I wish it came with an explanation what _exactly_ Impatience and Hybris virtues entail (given that both are generally described as non-virtues but I do seem to have the feeling that they can be good; same works for Laziness, but here I believe I have better understanding already).
"the agent would lack a nuanced understanding of what we consider terrible" - isn't it the whole narrative for Eliezer's genie tales? While having #2 as a separate request is good, failure to follow #1 can still be catastrophic enough because computers think faster, so our formal "staying in control" may not matter enough.
Oh, then sorry about the RNN attack ;)
Well, no. In particular, if you feed the same sound input to linguistic module (PF) and to the module of (say, initially visual) perception, the very intuition behind Fodorian modules is that they will *not* do the same - PF will try to find linguistic expressions similar to the input whereas the perception module will try to, well, tell where the sound comes from, how loud it is and things like that.
This memoizing seems similar to "dynamic programming" (which is, semi-predictably, neither quite dynamic nor stricto sensu programming). Have you considered that angle?
1. "My understanding is that we can do things like remember a word by putting it on loop using speech motor control circuits" - this is called phonological loop in psycholinguistics (psychology) and is NOT THE SAME as working memory - in fact, tests for working memory usually include reading something aloud precisely to occupy the circuits and not let the test subject take advantage of their phonological loop. What I mean by working memory is the number of things one can hold in their mind simultaneously captured by "5+-2" work and Daneman's tests - whatever the explanation is.
2. Fodorian modules are, by definition, barely compatible with CCA. And the Zeitgeist of theoretical linguistics leads me to think that when you use RNN to explain something you're cheating your way to performance instead of explaining what goes on (i.e. to think that brain ISN'T an RNN or a combination thereof - at least not in an obvious sense). Thus we don't quite share neurological assumptions - though bridging to a common point may well be possible.
Allowing to specify another overseer? Not to generalize from fiction, but have you even seen Spider-Man: Away from home? The new overseer may well turn out to be a manipulator who convinced Hugh to turn over the reins - and this is much more likely than a manipulator that can influence every decision of Hugh. Thus AI should probably have a big sparkling warning sign of NOT CHANGING THE OVERSEER, maybe unless an "external observer" party approves - though this is somewhat reminiscent of "turtles all the way down" manipulating several observers is trivially more difficult.
Also, SIMPLE case of natural language? The fact that current NLP works on strings and neural nets and other most likely wrong assumptions about language kinda suggests that it is not simple.
On the latter: yes, this is part of the question but not the whole question. See addendum.
On the former: technically not true. If we take "human values" as "values averaged between different humans" (not necessarily by arithmetical mean, of course) they may be vastly different from "is this good from my viewpoint?".
On the bracketed part: yeah, that too. And our current morals may not be that good judging by our metamorals.
Again, I want to underscore that I mention this as a theoretical possibility not so improbable as to make it not worth considering - not as an unavoidable fact.
I would think that the former are the _mechanism_ of the latter - though, as they say, "don't quote me on that".
There is an interesting question of whether, if many things are modules, there is also non-module part, the "general intelligence" part which does not share those properties. Perhaps unsurprisingly, there is no consensus (though my intuitions say there is the GI part).
Also, it seems that different modules might use the same (common) working memory - though this is not set in stone (and depends, in particular, on your analysis of language - if late Chomsky is right, only phonology (PF) and perhaps semantics (LF) are modular, whereas syntax uses our general recursive ability, and this is why it uses general working memory).
This led me to think... why do we even believe that human values are good? Perhaps the typical human behaviour amplified by possibilities of a super-intelligence would actually destroy the universe. I don't personally find this very likely (that's why I never posted it before), but, given that almost all AI safety is built around "how to check that AI's values are convergent with human values" one way or another, perhaps something else should be approached - like remodeling history (actual, human history) from a given starting point (say, Roman Principatus or 1945) with actors assigned values different from human values (but in similar relationship to each other, if applicable) and finding what leads to better results (and, in particular, in us not being destroyed by 2020). All with the usual sandbox precautions, of course.
(Addendum: Of course, pace "fragility of value". We should have some inheritance from metamorals. But we don't actually know how well our morals (and systems in "reliable inheritance" from them) are compatible with our metamorals, especially in an extreme environment such as superintelligence.)
On the second point - I have misunderstood you, now I see what you're talking about. If Fodorian modules' view is right, the neocortex one(s) still isn't (aren't) "conscious". The received wisdom I have says that modules are:
1)Automatic (one cannot consciously change how they work - except by cutting off their input) - hence susceptible to illusions/wrong analyses/...;
2)Autonomous (consciousness only "sees" outputs, a module is black box for its owner; these two properties are related but distinct - yet something that has both can barely be called "conscious");
3)Inherited with a critical period of fine-tuning (that's basically what you called time window).
There were some more points but I (obviously) forgot them. And that brings me to your first point: I can't point to a textbook right away but that was part of several courses I was taught (Psychology of cognitive processes at Moscow State University (Fundamental and Applied Linguistics program); Language, Music, and Cognition in NYI 2016 - nyi.spb.ru).