Sounds like a (much better than original) explanation of Igor Mel'čuk's "structural model" vs. "functional model". An old topic in linguistics and, arguably, other cognitive sciences.
Infertility does not entail non-producing of hormones (the most obvious examples being vasectomy in males and the operation on tubes what's-its-name in females). It is pretty unlikely that COVID-19 actually castrates its victims; it is testable, though, by measuring levels of testosterone and estrogenes.
suppoesd - should read supposed
I wish it came with an explanation what _exactly_ Impatience and Hybris virtues entail (given that both are generally described as non-virtues but I do seem to have the feeling that they can be good; same works for Laziness, but here I believe I have better understanding already).
"the agent would lack a nuanced understanding of what we consider terrible" - isn't it the whole narrative for Eliezer's genie tales? While having #2 as a separate request is good, failure to follow #1 can still be catastrophic enough because computers think faster, so our formal "staying in control" may not matter enough.
Oh, then sorry about the RNN attack ;)
Well, no. In particular, if you feed the same sound input to linguistic module (PF) and to the module of (say, initially visual) perception, the very intuition behind Fodorian modules is that they will *not* do the same - PF will try to find linguistic expressions similar to the input whereas the perception module will try to, well, tell where the sound comes from, how loud it is and things like that.
This memoizing seems similar to "dynamic programming" (which is, semi-predictably, neither quite dynamic nor stricto sensu programming). Have you considered that angle?
1. "My understanding is that we can do things like remember a word by putting it on loop using speech motor control circuits" - this is called phonological loop in psycholinguistics (psychology) and is NOT THE SAME as working memory - in fact, tests for working memory usually include reading something aloud precisely to occupy the circuits and not let the test subject take advantage of their phonological loop. What I mean by working memory is the number of things one can hold in their mind simultaneously captured by "5+-2" work and Daneman's tests - whatever the explanation is.
2. Fodorian modules are, by definition, barely compatible with CCA. And the Zeitgeist of theoretical linguistics leads me to think that when you use RNN to explain something you're cheating your way to performance instead of explaining what goes on (i.e. to think that brain ISN'T an RNN or a combination thereof - at least not in an obvious sense). Thus we don't quite share neurological assumptions - though bridging to a common point may well be possible.
Allowing to specify another overseer? Not to generalize from fiction, but have you even seen Spider-Man: Away from home? The new overseer may well turn out to be a manipulator who convinced Hugh to turn over the reins - and this is much more likely than a manipulator that can influence every decision of Hugh. Thus AI should probably have a big sparkling warning sign of NOT CHANGING THE OVERSEER, maybe unless an "external observer" party approves - though this is somewhat reminiscent of "turtles all the way down" manipulating several observers is trivially more difficult.
Also, SIMPLE case of natural language? The fact that current NLP works on strings and neural nets and other most likely wrong assumptions about language kinda suggests that it is not simple.
On the latter: yes, this is part of the question but not the whole question. See addendum.
On the former: technically not true. If we take "human values" as "values averaged between different humans" (not necessarily by arithmetical mean, of course) they may be vastly different from "is this good from my viewpoint?".
On the bracketed part: yeah, that too. And our current morals may not be that good judging by our metamorals.
Again, I want to underscore that I mention this as a theoretical possibility not so improbable as to make it not worth considering - not as an unavoidable fact.