Hmmm...the orthogonality thesis is pretty simple to state, so I don't think necessarily that it has been grossly misunderstood. The bad reasoning in Fallacy 4 seems to come from a more general phenomenon with classic AI Safety arguments, where they do hold up, but only with some caveats and/or more precise phrasing. So I guess "bad coverage" could apply to the extent that popular sources don't go in depth enough.
I do think the author presented good summaries of Bostrom's and Russell's viewpoints. But then they immediately jump to a "special sauce" type argument. (Quoting the full thing just in case)
The thought experiments proposed by Bostrom and Russell seem to assume that an AI system could be“superintelligent” without any basic humanlike common sense, yet while seamlessly preserving the speed, precision and programmability of a computer. But these speculations about superhuman AI are plagued by flawed intuitions about the nature of intelligence. Nothing in our knowledge of psychology or neuroscience supports the possibility that “pure rationality” is separable from the emotions and cultural biases that shape our cognition and our objectives. Instead, what we’ve learned from research in embodied cognition is that human intelligence seems to be a strongly integrated system with closely interconnected attributes, include emotions, desires, a strong sense of self hood and autonomy, and a commonsense understanding of the world. It’s not at all clear that these attributes can be separated.
I really don't understand where the author is coming from with this. I will admit that the classic paperclip maximizer example is pretty far-fetched, and maybe not the best way to explain the orthogonality thesis to a skeptic. I prefer more down-to-earth examples like, say, a chess bot with plenty of compute to look ahead, but its goal is to protect its pawns at all costs instead of its king. It will pursue its goal intelligently but the goal is silly to us, if what we want is for it to be a good chess player.
I feel like the author's counterargument would make more sense if they framed it as an outer alignment objection like "it's exceedingly difficult to make an AI whose goal is to maximize paperclips unboundedly, with no other human values baked in, because the training data is made by humans". And maybe this is also what their intuition was, and they just picked on the orthogonality thesis since it's connected to the paperclip maximize example and easy to state. Hard to tell.
It would be nice if AI Safety were less disorganized, and had a textbook or something. Then, a researcher would have a hard time learning about the orthogonality thesis without also hearing a refutation of this common objection. But a textbook seems a long way away...
I mean...sure...but again, this does not affect the validity of my counterargument. Like I said, I'm using as strong as possible of a counterargument by saying that even if the non-brain parts of the body were to add 2-100x computing power, this would not restrict our ability to scale up NNs to get human-level cognition. Obviously this still holds if we replace "2-100x" with "1x".
The advantage of "2-100x" is that it is extraordinarily charitable to the "embodied cognition" theory—if (and I consider this to be extremely low probability) embodied cognition does turn out to be highly true in some strong sense, then "2-100x" takes care of this in a way that "~1x" does not. And I may as well be extraordinarily charitable to the embodied cognition theory, since "Bitter lesson" type reasoning is independent of its veracity.
This claim is false. (as in, the probability that it is true is vanishingly close to zero, unless the human brain uses supernatural elements). All of the motor drivers except for the most primitive reflexes (certain spinal reflexes) are in the brain. You can say that for all practical purposes, 100% of the computational power the brain has is in the brain.
I agree with your intuition here, but this doesn't really affect the validity of my counterargument. I should have stated more clearly that I was computing a rough upper bound. So saying something like, assuming embodied cognition is true, the non-brain parts of the body might add an extra 2, 10, or 100 times computing power. Even under the very generous assumption that they add 100 times computing power (which seems vanishingly unlikely), this still doesn't mean that embodied cognition refutes the idea that simply scaling up a NN with sufficient compute won't produce human-level cognition.
What do you do to keep up with AI Safety / ML / theoretical CS research, to the extent that you do? And how much time do you spend on this? For example, do you browse arXiv, Twitter, ...?
A broader question I'd also be interested in (if you're willing to share) is how you allocate your working hours in general.
What's your take on "AI Ethics", as it appears in large tech companies such as Google or Facebook? Is it helping or hurting the general AI safety movement?
You've appeared on the 80,000 Hours podcast two times. To the extent that you remember what you said in 2018-19, are there any views you communicated then which you no longer hold now? Another way of asking this question is—do you still consider those episodes to be accurate reflections of your views?
According to your internal model of the problem of AI safety, what are the main axes of disagreement researchers have?