Chapter 5, “Its Favorite Things,” starts with Yudkowsky’s “Correct-Nest parable” about intelligent aliens who care a lot about the exact number of stones found in their nests.
Immediately after the parable, on page 82:
Most alien species, if they evolved similarly to how known biological evolution usually works, and if given a chance to have things the way they liked them most, probably would not choose a civilization where all their homes contained a large prime number of stones. There are just a lot of other ways to be; there are a lot of other directions one could steer. Much like predicting that your next lottery ticket won’t be a winning one, this is an easy call.
Similarly, most powerful artificial intelligences, created by any method remotely resembling the current methods, would not choose to build a future full of happy, free people. We aren't saying this because we get a kick out of being bleak. It's just that those powerful machine intelligences will not be born with preferences much like ours.
This is just a classic “counting argument” against alignment efforts being successful, right?
I recall Alex Turner (TurnTrout) arguing that (at least some) counting arguments (that are often made) are wrong (Many arguments for AI x-risk are wrong) and quoting Nora Belrose and Quintin Pope arguing the same (Counting arguments provide no evidence for AI doom). Some people in the comments, such as Evan Hubinger, seem to disagree, but as a layperson the discussion became too technical for me to understand.
In any case, the version of the counting argument in the book seems simple enough that as a layperson I can tell that it's wrong. To me it seems like it clearly proves too much.
Insofar as Yudkowsky and Soares are saying here that an ASI created by any method remotely resembling the current method will likely not choose to build a future full of happy, free people because there are many more possible preferences that an ASI could have than the narrow subset of preferences that would lead to it building a future of happy, free people, then I think the argument is wrong.
It seems like this counting observation is a reason to think (so maybe I think the "no evidence" in the above linked post title is too strong) that the preferences an ASI ends up having might not be the preferences its creators try training into it, because the target preferences are indeed a narrow target and narrow targets are easier to miss than broad targets. But surely this counting observation is not sufficient to conclude that ASI creators will fail to hit their narrow target. It seems like you would need more reasons to conclude that.
Yes, in my language it's a *random potshot" fallacy.
One would be the random potshot version of the Orthogonality Thesis, where there is an even chance of hitting any mind in mindspace, and therefore a high chance ideas of hitting an eldritch, alien mind. But equiprobability is only one way of turning possibilities into probabilities, and not particularly realistic. Random potshots aren't analogous to the probability density for action of building a certain type of AI, without knowing much about what it would be.
Does this just change the problem to one of corrigibility? If the target is narrow but AI can be guided toward it, that's good. If the target is narrow and AI cannot be guided effectively, then it's predictably not going to hit the target.
I think you have to assume one of incorrigibility, very rapid takeoff , or deception for a doom argument to go through.
If Anyone Builds It, Everyone Dies was published ten days ago and nobody created a general discussion thread post for the book on LessWrong in all that time. Maybe there's good reason for that, but I don't know what it is, and I personally would value having a discussion post like this to comment on while rereading the book, so here it is.
"This is a review of the reviews" pointed out that it's weird for people who think AI extinction risk is significant to write reviews of IABIED consisting exclusively of a bunch of disagreements with the book without clearly stating upfront that the situation we find ourselves in is insane:
If you think there's a 1 in 20 chance it could be so over, it feels to me the part where people are not doing the ‘yes the situation is insane’ even if that is immediately followed up with ‘im more hopeful than them tbc’ is weird.
I agree. Notably, however, this thread is not for complete reviews of the book! And so it's not weird to just comment your random miscellaneous thoughts below without giving context on your views on AI risk or your overall take on the book.
As Steven Byrnes said:
Basically, I’m in favor of people having nitpicky high-decoupling discussion on lesswrong, and meanwhile doing rah rah activism action PR stuff on twitter and bluesky and facebook and intelligence.org and pauseai.info and op-eds and basically the entire rest of the internet and world. Just one website of carve-out. I don’t think this is asking too much!