I you believe "there'll probably be warning shots", that's an argument against "someone will get to build It", but not an argument against "if someone built It, everyone would die." (where "it" specifically means "an AI smart enough to confidently outmaneuver all humanity, built by methods similar to today where they are 'organically grown' in hard to predict ways").
It's a bit of both.
Suppose there are no warning shots. A hypothetical AI that's a a bit weaker than humanity but still awfully impressive doesn't do anything at all that manifests an intent to harm us. That could mean:
I take Yudkowsky and Soares to put all the weight on #2 and #3 (with, based on their scenario, perhaps more of it on #2).
I don't think that's right. I think if we have reached the point where an AI really could plausibly start and win a war with us and it doesn't do anything nasty, there's a fairly good chance we're in #1. We may not even really understand how we got into #1, but sometimes things just work out.
I'm not saying this is some kind of great strategy for dealing with the risk; the scenario I'm describing is one where there's a real chance we all die and I don't think you get a strong signal until you get into the range where the AI might win, which is a bad range. But it's still very different than imagining the AI will inherently wait to strike until it has ironclad advantages.
Because LLMs are already avoiding being shut down: https://arxiv.org/abs/2509.14260 .
Very interest, thanks. As I said in the review, I wish there was more of this kind of thing in the book.
If your terminal goal is to enjoy watching a good movie, you can't achieve it if you're dead/shut down.
If your terminal goal is for you to watch the movie, then sure. But if your terminal goal is that the movie be watched, then shutting you down might well be perfectly consistent with it.
Ok, let's say there is an "in between" period, and let's say we win the fight against a misaligned AI. After the fight, we will still be left with the same alignment problems, as other people in this thread pointed out. We will still need to figure out how to make safe, benevolent AI, because there is no guarantee that we will win the next fight, and the fight after that, and the one after that, etc
At that point, the shut down argument is no longer speculative, and you can probably actually do it.
To be clear, I'm not saying that's a good plan if you can foresee all the developments in advance. But, if you're uncertain about all of it, then it seems like there is likely to be a period of time before it's necessarily too late when a lot of the uncertainty is resolved.
No, I can't. And I suspect that if the authors conducted a more realistic political analysis, the book might just be called "Everyone's Going to Die."
But, if you're trying to come up with an idea that's at least capable of meeting the magnitude of the asserted threat, then you'd consider things like:
And then you just have to bite the bullet and accept that if these entail a risk of a nuclear war with China, then you fight a nuclear war with China. I don't think either of those would really work out either, but at least they could work out.
If there is some clever idea out there for how to achieve an AI shutdown, I suspect it involves some way of ensuring that developing AI is economically unprofitable. I personally have no idea how to do that, but unless you cut off the financial incentive, someone's going to do it.
Yea, I get that.
That said, they're clearly writing the book for this moment and so it would be reasonable to give some space to what's going with AI at this moment and what is likely to happen within the foreseeable future (however long that is). Book sales/readership follow a rapidly decaying exponential and so the fact that such information might well be outdated to the point of irrelevance in a few years shouldn't really hold them back.
But, in those cases, it's most likely better for the AI to wait, and it will know that it's better to wait, until it gets more powerful.
But why? People foolishly start wars all the time, including in specific circumstances where it would be much better to wait.
(A counterargument here is "an AI might want to launch a pre-emptive strike before other more powerful AIs show up", which could happen. But, if we win that war, we're still left with "the sort of tools that can constrain a near-human superintelligence, would not obviously apply to a much smarter AI", and we still have to solve the same problems.)
Or, having fought a "war" with an AI, we have relatively clear, non-speculative evidence about the consequences of continuing AI development. And that's the point where you might actually muster politically will to cut that off in the future and take the steps necessary for that to really work.
Did the book convince you that if superintelligence is built in the next 20 years (however that happens, if it does, and for at least some sensible threshold-like meaning of "superintelligence"), then there is at least a 5-10% chance that as a result literally everyone will literally die?
I'm much more in the world of Knightian uncertainty here (i.e., it could happen but I have no idea how to quantify that) than in one where I feel like I can reasonably collapse it into a clear, probabilistic risk. I am persuaded that this is something that cannot be ruled out.
I have the sense that rationalists think there's a a very important distinction between "literally everyone will die" and, say, "the majority of people will suffer and/or die." I do not share that sense, and to me, the burden of proof set by the title is unreasonably high.
I'll assent to the statement that there's at least a 10% chance of something very bad happening, where "very bad" means >50% of people dying or experiencing severe suffering or something equivalent to that.
I think this kind of claim is the crux for motivating some sort of global ban or pause on rushed advanced general AI development in the near future (as an input to policy separate from the difficulty of actually making this happen). Or for not being glad that there is an "AI race" (even if it's very hard to mitigate). So it's interesting if your "not sure on existential risk" takeaway is denying or affirming this claim.
Give me a magic, zero-side effect pause button, and I'll hit it instantly.
Are these actually costly actions to any meaningful degree? In the context of the amount of money sloshing around the AI space, hiring even "lots" of safety researchers seems like a rounding error.
I may misunderstand the commitments you're referring to, but I think these are all purely internal? And thus not really commitments at all.
This seems to presume that I have some well-formed views on how AI labs compare, and I don't have those. All I really know about Meta is that they're behind and doing open source. I wouldn't even know where to start an analysis of their relative level of moral integrity. So far as it goes (and, again, this is just the view of someone that reads what breaks through in mainstream news coverage), I have a very clear sense that OpenAI is run by compulsive liars but not much more to go on beyond that other than a general sense that people in the industry do a lot of hype.
I'm deliberately not looking this up and telling you my impression of this phenomenon. I'm coming up with three cases of it (my recollection is maybe garbled) that broke though into my media universe:
And then, beyond that, you seem to have a lot of people signing these open letters with no cost attached. For something like this to breakthrough, it needs to be (in my estimation at least) large numbers of people acting in a coordinated way and leaving the industry entirely.
I'd analogize it to politics. In any given presidential administration, you have one or two people who get really worked up and resign angrily and then go on TV attacking their former bosses. That's just to be expected and doesn't really reflect anything beyond the fact that sometimes people have strong reactions or particularized grievances or whatever. The thing that (should) wake you up is when this is happening at scale.
Only steps that carry meaningful financial consequences. I agree that any individual researcher can send a credible signal by quitting and giving up their stock, at least to the extent they don't just immediately go into a similarly compensated position. But, you're always left with the counter-signal from all the other researchers not doing that.
On a more institutional level, it would have to be something that actually threatens the valuation of the companies.