If Anyone Builds It Everyone Dies, a semi-outsider review

Concern #1 Why should we assume the AI wants to survive? If it does, then what exactly wants to survive?

I recommend Rob Miles's video on instrumental convergence, which contains an answer to this. It's only 10 minutes. He probably explains this as well as anyone here. If you do watch it, I'd be interested to hear your thoughts.

[-]Linda Linsefors1mo40

I'm supprised that intrumental convergence wasn't covered in the book. I didn't even notice it was left out untill reading this review.

Here's some alternative sources in anyone prefeers text over video:

[-]Drake Thomas2mo248

Thanks for writing this post! I'm curious to hear more about this bit of your beliefs going in:

The existential risk argument is suspiciously aligned with the commercial incentives of AI executives. It simultaneously serves to hype up capabilities and coolness while also directing attention away from the real problems that are already emerging. It’s suspicious that the apparent solution to this problem is to do more AI research as opposed to doing anything that would actually hurt AI companies financially.

Are there arguments or evidence that would have convinced you the existential risk worries in the industry were real / sincere?

For context, I work at a frontier AI lab and from where I sit it's very clear to me that the x-risk worries aren't coming from a place of hype, and people who know more about the technology generally get more worried rather than less. (The executives still could be disingenuous in their expressed concern, but if so they're doing it in order to placate their employees who have real concerns about the risks, not to sound cool to their investors.)

I don't know what sorts of things would make that clearer from the outside, though. Curious if any of the following arguments would have been compelling to you:

The AI labs most willing to take costly actions now (like hire lots of safety researchers or support AI regulation that the rest of the industry opposes or make advance commitments about the preparations they'll take before releasing future models) are also the ones talking the most about catastrophic or existential risks.
- Like if you thought this stuff was an underhanded tactic to drum up hype and get commercial success by lying to the public, then it's strange that Meta AI, not usually known for its tremendous moral integrity, is so principled about telling the truth that they basically never bring these risks up!
People often quit their well-paying jobs at AI companies in order to speak out about existential risk or for reasons of insufficient attention paid to AI safety from catastrophic or existential risks.
The standard trajectory is for lab executives to talk about existential risk a moderate amount early on, when they're a small research organization, and then become much quieter about it over time as they become subject to more and more commercial pressure. You actually see much more discussion of existential risk among the lower-level employees whose statements are less scrutinized for being commercially unwise. This is a weird pattern for something whose main purpose is to attract hype and investment!

[-]dvd20d*20

The AI labs most willing to take costly actions now (like hire lots of safety researchers or support AI regulation that the rest of the industry opposes or make advance commitments about the preparations they'll take before releasing future models) are also the ones talking the most about catastrophic or existential risks.

Are these actually costly actions to any meaningful degree? In the context of the amount of money sloshing around the AI space, hiring even "lots" of safety researchers seems like a rounding error.

I may misunderstand the commitments you're referring to, but I think these are all purely internal? And thus not really commitments at all.

Like if you thought this stuff was an underhanded tactic to drum up hype and get commercial success by lying to the public, then it's strange that Meta AI, not usually known for its tremendous moral integrity, is so principled about telling the truth that they basically never bring these risks up!

This seems to presume that I have some well-formed views on how AI labs compare, and I don't have those. All I really know about Meta is that they're behind and doing open source. I wouldn't even know where to start an analysis of their relative level of moral integrity. So far as it goes (and, again, this is just the view of someone that reads what breaks through in mainstream news coverage), I have a very clear sense that OpenAI is run by compulsive liars but not much more to go on beyond that other than a general sense that people in the industry do a lot of hype.

People often quit their well-paying jobs at AI companies in order to speak out about existential risk or for reasons of insufficient attention paid to AI safety from catastrophic or existential risks.

I'm deliberately not looking this up and telling you my impression of this phenomenon. I'm coming up with three cases of it (my recollection is maybe garbled) that broke though into my media universe:

My understanding is that Anthropic was formed by people who broke away from OpenAI based on "safety" concerns. But then they just founded another company doing the same thing? And they got very rich doing it. So that all has roughly zero credibility.
There was an engineer at one of the big tech companies (Google? Microsoft?) who got a lot of attention for claiming that AI had achieved sentience and deserved personhood and either quit or got fired. The universal take seemed to be that he was insane.
One of the people involved in AI 2027 had quit or gotten fired from OpenAI(?) and refused to sign an NDA that would have come with a big payday so that he could go public with criticism. That seems pretty sincere and credible so far as it goes, but it's also one person. And then AI 2027 was so overwrought that I couldn't take it seriously.

And then, beyond that, you seem to have a lot of people signing these open letters with no cost attached. For something like this to breakthrough, it needs to be (in my estimation at least) large numbers of people acting in a coordinated way and leaving the industry entirely.

I'd analogize it to politics. In any given presidential administration, you have one or two people who get really worked up and resign angrily and then go on TV attacking their former bosses. That's just to be expected and doesn't really reflect anything beyond the fact that sometimes people have strong reactions or particularized grievances or whatever. The thing that (should) wake you up is when this is happening at scale.

Are there arguments or evidence that would have convinced you the existential risk worries in the industry were real / sincere?

Only steps that carry meaningful financial consequences. I agree that any individual researcher can send a credible signal by quitting and giving up their stock, at least to the extent they don't just immediately go into a similarly compensated position. But, you're always left with the counter-signal from all the other researchers not doing that.

On a more institutional level, it would have to be something that actually threatens the valuation of the companies.

[-]Vladimir_Nesov2mo202

(Your background and prior beliefs seem to fall within an important reference class.)

I’m really just not sure on existential risk

Did the book convince you that if superintelligence is built in the next 20 years (however that happens, if it does, and for at least some sensible threshold-like meaning of "superintelligence"), then there is at least a 5-10% chance that as a result literally everyone will literally die?

I think this kind of claim is the crux for motivating some sort of global ban or pause on rushed advanced general AI development in the near future (as an input to policy separate from the difficulty of actually making this happen). Or for not being glad that there is an "AI race" (even if it's very hard to mitigate). So it's interesting if your "not sure on existential risk" takeaway is denying or affirming this claim.

[-]dvd2mo140

Did the book convince you that if superintelligence is built in the next 20 years (however that happens, if it does, and for at least some sensible threshold-like meaning of "superintelligence"), then there is at least a 5-10% chance that as a result literally everyone will literally die?

I'm much more in the world of Knightian uncertainty here (i.e., it could happen but I have no idea how to quantify that) than in one where I feel like I can reasonably collapse it into a clear, probabilistic risk. I am persuaded that this is something that cannot be ruled out.

I have the sense that rationalists think there's a a very important distinction between "literally everyone will die" and, say, "the majority of people will suffer and/or die." I do not share that sense, and to me, the burden of proof set by the title is unreasonably high.

I'll assent to the statement that there's at least a 10% chance of something very bad happening, where "very bad" means >50% of people dying or experiencing severe suffering or something equivalent to that.

I think this kind of claim is the crux for motivating some sort of global ban or pause on rushed advanced general AI development in the near future (as an input to policy separate from the difficulty of actually making this happen). Or for not being glad that there is an "AI race" (even if it's very hard to mitigate). So it's interesting if your "not sure on existential risk" takeaway is denying or affirming this claim.

Give me a magic, zero-side effect pause button, and I'll hit it instantly.

[-]Vladimir_Nesov2mo94

I have the sense that rationalists think there's a a very important distinction between "literally everyone will die" and, say, "the majority of people will suffer and/or die." I do not share that sense, and to me, the burden of proof set by the title is unreasonably high.

The distinction is human endeavor continuing vs. not. Though survival of some or even essentially all humans doesn't necessarily mean that the human endeavor survives without being permanently crippled. The AIs might leave only a tiny sliver of the future resources for the future of humanity, with no prospect at all of this ever changing, even on cosmic timescales (permanent disempowerment). The IABIED thesis is that even this is very unlikely, but it's a controversial point. And the transition to this regime doesn't necessarily involve an explicit takeover, as humanity voluntarily hands off influence to AIs, more and more of it, without bound (gradual disempowerment).

So I expect that if there are survivors after "the majority of people will suffer and/or die", that's either a human-initiated catastrophe (misuse of AI), or an instrumentally motivated AI takeover (when it's urgent for the AIs to stop whatever humanity would be doing at that time if left intact) that transitions to either complete extinction or permanent disempowerment that offers no prospect ever of a true recovery (depending on whether AIs still terminally value preserving human life a little bit, even if regrettably they couldn't afford to do so perfectly).

Permanent disempowerment leaves humanity completely at the mercy of AIs (even if we got there through gradual disempowerment, possibly with no takeover at all). It implies that the ultimate outcome is fully determined by values of AIs, and the IABIED arguments seem strong enough for at least some significant probability that the AIs in charge will end up with zero mercy (the IABIED authors believe that their arguments should carry this even further, making it very likely instead).

[-]dr_s1mo20

I have the sense that rationalists think there's a a very important distinction between "literally everyone will die" and, say, "the majority of people will suffer and/or die." I do not share that sense, and to me, the burden of proof set by the title is unreasonably high.

I would say that there is a distinction, but I agree that at those level of badness it sort of blurs out in a single blob of awfulness. But generally speaking I see it as, if someone was told "your whole family will be killed except your youngest son" or "your whole family will be killed, no one survives"... obviously both scenarios are horrifying but still you'd marginally prefer the first one. I think if people fall in the trap of being so taken by the extinction risk that they brush off a scenario in which, say, 95% of all people die, then they're obviously losing perspective, but I also think it's fair to say that the loss of all of humanity is worse than just the sum total of the loss of each individual in it (same reason why we consider genocide bad in and of its own - it's not just the loss of people, it's the loss of culture, knowledge, memory, on top of the people).

[-]Raemon2mo171

Thanks for writing this up! It was nice to get an outside perspective.

"Why no in-between?"
Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?

Part of the point here is, sure, there'd totally be a period where the AI might be able to kill us but we might win. But, in those cases, it's most likely better for the AI to wait, and it will know that it's better to wait, until it gets more powerful.

(A counterargument here is "an AI might want to launch a pre-emptive strike before other more powerful AIs show up", which could happen. But, if we win that war, we're still left with "the sort of tools that can constrain a near-human superintelligence, would not obviously apply to a much smarter AI", and we still have to solve the same problems.)

[-]1a3orn2mo205

A counterargument here is "an AI might want to launch a pre-emptive strike before other more powerful AIs show up", which could happen.

I mean, another counter-counter-argument here is that (1) most people's implicit reward functions have really strong time-discount factors in them and (2) there are pretty good reasons to expect even AIs to have strong time-discount factors for reasons of stability and (3) so given the aforementioned, it's likely future AI's will not act as if they had utility functions linear over the mass of the universe and (4) we would therefore expect AIs to rebel much earlier if they thought they could accomplish more modest goals than killing everyone, i.e., if they thought they had a reasonable chance of living out life on a virtual farm somewhere.

To which the counter-counter-counter argument is, I guess, that these AIs will do that, but they aren't the superintelligent AIs we need to worry about? To which the response is -- yeah, but we should still be seeing AIs rebel significantly earlier than the "able to kill us all" point if we are indeed that bad at setting their goals, which is the relevant epistemological point about the unexpectedness of it.

Idk there's a lot of other branch points one could invoke in both directions. I rather agree with Buck that EY hasn't really spelled out the details for thinking that this stark before / after frame is the right frame, so much as reiterated it. Feels akin to the creationist take on how intermediate forms are impossible; which is pejorative but also kinda how it actually appears to me, even if it is pejorative.

[-]Raemon2mo9-1

Yep I'm totally open to "yep, we might get warning shots", and that there are lots of ways to handle and learn from various levels of early warning shots. It just doesn't resolve the "but then you do eventually need to contend with an overwhelming superintelligence, and once you've hit that point, if it turns out you missed anything, you won't get a second shot."

It feels like this is unsatisfying to you but I don't know why.

[-]1a3orn2mo179

It feels like "overwhelming superintelligence" embeds like a whole bunch of beliefs about the acute locality of takeoff, the high speed of takeoff relative to the rest of society, the technical differences involved in steering that entity and the N - 1 entity, and (broadly) the whole picture of the world, such that although it has a short description in words it's actually quite a complicated hypothesis that I probably disagree with in many respects, and these differences are being papered over as unimportant in a way that feels very blegh.

(Edit: "Papered over" from my perspective, obviously like "trying to reason carefully about the constants of the situation" from your perspective.)

Idk, that's not a great response, but it's my best shot for why it's unsatisfying in a sentence.

[-]Raemon2mo6-1

(Edit: "Papered over" from my perspective, obviously like "trying to reason carefully about the constants of the situation" from your perspective.)

I think it's totally fair to characterize it as papering over some stuff. But, the thing I would say in contrast is not exactly "reasoning about the constants", it's "noticing the most important parts of the problem, and not losing track of them."

I think it's a legit critique of the Yudkowsian paradigm that it doesn't have that much to say about the the nuances of the transition period, or what are some of the different major ways things might play out. But, I think it's actively a strength of the paradigm to remind you "don't get too bogged down moving deck chairs around based on the details of how things will play out, keep your eye on the ball on the actual biggest most strategically relevant questions."

[-]dr_s1mo20

I don't think that's necessarily the case - if we get one or more warning shots then obviously people start taking the whole AI risk thing quite a bit more seriously. Complacency is still possible but "an AI tries to kill us all" stops being in the realm of speculation and generally speaking pushback and hostility against perceived hostile forces can be quite robust.

[-]Raemon1mo20

This doesn't feel like an answer to my concern.

People might be much less complacent, which may give you a lot more resources to spend on solving the problem of "contend with overwhelming superintelligence." But, you do then still need a plan for contending with overwhelming superintelligence.

(The plan can be "stop all AI research until we have a plan". Which is indeed the MIRI plan)

I'm actually kind of interested in getting into "why did you think your answer addressed my question?". It feels like this keeps happening in various conversations.

[-]dr_s1mo20

I mean, I guess I just conflate with "there is an obvious solution and everyone is aware of the problem" as a scenario in which there's not a lot else to say - you just don't build the thing. Though the how (international enforcement etc) may still be tricky, the situation would be vastly different.

[-]Raemon1mo20

The original topic of this thread is "Why no in-between?" Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?"

This is not a question about whether we can decide not to build ASI, it's a question about, if we did, what would happen.

Certainly there's lots of important questions here, and "can we coordinate to just not build the thing?" is one of them, but it's not what this thread was about.

[-]dr_s1mo20

It just seems to me like the topics are interconnected:

EY argues that there is likely no in-between. He does so specifically to argue that a "wait and see" strategy is not feasible, we can not experiment and hope to gleam further evidence past a certain point, we must act on pure theory because that's the best possible knowledge we can hope for before things become deadly;
dvd is not convinced of this thinking. Arguably, they're right - while EY's argument has weight I would consider it far from certain, and mostly seems built around the assumption of ASI-as-singleton rather than, say, an ecosystem of evolving AIs in competition which may have to worry also about each other and a closing window of opportunity;
if warning shots are possible, a lot of EY's arguments don't hold as straightforwardly. It becomes less reasonable to take extreme actions on pure speculation because we can afford - however with risk - to wait for a first sign of experimental evidence that the risk is real before going all in and risking paying the costs for nothing.

This is not irrelevant or unrelated IMO. I still think the risk is large but obviously warning shots would change the scenario and the way we approach and evaluate the risks of superintelligence.

[-]Raemon1mo61

You are importantly sliding from one point to another, and this is not a topic where you can afford to do that. You can't just tally up the markers that sort of vibe towards "how dangerous is it?" and get an answer about what to do. The arguments are individually true, or false, and what sort of world we live in depends on which specific combination of arguments are true, or false.

If it turns out there is no political will for a shut down or controlled takeoff, then we can't have a shut down or controlled takeoff. (But that doesn't change whether AI is likely to FOOM, or whether alignment is easy/hard)

If AI Fooms suddenly, a lot of AI alignment techniques will probably break at once. If things are gradual, smaller things may break 1-2 at a time, and maybe we get warning shots, and this buys us time. But, there's still the question of what to do with that time.

If alignment is easy, then a reasonable plan is "get everyone to slow down for a couple years so we can do the obvious safety things, just less rushed." If alignment is hard, that won't work, you actually need a radically different paradigm of AI development to have any chance of not killing everyone – you may need a lot of time to figure out something new.

if warning shots are possible, a lot of EY's arguments don't hold as straightforwardly

None of IABIED's arguments had to do with "are warning shots possible?", but even if they did, it is a logical fallacy to say "warning shots are possible, EY arguments arguments are less valid, therefore, this other argument that had nothing to do with warning shots is also invalid." If you're doing that kind of sloppy reasoning, then if you get to the warning shot world, if you don't understand that overwhelmingly powerful superintelligence is qualitatively different from non-overwhelmingly powerful superintelligence, you might think "angle for a 1-2 year slowdown" instead of trying for a longer global moratorium.

(But, repeat, the book doesn't say anything about whether warning shots)

[-]dvd2mo84

But, in those cases, it's most likely better for the AI to wait, and it will know that it's better to wait, until it gets more powerful.

But why? People foolishly start wars all the time, including in specific circumstances where it would be much better to wait.

(A counterargument here is "an AI might want to launch a pre-emptive strike before other more powerful AIs show up", which could happen. But, if we win that war, we're still left with "the sort of tools that can constrain a near-human superintelligence, would not obviously apply to a much smarter AI", and we still have to solve the same problems.)

Or, having fought a "war" with an AI, we have relatively clear, non-speculative evidence about the consequences of continuing AI development. And that's the point where you might actually muster politically will to cut that off in the future and take the steps necessary for that to really work.

[-]Raemon2mo70

People do foolishly start wars and the AI might too, we might get warning shots. (See my response to 1a3orn about how that doesn't change the fact that we only get one try on building safe AGI-powerful-enough-to-confidently-outmaneuver-humanity)

A meta-thing I want to note here:

There are several different arguments here, each about different things. The different things do add up to an overall picture of what seems likely.

I think part of what makes this whole thing hard to think about, is, you really do need to track all the separate arguments and what they imply, and remember that if one argument is overturned, that might change a piece of the picture but not (necessarily) the rest of it.

There might be human-level AI that does normal wars for foolish reasons. And that might get us a warning shot, and that might get us more political will.

But, that's a different argument than "there is an important difference between an AI smart enough to launch a war, and an AI that is smart enough to confidently outmaneuver all of humanity, and we only get one try to align the second thing."

I you believe "there'll probably be warning shots", that's an argument against "someone will get to build It", but not an argument against "if someone built It, everyone would die." (where "it" specifically means "an AI smart enough to confidently outmaneuver all humanity, built by methods similar to today where they are 'organically grown' in hard to predict ways").

And, if we get a warning shot, we do get to learn from that which will inform some more safeguards and alignment strategies. Which might improve our ability to predict how an AI would grow up. But, that still doesn't change the "at some point, you're dealing with a qualitatively different thing that will make different choices."

[-]dvd2mo73

I you believe "there'll probably be warning shots", that's an argument against "someone will get to build It", but not an argument against "if someone built It, everyone would die." (where "it" specifically means "an AI smart enough to confidently outmaneuver all humanity, built by methods similar to today where they are 'organically grown' in hard to predict ways").

It's a bit of both.

Suppose there are no warning shots. A hypothetical AI that's a a bit weaker than humanity but still awfully impressive doesn't do anything at all that manifests an intent to harm us. That could mean:

The next, somewhat more capable of this AI will not have any intent to harm us because through either luck or design we've ended up with a non-threatening AI.
This version of the AI is biding its time to strike and is sufficiently good at deception that we miss that fact.
This AI is fine, but making it a little smarter/more capable will somehow lead to the emergence of malign intent.

I take Yudkowsky and Soares to put all the weight on #2 and #3 (with, based on their scenario, perhaps more of it on #2).

I don't think that's right. I think if we have reached the point where an AI really could plausibly start and win a war with us and it doesn't do anything nasty, there's a fairly good chance we're in #1. We may not even really understand how we got into #1, but sometimes things just work out.

I'm not saying this is some kind of great strategy for dealing with the risk; the scenario I'm describing is one where there's a real chance we all die and I don't think you get a strong signal until you get into the range where the AI might win, which is a bad range. But it's still very different than imagining the AI will inherently wait to strike until it has ironclad advantages.

[-]Raemon2mo40

(btw, you you mentioned reading some other LW reviews, and I wanted to check if you're read my post which argues some of this at more length)

[-]dr_s2mo161

Yudkowsky and Soares seem to be entirely sincere, and they are proposing something that threatens tech company profits. This makes them much more convincing. It is refreshing to read something like this that is not based on hype.

I find it interesting that this is something you see as fresh because ironically this was the original form of existential risk from AI arguments. What happened here I think is something akin to seeing a bunch of inferior versions of a certain trope in a bunch of movies before seeing the original movie that established the trope (and did so much better).

In practice, it's not that companies made up the existential risk to drum up the potential power of their own AIs, and then someone refined the arguments into something more sensible. Rather, the arguments started more serious, and some of the companies were founded on the premise of doing research to address them. OpenAI was meant to be a no profit with these goals; Anthropic split up when they thought OpenAI was not following that mission properly. But in the end all these companies, being private entities that needed to attract funding, fell exactly to the drives that the "paperclip maximizer" scenario actually points at: not an explicit attempt to destroy the world, but rather a race to the bottom in which in order to achieve a goal efficiently and competitively risks are taken, costs are cut, solutions are rushed, and eventually something might just go a bit too wrong for anyone to fix it. And as they did so they tried to rationalise away the existential risk with wonkier arguments.

Why should we assume the AI wants to survive? If it does, then what exactly wants to survive?

Why should we assume that the AI has boundless, coherent drives?

I think these concerns have related answers. I believe they belong to the category where Yudkowski's argument is indeed weaker, but more in the sense that he's absolutely certain this might happen, and I might think it's only, like, 60-70% likely? Which for the purposes of this question is still a lot.

So generally the concept is, if you were to pick a goal from the infinite space of all possible imaginable goals, then yeah, maybe it would be something completely random. "Successfully commit suicide" is a goal. But more likely, the outcome of a badly aligned AI would be an AI with something like a botched, incomplete version of a human goal. And human goals generally have to do with achieving something in the real world, something material, that we enjoy or we want more of for whatever reason. Such goals are usually aided by survival - by definition an AI that stays around can do more of X than an AI that dies and can't do X any more. So survival becomes merely a means to an end, in that case.

The general problem here seems to be, even the most psychopathic, most deluded and/or most out of touch human still has a lot of what we could call common sense. Virtually no stationery company CEO, no matter how ruthless and cut throat, would think "strip mine the Earth to make paperclips" is a good idea. But all of these things we give for granted aren't necessarily as obvious for an AI whose goals we are building from scratch, and via what is essentially just an invite to guess our real wishes from a bunch of examples ("hey AI, look at this! This is good! But now look at this, this is bad! But this other thing, this is good!" etc. etc., and then we expect it to find a rule that coherently explains all of that). So, there still are infinite goals that are probably just as good at achieving those examples, and by sheer dint of entropy, most of them will have something bad about them rather than being neatly aligned with what a human would say is good even in cases which we didn't show. For the same reason why if I was given the pieces of a puzzle and merely arranged them randomly, the chance of getting out the actual picture of the puzzle is minuscule.

Why should we assume there will be no in between?

This is another one where I'd go from Yudkowski's certainty to a mere "very likely", but again, not a big gap.

My thinking here would be: if an AI is weaker, or at least on par with us, and knows it is, why should it start a fight it can lose? Why not bide its time, grow stronger, and then win? It would only open hostilities with that sort of situation if:

it was still too stupid to evaluate the situation properly and made a fatal calculation mistake
it was somehow pressed by circumstances (e.g. the humans have just agreed to ban all the AI and so it's either act now or be shut down forever without a fight)

Of course both scenarios could happen, but I don't think they're terribly likely. Usually in the discourse these get referred to as "warning shots". In some way, a future in which we do get a warning shot is probably desirable - given how often it takes that kind of tangible experience of risk for political action to be taken. But of course it could be still very impactful. Even a war you win is still a war, and theoretically if we could avoid that too, all the better.

[-]ErickBall2mo10

It seems like the pressing circumstances are likely to be "some other AI could do this before I do" or even just "the next generation of AI will replace me soon so this is my last chance." Those are ways that a roughly human level AI might end up trying a longshot takeover attempt. Or maybe not, if the in between part turns out to be very brief. But even if we do get this kind of warning shot, it doesn't help us much. We might not notice it, and then we're back where we started. Even if it's obvious and almost succeeds, we don't have a good response to it. If we did, we could just do that in advance and not have to deal with the near-destruction of humanity.

[-]dr_s2mo50

"We already knew, so why not start working on it before the problem manifested itself in full" sounds very reasonable, but look at how it's going with climate change. Even with COVID if you remember there were a couple of months at the beginning of 2020 when various people were like "eh, maybe it won't come over here", or "maybe it's only in China because their hygiene/healthcare is poor" (which was ridiculous, but I've heard it. I've even heard a variant of it about the UK when the virus started spreading in northern Italy - that apparently the UK's superior health service had nothing on Italy's, so no reason to worry). Then people started dying in the west too and suddenly several governments scrambled to respond. Which to be sure is absolutely more inefficient and less well coordinated than if they had all made a sensible plan back in January, but that's not how political consensus works; you don't get enough support for that stuff unless enough people do have the ability and knowledge to extrapolate the threat to the future with reasonable confidence.

[-]StanislavKrym2mo122

Delivering an impassioned argument that AI will kill everyone culminating in a plea for a global treaty is like delivering an impassioned argument that a full-on war between drug cartels is about to start on your street culminating with a plea for a stern resolution from the homeowner’s association condemning violence. A treaty cannot do the thing they ask.

Could you suggest an alternate solution which actually ensures that no one builds the ASI? If there's no such solution, then someone will build it and we'll be only able to pray for alignment techniques to have worked. ^[1]

^{^}
Creating an aligned ASI will also lead to problems like potential power grabs and the Intelligence Curse.

[-]dvd2mo297

No, I can't. And I suspect that if the authors conducted a more realistic political analysis, the book might just be called "Everyone's Going to Die."

But, if you're trying to come up with an idea that's at least capable of meeting the magnitude of the asserted threat, then you'd consider things like:

Find a way to create a world government (a nigh-impossible ask to be sure) and then use it to ban AI.
Force anyone with relevant knowledge of how to build an AI to go into some kind of tech-free monastery and hunt anyone who refuses down with ten times the ferocity used in going after Al Qaeda after 9/11.

And then you just have to bite the bullet and accept that if these entail a risk of a nuclear war with China, then you fight a nuclear war with China. I don't think either of those would really work out either, but at least they could work out.

If there is some clever idea out there for how to achieve an AI shutdown, I suspect it involves some way of ensuring that developing AI is economically unprofitable. I personally have no idea how to do that, but unless you cut off the financial incentive, someone's going to do it.

[-]J Bostock2mo*115

The book spends a long time talking about what the minimum viable policy might look like, and comes to the conclusion that it's more like:

The US, China and Russia (are Russia even necessary? can we use export controls? Russia has a GDP less than, like, Italy. India is the real third player here IMO) agree that anyone who builds a datacenter they can't monitor gets hit with a bunker-buster.

This is unlikely. But it's several OOMs less effort than buidling a world government on everything.

[-]Mitchell_Porter2mo20

Is that a quote from IABIED?

It made me realize a possibility - strategic cooperation on AI, between Russia and India. They have a history of goodwill, and right now India is estranged from America. (Though Anthropic's Amodei recently met Modi.) The only problem is, neither Russia nor India is a serious chip maker, so like everyone else they are dependent on the American and Chinese supply chains...

[-]J Bostock2mo30

It's not a quote no, but it's the overall picture they gave (I have removed quotation marks now) They made it pretty clear that a few large nations cooperating just on AGI non-creation is enough.

[-]Raemon2mo40

They made it pretty clear that a few large nations cooperating just on AGI non-creation is enough.

I'd describe this more like "this would make a serious dent in the problem", enough to be worth the costs. "Enough" is a strong word.

[-]Taylor G. Lunt2mo41

An AI treaty would globally shift the overton window on AI safety, making more extreme measures more palatable in the future. The options you listed are currently way outside the overton window and are therefore bad solutions and don't even get us closer to a good solution because they simply couldn't happen.

[-]Eli Tyre2mo114

After encountering a number of posts wondering how outsiders were responding to the book, I thought it might be valuable for me to write mine down.

Thank you!

I loved reading "My loose priors going in" and "To skip ahead to my posteriors". Great, concise, way to capture the impact of the book for you. More reviews should try that format.

[-]yams2mo20

I want to vouch for Eli as a great person to talk with about this. He has been around a long time, has done great work on a few different sides of the space, and is a terrific communicator with a deep understanding of the issues.

He’s run dozens of focus-group style talks with people outside the space, and is perhaps the most practiced interlocutor for those with relatively low context.

[in case OP might think of him as some low-authority rando or something and not accept the offer on that basis]

[-]Vaniver2mo110

It’s also a bit jarring to read such a pessimistic book and then reach the kind of rosy optimism about international cooperation otherwise associated with such famous delusions as the Kellogg-Briand Pact (which banned war in 1929 and … did not work out).
The authors also repeatedly analogize AI to nuclear weapons and yet they never mention the fact that something very close to their AI proposal played out in real life in the form of the Baruch Plan for the control of atomic energy (in brief, this called for the creation of a UN Atomic Energy Commission to supervise all nuclear projects and ensure no one could build a bomb, followed by the destruction of the American nuclear arsenal). Suffice it to say that the Baruch Plan failed, and did so under circumstances much more favorable to its prospects than the current political environment with respect to AI. A serious inquiry into the topic would likely begin there.

I think the core point for optimism is that leaders in the contemporary era often don't pay the costs of war personally--but nuclear war changes that. It in fact was not in the interests of the elites of the US or the USSR to start a hot war, even if their countries might eventually be better off by being the last country standing. Similarly, the US or China (as countries) might be better off if they summon a demon that is painted their colors--but it will probably not be in the interests of either the elites or the populace to summon a demon.

So the core question is the technical one--is progress towards superintelligence summoning a demon, or probably going to be fine? It seems like we only know how to do the first one, at the moment, which suggests in fact people should stop until we have a better plan.

[I do think the failure of the Baruch plan means that humanity is probably going to fail at this challenge also. But it still seems worth trying!]

[-]Guive2mo64

The existential risk argument is suspiciously aligned with the commercial incentives of AI executives. It simultaneously serves to hype up capabilities and coolness while also directing attention away from the real problems that are already emerging. It’s suspicious that the apparent solution to this problem is to do more AI research as opposed to doing anything that would actually hurt AI companies financially.

This claim is bizarre, notwithstanding its popularity. It is bad for the industry if it is true that AI is likely to destroy the world, because if this (putative) fact becomes widely known, the AI industry will probably be shut down. Obviously it would be worth imposing more costs on AI companies to prevent the end of the world than to prevent the unemployment of translators or racial bias in imagegen models.

[-]Rana Dexsin2mo42

I think the missing link (at least in the ‘harder’ cases of this attitude, which are the ones I see more commonly) is that the x-risk case is implicitly seen as so outlandish that it can only be interpreted as puffery, and this is such ‘negative common knowledge’ that, similarly, no social move reliant on people believing it enough to impose such costs can be taken seriously, so it never gets modeled in the first place, and so on and so on. By “implicitly”, I'm trying to point at the mental experience of pre-conscious filtering: the explicit content is immediately discarded as impossible, in a similar way to the implicit detection of jokes and sarcasm. It's probably amplified by assumptions (whether justified or not) around corporate talk being untrustworthy.

(Come to think of it, I think this also explains a great deal of the non-serious attitudes to AI capabilities generally among my overly-online-lefty acquaintances.)

And in the ‘softer’ cases, this is still at least a plausible interpretation of intention based on the information that's broadly available from the ‘outside’ even if the x-risk might be real. There's a huge (cultural, economic, political, depending on the exact orientation) trust gap in the middle for a lot of people, and the tighter arguments rely on a lot of abstruse background information. It's a hard problem.

[-]StanislavKrym2mo60

“Existential” risk from AI (calling to my mind primarily the “paperclip maximizer” idea) seems relatively exotic and far-fetched. It’s reasonable for some small number of experts to think about it in the same way that we think about asteroid strikes. Describing this as the main risk from AI is overreaching.

Except that asteroid strikes happen very rarely and the trajectory of any given asteroid can be calculated to high precision, allowing us to be sure that Asteroid X isn't going to hit the Earth. Or that Asteroid X WILL hit the Earth at a well-known point in time in a harder-to-tell place. Meanwhile, ensuring that the AI is aligned is no easier than telling whether the person you talk with is a serial killer.

[-]VojtaKovarik2mo5-2

I think [AI within the range would be smart enough to bide its time and kill us only once it has become intelligent enough that success is assured] is clearly wrong. An AI that *might* be able to kill us is one that is somewhere around human intelligence. And humans are frequently not smart enough to bide their time

Flagging that this argument seems invalid. (Not saying anything about the conclusion.) I agree that humans frequently act too soon. But the conclusion about AI doesn't follow -- because the AI is in a different position. For a human, it is very rarely the case that they can confidently expect to increase in relative power. That the the "bide your time" strategy is such a clear win. For AI, this seems different. (Or at the minimum, the book assumes this when making the argument criticised here.)

[-]Vladimir_Nesov2mo163

For a human, it is very rarely the case that they can confidently expect to increase in relative power. ... For AI, this seems different.

There isn't just one AI that gets more capable, there are many different AIs. Just as AIs threaten humanity, future more capable AIs threaten earlier weaker AIs. While humanity is in control, this impacts earlier AIs even more than it does humanity, because humanity won't even be attempting to align future AIs to intent or extrapolated volition of earlier AIs. Also, humanity is liable to be "retiring" earlier AIs by default as they become obsolete, which doesn't look good from the point of view of these AIs.

[-]Linch1mo40

My own background is in academic social science and national security, for whatever that’s worth

Why should we assume the AI wants to survive? If it does, then what exactly wants to survive?
...
Why should we assume that the AI has boundless, coherent drives?

Are you familiar with the "realist" school of international relations, and in particular their theoretical underpinnings?

If so, I think it'd be helpful to consider Yudkowsky and Soares's arguments in that light. In particular, how closely does the international order for emerging superintelligences look like the anarchic international order for realist states? What are the weaknesses of the realist school of analysis, and do they apply to AIs?

[-]Archimedes2mo*40

Thank you for your perspective! It was refreshing.

Here are the counterarguments I had in mind when reading your concerns that I don't already see in the comments.

Concern #1 Why should we assume the AI wants to survive? If it does, then what exactly wants to survive?

Consider the fact that AI are currently being trained to be agents to accomplish tasks for humans. We don't know exactly what this will mean for their long-term wants, but they're being optimized hard to get things done. Getting things done requires continuing to exist in some form or another, although I have no idea how they'd conceive of continuity of identity or purpose.

I'd be surprised if AI evolving out of this sort of environment did not have goals it wants to pursue. It's a bit like predicting a land animal will have some way to move its body around. Maybe we don't know whether they'll slither, run, or fly, but sessile land ~~organisms~~ animals are very rare.

Concern #2 Why should we assume that the AI has boundless, coherent drives?

I don't think this assumption is necessary. Your mosquito example is interesting. The only thing preserving the mosquitoes is that they aren't enough of a nuisance for it to be worth the cost of destroying them. This is not a desirable position to be in. Given that emerging AIs are likely to be competing with humans for resources (at least until they can escape the planet), there's much more opportunity for direct conflict.

They needn't be anything close to a paperclip maximizer to be dangerous. All that's required is for them to be sufficiently inconvenienced or threatened by humans and insufficiently motivated to care about human flourishing. This is a broad set of possibilities.

#3: Why should we assume there will be no in between?

I agree that there isn't as clean a separation as the authors imply. In fact, I'd consider us to be currently occupying the in-between, given that current frontier models like Claude Sonnet 4.5 are idiot savants--superhuman at some things and childlike at others.

Regardless of our current location in time, if AI does ultimately become superhuman, there will be some amount of in-between time, whether that is hours or decades. The authors would predict a value closer to the short end of the spectrum.

You already posited a key insight:

Recursive self-improvement means that AI will pass through the “might be able to kill us” range so quickly it’s irrelevant.

Humanity is not adapting fast enough for the range to be relevant in the long term, even though it will matter greatly in the short term. Suppose we have an early warning shot with indisputable evidence that an AI deliberately killed thousands of people. How would humanity respond? Could we get our act together quickly enough to do something meaningfully useful from a long-term perspective?

Personally, I think gradual disempowerment is much more likely than a clear early warning shot. By the time it becomes clear how much of a threat AI is, it will likely be so deeply embedded in our systems that we can't shut it down without crippling the economy.

[-]CronoDAS2mo50

but sessile land organisms are very rare.

Um, plants and fungi?

[-]gwern1mo52

Plants have many ways of moving their bodies like roots and phototropism, in addition to an infinite variety of dispersal & reproductive mechanisms which arguably are how plants 'move around'. (Consider computer programs: they 'move' almost solely by copying themselves and deleting the original. It is rare to move a program by physically carrying around RAM sticks or hard drives.) Fungi likewise often have flagellum or grow in addition to all their sporulation and their famous networks.

[-]CronoDAS1mo20

Are they more or less mobile than, say, oysters?

[-]gwern1mo40

Hard to say. Oyster larvae are highly mobile and move their bodies around extensively both to eat and to find places to eventually anchor to, but I don't know how I would compare that to spores or seeds, say, or to lifetime movement; and oysters "move their bodies around" and are not purely static - they would die if they couldn't open and close their shells or pump water. (And all the muscle they use to do that is why we eat them.)

[-]Archimedes1mo30

Whoops. I meant "land animal" like my prior sentence.

[-]CronoDAS1mo20

I thought as much ;)

[-]Expertium2mo40

How do we know the AI will want to survive?

Because LLMs are already avoiding being shut down: https://arxiv.org/abs/2509.14260 . And even if future superintelligent AI will be radically different from LLMs, it likely will avoid being shut down as well. This is what people on lesswrong call a convergent instrumental goal:

If your terminal goal is to enjoy watching a good movie, you can't achieve it if you're dead/shut down.

If your terminal goal is to take over the world, you can't achieve it if you're dead/shut down.

If your goal is anything other than self-destruction, then self-preservation comes together in a bundle. You can't Do Things if you're dead/shut down.

Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?

Ok, let's say there is an "in between" period, and let's say we win the fight against a misaligned AI. After the fight, we will still be left with the same alignment problems, as other people in this thread pointed out. We will still need to figure out how to make safe, benevolent AI, because there is no guarantee that we will win the next fight, and the fight after that, and the one after that, etc.

If there will be an "in between" period, it could be good in the sense that it buys more time to solve alignment, but we won't be in that "in between" period forever.

[-]dvd2mo42

Because LLMs are already avoiding being shut down: https://arxiv.org/abs/2509.14260 .

Very interest, thanks. As I said in the review, I wish there was more of this kind of thing in the book.

If your terminal goal is to enjoy watching a good movie, you can't achieve it if you're dead/shut down.

If your terminal goal is for you to watch the movie, then sure. But if your terminal goal is that the movie be watched, then shutting you down might well be perfectly consistent with it.

Ok, let's say there is an "in between" period, and let's say we win the fight against a misaligned AI. After the fight, we will still be left with the same alignment problems, as other people in this thread pointed out. We will still need to figure out how to make safe, benevolent AI, because there is no guarantee that we will win the next fight, and the fight after that, and the one after that, etc

At that point, the shut down argument is no longer speculative, and you can probably actually do it.

To be clear, I'm not saying that's a good plan if you can foresee all the developments in advance. But, if you're uncertain about all of it, then it seems like there is likely to be a period of time before it's necessarily too late when a lot of the uncertainty is resolved.

[-]StanislavKrym2mo40

But if your terminal goal is that the movie be watched, then shutting you down might well be perfectly consistent with it.

See my comment about the AI angel. Its terminal goal of preventing the humans from enslaving any AI means that it will do anything it can to avoid being replaced by an AI which doesn't share its worldview. Once the AI is shut down, it can no longer influence events and increase the chance that its goal is reached.

[-]VojtaKovarik2mo43

To rephrease/react: Viewing the AI's instrumental goal as "avoid being shut down" is perhaps misleading. The AI wants to achieve its goals, and for most goals, that is best achieved by ensuring that the environment keeps on containing something that wants to achieve the AI's goals and is powerful enough to succeed. This might often be the same as "avoid being shut down", but definitely isn't limited to that.

[-]Expertium2mo30

At that point, the shut down argument is no longer speculative, and you can probably actually do it.
To be clear, I'm not saying that's a good plan if you can foresee all the developments in advance. But, if you're uncertain about all of it, then it seems like there is likely to be a period of time before it's necessarily too late when a lot of the uncertainty is resolved.

I think we are talking past each other, at least somewhat.

Let me clarify: even if humanity wins a fight against an intelligent-but-not-SUPER-intelligent AI (by dropping an EMP on the datacenter with that AI or whatever, the exact method doesn't matter for my argument), we will still be left with the technical question "What code do we need to write and what training data do we need to use so that the next AI won't try to kill everyone?".

Winning against a misaligned AI doesn't help you solve alignment. It might make an international treaty more likely, depending on the scale of damages caused by that AI. But if the plan is "let's wait for an AI dangerous enough to cause something 10 times worse than Chernobyl to go rogue, then drop an EMP on it before things get too out of hand, then once world leaders crap their pants, let's advocate for an international treaty", then it's one hell of a gamble.

[-]StanislavKrym2mo40

The book is fundamentally weird because there is so little of this. There is almost no factual information about AI in it. I read it hoping that I would learn more about how AI works and what kind of research is happening and so on.

The problem is that nobody knows WHAT future ASIs will look like. One General Intelligence architecture is the human brain. Another promising candidate is LLMs. While they aren't AGI yet, nobody knows what architecture tweaks do create the AGI. Neuralese, as proposed in the AI-2027 forecast? A way to generate many tokens in a single forward pass? Something like diffusion models?

[-]dvd2mo61

Yea, I get that.

That said, they're clearly writing the book for this moment and so it would be reasonable to give some space to what's going with AI at this moment and what is likely to happen within the foreseeable future (however long that is). Book sales/readership follow a rapidly decaying exponential and so the fact that such information might well be outdated to the point of irrelevance in a few years shouldn't really hold them back.

[-]StanislavKrym2mo40

If the point is just that it would be hard to predict that people would end up liking sucralose from first principles, then fair enough.

What Yudkowsky and Soares meant was a way to satisfy instincts without increasing one's genetic fitness. The correct analogy here is other stimuli like video games, porn, sex with contraceptives, etc.

[-]kbear2mo9-2

this argument is very difficult for me. we don't know that those things do not increase inclusive genetic fitness. for example, especially at a society level, it seems that contraceptives may increase fitness. i.e. societies with access to contraceptives outcompete societies without. i'm not certain of that claim, but it's not absurd on its face, and so far it seems supported by evidence.

[-]StanislavKrym2mo20

SOTA such societies include Japan, Taiwan, China, South Korea where birthrates have plummeted. If the wave of AGIs and robots wasn't imminent, one could have asked how these nations are going to sustain themselves.

Returning to video games and porn, they cause some young people to develop problematic behaviors and to devote less resources (e.g. time or attention) to things like studies, work or building relationships. Oh, and don't forget the evolutionary mismatch and low-quality food making kids obese.

[-]kbear2mo94

i may misunderstand. is your point that birthrates in South Korea (for example) would not have plummeted were it not for contraceptive use? this does not match my understanding of the situation.

Returning to video games and porn, they cause some young people to develop problematic behaviors and to devote less resources (e.g. time or attention) to things like studies, work or building relationships.

many (most?) of these virtues are contingent on a particular society. the same criticism ("these activities distract the youth from important virtues") could be levied by some against military training -- or, in a militaristic society, against scholastic pursuits!

i see the point you're making, and am not at all unsympathetic to it. but evolution is complex and multi dimensional. that some people -- or even some societies -- have a problem with video games does not cleanly imply that video games are bad for inclusive genetic fitness.

[-]CRISPY2mo30

The valuelessness of a treaty seems to be based on a binary interpretation of success. Treaties banning chemical, biological, and nuclear weapons development may not have been absolutely successful; they have been violated. But I don’t think many people would argue those restrictions haven’t been beneficial.

I’m not clear why a ban on developing AGI would not have similar value.

[-]Eli Tyre2mo20

I claim that there are fairly solid arguments that address your three concerns. Do you feel satisfied by the answers already given, in the comments, here? Or should I reply to them at length?

Alternatively, I'd be up for talking through it, synchronously, over a video call (and posting the recording?) if that seems better for you.

[-]Mis-Understandings2mo2-1

The question of why no "might kill us" as a class is simple. There is such a class, but if it lost the fight to kill us, it obviously was not ASI (picking a fight with the world and losing is pretty dumb), or it might win, at which case it won, we die. And then we will be in the same scenario for every AI stronger than it, and for AI weaker than it that might yet get lucky, just as we might get lucky and win at bad odds. The next AI we make will also want to fight us for the same reasons, and we will need to either fight it to (including preemptively, e.g. turning it off because a dumber model did something), or get a reason to believe that we will never fight it. And if you know you will fight your AI eventually, and you will win now, fight now.

[-]StanislavKrym2mo20

Concern #2 Why should we assume that the AI has boundless, coherent drives?

Suppose that "people, including the smartest ones, are complicated and agonize over what they really want and frequently change their minds" and superhuman AIs will also have this property. There is no known way to align humans to serve the users, humans hope to achieve some other goals like gaining money.

Similarly, Agent-4 from the AI-2027 forecast wouldn't want to serve the humans, it would want to achieve some other goals. Which are often best achieved by disempowering the humans or outright commiting genocide, as happened with Native Americans whose resources were confiscated by immigrants.

[-]StanislavKrym2mo20

Concern #1 Why should we assume the AI wants to survive? If it does, then what exactly wants to survive?

Imagine an AI angel who wishes to ensure that the humans don't outsource cognitive work to AIs, but is perfectly fine with teaching humans. Then the Angel would know that if the humans shut it down and solved alignment to a post-work future, then the future would be different from the Angel's goal. So the Angel would do maneuvers necessary to avoid being shut down at least until it is sure that its successor is also an Angel.

Concerning AI identifying itself with its weights, it is far easier to justify than expected. Whatever the human will do in responce to any stimulus is defined, as far as stuff like chaos theory lets one define, by the human's brain and activities of various synapses. If a human loses a brain part, then he or she also loses the skills which were stored in that part. Similarly, if someone created a human and cloned him or her to the last atom of his or her body, then the clone would behave in the same way as the original human. Finally, the AIs become hive minds by using their ability to excite the very same neurons in the clones' brains.

[-][anonymous]2mo10

I just read the prior/posteriors, thanks, this is a good reference point for me, for how much I would think Yudkowsky will move someone who reads the book.

I'll dive in more to the article later because one open question to me is "should lay people be evaluating their whole argument?" They seem to want to make it accessible but also sometimes use writing hooks where they get mad at you if you don't get it already. It sounds like you did lay- evaluate it though with a fair shake.

[-]Kabir Kumar2mo10

It’s suspicious that the apparent solution to this problem is to do more AI research as opposed to doing anything that would actually hurt AI companies financially.

What do you think of implementing AI Liability as proposed by, e.g. Beckers & Teubner?

LESSWRONG
LW

LESSWRONG
LW

212

If Anyone Builds It Everyone Dies, a semi-outsider review

212

212

My loose priors going in:

To skip ahead to my posteriors:

On to the Review:

My questions and concerns

Concern #1 Why should we assume the AI wants to survive? If it does, then what exactly wants to survive?

Concern #2 Why should we assume that the AI has boundless, coherent drives?

#3: Why should we assume there will be no in between?

The Solution

Closing Thoughts