I enjoyed most of IABIED

Buck

LESSWRONG
LW

I enjoyed most of IABIED — LessWrong

208 I enjoyed most of IABIED

by Buck

17th Sep 2025

9 min read

208

I listened to "If Anyone Builds It, Everyone Dies" today.

I think the first two parts of the book are the best available explanation of the basic case for AI misalignment risk for a general audience. I thought the last part was pretty bad, and probably recommend skipping it. Even though the authors fail to address counterarguments that I think are crucial, and as a result I am not persuaded of the book’s thesis and think the book neglects to discuss crucial aspects of the situation and makes poor recommendations, I would happily recommend the book to a lay audience and I hope that more people read it.

I can't give an overall assessment of how well this book will achieve its goals. The point of the book is to be well-received by people who don't know much about AI, and I’m not very good at predicting how laypeople will respond to it; seems like results so far are mixed, and reviews from people who are familiar with AI risk are fairly negative. So I’ll just talk about whether I think the arguments in the book are reasonable enough that I want them to be persuasive to the target audience, rather than whether I think they’ll actually succeed.

Thanks to several people for helpful and quick comments and discussion, especially Oli Habryka and Malo Bourgon!

Synopsis

Here's a synopsis and some brief thoughts, part-by-part:

In part 1, they explain what neural nets are and why you might expect powerful AIs to be misaligned. I thought it was very good. I think it's a reasonable explanation of basic ML and an IMO excellent exploration of what the evolution analogy suggests about AI goals (though I think that there are a bunch of disanalogies that the authors don’t discuss, and I imagine I’d dislike their discussion of that if they did write it). I agreed with most of this section.
- I thought the exploration of the evolution analogy was great – very clearly stated and thoughtful. I don’t remember if I’ve previously read other versions of this argument that also made all the points here (though there are many important subtleties to the argument that it doesn’t discuss; for example, it almost totally ignores instrumental alignment).
- Overall, I thought this part does a great job of articulating arguments. The text does respond to a bunch of counterarguments, but they mostly felt like really naive and rudimentary counterarguments to me, and I felt like most of the counterarguments that I see in the wild (e.g. from people on Twitter, who are mostly much more informed about AI than the audience of this book) were left unaddressed. I have no idea whether the authors’ prioritization of counterarguments was right for that audience, and I do think it would be handy to have a version of this book somewhat more appropriate for AI twitter people.

Part 2, where they tell a story of AI takeover, is solid; I only have one footnoted quibble^[1].
- In general, they try to tell the story as if the AI company involved is very responsible, but IMO they fail to discuss some countermeasures the AI company should take (e.g. I would take those actions if I were in charge of a ten-person team, assuming the rest of the company is being reasonably cooperative with my team). This doesn't hurt the argument very much, because it's easy to instead read it as a story about a real, not-impressively-responsible AI company.
Part 3, where they try to talk about the state of countermeasures and how the risk should be responded to, varied between okay and awful, and overall felt pretty useless to me. If I were recommending the book to someone, I would plausibly recommend that they skip it.
- They give a general discussion of how engineering problems are often hard when you don't have good feedback loops or good understanding of the underlying science, and when the technology involves fast-moving components (e.g. nuclear reactors); this content was fine.
- They very briefly discuss automated AI alignment research as a proposal for mitigating AI risk, but their arguments against that plan do not respond to the most thoughtful versions of these plans. (In their defense, the most thoughtful versions of these plans basically haven't been published, though Ryan Greenblatt is going to publish a detailed version of this plan soon. And I think that there are several people who have pretty thoughtful versions of these plans, haven't written them up (at least publicly), but do discuss them in person.)
- (They also criticize some other naive plans for mitigating AI risk, like “train it to be curious, so that it preserves humanity”; I think their objections to those are fine.)
- They propose that GPU clusters (perhaps as small as 8 GPUs!) be banned or restricted somehow and suggest some other calls to action; I don’t think the ideas here are very good. (I’m told that the online resources go into more detail on their proposals; my concern isn’t that the proposals aren’t detailed enough, but that they aren’t very good interventions to push for.)

I personally really liked the writing throughout (unlike e.g. Shakeel I didn't find the sentences torturous at all). I'm a huge fan of Eliezer's fiction and most of his non-fiction that doesn't talk about AI, so maybe this is unsurprising. I often find it annoying to read things Eliezer and Nate write about AI, but I genuinely enjoyed the experience of listening to the book. (Also, the narrator for the audiobook does a hilarious job of rendering the dialogues and parables.)

My big disagreement

In the text, the authors often state a caveated version of the title, something like "If anyone builds it (with techniques like those available today), everyone dies". But they also frequently state or imply the uncaveated title. I'm quite sympathetic to something like the caveated version of the title^[2]. But I have a huge problem with equivocating between the caveated and uncaveated versions.

There are two possible argument structures that I think you can use to go from the caveated thesis to the uncaveated one, and both rely on steps that are IMO dubious:

Argument structure one:

If anyone built ASI with current techniques in a world that looked like today's, everyone would die.
Tricky hypothesis 1: ASI will in fact be developed in a world that looks very similar to today's (e.g. because sub-ASI AIs will have negligible effect on the world; this could also be because ASI will be developed very soon).
Therefore, everyone will die.

This is the argument that I (perhaps foolishly and incorrectly) understood Eliezer and Nate to be making when I worked with them, and the argument I made when I discussed AI x-risk five years ago, right before I started changing my mind on takeoff speeds.

I think Eliezer and Nate aren’t trying to make this argument—they are agnostic on timelines and they don’t want to argue that sub-ASI AI will be very unimportant for the world. I think they are using what I’ll call “argument structure two”:

If anyone built ASI with current techniques in a world that looked like today's, everyone would die.
The big complication: However, ASI might be built in a world that looks very different from today's: it might be several decades in the future, pretty powerful AI might be available for a while before ASI is developed, researchers might be way more experienced getting AIs to do stuff than they currently are.
Tricky hypothesis 2: But the differences between the world of today and the world where ASI will be developed don't matter for the prognosis.

Therefore, everyone will die.

The authors are (unlike me) confident in tricky hypothesis 2. The book says almost nothing about either the big complication or tricky hypothesis 2, and I think that’s a big hole in their argument that a better book would have addressed.^[3]

I think that explicitly mentioning the big complication is pretty important for giving your audience an accurate picture of what you're expecting. Whenever I try to picture the development of ASI, it's really salient in my picture that that world already has much more powerful AI than today’s, and the AI researchers will be much more used to seeing their AIs take unintended actions that have noticeably bad consequences. Even aside from the question of whether it changes the bottom line, it’s a salient-enough part of the picture that it feels weird to neglect discussing it. (See also Lukas's articulation of this complaint.)

And of course, the core disagreement that leads me to disagree so much with Eliezer and Nate on both P(AI takeover) and on what we should do to reduce it: I don't agree with tricky hypothesis 2. I think that the trajectory between here and ASI gives a bunch of opportunities for mitigating risk, and most of our effort should be focused on exploiting those opportunities. If you want to read about this, you could check out the back-and-forth me and my coworkers had with some MIRI people here, or the back-and-forth Scott Alexander and Eliezer had here.

(This is less relevant given the authors’ goal for this book, but from my perspective, another downside of not discussing tricky hypothesis 2 is that, aside from being relevant to estimating P(AI takeover), understanding the details of these arguments is crucial if you want to make progress on mitigating these risks.)

If they wanted to argue a weaker claim, I'd be entirely on board. For example, I’d totally get behind:

It is pretty unclear whether we will survive or not. There are various reasons to think we might be able to prevent AI takeover. But none of those reasons are airtight, and many of them require that all AI companies with dangerous models implement safety measures competently, and it's very unclear that that will happen.
We should demand an extremely low probability of extinction from AI developers, because extinction would be really bad. And we are not on track to getting to justified confidence in the safety of powerful AI.

But instead, they propose a much stronger thesis that they IMO fail to justify.

This disagreement leads to my disagreement with their recommendations—relatively incremental interventions seem much more promising to me.

(There’s supplementary content online. I only read some of this content, but it seemed somewhat lower quality than the book itself. I'm not sure how much of that is because the supplementary content is actually worse, and how much of it is because the supplementary content gets more into the details of things—I think that the authors and MIRI staff are very good at making simple conceptual arguments clearly, and are weaker when arguments require attention to detail.)

(I will also parenthetically remark that superintelligence is less central in my picture than theirs. I think that there is substantial risk posed by AIs that are not wildly superintelligent, and it's plausible that humans purposefully or involuntarily cede control to AIs that are less powerful than the wildly superintelligent ones the authors describe in this book. This causes me to disagree in a bunch of places.)

I tentatively support this book

I would like it if more people read this book, I think. The main downsides are:

Some people will be persuaded by the parts of the book that I think are wrong, which will have slightly bad consequences but is not a huge problem, and seems overall better than them never engaging with a serious argument about AI x-risk.
Some people will be turned off by the book, especially the most unreasonable parts of it, and we will have missed the opportunity to have someone more reasonable (according to me) than Eliezer and Nate write a similar book and then do a tour etc. I'm less worried about this after reading the book, because the book was good enough that it's hard for me to imagine someone else writing a much better one.
- Relatedly, success of this book will lead Eliezer and Nate to be more prominent public intellectuals on the topic of AI. I don't know whether this is good or bad. It really depends on who they're displacing.
  - I don't love them as representatives of AI safety to the public for a few reasons. Despite the book being impressively cleaned up compared to Eliezer’s usual writing style, I expect them to be somewhat worse at being likable and persuasive to mass audiences in unscripted settings. I think their arguments are often unpersuasive to informed audiences (partially because of the flaws in the arguments that I complained about above, and partially because they don’t know much about empirical ML or empirical evidence about alignment and sometimes come across as blowhards to ML researchers). And I disagree with their recommended actions.
  - I think it would be suboptimal if important stakeholders tried to get advice from them (though again, it depends who they’re displacing), because I don't think that they have good recommendations for what people should do.

Despite my complaints, I’m happy to recommend the book, especially with the caveat that I think it's wrong about a bunch of stuff including the thesis. Even given all the flaws, I don't know of a resource for laypeople that’s half as good at explaining what AI is, describing superintelligence, and making the basic case for misalignment risk. After reading the book, it feels like a shocking oversight that no one wrote it earlier.

^{^}
In their story, the company figures out a way to scale the AI in parallel, and then the company suddenly massively increases the parallel scale and the AI starts plotting against them. This seems somewhat implausible—probably the parallel scale would be increased gradually, just for practical reasons. But if that scaling had happened more gradually, the situation probably still wouldn't have gone that well for humanity if the AI company was as incautious as I expect, so whatever. (My objection here is different from what Scott complained about and Eliezer responded to here—I’m not saying it’s hugely unrealistic for parallel scaling to pretty suddenly lead to capabilities improving as rapidly as depicted in the book, I’m saying that if such a parallel scaling technique was developed, it would probably be tested out with incrementally increasing amounts of parallelism, if nothing else just for practical engineering reasons.)

^{^}
My main problems with the caveated version of the title:
- I again think they’re inappropriately reasoning about what happens for arbitrarily intelligent models instead of reasoning about what happens with AIs that are just barely capable enough to count as ASI. Their arguments (that AIs will learn goals that are egregiously misaligned with human goals and then conspire against us) are much stronger for wildly galaxy-brained AIs than for AIs that are barely smart enough to count as superhuman.
- I don't think it's clear that misaligned superintelligent AI would kill everyone as part of taking over; see discussion here. Note that the expected fatalities from getting taken over by wildly superintelligent AI are probably lower than the fatalities from getting taken over by an AI that is barely able to take over, because in the latter case the AI might have to kill us in order to take our stuff despite not wanting to do so.
^{^}
I don't think Eliezer and Nate are capable of writing this better book, because I think their opinions on this topic are pretty poorly thought through.

IABIEDAI

Frontpage

208

New Comment

46 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:18 AM

[-]Aaron_Scher5mo6438

I feel a bit surprised by how much you dislike Section 3. I agree that it does not address 'the strongest counterarguments and automated-alignment plans that haven't been written down publicly'; this is a weakness but seems too demanding given what’s public.

I particularly like the analogy to alchemy presented in Chapter 11. I think it is basically correct (or as correct as analogies get) that the state of AI alignment research is incredibly poor and the field is in its early stages where we have no principled understanding of anything (my belief here is based on reading or skimming basically every AI safety paper in 2024). The next part of the argument is like "we're not going to be able to get from the present state of alchemy to a 'mature scientific field that doesn't screw up certain crucial problems on the first try' in time". That is, 1: the field is currently very early stages without principled understanding, 2: we're not going to be able to get from where we are now to a sufficient level by the time we need.

My understanding is that your disagreement is with 2? You think that earlier AIs are going to be able to dramatically speed up alignment research (and by using control methods we can get more alignment research out of better AIs, for some intermediate capability levels), getting us to the principled, doesn't-mess-up-the-first-try-on-any-critical-problem place before ASI.

Leaning into the analogy, I would describe what I view as your position as "with AI assistance, we're going to go from alchemy to first-shot-moon-landing in ~3 years of wall clock time". I think it's correct for people to think this position is very crazy at first glance. I've thought about it some and think it's only moderately crazy. I am glad that Ryan is working on better plans here (and excited to potentially update my beliefs, as I did when you all put out various pieces about AI Control), but I think the correct approach for people hearing about this plan is to be very worried about this plan.

I really liked Section 3, especially Ch 11, because it makes this (IMO) true and important point about the state of the AI alignment field. I think this argument stands on its own as a reason to have an AI moratorium, even absent the particular arguments about alignment difficulty in Section 1. Meanwhile, it sounds like you don't like this section because, to put it disingenuously, "they don't engage with my favorite automating-alignment plan that tries to get us from alchemy to first-shot-moon-landing in ~3 years of wall clock time and that hasn't been written down anywhere".

Also, if you happen to disagree strongly with the analogy to alchemy or 1 above (e.g., think it's an incorrect frame), that would be interesting to hear! Perhaps the disagreement is in how hard alignment problems will be in the development of ASI; for example, if the alchemists merely had to fly a blimp first try, rather than land a rocket on the moon? Perhaps you don't expect there to be any significant discontinuities and this whole "first try" claim is wrong and we'll never need a principled understanding?

I found this post and your review to be quite thoughtful overall!

[-]Vaniver5mo*4945

Some people will be turned off by the book, especially the most unreasonable parts of it, and we will have missed the opportunity to have someone more reasonable (according to me) than Eliezer and Nate write a similar book and then do a tour etc. I'm less worried about this after reading the book, because the book was good enough that it's hard for me to imagine someone else writing a much better one.

I want to register some "perfect is the enemy of the good" complaint here? Like--if you want to say "oh person X should totally write a book about AI risk", sure, let's all get together and encourage person X to write a book about AI risk. (I would probably be willing to fund more of these, and I'm probably not alone in that.) But I don't actually think there's anyone who seems more reasonable than them according to you who's willing to write a book. (Are you?)

Separately--I think this is the wrong model of book demand / opportunities to talk in public. I think when Superintelligence was published, there was more appetite for books like this as a result, instead of less. Similarly, I think IABIED is going to increase the appetite for books and articles and interviews by people who disagree with Eliezer and Nate, and so your hypothetical more reasonable author is more likely to get a book deal or speaking engagements as a result of IABIED increasing the temperature and salience of this issue.

[-]Buck5mo396

Why did I like the book so much more than I expected? I think it's a mix of:

I like the authors' writing on basic AI risk stuff but I don't like their writing on more in-the-weeds questions, and I run across their in-the-weeds writing much more in my day-to-day life, so it's surprisingly pleasant to read them writing intro materials.
Their presentation of the arguments were cleaner here than I've previously seen.

[-]sjadler5mo*202

I agree re cleaner presentation & thought the parables here were much easier to follow than some of Eliezer’s past two-people-having-a-conversation pieces

I also thought that chapters generally opened with interesting ledes and that their endings flowed well into the chapter that followed. I was impressed by the momentum / throughline of the book in that sense

[-]Nina Panickssery5mo*3613

Just finished the book and agree that I’d recommended it to laypeople and predict it would improve the average layperson’s understanding of MIRI’s AI risk arguments.

[-]RobertM5mo3221

They very briefly discuss automated AI alignment research as a proposal for mitigating AI risk, but their arguments against that plan do not respond to the most thoughtful versions of these plans. (In their defense, the most thoughtful versions of these plans basically haven't been published, though Ryan Greenblatt is going to publish a detailed version of this plan soon. And I think that there are several people who have pretty thoughtful versions of these plans, haven't written them up (at least publicly), but do discuss them in person.)

Am a bit confused by this section - did you think that part 3 was awful because it didn't respond to (as yet unpublished) plans, or for some other reason?

[-]Holly_Elmore5mo3116

The book is right that the real answer is to try to pause AI development. It’s not the killer objection you think it is that it could be theoretically possible to build safe ASI with future technology. I also think it’s possible we could end up okay by accident. It’s still a foolish plan and I assume you’re only doing Superalignment/AI Control stuff bc you feel forced to, and that you wouldn’t choose for your safety work to have to play catch-up to a racing industry.

[-]Buck5mo81

It depends on what you mean by "foolish plan." In a world where the median person had my beliefs, I totally agree that AI risk would be handled extremely differently (though idk if "pause now" is a good description of what we'd do). I totally agree that this world is making a huge error according to my values and most human values by not being much more cautious here.

I don't think "we might be okay" is a valid objection to the claim "AI might cause an existential catastrophe"—those claims are totally consistent. But I do think:

It's a valid objection to "AI is almost guaranteed to cause an existential catastrophe."
It's a very important aspect of the situation, and descriptions of AI risk that don't mention it seem to me like they're failing to discuss a crucial part of how it all goes down.

[-]habryka5mo5133

Just FWIW, I have never heard of a remotely realistic-seeming story for how things will be OK, without something that looks like coordination to not build ASI for quite a while. The only difference in belief I might have from some of the vibes in the book is that I believe that maybe^[1] we get lucky and we can use the AIs to coordinate humanity to not build ASI for quite a while.

But even from my epistemic vantage point, I think it's kind of OK for the book to not talk through that specific plan, because it's ex-ante pretty crazy. Hoping that the mildly superhuman AIs will turn out to be useful enough to prevent us from building ASI is really not a good plan. We might get lucky, but I do think you should really try very hard to not rely on getting that lucky.

AI is almost guaranteed to cause an existential catastrophe.

Relatedly, the book argues pretty clearly for "ASI, if built using the current methods we know about, is almost guaranteed to cause an existential catastrophe", which is very importantly different from the quote above, and this still seems just really straightforwardly true to me. My guess is even with lots of AI assistance we will first end up doing something that looks more like "really slowing down a lot" if things end up fine.

I personally do think it would have been good for the book to say some more about how there is some chance we might end up being able to coordinate a pause using somewhat superhuman systems, but it does seem really hard to get across while also being clear that this would be a really bad and terrible plan to rely on, and the first and obvious thing to do is to not build these systems in the first place. Like, I feel like the reasonable reaction most people reading the book would have to such a section would be "why go into this in the first place if it seems like such a bad idea? I feel like you should have used this space fleshing out a plan that isn't crazily risky".

^{^}
Like 5-15%?

[-]Lukas Finnveden5mo3811

I think the discussion wouldn't have to be like "here's a crazy plan".

I think there could have been something more like: "Important fact to understand about the situation: Even if superintelligence comes within the next 10 years, it's pretty likely that sub-ASI systems will have had a huge impact on the world by then — changing the world in a few-year period more than any technology ever has changed the world in a few-year period. It's hard to predict what this would look like [easy calls, hard calls, etc]. Some possible implications could be: [long list: ..., automated alignment research, AI-enabled coordination, people being a lot more awake to the risks of ASI, lots of people being in relationships with AIs and being supportive of AI rights, not-egregiously-misaligned AIs that are almost as good at bio/cyber/etc as the superintelligences...]. Some of these things could be helpful, some could be harmful. Through making us more uncertain about the situation, this lowers our confidence that everyone will die. In particular, some chance that X, Y, Z turns out really helpful. But obviously, if we see humanity as an agent, it would be a dumb plan for humanity to just assume that this crazy, hard-to-predict mess will save the whole situation."

I.e. it could be presented as an important thing to understand about the strategic situation rather than as a proposed plan.

[-]habryka5mo81

I agree that a section like this would be good!

Modulo the sentence "this lowers our confidence that everyone will die", since I don't think it's what they believe, or what I believe, though it's messy. My guess is this period is also majorly responsible for increasing risks by creating tons of economic momentum that then make it hard to stop when you get to really risky AI, and so my best guess is the overall technological diffusion will make things riskier instead of less, though I don't have a strong take either way.

Having the economic incentives plus other things explained, and being like "and yep, this seems like it might make things worse or better, it makes it harder to be confident about how things go, though the core difficulty remains", would be good.

[-]cousin_it5mo*70

Through making us more uncertain about the situation, this lowers our confidence that everyone will die.

This seems to rely on the assumption that "there's nowhere to go but up": that we're pretty certain of doom, so wildcards in the future can only make us less certain. But I don't think that works. Wildcards in the future can also increase s-risks, and there's no limit how bad things can get, potentially much worse than extinction.

[-]ryan_greenblatt5mo296

I have never heard of a remotely realistic-seeming story for how things will be OK, without something that looks like coordination to not build ASI for quite a while.

I wonder if we should talk about this at some point. This perspective feels pretty wild to me and I don't immediately understand where the crux lives.

Do you think it will be extremely hard to avoid scheming in human-ish level AIs?
Do you think it will be extremely hard to get not-scheming-against-us human-ish level AIs aligned enough that handing over safety work to them is competitive with emulated humans (at the same speed and cost)?
Do you think that AIs trying hard to ongoingly ensure alignment and given some lead time will fail because alignment is much, much harder than capabilites? (E.g., full AGI can't align +1 SD AGI given 1 wall clock year, or +1 SD AGI can't align +2 SD AGI given 6 wall clock months, or so on.)
Maybe you're including ">1 years of lead time spent on safety" under "coordination to not build ASI for quite a while" and you think this is extremely unlikely?

[-]habryka5mo82

I wonder if we should talk about this at some point.

I would definitely be interested!

Some things to respond now:

One of the key thing is just that you end up with recursive self-improvement/automated AI software development and then everything happens much quicker. I think at the very least you need to intervene to stop that feedback loop. Like, we aren't then talking about "scheming in human-ish level AIs", we are then talking about "scheming in galaxy-brain level AIs", and yes, I think it's extremely unlikely that if you take anything remotely close to current AI systems and let them recursively self-improve/automate AI development at 1000x human speeds, you end up with a system that is aligned with humanity.

Do you think it will be extremely hard to get not-scheming-against-us human-ish level AIs aligned enough that handing over safety work to them is competitive with emulated humans (at the same speed and cost)?

This does seem very hard! But most importantly, it seems to me that among the first thing such human-ish level AIs would do is to coordinate to not have any subset of them build a much smarter system than themselves (including other actors that might be at different companies or datacenters).

Like, yeah, their work on facilitating a slowdown/pause might be at a kind of similar level to emulated humans. It again seems extremely unlikely that they would succeed at aligning a runaway intelligence explosion.

Maybe you're including ">1 years of lead time spent on safety" under "coordination to not build ASI for quite a while" and you think this is extremely unlikely?

Wall clock time and subjective time come apart a good amount here, and it's a bit confusing which one to care about. A few thoughts:

A lead time of >1 year does seem pretty unlikely at this point. My guess would be like 25% likely? So already this isn't going to work in 75% of worlds.
Again, the key thing here is you need some force that prevents people and AIs from building vastly superhuman ASI systems.
It's not implausible to me that then, with lots of AI assistance that isn't at a critical level, you can start inching in slowly into the strongly superhuman domain without everything going badly. I think it's very unlikely you can do it very quickly.
This might allow us to get to something like safe ASI on the scale of single-digit years, but man, this just seems like such an insane risk to take, that I really hope we instead use the AI systems to coordinate a longer pause, which seems like a much easier task.

[-]ryan_greenblatt5mo105

I think at the very least you need to intervene to stop that feedback loop.

There's probably at least some disagreement here. I think even if you let takeoff proceed at the default rate with a small fraction (e.g. 5%) explicitly spent on reasonably targeted alignment work at each point (as in, 5% beyond what is purely commercially expedient), you have a reasonable chance of avoiding AI takeover (maybe 50% chance of misaligned AI takeover?). Some of this is due to the possibility of takeoff being relatively slower and more compute constrained (which you might think is very unlikely?). I also think that there is a decent chance that you get a higher fraction spent on safety after handing off to AIs or after getting advice from highly capable AIs even if this doesn't happen before this.

It again seems extremely unlikely that they would succeed at aligning a runaway intelligence explosion.

I don't feel so confident--these AIs might have a decent amount of subjective time and total cognitive labor between each unit of increase in capabilities as the intelligence explosion continues such that they can keep things on track. Intuitively, capabilities might be more compute bottlenecked than alignment, so it should pull ahead if we can start with actually aligned (and wise) AIs (which is not easy to achieve to be clear!).

A lead time of >1 year does seem pretty unlikely at this point. My guess would be like 25% likely? So already this isn't going to work in 75% of worlds.

I agree with around 25% likely.

This might allow us to get to something like safe ASI on the scale of single-digit years, but man, this just seems like such an insane risk to take, that I really hope we instead use the AI systems to coordinate a longer pause, which seems like a much easier task.

I agree that coordinating a longer pause looks pretty good, but I'm not so sure about the relative feasibility given only the use of AIs that are somewhat more capable than top human experts (regardless of whether these AIs are running things). I think it might be much harder to buy 10 years of time than 2 years given the constraints at the time (including limited political will) and I'm not so sure aligning somewhat more powerful AIs will be harder (and then these somewhat more powerful AIs can align even more powerful AIs and this either bottoms out in a scalable solution to alignment or in powerful enough capabilities that they actually can buy more time).

One general note: I do think that "buying time along the way" (either before handing off to AIs or after) is quite helpful for making the situation go well. However, I can also imagine worlds where things go fine and we didn't buy much/any time (especially if takeoff is naturally on the slower side).

[-]Hastings5mo30

Do you have a realistic seeming story in mind?

[-]ryan_greenblatt5mo65

Sure, several. E.g.:

USG cares a decent amount and leading AI companies are on board, so they try to buy several additional years to work on safety.
We scale to roughly top human expert level while ensuring control.
Over time, we lower the risk of scheming at this level of capability through a bunch of empirical experiments and new interventions developed using a bunch of AI labor.
We relax our control measures and increasingly work on making AIs generally very trustworthy, including on hard-to-check open ended tasks. We do a bunch of studies of this.
This ends up not being that hard due to somewhat favorable generalization.
We handoff to AIs and they align their successors and so on.

[-]habryka5mo170

USG cares a decent amount and leading AI companies are on board, so they try to buy several additional years to work on safety.

Your first paragraph is an example of "something that looks like coordination to not build ASI for quite a while"! "Several additional years" is definitely "quite a while"!

I am not sure whether the other bullet lists are all supposed to take place within those few years, or whether you are expecting further cautious actions that slow things down. It sounds like at least within the USG we are coordinating to not build ASI, and generally are successfully establishing going carefully and slowly.

And then even after these bullet lists are over, my best guess is the AIs we "handed over" to would still decide to go quite slowly themselves, probably establishing some global coordination to go sufficiently slowly. My best guess is we also will have just wanted to do that earlier in collaboration with those AI systems.

[-]ryan_greenblatt5mo50

Your first paragraph is an example of "something that looks like coordination to not build ASI for quite a while"! "Several additional years" is definitely "quite a while"!

Ok, if you count a several additional years as quite a while, then we're probably closer to agreement.

For this scenario, I was imagining all these actions happen within 2 years of lead time. In practice, we should keep trying to buy additional lead time prior to it making sense to handoff to AIs and the AIs we handoff to will probably want to try to buy lead time (especially if there are strategies which are easier post handoff, e.g. due to leveraging labor from more powerful systems).

I'm unsure about the difficulty of buying different amounts of lead time and it seems like it might be harder to buy lead time than to ongoingly ensure the alignment of later AIs. Eventually, we have to do some kind of a handoff and I think it's safer to do this handoff to AIs that aren't substantially more capable than top human experts in general purpose qualitative capabilties (like I think you want to handoff at roughly the minimum level of capability where the AIs are clearly capable enough to dominate humans, including at conceptually tricky work).

[-]xpym5mo*3-2

a remotely realistic-seeming story for how things will be OK, without something that looks like coordination to not build ASI for quite a while

My mainline scenario is something like:

LLM scaling and tinkering peters out in the next few years without reaching capacity for autonomous R&D. LLMs end up being good enough to displace some entry-level jobs, but the hype bubble bursts and we enter a new AI winter for at least a couple of decades.

The "intelligence" thingie turns out to be actually hard and not amenable to a bag of simple tricks with a mountain of compute, for reasons gestured at in Realism about rationality. Never mind ASI, we're likely very far from being able to instantiate an AGI worthy of the name, which won't happen while we remain essentially clueless about this stuff.

I also expect that each subsequent metaphorical AI "IQ point" will be harder to achieve, not easier, so no foom or swift takeover. Of course, even assuming all that, it still doesn't guarantee that "things will be OK", but I'm sufficiently uncertain either way.

[-]Joe Rogero5mo313

I felt like most of the counterarguments that I see in the wild (e.g. from people on Twitter, who are mostly much more informed about AI than the audience of this book) were left unaddressed. I have no idea whether the authors’ prioritization of counterarguments was right for that audience, and I do think it would be handy to have a version of this book somewhat more appropriate for AI twitter people.

PSA: The online resources do indeed contain quite a few counter-counterarguments that didn't fit into the book. (Buck probably knows this already, some readers might not.)

[-]Darren McKee5mo171

Buck, did you read my book "Uncontrollable" ?

Given your review, it's possible my book is the response to what you're alluding to here: "I don't know of a resource for laypeople that’s half as good at explaining what AI is, describing superintelligence, and making the basic case for misalignment risk."

I'm only 40 pages in to the new book, and inherently conflicted of course, so it is better to have the thoughts of someone who has read both and isn't me, but people have said it is the best introduction to AI risk for laypeople.
I had hoped EY's book would clearly supplant mine but the more reviews I read, I think that isn't clearly the case.

(happy to get you a copy, physical or audio, if desired).

[-]Buck5mo70

I've seen physical copies around, but I actually haven't read it. It's possible that you're totally right, in which case I apologize and should have finished my review with "I'm an idiot for not realizing it was worth my time to read Uncontrollable so that I could recommend it to people".

I would appreciate a digital copy and audio copy if you wanted to email them to me! I'm not sure I'll consume it because I don't know if it's that decision-relevant to me.

[-]WilliamKiely5mo30

FWIW Darren's book Uncontrollable is my current top recommended book on AI.

While I expected (75% chance) IABIED to overtake it, after listening to the audiobook Tuesday I don't think IABIED is better (though I'll wait until I receive and reread my hardcopy to declare that definitively).

As I wrote on Facebook 10 months ago:

The world is not yet as concerned as it should be about the impending development of smarter-than-human AI. Most people are not paying enough attention.
What one book should most people read to become informed and start to remedy this situation?
"Uncontrollable: The Threat of Artificial Superintelligence and the Race to Save the World" by Darren McKee is now my top recommendation, ahead of:
- "Superintelligence" by Nick Bostrom,
- "Human Compatible" by Stuart Russell, and
- "The Alignment Problem" by Brian Christian
It's a short, easy read (6 hours at ~120wpm / 2x speed on Audible) covering all of the most important topics related to AI, from what's happening in the world of AI, to what risks from AI humanity faces in the near future, to what each and everyone one of us can do to help with the most important problem of our time.

[-]Darren McKee5mo10

Just posted my review: IABIED Review - An Unfortunate Miss — LessWrong

[-]Raemon5mo60

(fyi, I almost replied yesterday with "my shoulder Darren McKee is kinda sad about the 'no one else tried writing a book like this' line", and didn't get around to it because I was busy. I did get a copy of your book recently to see how it compared. Haven't finished reading it yet)

[-]Darren McKee5mo10

Oh well, but you were correct.

You aren't really my target audience but I'd be curious to hear what you think. I'm re-reading it myself.

[-]Karl von Wendt5mo164

Tricky hypothesis 2: But the differences between the world of today and the world where ASI will be developed don't matter for the prognosis.

I don't think that the authors implied this. Right in the first chapter, they write:

If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.

(emphasis by me). Even if it is not always clearly stated, I think they don't believe that ASI should never be developed, or that it is impossible in principle to solve alignment. Their major statement is that we are much farther from solving alignment than from building a potentially uncontrollable AI, so we need to stop trying to build it.

Their suggested measures in part III (whether helpful/feasible or not) are meant to prevent ASI under the current paradigms, with the current approaches to alignment. Given the time gap, I don't think this matters very much, though - if we can't prevent ASI from being built as soon as it is technically possible, we won't be in a world that differs enough from today's to render the book title wrong.

[-]Buck5mo148

Since writing this post, I have updated towards it's a bigger problem that this book's core thesis is wrong. It's pretty annoying that me and similar people who recommend the book have to caveat with "obviously this is overblown and the arguments are skipping a bunch of important steps, but I think the issue described is real and important".

[-]Vaniver5mo24-2

I am not actually convinced that this annoyance should be fed instead of starved? I'm thinking mostly about Your Price for Joining, here, and a comment by Nate at a recent event that he wished people who thought the book was all correct except for part i would argue with people who thought the book was all correct except for part j. Like, I think the overlap between your criticisms and the criticisms of other people in the community is actually pretty low, and the correct interpretation of that is more like "yes we agree with the core thesis but disagree on nuance" rather than "we all think the core thesis is wrong in identifiable way X."

[To be clear, the core thesis is "if anybody builds it, everybody dies", and that the default path is ruin. If you think your alignment agenda has a shot, and also everyone will implement your alignment agenda on the default path--then it makes sense to disagree, because you're part of the "it"? But I think this is a probably an unreasonable expectation, and if you think there's a company that isn't going to implement your alignment agenda, then maybe you're not in the "it".]

[-]Buck5mo72

Like, I think the overlap between your criticisms and the criticisms of other people in the community is actually pretty low,

I disagree; I think my core complaints about their arguments are very similar to Will MacAskill, Kelsey Piper, and the majority of people whose ideas on AI safety I have a moderate amount of respect for. I agree that some other LW people agree with the parts I think are wrong, but that's not what I'm talking about.

[-]Vaniver5mo330

I think my core complaints about their arguments are very similar to Will MacAskill, Kelsey Piper, and the majority of people whose ideas on AI safety I have a moderate amount of respect for.

So taking this tweet as representative of MacAskill's thoughts, and this as representative of Kelsey Piper's, I see:

The evolution analogy in part I.
1. You like it but think how the authors would handle the disanalogies would probably be bad; MacAskill complains that they don't handle the disanalogies; Piper doesn't discuss it.
Discontinuous capability growth.
1. MacAskill doesn't like it; you don't seem to comment on it; Piper doesn't seem to comment on it. (I think MacAskill also misunderstands its role and relevance in the argument.)
  1. In particular, MacAskill quotes PC's summary of EY as "you can’t learn anything about alignment from experimentation and failures before the critical try" but I think EY's position is closer to "you can't learn enough about alignment from experimentation and failures before the critical try".
The world in which we make our first crucial try will be significantly different from the current world.
1. I think both you and MacAskill think this is a significant deficiency (this is your tricky hypothesis #2); I think Piper also identifies this as a point that the authors don't adequately elaborate on, but as far as I can tell she doesn't think this is critical. (That is, yes, the situation might be better in the future, but not obviously better enough that we shouldn't attempt a ban now.)
Catastrophic misalignment.
1. MacAskill thinks we have lots of evidence that AIs will not do what the user wanted, but not very much evidence that AIs will attempt to take over. I think both you and Piper think it's likely that there will be at least one AI of sufficient capability that attempts to take over.
Part 3.
1. You and MacAskill all seem to dislike their policy proposals. Piper seems much more pro-ban than you or MacAskill are; I don't get a good sense of whether MacAskill actually thinks a ban is bad (what catchup risk is there if neither frontrunners nor laggards can train AIs?) or just unlikely to be implemented.
  1. I don't think MacAskill is thinking thru the "close substitutes for agentic superintelligence" point. If they are close substitutes, then they have enough of the risks of agentic superintelligence that it still makes sense to ban them!

So, at least on this pass, I didn't actually find a specific point that all three of you agreed on. (I don't count "they should have had a better editor" as a specific point, because it doesn't specify the direction; an editing choice Piper liked more could easily have been an editing choice that MacAskill liked less.)

The closest was that the book isn't explicit or convincing enough when talking about iterative alignment strategies (like in chapter 11). Are there other points that I missed (or should I believe your agreement on that point is actually much clearer than I think it is)?

[-]Buck5mo72

I think me and Will and Kelsey have similar positions (having talked with both of them about this quite a lot), we just emphasized different disagreements.

(Except that I am less sold than Kelsey on her "maybe we can have non-agentic substitutes" point.)

[-]Vaniver5mo40

Do you agree with the "types of misalignment" section of MacAskill's tweet? (Or, I guess, is it 'similar to your position'?)

If not, I think it would be neat to see the two of you have some sort of public dialogue about it.

[-]Zvi5mo142

Question for Buck: What changes do you anticipate happening between now and the world where we create ASI, that you believe matter for the prognosis here?

[-]Max H5mo70

Tricky hypothesis 1: ASI will in fact be developed in a world that looks very similar to today's (e.g. because sub-ASI AIs will have negligible effect on the world; this could also be because ASI will be developed very soon).

Tricky hypothesis 2: But the differences between the world of today and the world where ASI will be developed don't matter for the prognosis.

Both of these hypotheses look relatively more plausible than they did 4y ago, don't they? Looking back at this section from the 2021 takeoff speed conversation gives a sense of how people were thinking about this kind of thing at the time.

AI-related investment and market caps are exploding, but not really due to actual revenue being "in the trillions" - it's mostly speculation and investment in compute and research.

Deployed AI systems can already provide a noticeable speed-up to software engineering and other white-collar work broadly, but it's not clear that this is having much of an impact on AI research (and especially a differential impact on alignment research) specifically.

Maybe we will still get widely deployed / transformative robotics, biotech, research tools etc. due to AI that could make a difference in some way prior to ASI, but SoTA AIs of today are routinely blowing through tougher and tougher benchmarks before they have widespread economic effects due to actual deployment.

I think most people in 2021 would have been pretty surprised by the fact we have widely available LLMs in 2025 with gold medal-level performance on the IMO, but which aren't yet having much larger economic effects. But in relative terms it seems like you and Christiano should be more surprised than Yudkowsky and Soares.

[-]jelly5mo72

Even given all the flaws, I don't know of a resource for laypeople that’s half as good at explaining what AI is, describing superintelligence, and making the basic case for misalignment risk.

You might not have read aisafety.dance. Although it doesn't explain in detail what AI and superintelligence are, it did a really good job of describing the specifics of AI safety, possibly on par with the book (I haven't read the book yet, so this is an educated guess)

[-]Vaniver5mo*5-3

I’m saying that if such a parallel scaling technique was developed, it would probably be tested out with incrementally increasing amounts of parallelism, if nothing else just for practical engineering reasons.)

I think this isn't my impression of how things have gone in the past, so I'm not sure where your 'probably' is coming from. Like, this reminds me of the OpenAI Five story, where (if I remember correctly) an exploratory model ran for a week without much supervision and then people discovered it had performed surprisingly well.

But even if this is true, I don't think it changes the story much? If they run Sable with two hundred GPUs, then two thousand, then twenty thousand, then two hundred thousand, what is different about the story? It seems like extrapolating Sable's performance on 200k GPUs from 20k GPUs is fraught, and if the worrisome behavior starts appearing at 40k GPUs, the step size of 10x means they skip over the zone where the bad behavior is happening and maybe the model is unable to hide it to the zone where the bad behavior is happening and the model is able to hide it. "That's why real AI labs use a step size of 2x", one might reply, but--how confident are we that 2x is enough?

I do think there's some tension in the story between elements that make it likely all of the GPUs are put onto one project (something like "it's the CEO's pet project to throw all the compute at a project") and "no one is watching"--if nothing else, probably someone will get fired if the model stopped thinking one hour in to a long run--but I think it's not hard to add additional details that make that sensible, or to postulate that the watching is just a devops engineer making sure that the GPUs are being utilized. ("Oh neat, it reformulated its internal language to be more efficient!" the observers might think, as Sable slips out of their control.)

[-]Gunnar_Zarncke5mo5-11

(caveat: I'm still reading the book)

The book takes a risk by - and I assume it is intentional - ignoring some of the more nuanced arguments (esp. your Tricky hypothesis 2). I think they are trying to shock the Overton Window to the very real risk of death by alignment failure if society continues with business as usual. The risk management seems to be:

A) Yet another carefully hedged warning call (like this one). Result:

95% few people update, but the majority continues business as usual.
5% brings the topic over the tipping point.

B) If Anyone Builds It, Everyone Dies. Result:

50% the topic becomes a large discussion point, the Overton Window includes the risk.
50% critical voices point out technical weaknesses of part 3 and the effort fizzles out.

If these numbers are halfway right, B seems advisable? And you can still do A if it fails!

[-]Raemon5mo228

I think Buck and Eliezer both agree you should only say shocking things if they were true. I think if Eliezer believed what Buck believes, he would have found a title that was still aimed at the overton-smashing strategy but honest.

[-]Gunnar_Zarncke5mo30

I don't think you are arguing only about the title. Titles naturally have to simplify, but the book content has to support it. The "with techniques like those available today" in "If anyone builds it (with techniques like those available today), everyone dies" sure is an important caveat, but arguably it is the default. And, as Buck agrees, the authors do qualify it that way in the book. You don't have to repeat the qualification each time you mention it.

The core disagreement doesn't seem to be about that but about leaving out Tricky hypothesis 2. I'm less sure that is an intentional omission by the authors. Yudkowsky sure has argued many times that alignment is tricky and hard and may feel that the burden of proof is on the other side now.

[-]WilliamKiely5mo30

"If anyone builds it (with techniques like those available today), everyone dies"

One could argue that the parenthetical caveat is redundant if the "it" means something like "superintelligent AI built with techniques like those available today".

I also listened to the book and don't have the written text available yet, so I'll need to revisit it when my hardcopy arrives to see if I agree that there are problematic uncaveated versions of the title throughout the text.

(At first I disliked the title because it seemed uncaveated, but again, the "it" in the title is ambiguous and can be interpreted as including the caveats, so now I'm more neutral about the title.)

[-]WilliamKiely5mo*20

I'm less worried about this after reading the book, because the book was good enough that it's hard for me to imagine someone else writing a much better one.

I was really hoping you'd say "after reading the book, I updated toward thinking that I could probably help a better book get written."

My view is still that a much better Intro to AI risk can still get written.

I currently lean toward Darren McKee's Uncontrollable still being a better intro than IABIED, though I'm going to reread IABIED once my hardcopy arrives before making a confident judgment.

[-]WilliamKiely5mo20

I independently had this same thought when listening to the book on Tuesday, and think it's worth emphasizing:

I again think they’re inappropriately reasoning about what happens for arbitrarily intelligent models instead of reasoning about what happens with AIs that are just barely capable enough to count as ASI. Their arguments (that AIs will learn goals that are egregiously misaligned with human goals and then conspire against us) are much stronger for wildly galaxy-brained AIs than for AIs that are barely smart enough to count as superhuman.

[-]ryan_greenblatt5mo20

I'm quite sympathetic to something like the caveated version of the title

Presumably, another problem with your caveated version of the title is that you don't expect literally everyone to die (at least not with high confidence) even if AIs take over.

[-]Buck5mo30

oh thanks, fixed. I just internally substitute "AI takeover" anytime anyone says "AI kills everyone" because this comes up constantly, and I'd forgotten that I'd done so here :P

Moderation Log