1461

LESSWRONG
LW

1460
AI
Frontpage

42

The title is reasonable

by Raemon
20th Sep 2025
19 min read
1

42

AI
Frontpage

42

The title is reasonable
4David James
New Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 12:34 PM
[-]David James2h40

Please don't awkwardly distance yourself because it didn't end up saying exactly the things you would have said, unless it's actually fucking important.

Raemon, thank for you writing this! I recommend each of us pause and reflect on how we (the rationality community) sometimes have a tendency to undermine our own efforts. See also Why Our Kind Can't Cooperate.

Reply
Moderation Log
More from Raemon
View more
Curated and popular this week
1Comments

I'm annoyed by various people who seem to be complaining about the book title being "unreasonable". i.e. who don't merely disagree with the title of "If Anyone Builds It, Everyone Dies", but, think something like: "Eliezer/Nate violated a Group-Epistemic-Norm." 

I think the title is reasonable. 

I think the title is probably true. I'm less confident than Eliezer/Nate, but I don't think it's unreasonable for them to be confident in it given their epistemic state. So I want to defend several decisions about the book I think were: 

  1. Actually pretty reasonable from a meta-group-epistemics/comms perspective
  2. Very important to do.

I've heard different things from different people and maybe am drawing a cluster where there is none, but, some things I've heard:

Complaint #1: "They really shouldn't have exaggerated the situation like this."

Complaint #2: "Eliezer and Nate are crazy overconfident, and it's going to cost them/us credibility."

Complaint #3: "It sucks that the people with the visible views are going to be more extreme, eye-catching and simplistic. There's a nearby title/thesis I might have agreed with, but it matters a lot not to mislead people about the details."

"Group epistemic norms" includes both how individuals reason, and how they present ideas to a larger group for deliberation. 

Complaint #1 emphasizes culpability about dishonesty (by exaggeration). I agree that'd be a big deal. But, this is just really clearly false. Whatever else you think, its pretty clear from loads of consistent writing that Eliezer and Nate do just literally believe the title, and earnestly think it's important.

Complaint #2 emphasizes culpability in terms of "knowingly bad reasoning mistakes." i.e, "Eliezer/Nate made reasoning mistakes that led them to this position, it's pretty obvious that those are reasoning mistakes, and people should be held accountable for major media campaigns based on obvious mistakes like that." 

(I do think it's sometimes important to criticize people for something like that. But, not this time, because I don't think they made obvious reasoning mistakes).

I have the most sympathy for Complaint #3. I agree there's a memetic bias towards sensationalism in outreach. (Although there are also major biases towards "normalcy" / "we're gonna be okay" / "we don't need to change anything major". One could argue about which bias is stronger, but mostly I think they're both important to model separately).

It does suck if you think something false is propagating. If you think that, seems good to write up what you think is true and argue about it. 

If people-more-optimistic-than-me turn out to be right about some things, I'd agree the book and title may have been a mistake.

I think it'd be great for someone who earnestly believes "If anyone builds it, everyone probably dies but it's hard to know" to publicly argue for that instead.

I. Reasons the "Everyone Dies" thesis is reasonable

What the book does and doesn't say

The book says, confidently, that:

If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.

The book does not claim confidently that AI will come soon, or shaped any particular way. (It does make some guesses about what is likely, but, those are guesses and the book is pretty clear about the difference in epistemic status).

The book doesn't say you can't build something that's not "It", that is useful in some ways. (It specifically expresses some hope in using narrow biomedical-AI to solve various problems).

The book says if you build it, every dies.

"It" means AI that is actually smart enough to confidently defeat humanity. This can include, "somewhat powerful, but with enough strategic awareness to maneuver into more power without getting caught." (Which is particularly easy if people just straightforwardly keep deploying AIs as they scale them up).

The book is slightly unclear about what "based on current techniques" means (which feels like a fair complaint). But, I think it's fairly obvious that they mean the class of AI training that is "grown" more than "crafted" – i.e. any techniques that involve a lot of opaque training, where you can't make at least a decently confident guess about how powerful the next training run will turn out, and how it'll handle various edge cases.

Do you think interpretability could advance to where we can make reasonably confident predictions about what the next generation would do? Cool. (I'm more skeptical it'll happen fast enough, but, it's not a disagreement with the core thesis of the book, since it'd change the "based on anything like today's understanding of AI" clause)[1]

Do you think it's possible to control somewhat-strong-AI with a variety of techniques that make it less likely that it would be able to take over all humanity? I think there is some kind of potential major disagreement somewhere around here (see below), but it's not automatically a disagreement. 

Do you think there will be at least one company that's actually sufficiently careful as we approach more dangerous levels of AI, with enough organizational awareness to (probably) stop when they get to a run more dangerous than they know how to handle? Cool. I'm skeptical about that too. And this one might lead to disagreement with the book's secondary thesis of "And therefore, Shut It Down," but, it's not (necessarily) a disagreement with "*If* someone built AI powerful enough to destroy humanity based on AI that is grown in unpredictable ways with similar-to-current understanding of AI, then everyone will die."

The book is making a (relatively) narrow claim. 

You might still disagree with that claim. I think there are valid reasons to disagree, or at least assign significantly less confidence to the claim. 

But none of the reasons listed so far are disagreements with the thesis. And, remember, if the reason you disagree is because you think our understanding of AI will improve dramatically, or there will be a paradigm shift specifically away from "unpredictably grown" AI, this also isn't actually a disagreement with the sentence.

I think a lot of people just don't really believe in AI that is smart enough to outmaneuver all humanity. I do think they're wrong. But, if you don't really believe in this, and think the book title is false, I... roll to disbelieve on you actually really simulating the world where there's an AI powerful enough to outmanuever humanity?

The claims are presented reasonably

A complaint I have about Realtime Conversation Eliezer, or Comment-Thread Eliezer, is that he often talks forcefully, unwilling to change frames, with a tone of "I'm talking to idiots", and visibly not particularly listening to any nuanced arguments anyone is trying to make. 

But, I don't have that sort of complaint about this book. 

Something I like about the book is it lays out disjunctive arguments, like “we think ultimately, a naively developed superintelligence would want to kill everyone, for reasons A, B, C and D. Maybe you don’t buy reasons B, C and D. But that still leaves you with A, and here’s are argument that although Reason A might not lead literally everyone dying, the expected outcome is still something horrifying.”

(An example of that was: For “might the AI keep us as pets?”, the book answers (paraphrased) “We don’t think so. But, even if they did… note that, while humans keep dogs as pets, we don’t keep wolves as pets. Look at the transform from wolf to dog. An AI might keep us as pets, but, if that’s your hope, imagine the transform from Wolves-to-Dogs and equivalent transforms on humans.”) 

Similarly, I like that in the AI Takeoff scenario, there are several instances where it walks through "Here are several different things the AI could try to do next. You might imagine that some of them aren't possible, because the humans are doing X/Y/Z. Okay, let's assume X/Y/Z rule out options 1/2/3. But, that leaves options 4/5/6. Which of them does the AI do? Probably all of them, and then sees which one works best."

Reminder: All possible views of the future are wild.

@Scott Alexander described the AI Takeoff story thus:

It doesn’t just sound like sci-fi [specifically compared to "hard sci fi"]; it sounds like unnecessarily dramatic sci-fi. I’m not sure how much of this is a literary failure vs. different assumptions on the part of the authors." 

I... really don't know what Scott expected a story that featured actual superintelligence to look like. I think the authors bent over backwards giving us one of the least-sci-fi stories you could possibly tell that includes superintelligence doing anything at all, without resorting to "superintelligence just won't ever exist." 

Eliezer and Nate make sure the takeover scenario doesn't depend on technologies that we don't have some existing examples of. The amount of "fast takeoff" seems like the amount of scaleup you'd expect if the graphs just kept going up the way they're currently going up, by approximately the same mechanisms they currently go up (i.e. some algorithmic improvements, some scaling). 

Sure, Galvanic would first run Sable on smaller amounts of compute. And... then they will run it on larger amounts of compute (and as I understand it, it'd be a new, surprising fact if they limited themselves to scaling up slowly/linearly rather than by a noticeable multiplier or order-of-magnitude. If I am wrong about current lab practices here, please link me some evidence).

If this story feels crazy to you, I want to remind you that all possible views of the future are wild. Either some exponential graphs suddenly stop for unclear reasons, or some exponential graphs keep going and batshit crazy stuff can happen that your intuitions are not prepared for. You can believe option A if you want, but, it's not like "the exponential graphs that have been consistent over hundreds of years suddenly stop" is a viewpoint that you can safely point to as a "moderate" and claim to give the other guy burden of proof.

You don't have the luxury of being the sort of moderate who doesn't have to believe something pretty crazy sounding here, one way or another. 

(If you haven't yet read the Holden post on Wildness, I ask you do so before arguing with this. It's pretty short and also fun to read fwiw)

The Online Resources spell out the epistemic status more clearly.

In the FAQ question, "So there's at least a chance of the AI keeping us alive?", they state more explicitly:

It’s overwhelmingly more likely that [superintelligent] AI kills everyone.

In these online resources, we’re willing to engage with a pretty wide variety of weird and unlikely scenarios, for the sake of spelling out why we think they’re unlikely and why (in most cases) they would still be catastrophically bad outcomes for humanity.

We don’t think that these niche scenarios should distract from the headline, however. The most likely outcome, if we rush into creating smarter-than-human AI, is that the AI consumes the Earth for resources in pursuit of some end, wiping out humanity in the process.

The book title isn’t intended to communicate complete certitude. We mean the book title in the manner of someone who sees a friend lifting a vial of poison to their lips and shouts, “Don’t drink that! You’ll die!”

Yes, it’s technically possible that you’ll get rushed to the hospital and that a genius doctor might concoct an unprecedented miracle cure that merely leaves you paralyzed from the neck down. We’re not saying there’s no possibility of miracles. But if even the miracles don’t lead to especially good outcomes, then it seems even clearer that we shouldn’t drink the poison.

The book doesn't actually overextend the arguments and common discourse norms.

This adds up to seeming to me that:

  • The book makes a reasonable case for why Eliezer and Nate are personally pretty confident in the title.
  • The book, I think, does a decent job giving you some space to think “well, I don’t buy that particular argument."
  • The book acknowledges “if you don’t buy some of these arguments, yeah, maybe everyone might not literally die and maybe the AI might care about humans in some way, but we still think it's very unlikely to care about humans in a way that should be comforting."

If a book in the 50s was called "Nuclear War would kill us all", I think that book would have been incorrect (based of my most recent read of Nuclear war is unlikely to cause human extinction), but I wouldn't think the authors were unreasonable for arguing it, especially if they pointed out things like "and yeah, if our models of nuclear winter are wrong, everyone wouldn't literally die, but civilization would still be pretty fucked", and I would think the people giving the authors a hard time about it were being obnoxious pedants, not heroes of epistemic virtue.

(I would think people arguing "but, the nuclear winter models are wrong, so, yeah, we're more in the 'civilization would be fucked' world than the 'everyone literally dies world." would be doing a good valuable service. But I wouldn't think it'd really change the takeaways very much).

II. Specific points to maybe disagree on

There are some opinions that seem like plausible opinions to hold, given humanity's current level of knowledge, that lead to actual disagreement with "If anyone builds [an AI smart enough to outmanuever humanity] [that is grown in unpredictable ways] [based on approximately our current understanding of AI]".

And the book does have a secondary thesis of "And therefore, Shut It Down", and you can disagree with that separately from "If anyone builds it, everyone dies."

Right now, the arguments that I've heard sophisticated enough versions of to seem worth acknowledging include:

  1. Very slightly nice AIs would find being nice cheap.
    • (argument against "everyone literally dies.")
  2. AI-assisted alignment is reasonably likely to work. Misuse or dumber-AI-run-amuck is likely enough to be comparably bad to superintelligence. And it's meanwhile easier to coordinate now with smaller actors. So, we should roll the dice now rather than try for a pause.
    • (argument against "Shut It (completely) Down")
  3. We can get a lot of very useful narrow-ish work out of somewhat-more-advanced-models that'll help us learn enough to make significant progress on alignment.
    1. (argument against "Shut It Down (now)")
  4. We can keep finding ways to increase the cost of taking over humanity. There's no boolean between "superintelligent enough to outthinking humanity" and "not", and this is a broken frame that is preventing you from noticing alternative strategies.
    • (argument against "It" being the right concept to use)

I disagree with the first two being very meaningful (as counterarguments to the book). More on that in a sec.

Argument #3 is somewhat interesting, but, given that it'd take years to get a successful Global Moratorium, I don't see any reason not to start pushing for a long global pause now.

I think the forth one is fairly interesting. While I strongly disagree with some major assumptions in the Redwood Plan as I understand it, various flavors of "leverage narrow / medium-strength controlled AIs to buy time" feel like they might be an important piece of the gameboard. Insofar as Argument #3 helped Buck invent step outside the MIRI frame and invent Control, and insofar as that helps buy time, yep, seems important.

This is complicated by "there is a giant Cope Memeplex that really doesn't want to have to slow down or worry too much", so while I agree it's good to be able to step outside the Yudkowsky frame, I think most people doing it are way more likely to end up slipping out of reality and believing nonsense than getting anywhere helpful.

I won't get into that much detail about either topic, since that'd pretty much be a post to itself. But, I'll link to some of the IABED Online Resources, and share some quick notes about why I disagree that even the sophisticated versions of these so far don't seem very useful arguments to me.

On the meta-level: It currently feels plausible to me to have some interesting disagreements with the book here, but I don't see any interesting disagreements that add up to "Eliezer/Nate particularly fucked up epistemically or communicatively" or "you shouldn't basically hope the book succeeds at its goal."

Notes on Niceness

There are some flavors of "AI might be slightly nice" that are interesting. But, they don't seem like it changes any of our decisions. It just makes us a bit more hopeful about the end result.

Given the counteraguments, I don't see a reason to think this more than single-digit-percent likely to be especially relevant. (I can see >9% likelihood the AIs are "nice enough that something interesting-ish happens" but not >9% likelihood that we shouldn't think the outcome is still extremely bad. The people who think otherwise seem extremely motivatedly-cope-y to me).

Note also that it's very expensive for the AI to not boil the oceans / etc as fast as possible, since that means losing a many galaxies worth of resources, so it seems like it's not enough to be "very slightly" nice – it has to be, like, pretty actively nice.

Which plan is Least Impossible?

A lot of x-risk disagreements boil down to "which pretty impossible-seeming thing is only actually Very Hard instead of Impossibly Hard."

There's an argument I haven't heard a sophisticated version of, which is "there's no way you're getting a Global Pause."

I certainly believe that this is an extremely difficult goal, and a lot of major things would need to change in order for it to happen. I haven't heard any real argument we should think it's more impossible than, say, Trump winning the presidency and going on to do various Trumpy things. 

(Please don't get into arguing about Trump in the comments. I'm hoping that whatever you think of Trump, you agree he's doing a bunch of stuff most people would previously have probably expected to be outside the overton window. If this turns out to be an important substantive disagreement I'll make a separate container post for it)

Meanwhile, the counter-impossible-thing I've heard several people putting hope on is "We can run a lot of controlled AIs, where (first) we have them do fairly straightforward automation of not-that-complex empirical work, which helps us get to a point where we trust them enough to give them more openended research tasks."

Then, we run a lot of those real fast, such that they substantial increase the total amount of alignment-research-months happening during a not-very-long-slowdown.

The arguments for why this is extremely dangerous, from the book and online resources and maybe some past writing, are, recapped:

There's no good training data. 

We don't even know how to verify alignment work is particular useful among humans, let alone in an automatedly gradable way. 

Goal Directedness is pernicious. Corrigibility is anti-natural.

The way an AI would develop the ability to think extended, useful creative research thoughts that you might fully outsource to, is via becoming perniciously goal directed. You can't do months or years of openended research without fractally noticing subproblems, figuring out new goals, and relentless finding new approaches to tackle them.

Once you do that, it's a fact of the universe, that the programmers can't change, that "you'd do better at these goals if you didn't have to be fully obedient", and while programmers can install various safeguards, those safeguards are pumping upstream and will have to pump harder and harder as the AI gets more intelligent. And if you want it to make at least as much progress as a decent AI researcher, it needs to be quite smart.

Security is very difficult

The surface area of ways an AI can escape and maneuver are enormous. (I think it's plausible to have a smallish number of carefully controlled, semi-powerful AIs if you are paying a lot of attention. The place I completely get off the train is where you then try to get lots of subjective hours of research time out of thousands of models).

Alignment is among the most dangerous tasks

"Thinking about how to align AIs" requires both for the AI to think how "how would I make smarter version of myself" and "how would I make it aligned to humans?". The former skillset  directly helps them recursively self-improve. The latter skillset helps them manipulate humans.

MIRI did make a pretty substantive try.

One of the more useful lines for me, in the Online Resources, in their extended discussions about corrigibility.

We ran some workshops, and the workshops had various mathematicians of various stripes (including an International Mathematical Olympiad gold medalist), but nobody came up with a really good idea.

This does not mean that the territory has been exhausted. Earth has not come remotely near to going as hard on this problem as it has gone on, say, string theory, nor offered anything like the seven-digit salaries on offer for advancing AI capabilities.

But we learned something from the exercise. We learned not just about the problem itself, but also about how hard it was to get outside grantmakers or journal editors to be able to understand what the problem was. A surprising number of people saw simple mathematical puzzles and said, “They expect AI to be simple and mathematical,” and failed to see the underlying point that it is hard to injure an AI’s steering abilities, just like how it’s hard to injure its probabilities.

If there were a natural shape for AIs that let you fix mistakes you made along the way, you might hope to find a simple mathematical reflection of that shape in toy models. All the difficulties that crop up in every corner when working with toy models are suggestive of difficulties that will crop up in real life; all the extra complications in the real world don’t make the problem easier.

There was a related quote I can't find now, that maybe was just in an earlier draft of the Online Resources, to the effect of "this [our process of attempting to solve corrigibility] is the real reason we have this much confidence about this being quite hard and our current understanding not being anywhere near adequate." 

(Fwiw I think it is a mistake that this isn't at least briefly mentioned in the book. The actual details would go over most people's heads, but, having any kind of pointer to "why are these guys so damn confident?" seems like it'd be quite useful)

III. Overton Smashing, and Hope

Or: "Why is this book really important, not just 'reasonable?'"

I, personally, believe in this book. [2]

If you don't already believe in it, you're probably not going to because of my intuitions here. But, I want to say why it's deeply important to me that the book is reasonable, not just arguing on the internet because I'm triggered and annoyed about some stuff.

I believe in the book partly because it looks like it might work. 

The number (and hit-rate) of NatSec endorsements surprised me. More recently some senators seem to have been bringing up existential risk of their own initiative. When I showed the website to a (non-rationalist) friend who lives near DC and has previously worked for think-tank-ish org, I expected them to have a knee-jerk reaction of ‘man that’s weird and a bit cringe’, or ‘I’d be somewhat embarrassed to share this website with colleagues’, and instead they just looked worried and said “okay, I’m worried”, and we had a fairly matter-of-fact conversation about it.

It feels like the world is waking up to AI, and is aware that it is some kind of big deal that they don’t understand, and that there’s something unsettling about it. 

I think the world is ready for this book.

I also believe in the book because, honestly, the entire rest of the AI safety community’s output just does not feel adequate to me to the task of ensuring AI goes well. 

I’m personally only like 60% on “if anyone built It, everyone would die.” But I’m like 80% on “if anyone built It, the results would be unrecoverably catastrophic,” and the remaining 20% is a mix of model uncertainty and luck. Nobody has produced counterarguments that feel compelling, just "maybe something else will happen?", and the way people choose their words almost always suggests some kind of confusion or cope.

The plans that people propose mostly do not seem to be counter-arguing the actual difficult parts of the problem. 

The book gives me more hope than anything else has in the past few years. 

Overton Smashing is a thing. I really want at least some people trying.

It’s easy to have the idea “try to change the Overton window.” Unfortunately, changing the Overton window is very difficult. It would be hard for most people to pull it off. I think it helps to have a mix of conviction backed by deep models, and some existing notoriety. There are only a few other people who seem to me like they might be able to pull it off. (It'd be cool if at least one of Bengio, Hinton, Hassabis or Amodei end up trying. I think Buck actually might do a good job if he tried.)

Smashing an overton window does not look like "say the careful measured thing, but, a bit louder/stronger." Trying to do it halfway won't work. But going all in with conviction and style, seems like it does work. It looks like Bengio, Hinton, Hassabis and Amodei are each trying to do some kind of measured/careful strategy, and it's salient that if they shifted a bit, things would get worse instead of better. 

(Sigh... I think I might need to talk about Trump again. This time it seems more centrally relevant to talk about in the comments. But, like, dude, look at how bulletproof the guy seems to be. He also, like, says falsehoods a lot and I'm not suggesting emulating him-in-particular, but I point to him as an existence proof of what can work)

People keep asking "why can't Eliezer tone it down." I don't think Eliezer is the best possible spokesperson. I acknowledge some downside risk to him going on a major media campaign. But I think people are very confused about how loadbearing the things some people find irritating are. How many fields and subcultures have you founded, man? Fields and subcultures and major new political directions are not founded (generally) by people without some significant fraction of haters.

You can't file off all the edges, and still have anything left that works. You can only reroll on which combination of inspiring and irritating things you're working with.

I want there to be more people who competently execute on "overton smash." The next successful person would probably look pretty different from Eliezer, because part of overton smashing is having a unique style backed by deep models and taste and each person's taste/style/models are pretty unique. It'd be great to have people with more diversity of "ways they are inspiring and also grating."

Meanwhile, we have this book. It's the Yudkowsky version of the book. If you don't like that, find someone who actually could write a better one. (Or, rather, find someone who could execute on a successful overton smashing strategy, which would probably look pretty different than a book since there already is a book, but would still look and feel pretty extreme in some way).

I don't currently buy that it costs much to have this book asking for total shutdown.

My sense is it's pretty common for political groups to have an extreme wing and a less extreme wing, and for them to be synergistic. Good cop, bad cop. Martin Luther King and Malcolm X. 

If what you want is some kind of global coordination that isn't a total shutdown, I think it's still probably better to have Yudkowsky over there saying "shut it all down" and say "Well, I dunno about that guy. I don't think we need to shut it all down, but I do think we want some serious coordination."

I believe in the book.

Please buy a copy if you haven't yet. 

Please tell your friends about it. 

And, disagree where appropriate, but, please don't give it a hard time for lame pedantic reasons, or jump to assuming you disagree because you don't like something about the vibe. Please don't awkwardly distance yourself because it didn't end up saying exactly the things you would have said, unless it's actually fucking important.

You can buy the book here.

  1. ^

    Paul Christiano and Buck both complain about (paraphrased) "Eliezer equivocates between 'we have to get it right on the first critical try' and 'we can't learn anything important before the first critical try.'" 

    I agree something-in-this-space feels like a fair complaint, especially in combination with Eliezer not engaging that much with the more thoughtful critics, and tending to talk down to them in a way that doesn't seem to be really listening to the nuances they're trying to point to and round them to nearest strawman of themselves. I

    I think this is a super valid thing to complain about Eliezer. But, it's not the title or thesis of the book. (because, if we survive because we learned useful things, I'd say that doesn't count as "anywhere near our current understanding").

  2. ^

    "Believing in" doesn't mean "assign >50% chance to working", it means "assign enough chance (~20%?) that it feels worth investing substantially in and coordinating around." See Believing In by Anna Salamon.