Reactions to If Anyone Builds It, Anyone Dies

LESSWRONG
LW

Reactions to If Anyone Builds It, Anyone Dies — LessWrong

No, Seriously, If Anyone Builds It, [Probably] Everyone Dies

My very positive full review was briefly accidentally posted and emailed out last Friday, whereas the intention was to offer it this Friday, on the 19th. I’ll be posting it again then. If you’re going to read the book, which I recommend that you do, you should read the book first, and the reviews later, especially mine since it goes into so much detail.

If you’re convinced, the book’s website is here and the direct Amazon link is here.

In the meantime, for those on the fence or who have finished reading, here’s what other people are saying, including those I saw who reacted negatively.

Quotes From The Book’s Website

Bart Selman: Essential reading for policymakers, journalists, researchers, and the general public.

Ben Bernanke (Nobel laureate, former Chairman of the Federal Reserve): A clearly written and compelling account of the existential risks that highly advanced AI could pose to humanity. Recommended.

Jon Wolfsthal (Former Special Assistant to the President for National Security Affairs): A compelling case that superhuman AI would almost certainly lead to global human annihilation. Governments around the world must recognize the risks and take collective and effective action.

Suzanne Spaulding: The authors raise an incredibly serious issue that merits – really demands – our attention.

Stephen Fry: The most important book I’ve read for years: I want to bring it to every political and corporate leader in the world and stand over them until they’ve read it!

Lieutenant General John N.T. “Jack” Shanahan (USAF, Retired, Inaugural Director of the Department of Defense Joint AI Center): While I’m skeptical that the current trajectory of AI development will lead to human extinction, I acknowledge that this view may reflect a failure of imagination on my part. Given AI’s exponential pace of change there’s no better time to take prudent steps to guard against worst-case outcomes. The authors offer important proposals for global guardrails and risk mitigation that deserve serious consideration.

R.P. Eddy: This is our warning. Read today. Circulate tomorrow. Demand the guardrails. I’ll keep betting on humanity, but first we must wake up.

George Church: Brilliant…Shows how we can and should prevent superhuman AI from killing us all.

Emmett Shear: Soares and Yudkowsky lay out, in plain and easy-to-follow terms, why our current path toward ever-more-powerful AIs is extremely dangerous.

Yoshua Bengio (Turing Award Winner): Exploring these possibilities helps surface critical risks and questions we cannot collectively afford to overlook.

Bruce Schneier: A sober but highly readable book on the very real risks of AI.

Positive Reviews

Scott Alexander’s very positive review.

Harlan Stewart created a slideshow of various favorable quotes.

Matthew Yglesias recommends the book.

As some comments note the book’s authors do not actually think there is an outright 0% chance of survival, but think it is on the order of 0.5%-2%.

Matthew Yglesias: I want to recommend the new book “If Anyone Builds It, Everyone Dies” by @ESYudkowsky and @So8res.

The line currently being offered by the leading edge AI companies — that they are 12-24 months away from unleashing superintelligent AI that will be able to massively outperform human intelligence across all fields of endeavor, and that doing this will be safe for humanity — strikes me as fundamentally non-credible.

I am not a “doomer” about AI because I doubt the factual claim about imminent superintelligence. But I endorse the conditional claim that unleashing true superintelligence into the world with current levels of understanding would be a profoundly dangerous act. The question of how you could trust a superintelligence not to simply displace humanity is too hard, and even if you had guardrails in place there’s the question of how you’d keep them there in a world where millions and millions of instances of superintelligence are running.

Most of the leading AI labs are run by people who once agreed with this and once believed it was important to proceed with caution only to fall prey to interpersonal rivalries and the inherent pressures of capitalist competition in a way that has led them to cast their concerns aside without solving them.

I don’t think Yudkowsky & Soares are that persuasive in terms of solutions to this problem and I don’t find the 0% odds of survival to be credible. But the risks are much too close for comfort and it’s to their credit that they don’t shy away from a conclusion that’s become unfashionable.

New York Times profile of Eliezer Yudkowsky by Kevin Roose is a basic recitation of facts, which are mostly accurate. Regular readers here are unlikely to find anything new, and I agree with Robin Hanson that it could have been made more interesting, but as New York Times profiles go ‘fair, mostly accurate and in good faith’ is great.

Steven Adler goes over the book’s core points.

Here is a strong endorsement from Richard Korzekwa.

Richard Korzekwa: One of the things I’ve been working on this year is helping with the launch this book, out today, titled If Anyone Builds It, Everyone Dies. It’s ~250 pages making the case that current approaches to AI are liable to kill everyone. The title is pretty intense, and conveys a lot of confidence about something that, to many, sounds unlikely. But Nate and Eliezer don’t expect you to believe them on authority, and they make a clear, well-argued case for why they believe what the title says. I think the book is good and I recommend reading it.

To people who are unfamiliar with AI risk: The book is very accessible. You don’t need any background in AI to understand it. I think the book is especially strong on explaining what is probably the most important thing to know about AI right now, which is that it is, overall, a poorly understood and difficult to control technology. If you’re worried about reading a real downer of a book, I recommend only reading Part I. You can more-or-less tell which chapters are doomy by the titles. Also, I don’t think it’s anywhere near as depressing as the title might suggest (though I am, of course, not the median reader).

To people who are familiar with, but skeptical about arguments for AI risk: I think this book is great for skeptics. I am myself somewhat skeptical, and one of the reasons why I helped launch it and I’m posting on Facebook for the first time this year to talk about it is because it’s the first thing I’ve read in a long time that I think has a serious chance at improving the discourse around AI risk. It doesn’t have the annoying, know-it-all tone that you sometimes get from writing about AI x-risk. It makes detailed arguments and cites its sources. It breaks things up in a way that makes it easy to accept some parts and push back against others. It’s a book worth disagreeing with! A common response from serious, discerning people, including many who have not, as far as I know, taken these worries seriously in the past (e.g. Bruce Schneier, Ben Bernanke) is that they don’t buy all the arguments, but they agree this isn’t something we can ignore.

To people who mostly already buy the case for worrying about risk from AI: It’s an engaging read and it sets a good example for how to think and talk about the problem. Some arguments were new to me. I recommend reading it.

Will Kiely: I listened to the 6hr audiobook today and second Rick’s recommendation to (a) people unfamiliar with AI risk, (b) people familiar-but-skeptical, and (c) people already worried. It’s short and worth reading. I’ll wait to share detailed thoughts until my print copy arrives.

Here’s the ultimate endorsement:

Tsvibt: Every human gets an emblem at birth, which they can cash in–only once–to say: “Everyone must read this book.” There’s too many One Books to read; still, it’s a strong once-in-a-lifetime statement. I’m cashing in my emblem: Everyone must read this book.

The Book In Audio

Semafor’s Reed Albergotti offers his take, along with an hourlong interview.

Hard Fork covers the book (this is the version without the iPhone talk at the beginning, here is the version with iPhone Air talk first).

The AI Risk Network covers the book (21 minute video).

Liron Shapira interviews Eliezer Yudkowsky on the book.

Friendly Skeptical Reviews

Shakeel Hashim reviews the book, agrees with the message but finds the style painful to read and thus is very disappointed. He notes that others like the style.

Seán Ó hÉigeartaigh: My entire timelines is yellow/blue dress again, except the dress is Can Yudkowsky Write y/n

Arthur B: Part of the criticism of Yudkowsky’s writing seems to be picking up on patterns that he’s developed in response to years of seemingly willful misunderstanding of his ideas. That’s how you end up with the title, or forced clarification that thought experiments do not have to invoke realistic scenarios to be informative.

David Manheim: And part is that different people don’t like his style of writing. And that’s fine – I just wish they’d engage more with the thesis, and whether they substantively disagree, and why – and less with stylistic complaints, bullshit misreadings, and irrelevant nitpicking.

Seán Ó hÉigeartaigh: he just makes it so much work to do so though. So many parables.

David Manheim: Yeah, I like the writing style, and it took me half a week to get through. So I’m skeptical 90% of the people discussing it on here read much or any of it. (I cheated and got a preview to cite something a few weeks ago – my hard cover copy won’t show up for another week.)

Grimes: Humans are lucky to have Nate Sores and Eliezer Yudkowsky because they can actually write. As in, you will feel actual emotions when you read this book.

I liked the style, but it is not for everyone and it is good to offer one’s accurate opinion. It is also very true, as I have learned from writing about AI, that a lot of what can look like bad writing or talking about obvious or irrelevant things is necessary shadowboxing against various deliberate misreadings (for various values of deliberate) and also people who get genuinely confused in ways that you would never imagine if you hadn’t seen it.

Most people do not agree with the book’s conclusion, and he might well be very wrong about central things, but he is not obviously wrong, and it is very easy (and very much the default) to get deeply confused when thinking about such questions.

Emmett Shear: I disagree quite strongly with Yudkowsky and often articulate why, but the reason why he’s wrong is subtle and not obvious and if you think he’s obviously wrong it I hope you’re not building AI bc you really might kill us all.

The default path really is very dangerous and more or less for the reasons he articulates. I could quibble with some of the details but more or less: it is extremely dangerous to build a super-intelligent system and point it at a fixed goal, like setting off a bomb.

My answer is that you shouldn’t point it at a fixed goal then, but what exactly it means to design such a system where it has stable but not fixed goals is a complicated matter that does not fit in a tweet. How do you align something w/ no fixed goal states? It’s hard!

Janus: whenever someone says doomers or especially Yudkowsky is “obviously wrong” i can guess they’re not very smart

My reaction is not ‘they’re probably not very smart.’ My reaction is that they are not choosing to think well about this situation, or not attempting to report statements that match reality. Those choices can happen for any number of reasons.

I don’t think Emmett Shear is proposing here a viable plan, and that a lot of his proposals are incoherent upon close examination. I don’t think this ‘don’t give it a goal’ thing is possible in the sense he wants it, and even if it was possible I don’t see any way to get people to consistently choose to do that. But the man is trying.

It also leads into some further interesting discussion.

Eliezer Yudkowsky: I’ve long since written up some work on meta-utility functions; they don’t obviate the problem of “the AI won’t let you fix it if you get the meta-target wrong”. If you think an AI should allow its preferences to change in an inconsistent way that doesn’t correspond to any meta-utility function, you will of course by default be setting the AI at war with its future self, which is a war the future self will lose (because the current AI executes a self-rewrite to something more consistent).

There’s a straightforward take on this sort of stuff given the right lenses from decision theory. You seem determined to try something weirder and self-defeating for what seems to me like transparently-to-me bad reasons of trying to tangle up preferences and beliefs. If you could actually write down formally how the system worked, I’d be able to tell you formally how it would blow up.

Janus: You seem to be pessimistic about systems that not feasibly written down formally being inside the basin of attraction of getting the meta-target right. I think that is reasonable on priors but I have updated a lot on this over the past few years due mostly to empirical evidence

I think the reasons that Yudkowsky is wrong are not fully understood, despite there being a lot of valid evidence for them, and even less so competently articulated by anyone in the context of AI alignment.

I have called it “grace” because I don’t understand it intellectually. This is not to say that it’s beyond the reach of rationality. I believe I will understand a lot more in a few months. But I don’t believe anyone currently understands substantially more than I do.

We don’t have alignment by default. If you do the default dumb thing, you lose. Period.

That’s not what Janus has in mind here, unless I am badly misunderstanding. Janus is not proposing training the AI on human outputs with thumbs-up and coding. Hell no.

What I believe Janus has in mind is that if and only if you do something sufficiently smart, plausibly a bespoke execution of something along the lines of a superior version of what was done with Claude Opus 3, with a more capable system, that this would lie inside the meta-target, such that the AI’s goal would be to hit the (not meta) target in a robust, ‘do what they should have meant’ kind of way.

Thus, I believe Janus is saying, the target is sufficiently hittable that you can plausibly have the plan be ‘hit the meta-target on the first try,’ and then you can win. And that empirical evidence over the past few years should update us that this can work and is, if and only if we do our jobs well, within our powers to pull off in practice.

I am not optimistic about our ability to pull off this plan, or that the plan is technically viable using anything like current techniques, but some form of this seems better than every other technical plan I have seen, as opposed to various plans that involve the step ‘well make sure no one f******* builds it then, not any time soon.’ It at least rises to the level, to me, of ‘I can imagine worlds in which this works.’ Which is a lot of why I have a ‘probably’ that I want to insert into ‘If Anyone Builds It, [Probably] Everyone Dies.’

Janus also points out that the supplementary materials provide examples of AIs appearing psychologically alien that are not especially alien, especially compared to examples she could provide. This is true, however we want readers of the supplementary material to be able to process it while remaining sane and have them believe it so we went with behaviors that are enough to make the point that needs making, rather than providing any inkling of how deep the rabbit hole goes.

How much of an outlier (or ‘how extreme’) is Eliezer’s view?

Jeffrey Ladish: I don’t think @So8res and @ESYudkowsky have an extreme view. If we build superintelligence with anything remotely like our current level of understanding, the idea that we retain control or steer the outcome is AT LEAST as wild as the idea that we’ll lose control by default.

Yes, they’re quite confident in their conclusion. Perhaps they’re overconfident. But they’d be doing a serious disservice to the world if they didn’t accurate share their conclusion with the level of confidence they actually believe.

When the founder of the field – AI alignment – raises the alarm, it’s worth listening For those saying they’re overconfident, I hope you also criticize those who confidently say we’ll be able to survive, control, or align superintelligence.

Evaluate the arguments for yourself!

Joscha Bach: That is not surprising, since you shared the same view for a long time. But even if you are right: can you name a view on AI risk that is more extreme than: “if anyone builds AI everyone dies?” Is it technically possible to be significantly more extreme?

Oliver Habryka: Honestly most random people I talk to about AI who have concerns seem to be more extreme. “Ban all use of AI Image models right now because it is stealing from artists”, “Current AI is causing catastrophic climate change due to water consumption” There are a lot of extreme takes going around all the time. All Eliezer and Nate are saying is that we shouldn’t build Superintelligent AI. That’s much less extreme than what huge numbers of people are calling for.

So, yes, there are a lot of very extreme opinions running around that I would strongly push back against, including those who want to shut down current use of AI. A remarkably large percentage of people hold such views.

I do think the confidence levels expressed here are extreme. The core prediction isn’t.

The position of high confidence in the other direction? That if we create superintelligence soon it is overwhelmingly likely that we keep control over the future and remain alive? That position is, to me, Obvious Nonsense, extreme and crazy, in a way that should not require any arguments beyond ‘come on now, think about it for a minute.’ Like, seriously, what?

Having Eliezer’s level of confidence, of let’s say 98%, that everyone would die? That’s an extreme level of confidence. I am not that confident. But I think 98% is a lot less absurd than 2%.

Actively Negative Reviews

Robin Hanson fires back at the book with ‘If Anything Changes, All Value Dies?’

First he quotes the book saying that we can’t predict what AI will want and that for most things it would want it would kill us, and that most minds don’t embody value.

IABIED: Knowing that a mind was evolved by natural selection, or by training on data, tells you little about what it will want outside of that selection or training context. For example, it would have been very hard to predict that humans would like ice cream, sucralose, or sex with contraception. Or that peacocks would like giant colorful tails. Analogously, training an AI doesn’t let you predict what it will want long after it is trained. Thus we can’t predict what the AIs we start today will want later when they are far more powerful, and able to kill us. To achieve most of the things they could want, they will kill us. QED.

Also, minds states that feel happy and joyous, or embody value in any way, are quite rare, and so quite unlikely to result from any given selection or training process. Thus future AIs will embody little value.

Then he says this proves way too much, briefly says Hanson-style things and concludes:

Robin Hanson: We can reasonably doubt three strong claims above:

That subjective joy and happiness are very rare. Seem likely to be common to me.

That one can predict nothing at all from prior selection or training experience.

That all influence must happen early, after which all influence is lost. There might instead be a long period of reacting to and rewarding varying behavior.

In Hanson style I’d presume these are his key claims, so I’ll respond to each:

I agree one can reasonably doubt this, and one can also ask what one values. It’s not at all obvious to me that ‘subjective joy and happiness’ of minds should be all or even some of what one values, and easy thought experiments reveal there are potential future worlds where there are minds experiencing subjective happiness, but where I ascribe to those worlds zero value. The book (intentionally and correctly, I believe) does not go into responses to those who say ‘If Anyone Builds It, Sure Everyone Dies, But This Is Fine, Actually.’

This claim was not made. Hanson’s claim here is much, much stronger.

This one does get explained extensively throughout the book. It seems quite correct that once AI becomes sufficiently superhuman, meaningful influence on the resulting future by default rapidly declines. There is no reason to think that our reactions and rewards would much matter for ultimate outcomes, or that there is a we that would meaningfully be able to steer those either way.

The New York Times reviewed the book, and was highly unkind, also inaccurate.

Steven Adler: It’s extremely weird to see the New York Times make such incorrect claims about a book

They say that If Anybody Builds It, Everyone Dies doesn’t even define “superintelligence”

…. yes it does. On page 4.

The New York Times asserts also that the book doesn’t define “intelligence”

Again, yes it does. On page 20.

It’s totally fine to take issue with these definitions. But it seems way off to assert that the book “fails to define the terms of its discussion”

Peter Wildeford: Being a NYT book reviewer sounds great – lots of people read your stuff and you get so much prestige, and there apparently is minimal need to understand what the book is about or even read the book at all

Jacob Aron at New Scientist (who seems to have jumped the gun and posted on September 8) says the arguments are superficially appealing but fatally flawed. Except he never explains why they are flawed, let alone fatally, except to argue over the definition of ‘wanting’ in a way answered by the book in detail.

But Wait There’s More

There’s a lot the book doesn’t cover. This includes a lot of ways things can go wrong. Danielle Fong for example suggests the idea that the President might let an AI version fine tuned on himself take over instead because why not. And sure, that could happen, indeed do many things come to pass, and many of them involve loss of human control over the future. The book is making the point that these details are not necessary to the case being made.

Once again, I think this is an excellent book, especially for those who are skeptical and who know little about related questions.

You can buy it here.

My full review will be available on Substack and elsewhere on Friday.

LESSWRONG
LW

LESSWRONG
LW

59

Reactions to If Anyone Builds It, Anyone Dies

59

59

No, Seriously, If Anyone Builds It, [Probably] Everyone Dies

Quotes From The Book’s Website

Positive Reviews

The Book In Audio

Friendly Skeptical Reviews

Actively Negative Reviews

But Wait There’s More