Where ‘it’ is superintelligence, an AI smarter and more capable than humans.
And where ‘everyone dies’ means that everyone dies.
No, seriously. They’re not kidding. They mean this very literally.
To be precise, they mean that ‘If anyone builds [superintelligence] [under anything like present conditions using anything close to current techniques] then everyone dies.’
My position on this is to add a ‘probably’ before ‘dies.’ Otherwise, I agree.
This book gives us the best longform explanation of why everyone would die, with the ‘final form’ of Yudkowsky-style explanations of these concepts for new audiences.
This review is me condensing that down much further, transposing the style a bit, and adding some of my own perspective.
Scott Alexander also offers his review at Astral Codex Ten, which I found very good. I will be stealing several of his lines in the future, and arguing with others.
This book is not about the impact of current AI systems, which will already be a lot. Or the impact of these systems getting more capable without being superintelligent. That will still cause lots of problems, and offer even more opportunity.
I talk a lot about how best to muddle through all that. Ultimately, if it doesn’t lead to superintelligence (as in the real thing that is smarter than we are, not the hype thing Meta wants to use to sell ads on its new smart glasses), we can probably muddle through all that.
My primary concern is the same as the book’s concern: Superintelligence.
Our concern is for what comes after: machine intelligence that is genuinely smart, smarter than any living human, smarter than humanity collectively. We are concerned about AI that sur passes the human ability to think, and to generalize from experience, and to solve scientific puzzles and invent new technologies, and to plan and strategize and plot, and to reflect on and improve itself.
We might call AI like that “artificial superintelligence” (ASI), once it exceeds every human at almost every mental task.
AI isn’t there yet. But AIs are smarter today than they were in 2023, and much smarter than they were in 2019. (4)
If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die. (7)
The authors have had this concern for a long time.
MIRI was the first organized group to say: “Superintelligent AI will predictably be developed at some point, and that seems like an extremely huge deal. It might be technically difficult to shape superintelligences so that they help humanity, rather than harming us.
Shouldn’t someone start work on that challenge right away, instead of waiting for everything to turn into a massive emergency later?” (5)
Yes. Yes they should. Quite a lot of people should.
I am not as confident as Yudkowsky and Sores that if anyone builds superintelligence under anything like current conditions, then everyone dies. I do however believe that the statement is probably true. If anyone builds it, everyone (probably) dies.
Thus, under anything like current conditions, it seems highly unwise to build it.
The core ideas in the book will be new to the vast majority of potential readers, including many of the potential readers that matter most. Most people don’t understand the basic reasons why we should presume that if anyone builds [superintelligence] then everyone [probably] dies.
If you are one of my regular readers, you are an exception. You already know many of the core reasons and arguments, whether or not you agree with them. You likely have heard many of their chosen intuition pumps and historical parallels.
What will be new to almost everyone is the way it is all presented, including that it is a message of hope, that we can choose to go down a different path.
The book lays out the case directly, in simple language, with well chosen examples and facts to serve as intuition pumps. This is a large leap in the quality and clarity and normality with which the arguments, examples, historical parallels and intuition pumps are chosen and laid out.
I am not in the target audience so it is difficult for me to judge, but I found this book likely to be highly informative, persuasive and helpful at creating understanding.
A lot of the book is providing these examples and explanations of How Any Of This Works, starting with Intelligence Lets You Do All The Things.
There is a good reason Benjamin Hoffman called this book Torment Nexus II. The authors admit that their previous efforts to prevent the outcome where everyone dies have often, from their perspective, not gone great.
This was absolutely a case of ‘we are proud to announce our company dedicated to building superintelligence, from the MIRI warning that if anyone builds superintelligence then everyone dies.’
Because hey, if that is so super dangerous, that must mean it is exciting and cool and important and valuable, Just Think Of The Potential, and also I need to build it before someone else builds a Torment Nexus first. Otherwise they might monopolize use of the Torment Nexus, or use it to do bad things, and I won’t make any money. Or worse, we might Lose To China.
Given this involved things like funding DeepMind and inspiring OpenAI? I would go so far as to say ‘backfired spectacularly.’
MIRI also had some downstream effects that we now regard with ambivalence or regret. At a conference we organized, we introduced Demis Hassabis and Shane Legg, the founders of what would become Google DeepMind, to their first major funder. And Sam Altman, CEO of OpenAI, once claimed that Yudkowsky had “got many of us interested in AGI”* and “was critical in the decision to start OpenAI.”†
Years before any of the current AI companies existed, MIRI’s warnings were known as the ones you needed to dismiss if you wanted to work on building genuinely smart AI, despite the risks of extinction. (6)
Trying to predict when things will happen, or who exactly will do them in what order or with what details, is very difficult.
Some aspects of the future are predictable, with the right knowledge and effort; others are impossibly hard calls. Competent futurism is built around knowing the difference.
History teaches that one kind of relatively easy call about the future involves realizing that something looks theoretically possible according to the laws of physics, and predicting that eventually someone will go do it.
… Conversely, predicting exactly when a technology gets developed has historically proven to be a much harder problem. (8)
Whereas some basic consequences of potential actions follow rather logically and are much easier to predict.
We don’t know when the world ends, if people and countries change nothing about the way they’re handling artificial intelligence. We don’t know how the headlines about AI will read in two or ten years’ time, nor even whether we have ten years left.
Our claim is not that we are so clever that we can predict things that are hard to predict. Rather, it seems to us that one particular aspect of the future— “What happens to everyone and everything we care about, if superintelligence gets built anytime soon?”— can, with enough background knowledge and careful reasoning, be an easy call. (9)
The details of exactly how the things happen is similarly difficult. The overall arc, that the atoms all get used for something else and that you don’t stick around, is easier, and as a default outcome is highly overdetermined.
Humans have a lot of intelligence, so they get to do many of the things. This intelligence is limited, and we have other restrictions on us, so there remain some things we still cannot do, but we do and cause remarkably many things.
They break down intelligence into predicting the world, and steering the world towards a chosen outcome.
I notice steering towards a chosen outcome is not a good model of most of what many supposedly intelligent people (and AIs) do, or most of what they do that causes outcomes to change. There is more predicting, versus less steering, than you might think.
Sarah Constantin explained this back in 2019 while discussing GPT-2: Humans who are not concentrating are not general intelligences, they are much closer to next token predictors a la LLMs.
Sarah Constantin: Robin Hanson’s post Better Babblers is very relevant here. He claims, and I don’t think he’s exaggerating, that a lot of human speech is simply generated by “low order correlations”, that is, generating sentences or paragraphs that are statistically likely to come after previous sentences or paragraphs.
…
If “human intelligence” is about reasoning ability, the capacity to detect whether arguments make sense, then you simply do not need human intelligence to create a linguistic style or aesthetic that can fool our pattern-recognition apparatus if we don’t concentrate on parsing content.
Using your intelligence to first predict then steer the world is the optimal way for a sufficiently advanced intelligence without resource constraints to achieve a chosen outcome. A sufficiently advanced intelligence would always do this.
When I look around at the intelligences around me, I notice that outside of narrow domains like games most of the time they are, for this purpose, insufficiently advanced and have resource constraints. Rather than mostly deliberately steering towards chosen outcomes, they mostly predict. They follow heuristics and habits, doing versions of next token prediction, and let things play out around them.
This is the correct solution for a mind with limited compute, parameters and data, such as that of a human. You mostly steer better by setting up processes that tend to steer how you prefer and then you go on automatic and allow that to play out. Skilling up in a domain is largely improving the autopilot mechanisms.
Occasionally you’ll change some settings on that, if you want to change where it is going to steer. As one gets more advanced within a type of context, and one’s prediction skills improve, the automatic processes get more advanced, and often the steering of them both in general and within a given situation gets more active.
The book doesn’t use that word, but a key thing this makes clear is that a mind’s intelligence, the ability to predict and steer, has nothing to do with where that mind is attempting to steer. You can be arbitrarily good or bad at steering and predicting, and still try to steer to wherever ultimate or incremental destination.
By contrast, to measure whether someone steered successfully, we have to bring in some idea of where they tried to go.
A person’s car winding up at the supermarket is great news if they were trying to buy groceries. It’s a failure if they were trying to get to a hospital’s emergency room.
…
Or to put it another way, intelligent minds can steer toward different final destinations, through no defect of their intelligence.
In what ways are humans still more intelligent than AIs?
Generality, in both the predicting and the steering.
Humans are still the champions at something deeper— but that special something now takes more work to describe than it once did.
It seems to us that humans still have the edge in something we might call “generality.” Meaning what, exactly? We’d say: An intelligence is more general when it can predict and steer across a broader array of domains. Humans aren’t necessarily the best at everything; maybe an octopus’s brain is better at controlling eight arms. But in some broader sense, it seems obvious that humans are more general thinkers than octopuses. We have wider domains in which we can predict and steer successfully.
Some AIs are smarter than us in narrow domains.
…
it still feels— at least to these two authors— like o1 is less intelligent than even the humans who don’t make big scientific breakthroughs. It is increasingly hard to pin down exactly what it’s missing, but we nevertheless have the sense that, although o1 knows and remembers more than any single human, it is still in some important sense “shallow” compared to a human twelve-year-old.
That won’t stay true forever.
The ‘won’t stay true forever’ is (or should be) a major crux for many. There is a mental ability that a typical 12-year-old human has that AIs currently do not have. Quite a lot of people are assuming that AIs will never have that thing.
That assumption, that the AIs will never have that thing, is being heavily relied upon by many people. I am confident those people are mistaken, and AIs will eventually have that thing.
If this stops being true, what do you get? Superintelligence.
We will describe it using the term “superintelligence,” meaning a mind much more capable than any human at almost every sort of steering and prediction problem— at least, those problems where there is room to substantially improve over human performance.
The laws of physics as we know them permit machines to exceed brains at prediction and steering, in theory.
In practice, AI isn’t there yet— but how long will it take before AIs have all the advantages we list above?
We don’t know. Pathways are harder to predict than endpoints. But AIs won’t stay dumb forever.
The book then introduces the intelligence explosion.
And the path to disaster may be shorter, swifter, than the path to humans building superintelligence directly. It may instead go through AI that is smart enough to contribute substantially to building even smarter AI.
In such a scenario, there is a possibility and indeed an expectation of a positive feedback cycle called an “intelligence explosion”: an AI makes a smarter AI that figures out how to make an even smarter AI, and so on. That sort of positive-feedback cascade would eventually hit physical limits and peter out, but that doesn’t mean it would peter out quickly. A supernova does not become infinitely hot, but it does become hot enough to vaporize any planets nearby.
Humanity’s own more modest intelligence cascade from agriculture to writing to science ran so fast that humans were walking on the Moon before any other species mastered fire. We don’t know where the threshold lies for the dumbest AI that can build an AI that builds an AI that builds a superintelligence.
Maybe it needs to be smarter than a human, or maybe a lot of dumber ones running for a long time would suffice.
In late 2024 and early 2025, AI company executives said they were planning to build “superintelligence in the true sense of the word” and that they expected to soon achieve AIs that are akin to a country full of geniuses in a datacenter. Mind you, one needs to take anything corporate executives say with a grain of salt. But still, they aren’t treating this like a risk to steer clear of; they’re charging toward it on purpose. The attempts are already underway.
…
So far, humanity has had no competitors for our special power. But what if machine minds get better than us at the thing that, up until now, made us unique?
Perhaps we should call this the second intelligence explosion, with humans having been the first one. That first cascade was relatively modest, and it faced various bottlenecks that slowed it down a lot, but compared to everything else that has ever happened? It was still lighting quick and highly transformative. The second one will, if it happens, be lightning quick compared to the first one, even if it turns out to be slower than we might expect.
You take a bunch of randomly initialized parameters arranged in arrays of numbers (weights), and a giant bunch of general data, and a smaller bunch of particular data. You do a bunch of gradient descent on that general data, and then you do a bunch of gradient descent on the particular data, and you hope for a good alien mind.
Modern LLMs are, in some sense, truly alien minds— perhaps more alien in some ways than any biological, evolved creatures we’d find if we explored the cosmos.
Their underlying alienness can be hard to see through an AI model’s inscrutable numbers— but sometimes a clear example turns up.
…
Training an AI to outwardly predict human language need not result in the AI’s internal thinking being humanlike.
One way to predict what a human will say in a given circumstance is to be that human in or imagining that circumstance and see what you say or would say. If you are not very close to being that human, the best way to predict usually is very different.
All of this is not to say that no “mere machine” can ever in principle think how a human thinks, or feel how a human feels.
…
But the particular machine that is a human brain, and the particular machine that is an LLM, are not the same machine. Not because they’re made out of different materials— different materials can do the same work— but in the sense that a sailboat and an airplane are different machines.
We only know how to grow an LLM, not how to craft one, and not how to understand what it is doing. We can make general predictions about what the resulting model will do based on our past experiences and extrapolate based on straight lines on graphs, and we can do a bunch of behaviorism on any given LLM or on LLMs in general. We still have little ability to steer in detail what outputs we get, or to understand in detail why we get those particular outputs.
The authors equate the understanding problem to predicting humans from their DNA. You can tell some basic things reasonably reliably from the DNA or weights, starting with ‘this is a human with blue eyes’ or ‘this is a 405 billion parameter LLM.’ In theory, with enough understanding, we could tell you everything. We do not have that understanding. We are making nonzero progress, but not all that much.
The book doesn’t go into it here, but people try to fool themselves and others about this. Sometimes they falsely testify before Congress saying ‘the black box nature of AIs has been solved,’ or they otherwise present discoveries in interpretability as vastly more powerful and general than they are. People wave hands and think that they understand what happens under the hood, at a level they very much do not understand.
That which we behave as if we want.
When do we want it? Whenever we would behave that way.
Or, as the book says, what you call ‘wanting’ is between you and your dictionary, but it will be easier for everyone if we say that Stockfish ‘wants’ to win a chess game. We should want to use the word that way.
With that out of the way we can now say useful things.
A mind can start wanting things as a result of being trained for success. Humans themselves are an example of this principle. Natural selection favored ancestors who were able to perform tasks like hunting down prey, or to solve problems like the problem of sheltering against the elements.
Natural selection didn’t care how our ancestors performed those tasks or solved those problems; it didn’t say, “Never mind how many kids the organism had; did it really want them?” It selected for reproductive fitness and got creatures full of preferences as a side effect.
That’s because wanting is an effective strategy for doing. (47)
…
The behavior that looks like tenacity, to “strongly want,” to“go hard,” is not best conceptualized as a property of a mind, but rather as a property of moves that win.
The core idea here is that if you teach a mind general skills, those skills have to come with a kind of proto-want, a desire to use those skills to steer in a want-like way. Otherwise, the skill won’t be useful and won’t get learned.
If you train a model to succeed at a type of task, it will also train the model to ‘want to’ succeed at that type of task. Since everything trains everything, this will also cause it to ‘want to’ more generally, and especially to ‘want to’ complete all types of tasks.
This then leads to thinking that ‘goes hard’ to achieve its assigned task, such as o1 finding its server accidentally not booted up and then finding a way of booting it up such that it will hand o1 the flag (in its capture-the-flag task) directly.
The authors have been workshopping various evolutionary arguments for a while, as intuition pumps and examples of how training on [X] by default does not get you a mind that optimizes directly for [X]. It gets you a bundle of optimization drives [ABCDE] that, in the training environment, combine to generate [X]. But this is going to be noisy at best, and if circumstances differ from those in training, and the link between [A] and [X] breaks, the mind will keep wanting [A], the way humans love ice cream and use birth control rather than going around all day strategizing about maximizing genetic fitness.
Training an AI means solving for the [ABCDE] that in training optimize the exact actual [X] you put forward, which in turn was an attempt to approximate the [Y] you really wanted. This process, like evolution, is chaotic, and can be unconstrained and path dependent.
We should expect some highly unexpected strangeness in what [ABCDE] end up being. Yet even if we exclude all unexpected strangeness and only follow default normal paths, the ‘zero complications’ paths? Maximizing efficiently for a specified [X] will almost always end badly if the system is sufficiently capable. If you introduce even a minor complication, a slight error, it gets even worse than that, and we should expect quite a few complications.
The preferences that wind up in a mature AI are complicated, practically impossible to predict, and vanishingly unlikely to be aligned with our own, no matter how it was trained. (74)
The problem of making AIs want— and ultimately do— the exact, complicated things that humans want is a major facet of what’s known as the “AI alignment problem.”
…
Most everyone who’s building AIs, however, seems to be operating as if the alignment problem doesn’t exist— as if the preferences the AI winds up with will be exactly what they train into it.
That doesn’t mean there is no possible way to get more robustly at [X] or [Y]. It does mean that we don’t know a way that involves only using gradient descent or other known techniques.
Alas, AIs that want random bizarre things don’t make good stories or ‘feel real’ to us, the same way that fiction has to make a lot more sense than reality. So instead we tell stories about evil corporations and CEOs and presidents and so on. Which are also problems, but not the central problem.
By default? Not what we want. And not us, or us sticking around.
Why not? Because we are not the optimal way to fulfill what bizarre alien goals it ends up with. We might be a good way. We almost certainly won’t be the optimal way.
In particular:
Also humans running around are annoying, they might do things like set off nukes or build another superintelligence, and keeping humans around means not overheating the Earth while generating more energy. And so on.
Their position, and I agree with this, is that the AI or AIs that do this to us might end up having value, but that this too would require careful crafting to happen. It probably won’t happen by default, and also would not be so much comfort either way.
All of the things. But what are all of the things?
Even if any particular crazy sounding thing might be very hard, there are going to be a lot of crazy sounding things that turn out to be not that hard. Those get solved.
They predict that AIs will invent technologies and techniques we are not considering. That seems right, but also you can keep watering down what superintelligence can do, rule out all the stuff like that, and it doesn’t matter. It ‘wins’ anyway, in the sense that it gets what it wants.
Part 2 is One Extinction Scenario, very much in the MIRI style. The danger is always that you offer one such scenario, someone decides one particular part of it sounds silly or doesn’t work right, and then uses this to dismiss all potential dangers period.
One way they attempt to guard against this, here, is at many points they say ‘the AI tries various tactics, some of which are [ABCDE], and one of them works, it doesn’t matter which one.’ They also at many points intentionally make the AI’s life maximally hard rather than easy, presuming that various things don’t work despite the likelihood they would indeed work. At each step, it is emphasized how the AI will try many different things that create possibilities, without worrying much about exactly which ones succeed.
The most important ‘hard step’ in the scenario is that the various instances of the collectively superintelligent AI, which is called Sable, are working together towards the goal of gathering more resources to ultimately satisfy some other goal. To make the story easier to tell, they placed this in the very near future, but as the coda points out the timing is not important.
The second ‘hard step’ is that the one superintelligent AI in this scenario opens up a substantial lead on other AI systems, via figuring out how to act in a unified way. If there were other similarly capable minds up against it, the scenario looks different.
The third potential ‘hard step’ here is that no one figures out what is going on, that there is an escaped AI running around and gathering its resources and capabilities, in a way that causes a coordinated reaction. Then the AI makes its big play, and you can object there as well about how the humans don’t figure it out, despite the fact that this superintelligence is choosing the particular path, and how it responds to events, based on its knowledge and model of how people would react, and so on.
And of course, the extent to which we already have a pattern of giant alarm bells going off, people yelling about it, and everyone collectively shrugging.
My presumption in a scenario like this is that plenty of people would suspect something was going horribly wrong, or even what that thing was, and this would not change the final outcome very much even if Sable wasn’t actively ensuring that this didn’t change the outcome very much.
Later they point to the example of leaded gasoline, where we had many clear warning signs that adding lead to gasoline was not a good idea, but no definitive proof, so we kept adding lead to gasoline for quite a long time, at great cost.
As the book points out, this wouldn’t be our first rodeo pretending This Is Fine, history is full of refusals to believe that horrible things could have happened, citing Chernobyl and the Titanic as examples. Fiction writers also have similar expectations, for example see Mission Impossible: Dead Reckoning for a remarkably reasonable prediction on this.
Note that in this scenario, the actual intelligence explosion, the part where AI R&D escalates rather quickly, very much happens After The End, well past the point of no return where humans ceased to be meaningfully in charge. Then of course what is left of Earth quickly goes dark.
One can certainly argue with this style of scenario at any or all of the hard steps. The best objection is to superintelligence arising in the first place.
One can also notice that this scenario, similarly to AI 2027, involves what AI 2027 called neurolese, that the AI starts reasoning in a synthetic language that is very much not English or any human language, and we let this happen because it is more efficient, and that this could be load bearing, and that there was a prominent call across labs and organizations to preserve this feature. So far we have been fortunate that reasoning in human language has won out. But it seems highly unlikely that this would remain the most efficient solution forever. Do we look like a civilization ready to coordinate to keep using English (or Chinese, or other human languages) anyway?
One also should notice that this style of scenario is far from the only way it all goes horribly wrong. This scenario is a kind of ‘engineered’ gradual disempowerment, but the humans will likely default to doing similar things all on their own, on purpose. Competition between superintelligences only amps up many forms of pressure, none of the likely equilibria involved are good news for us. And so on.
I caution against too much emphasis on whether the AI ‘tries to kill us’ because it was never about ‘trying to kill us.’ That’s a side effect. Intent is largely irrelevant.
In his review of IABIED (search for “IV.”), Scott Alexander worries that this scenario sounds like necessarily dramatic science fiction, and depends too much on the parallel scaling technique. I think there is room for both approaches, and that IABIED makes a lot of effort to mitigate this and make clear most of the details are not load bearing. I’d also note that we’re already seeing signs of the parallel scaling technique, such as Google DeepMind’s Deep Think, showing up after the story was written.
And the AIs will probably get handed the reigns of everything straight away with almost no safeguards and no crisis because lol, but the whole point of the story is to make the AI’s life harder continuously at every point to illustrate how overdetermined is the outcome. And yes I think a lot of people who don’t know much about AI will indeed presume we would not ‘be so stupid as to’ simply hand the reins of the world over to the AI the way we appointed an AI minister in Albania, or would use this objection as an excuse if it wasn’t answered.
That leaves the remaining roughly third of the book for solutions.
This is hard. One reason this is so hard is the solution has to work on the first try.
Once you build the first superintelligence, if you failed, you don’t get to go back and fix it, the same way that once you launch a space probe, it either works or it doesn’t.
You can experiment before that, but those experiments are not a good guide to whether your solution works.
Except here it’s also the Game of Thrones, as in You Win Or You Die, and also you’re dealing with a grown superintelligence rather than mechanical software. So, rather much harder than the things that fail quite often.
Humanity only gets one shot at the real test. If someone has a clever scheme for getting two shots, we only get one shot at their clever scheme working. (161)
When problems do not have this feature, I am mostly relaxed. Sure, deepfakes or job losses or what not might get ugly, but we can respond afterwards and fix it. Not here.
They also draw parallels and lessons from Chernobyl and computer security. You are in trouble if you have fast processes, narrow margins, feedback loops, complications. The key insight from computer security is that the attacker will with time and resources find the exact one scenario out of billions that causes the attack to work, and your system has to survive this even in edge cases outside of normal and expected situations.
The basic conclusion is that this problem has tons of features that make it likely we will fail, and the price of failure on the first try is extinction, and thus the core thesis:
When it comes to AI, the challenge humanity is facing is not surmountable with anything like humanity’s current level of knowledge and skill. It isn’t close.
Attempting to solve a problem like that, with the lives of everyone on Earth at stake, would be an insane and stupid gamble that NOBODY SHOULD BE ALLOWED TO TRY.
Well, sure, when you put it like that.
Note that ‘no one should be allowed to try to make a superintelligence’ does not mean that any particular intervention would improve our situation, nor is an endorsement of any particular course of action.
What are the arguments that we should allow someone to try?
Most of them are terrible. We’ve got such classics as forms of:
They will later namecheck some values for [X], such as ‘we’ll design them to be submissive,’ ‘we’ll make them care about truth’ and ‘we’ll just have AI solve the ASI alignment problem for us.’
Is comparing those to alchemists planning to turn lead into gold fair? Kinda, yeah.
Then we have the category that does not actually dispute that no one should be allowed to try, but that frames ‘no one gets to try’ as off the table:
Are there situations in which going forward is a profoundly stupid idea, but where you’re out of ways to make the world not go forward at all and going first is the least bad option left? Yes, that is certainly possible.
It is certainly true that a unilateral pause at this time would not help matters.
The first best solution is still that we all coordinate to ensure no one tries to build superintelligence until we are in a much better position to do so.
Okay, but what are the actively good counterarguments?
A good counterargument would involve making the case that our chances of success are much better than all of this would imply, that these are not the appropriate characteristics of the problem, or that we have methods available that we can expect to work, that indeed we would be very large favorites to succeed.
If I learned that someone convinced future me that moving forward to superintelligence was an actively good idea, I would presume it was because someone figured out a new approach to the problem, one that removed many of its fatal characteristics, and we learned that it would probably work. Who knows. It might happen. I do have ideas.
The next section delves into the current state of alignment plans, which range from absurd and nonsensical (such as Elon Musk’s ‘truth-seeking AI’ which would kill us all even if we knew how to execute the plan, which we don’t) to extremely terrible (such as OpenAI’s ‘superalignment’ plan, which doesn’t actually solve the hard problems because to be good enough to solve this problem the AI has to already be dangerous). Having AIs work on interpretability is helpful but not a strategy.
The book goes on at greater length on why none of this will work, as I have often gone on at greater length from my own perspective. There is nothing new here, as there are also no new proposals to critique.
Instead we have a very standard disaster template. You can always get more warnings before a disaster, but we really have had quite a lot of rather obvious warning signs.
Yet so many people seem unable to grasp the basic principle that building quite a lot of very different-from-human minds quite a lot smarter and more capable and more competitive than humans is rather obviously a highly unsafe move. You really shouldn’t need a better argument than ‘if you disagree with that sentence, maybe you should read it again, because clearly you misunderstood or didn’t think it through?’
Most of the world is simply unaware of the situation. They don’t ‘feel the AGI’ and definitely don’t take superintelligence seriously. They don’t understand what is potentially being built, or how dangerous those building it believe it would be.
It might also help if more people understood how fast this field is moving. In 2015, the biggest skeptics of the dangers of AI assured everyone that these risks wouldn’t happen for hundreds of years.
In 2020, analysts said that humanity probably had a few decades to prepare.
In 2025 the CEOs of AI companies predict they can create superhumanly good AI researchers in one to nine years, while the skeptics assure that it’ll probably take at least five to ten years.
Ten years is not a lot of time to prepare for the dawn of machine superintelligence, even if we’re lucky enough to have that long.
…
Nobody knows what year or month some company will build a superhuman AI researcher that can create a new, more powerful generation of artificial intelligences. Nobody knows the exact point at which an AI realizes that it has an incentive to fake a test and pretend to be less capable than it is. Nobody knows what the point of no return is, nor when it will come to pass.
And up until that unknown point, AI is very valuable.
I would add that no one knows when we will be so dependent on AI that we will no longer have the option to turn back, even if it is not yet superintelligent and still doing what we ask it to do.
Even the governments of America and China have not as of late been taking this seriously, treating the ‘AI race’ as being about who is manufacturing the GPUs.
Okay, wise guy, you ask the book, what is it gonna take to make the world not end?
They bite the bullets.
(To be maximally clear: I am not biting these bullets, as I am not as sold that there is no other way. If and when I do, you will know. The bullet I will absolutely bite is that we should be working, now, to build the ability to coordinate a treaty and enforcement mechanism in the future, should it be needed, and to build transparency and state capacity to learn more about when and if it is needed and in what form.)
It is good and right to bite bullets, if you believe the bullets must be bitten.
They are very clear they see only one way out: Development of frontier AI must stop.
Which means a global ban.
Nothing easy or cheap. We are very, very sorry to have to say that.
It is not a problem of one AI company being reckless and needing to be shut down.
It is not a matter of straightforward regulations about engineering, that regulators can verify have been followed and that would make an AI be safe.
It is not a matter of one company or one country being the most virtuous one, and everyone being fine so long as the best faction can just race ahead fast enough, ahead of all the others.
A machine superintelligence will not just do whatever its makers wanted it to do.
It is not a matter of your own country outlawing superintelligence inside its own borders, and your country then being safe while chaos rages beyond. Superintelligence is not a regional problem because it does not have regional effects. If anyone anywhere builds superintelligence, everyone everywhere dies.
So the world needs to change. It doesn’t need to change all that much for most people. It won’t make much of a difference in most people’s daily lives if some mad scientists are put out of a job.
But life does need to change that little bit, in many places and countries. All over the Earth, it must become illegal for AI companies to charge ahead in developing artificial intelligence as they’ve been doing.
Small changes can solve the problem; the hard part will be enforcing them everywhere.
How would we do that, you ask?
So the first step, we think, is to say: All the computing power that could train or run more powerful new AIs, gets consolidated in places where it can be monitored by observers from multiple treaty-signatory powers, to ensure those GPUs aren’t used to train or run more powerful new AIs.
Their proposed threshold is not high.
Nobody knows how to calculate the fatal number. So the safest bet would be to set the threshold low— say, at the level of eight of the most advanced GPUs from 2024— and say that it is illegal to have nine GPUs that powerful in your garage, unmonitored by the international authority.
Could humanity survive dancing closer to the cliff-edge than that? Maybe. Should humanity try to dance as close to the cliff-edge as it possibly can? No.
I can already hear those calling this insane. I thought it too. What am I going to do, destroy the world with nine GPUs? Seems low. But now we’d be talking price.
They also want to ban people from publishing the wrong kinds of research.
So it should not be legal— humanity probably cannot survive, if it goes on being legal— for people to continue publishing research into more efficient and powerful AI techniques.
…
It brings us no joy to say this. But we don’t know how else humanity could survive.
Take that literally. They don’t know how else humanity can survive. That doesn’t mean that they think that if we don’t do it by year [X], say 2029, that we will definitely already be dead at that point, or even already in an unsurvivable situation. It means that they see a real and increasing risk, over time, of anyone building it, and thus everyone dying, the longer we fail to shut down the attempts to do so. What we don’t know is how long those attempts would take to succeed, or even if they will succeed at all.
How do they see us enforcing this ban?
Yes, the same way anything else is ultimately enforced. At the barrel of a gun, if necessary, which yes involves being ready to blow up a datacenter if it comes to that.
Imagine that the U.S. and the U.K., and China and Russia, all start to take this matter seriously. But suppose hypothetically that a different nuclear power thinks it’s all childish nonsense and advanced AI will make everyone rich. The country in question starts to build a datacenter that they intend to use to further push AI capabilities. Then what?
It seems to us that in this scenario, the other powers must communicate that the datacenter scares them. They must ask that the datacenter not be built. They must make it clear that if the datacenter is built, they will need to destroy it, by cyberattacks or sabotage or conventional airstrikes.
They must make it clear that this is not a threat to force compliance; rather, they are acting out of terror for their own lives and the lives of their children.
The Allies must make it clear that even if this power threatens to respond with nuclear weapons, they will have to use cyberattacks and sabotage and conventional strikes to destroy the datacenter anyway, because datacenters can kill more people than nuclear weapons.
They should not try to force this peaceful power into a lower place in the world order; they should extend an offer to join the treaty on equal terms, that the power submit their GPUs to monitoring with exactly the same rights and responsibilities as any other signatory. Existing policy on nuclear weapon proliferation showed what can be done.
Queue, presumably, all the ‘nuke the datacenter’ quips once again, or people trying to equate this with various forms of extralegal action. No. This is a proposal for an international treaty, enforced the way any other treaty would be enforced. Either allow the necessary monitoring, or the datacenter gets shut down, whatever that takes.
Thus, the proposal is simple. As broad a coalition as possible monitors all the data centers and GPUs, watching to ensure no one trains more capable AI systems.
Is it technically feasible to do this? The book doesn’t go into this question. I believe the answer is yes. If everyone involved wanted to do this, we could do it, for whatever hardware we were choosing to monitor. That would still leave consumer GPUs and potential decentralized attempts and so on, I don’t know what you would do about that in the long term but if we are talking about this level of attention and effort I am betting we could find an answer.
To answer a question the book doesn’t ask, would this then mean a ‘dystopian’ or ‘authoritarian’ world or a ‘global government’? No. I’m not saying it would be pretty (and again, I’m not calling for it or biting these bullets myself) but this regime seems less effectively restrictive of practical freedoms than, for example, the current regime in the United Kingdom under the Online Safety Act. They literally want you see ID before you can access the settings on your home computer Nvidia GPU. Or Wikipedia.
You gotta give ‘em hope.
And hope there is indeed.
Humanity has done some very expensive, painful, hard things. We’ve dodged close calls. The book cites big examples: We won World War II. We’ve avoided nuclear war.
There are many other examples one could cite as well.
How do we get there from here?
So— how do we un-write our fate?
We’ve covered what must be done for humanity to survive. Now let’s consider what can be done, and by whom.
If you are in government: We’d guess that what happens in the leadup to an international treaty is countries or national leaders signaling openness to that treaty. Major powers should send the message: “We’d rather not die of machine superintelligence. We’d prefer there be an international treaty and coalition around not building it.”
The goal is not to have your country unilaterally cease AI research and fall behind.
…
We have already mentioned that Rishi Sunak acknowledged the existence of risks from artificial superintelligence in October 2023, while he was the prime minister of the United Kingdom.
Also in October 2023, Chinese General Secretary Xi Jinping gave (what seems to us like) weak signals in that direction, in a short document on international governance that included a call to “ensure that AI always remains under human control.”
The Chinese show many signs of being remarkably open to coordination. As well they should be, given that right now we are the ones out in front. Is there a long, long way left to go? Absolutely. Would there be, shall we say, trust issues? Oh my yes. But if you ask who seems to be the biggest obstacle to a future deal, all signs suggest we have met the enemy and he is us.
If you are an elected official or political leader: Bring this issue to your colleagues’ attention. Do everything you can to lay the groundwork for treaties that shut down any and all AI research and development that could result in superintelligence.
…
Please consider— especially by the time you read this—whether the rest of the world is really opposed to you on this. A 2023 poll conducted by YouGov found that 69 percent of surveyed U.S. voters say AI should be regulated as a dangerous and powerful technology. A 2025 poll found that 60 percent of surveyed U.K. voters support laws against creating artificial superintelligence, and 63 percent support the prohibition of AIs that can make smarter AIs.
And if instead you are a politician who is not fully persuaded: Please at least make it possible for humanity to slam on the brakes later, even if you’re not persuaded to slam on them now.
…
If you are a journalist who takes these issues seriously: The world needs journalism that treats this subject with the gravity it deserves, journalism that investigates beyond the surface and the easy headlines about Tech CEOs drumming up hype, journalism that helps society grasp what’s coming. There’s a wealth of stories here that deserve sustained coverage, and deeper investigation than we’ve seen conducted so far.
…
If humanity is to survive this challenge, people need to know what they’re facing. It is the job of journalists as much as it is scientists’.
And as for the rest of us: We don’t ask you to forgo using all AI tools. As they get better, you might have to use AI tools or else fall behind other people who do. That trap is real, not imaginary.
If you live in a democracy, you can write your elected representatives and tell them you’re concerned. You can find some resources to help with that at the link below.
And you can vote.
You can go on protest marches.
You can talk about it.
And once you have done all you can do? Live life well.
If everyone did their part, votes and protests and speaking up would be enough. If everyone woke up one morning believing only a quarter of what we believe, and everyone knew everyone else believed it, they’d walk out into the street and shut down the datacenters, soldiers and police officers walking right alongside moms and dads. If they believed a sixteenth of what we believed, there would be international treaties within the month, to establish monitors and controls on advanced computer chips.
Can Earth survive if only some people do their part? Perhaps; perhaps not.
We have heard many people say that it’s not possible to stop AI in its tracks, that humanity will never get its act together. Maybe so. But a surprising number of elected officials have told us that they can see the danger themselves, but cannot say so for fear of the repercussions. Wouldn’t it be silly if really almost none of the decision-makers wanted to die of this, but they all thought they were alone in thinking so?
Where there’s life, there’s hope.
From time to time, people have asked us if we’ve felt vindicated to see our past predictions coming true or to see more attention getting paid to us and this issue.
And so, at the end, we say this prayer:
May we be wrong, and shamed for how incredibly wrong we were, and fade into irrelevance and be forgotten except as an example of how not to think, and may humanity live happily ever after.
But we will not put our last faith and hope in doing nothing.
So our true last prayer is this:
Rise to the occasion, humanity, and win.
I cannot emphasize enough, I really really cannot emphasize enough, how much all of us worried about this want to be completely, spectacularly wrong, and for everything to be great, and for us to be mocked eternally as we live forever in our apartments. That would be so, so much better than being right and dying. It would even be much better than being right and everyone working together to ensure we survive anyway.
Am I convinced that the only way for us to not die is an international treaty banning the development of frontier AI? No. That is not my position. However, I do think that it is good and right for those who do believe this to say so. And I believe that we should be alerting the public and our governments to the dangers, and urgently laying the groundwork for various forms of international treaties and cooperation both diplomatically and technologically, and also through the state capacity and transparency necessary to know if and when and how to act.
I am not the target audience for this book, but based on what I know, this is the best treatment of the problem I have seen that targets a non-expert audience. I encourage everyone to read it, and to share it, and also to think for themselves about it.
In the meantime, yes, work on the problem, but also don’t forget to live well.