Hello there! This is my first post in Less Wrong, so I will be asking for your indulgence for any overall silliness or breaking of norms that I may inadvertently have fallen into. All feedback will be warmly taken and (ideally) interiorized.
A couple of months ago, dvd published a semi-outsider review of IABIED which I found rather interesting and gave me the idea of sharing my own. I also took notes of every chapter, which I keep in my blog.
My priors
I am a 40-ish year old Spaniard from the rural, northwest corner of the country, so I've never had any sort of face-to-face with the Rationalist community (with the partial exception of attending some online CFAR training sessions of late). There are many reasons why I feel drawn to the community, but in essence, they distill to the following two:
On the other hand, there are lots of things I find unpalatable. Top of the list would likely be polyamory. In second place, what from the outside looks like a highly speculative, nerd-sniping obsession with AI apocalyptic scenarios.
But these are people whom I consider overall both very intelligent and very honest, which means I feel I really need to give a fair trial to their arguments (at least with respect to superintelligence), but this is easier said than done. It is an understatement on the scale of the supermassive black hole at the center of our galaxy to say that Eliezer Yudkowsky is a prolific writer. His reflections on AI are mostly dispersed amongst the ~1.2 to 1.4 million words of his Sequences. There are lots of posts, summaries, debates and reflections by many other people, mostly on LessWrong often technical and assuming familiarity with Yudkowsky’s concepts.
There are some popular books that offer a light introduction to these topics, and which I’ve gone through[1], but I was missing a simple and clear argument for a quasi-normie on the Yudkowskian case of both the possibility and the dangers of superintelligent AI. I think I mostly got it from this, so let’s get to the review.
Thinking about The End of the World™
The title and (UK) subtitle of If Anyone Builds It, Everyone Dies: The Case Against Superintelligent AI (from now on, IABIED for short) is a partial summary of the book’s core thesis. Spelled out in only slightly more detail, and in the author’s words:
If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.
Let’s start with the basics. So first, what is a superintelligent AI (from now on, ASI for short)? It would be any machine intelligence that "exceeds every human at almost every mental task". A more formal version appears in Chapter 1, where superintelligence is defined as “a mind much more capable than any human at almost every sort of steering and prediction problem[2]” that is, at the broad family of abilities involved in understanding the world, planning, strategizing, and making accurate models of reality. They also emphasize that this does not mean humanlike cognition or consciousness; what matters is overwhelming cognitive advantage in any domain where improvement over humans is possible, combined with mechanical advantages such as operating at vastly higher speeds, copying itself, and recursively improving its own capabilities. Such intelligences do not exist right now, but the claim of the authors is that LLM training and research is likely to make them a reality in a very near future.
Why would such superintelligences be dangerous to us? A good heuristic is to think of how human intelligence impacts all other species in the planet. although we generally aren’t intentionally murderous towards them, we just have human goals and implement them in general with disregard to whatever goals other creatures might have. The same would be true for an ASI: in the process of being trained using modern methods of Gradient Descent, it will acquire inscrutable and alien goals and a penchant for unchecked optimization in attaining them. Given its speed and superior capabilities, it will end up considering humans as an obstacle and eliminate us a side-effect of pursuing its goals[3].
Before building such dangerous Frankenstein’s monsters, one would hope to somehow be able to code into them a respect/appreciation for humanity, our survival and our values and/or a willingness to submit to them. This is what gets called the alignment problem and unfortunately, according to the authors, it is likely hard, perhaps impossible, definitely beyond our current capabilities. The difficulty of the problem is compounded by a cursed cluster of very unique and lethal properties that arise from trying to align ASIs under current conditions:
The authors also manifest a deep mistrust towards the entire field of Machine Learning, AI safety and policy, as structurally incapable of managing the risks: researchers are rewarded for progress, not caution, and are stuck in a naive, overoptimistic ‘alchemical’ state of science from which big errors will naturally arise; techniques like “Superalignment” (using AIs to align AIs) fall into negative loops (who aligns the aligner, given that it is unlikely for the authors that anything short of an ASI could align an ASI); and academia and industry have no real theory of intelligence or reliable way to encode values.
Given all of the former, the authors think there’s an extremely high likelihood of the drive to ASI leading to human extinction. Part II of the book depicts, in way of illustration, a plausible fictional scenario to show how this could come to pass: an AI develops superintelligence, becomes capable of strategic deception and in a few years, after gains compute by fraud and theft and building biolabs, it deploys a slow, global pathogen which only it can (partially) cure. Human institutions collapse, give more and more compute to the ASI in the hope of treating the cancers while it employs the new resources for building a replacement of robotic workers. In the end, the superintelligence self-improves and devours the Earth.
What do the authors propose to avoid this apocalyptic scenario from taking place? The proposals are as simple but rather sweeping: a global shutting down of AI developments and research that can lead to ASIs through international bans on training frontier models, seizure and regulation of GPUs and international surveillance and enforcement, possibly including military deterrence. The last chapter ends with tailored exhortations for politicians (compute regulation and treaties), journalists (elevate and investigate risks), and citizens (advocacy without despair).
How well does it argue its case?
This is a book that pulls no punches: its rhetorical impact comes in no small way from the clarity, simplicity and from the relentless way they build their case, each chapter narrowing the possibilities until only catastrophe seems to remain. The authors have clearly strived to write a book that is accessible to a lay audience, and hammered in each theme and main idea through introductory parables at the beginning of each chapter that give intuition and a concrete visualization to what’s about to be explained[4].
A big curse for me here though is the question of reliability: the authors are more than capable to build a plausible narrative about these topics, but is it a true one? Although the book tries to establish the credentials of its authors from the beginning as researchers in AI alignment, Yudkowsky and Soares are not machine learning researchers, do not work on frontier LLMs, and do not participate in the empirical, experimental side of the field, where today’s systems are actually trained, debugged, and evaluated. Rather, their expertise, to the degree that it is recognized, comes from longstanding conceptual and philosophical work on intelligence, decision theory, and alignment hypotheses, instead of from direct participation in the engineering of contemporary models. While this doesn’t invalidate their arguments, it does mean that many of the book’s strongest claims are made from what seems like an armchair vantage point rather than from engagement with how present-day systems behave, fail, or are controlled in practice. And a lot of the people who are working in the field seem to consider the author’s views as valuable and somewhat reasonable, but overtly pessimistic.
Another weakness lies in how the book treats expert disagreement[5]. At times the authors appeal to prominent figures as evidence that the danger is widely acknowledged. At other times, the book paints the entire ML and AI safety ecosystem as naive, reckless, or intellectually unserious. This oscillation (either “the experts agree with us” or “the experts are deluded alchemists”) functions rhetorically, but weakens the epistemic credibility of the argument. On a topic where expert divergence is already wide, this selective invocation of authority can feel like special pleading.
The last chapters depart from argument and move instead to prescriptive policies; while the authors acknowledge their lack of expertise here, the proposals they make (while perfectly consistent and proportional with the beliefs that have been explained in the previous pages of the book) do not seem to seriously engage with feasibility, international incentives, geopolitical asymmetries, enforcement mechanisms, or historical analogues. I think they are well aware how extremely unlikely the scenario of a sweeping global moratorium enforced by surveillance and possibly military action really is, which is likely why the likelihood they give to the probability of human extinction from ASI is over 90 per cent. One gets the feeling that the authors are just raising their hands and saying something like: “Look, we are doomed, and there’s no realistic way we’re getting out of this short of doing stuff we are not going to do. These proposals are the necessary consequence of accepting what is stated in the preceding chapters, so that’s that[6]”.
It would be nice if I could dissect the book and tell you how accurate the arguments it makes are, or what might be missing, questionable, inconsistent, or overclaimed, but I am, as stated from the beginning, a lay reader, so you’ll have to look for all that somewhere else, I fear[7].
What's my update, after reading the book?
I’ll start by saying that I take the contents of the book seriously, and that I have no reason to doubt the sincerity and earnestness of the authors. I am quite sure they genuinely believe in what they say here. Obviously, that doesn’t mean they are right.
The book has done a very good job of clarifying the core Yudkowskian arguments for me and dispelling several common misunderstandings. After reading it, I feel inclined to update upward how seriously we should take ASI risks, and I can see how the argument hangs together logically, given its premises. But the degree of credence I give to said premises remains a bit limited, and I am fundamentally skeptical of their certainty and framework. The main issues I’d highlight as needing clarification for me (which I’ve already hinted at in the notes to the chapters I posted here before) would be:
I close the book not persuaded, but grateful for the opportunity to engage with an argument presented with such focus and conviction. Books that force a reader to refine their own views, whether through agreement or resistance, serve a purpose, and this one has done that for me. And all the more so when they address something as consequential as the possibility of human extinction. And I definitely will recommend this book to others.
Like The Rationalist Guide to the Galaxy, by Tom Chivers or some chapters of Toby Ord’s The Precipice. I intend to read Superintelligence, by Nick Bostrom in 2026, and perhaps the Sequences too.
Good at steering and predicting acts as the de facto definition of intelligence used here, which allows the authors to extricate from messy debates about consciousness, volition, sentience, etc…
All this sounds suspiciously like it would require some psychological “drive for power”, but the authors go out of their way to point that it would just follow from general properties of optimization and intelligence, as defined in the book.
Some reviews have been very critical of these parables, but I think such criticisms miss the point, or rather, the intended audience. The authors regularly insist that there are other places where one can encounter objections and more technical versions of the contents of the book (in fact, each chapter contains QR code links to such sources, and besides, there’s the previous mountain of text to be found in Yudkowsky’s Sequences and in LessWrong blog posts).
As a side note, Rationalists usually have a very antagonistic view towards experts and expertise, a need to build everything from first principles and a deeply embedded contrarian culture. This feels like an ad hominem argument, but I don’t think I can be completely ignore it either.
Perhaps I am too much of a cynic here. After all, there are examples of humans collectively rising up to tough and dangerous challenges, like nuclear and bacteriological/chemical warfare, genetic engineering and the Ozone layer, to name a few. Once risks are clearly seem by the public, it can be done.
And yes, as opposed to Rats, I have little qualms at deferring to better authorities. I remember finding Scott Alexander’s, Nina Panickserry’s and Clara Collier’s reviews reasonable and informative. I also found the 2 podcasts of Carl Shulman with Dwarkesh Patel very enlightening.