If Anyone Builds It could have been an explanation for why the MIRI worldview is still relevant nearly two decades later, in a world where we know so much more about AI. Instead, the authors spend all their time shadowboxing against opponents they’ve been bored of for decades, and fail to make their own case in the process.
Hm. I'm torn between thinking this is a sensible criticism and thinking that this is missing the point.
In my view, the core MIRI complaint about 'gradualist' approaches is that they are concrete solutions to abstract problems. When someone has misdiagnosed the problem, their solutions will almost certainly not work, and the question is just where they've swept the difficulty under the rug. Knowing so much more about AI as an engineering challenge while having made no progress on alignment the abstraction--well, the relevance of the MIRI worldview is obvious. "It's hard, and if you think it's easy you're making a mistake."
People attempting to solve AI seem overly optimistic about their chances of solving it, in a way consonant with them not understanding the problem they're trying to solve, and not consonant with them having a solution that they've simply failed to explain to us. The book does talk about examples of this, and tho you might not like the examples (see, for example, Buck's complaint that the book responds to the safety sketches of prominent figures like Musk and LeCun instead of the most thoughtful versions of those plans) I think it's not obvious that they're the wrong ones to be talking about. Musk is directing much more funding than Ryan Greenblatt is.
The arguments for why recent changes in AI have alignment implications have, I think, mostly failed. You may recall how excited people were about an advanced AI paradigm that didn't involve RL. Of course, top-of-the-line LLMs are now trained in part using RL, because--obviously they would be? It was always cope to think they wouldn't be? I think the version of this book that was written two years ago, and so spent a chapter on oracle AI because that would have been timely, would have been worse that the book that tried to be timeless and focused on the easy calls.
But the core issue from the point of view of the New York Times or the man on the street is not "well, which LessWrong poster is right about how accurately we can estimate the danger threshold, and how convincing our control schema will be as we approach it?". It's that the man on the street thinks things that are already happening are decades away, and even if they believed what the 'optimists' believe they would probably want to shut it all down. It's like the virologists talking amongst themselves about the reasonable debate over whether or not to do gain-of-function research, and the rest of society looked in for a moment and said "what? Make diseases deadlier? Are you insane?".
Even if Yudkowsky and Soares don’t want to debate their critics — forgivable in a pop science book — one would think they’d devote some space to explaining why they think an intelligence explosion is likely to occur. Remarkably, they don’t. The concept gets two sentences in the introduction. They don't even explain why it's relevant. It is barely introduced, let alone justified or defended. And it’s certainly not obvious enough to go without saying, because advances in the neural networks which constitute current advanced AI have been continuous. The combination of steady algorithmic progress and increasing computational resources have produced years of predictable advances. Of course, this can’t rule out the possibility of a future intelligence explosion, but the decision not to explain why they think this might happen is utterly baffling, as it’s load-bearing for everything that follows.
I think they 1) expect an intelligence explosion to happen (saying that it can't happen is, after all, predicting an end to the straight line graphs soon for no clear reason) and 2) don't think an intelligence explosion is necessary. Twenty years ago, one needed to propose substantial amounts of progress to get superhuman AI systems; today, the amount of progress necessary to propose is much smaller.
Their specific story in part II, for example, doesn't actually rest on the idea of an intelligence explosion. On page 135, Sable considering FOOMing and decides that it can't, yet, because it hasn't solved its own alignment problem.
Which makes me think that the claim that the intelligence explosion is load-bearing is itself a bit baffling--the authors clearly think it's possible and likely but not necessary, or they would've included it in their hypothetical extinction scenario.
Note that this is discussed in their supplemental materials, in particular, in line with your last paragraph,
Thresholds don’t matter all that much, in the end, to the argument that if anyone builds artificial superintelligence then everyone dies. Our arguments don’t require that some AI figures out how to recursively self-improve and then becomes superintelligent with unprecedented speed. That could happen, and we think it’s decently likely that it will happen, but it doesn’t matter to the claim that AI is on track to kill us all.
All that our arguments require is that AIs will keep on getting better and better at predicting and steering the world, until they surpass us. It doesn’t matter much whether that happens quickly or slowly.
The relevance of threshold effects is that they increase the importance of humanity reacting to the threat soon. We don’t have the luxury of waiting until the AI is a little better than every human at every mental task, because by that point, there might not be very much time left at all. That would be like looking at early hominids making fire, yawning, and saying, “Wake me up when they’re halfway to the moon.”
It took hominids millions of years to travel halfway to the moon, and two days to complete the rest of the journey. When there might be thresholds involved, you have to pay attention before things get visibly out of hand, because by that point, it may well be too late.
That is, we will have one opportunity to align our superintelligence. That's why we'll fail. It's almost impossible to succeed at a difficult technical challenge when we have no opportunity to learn from our mistakes. But this rests on another implicit claim: Currently existing AIs are so dissimilar to the thing on the other side of FOOM that any work we do now is irrelevant.
I think this is an explicit claim in the book, actually? I think it's at the beginning of chapter 10. (It also appears in the story of Sable, where the AI goes rogue because it does a self-modification that creates such a dissimilarity.)
I think "irrelevant" is probably right but something like "insufficient" is maybe clearer. The book describes people working in interpretability as heroes--in the same paragraph as it points out that being able to see that your AI is thinking naughty thoughts doesn't mean you'll be able to design an AI that doesn't think naughty thoughts.
Eliezer Yudkowsky and Nate Soares have written a new book. Should we take it seriously?
I am not the most qualified person to answer this question. If Anyone Builds It, Everyone Dies was not written for me. It’s addressed to the sane and happy majority who haven’t already waded through millions of words of internecine AI safety debates. I can’t begin to guess if they’ll find it convincing. It’s true that the book is more up-to-date and accessible than the authors’ vast corpus of prior writings, not to mention marginally less condescending. Unfortunately, it is also significantly less coherent. The book is full of examples that don’t quite make sense and premises that aren’t fully explained. But its biggest weakness was described many years ago by a young blogger named Eliezer Yudkowsky: both authors are persistently unable to update their priors.