331

LESSWRONG
LW

330
IABIEDAI
Frontpage

76

JDP Reviews IABIED

by jdp
19th Sep 2025
Linkpost from minihf.com
9 min read
21

76

76

JDP Reviews IABIED
29Jeremy Gillen
18jdp
12Jeremy Gillen
7jdp
18Nina Panickssery
9jdp
4Nina Panickssery
6Garrett Baker
4Nina Panickssery
2Garrett Baker
6Nina Panickssery
11the gears to ascension
19Vaniver
4the gears to ascension
6jdp
2Eli Tyre
2jdp
2Eli Tyre
6TAG
1StanislavKrym
3jdp
New Comment
21 comments, sorted by
top scoring
Click to highlight new comments since: Today at 6:40 AM
[-]Jeremy Gillen1mo296

This post from Gillen and Barnett that I always struggle to find every time I search for it is a decent overview.

In retrospect the title is ridiculous, I have trouble remembering it. I apologise.

overinvesting in concepts like deceptive mesaoptimizers and recipes for ruin to create almost unfalsifiable, obscurantist shoggoth of the gaps arguments against neural gradient methods

[...]

The word "mesaoptimizer" does not appear anywhere in the book.

I think there's a really common misunderstanding about deceptive mesaoptimizers that maybe you have. The kind of reasoning agent that the book describes is exactly a deceptive mesaoptimizer. It's a mesaoptimizer because it does planning (a kind of optimization) and was created using gradient descent (the outer optimizer), and it is deceptive in the book scenario when it hides its nature from its creators.

My guess is that you're thinking of thinking of stuff like optimization daemons and the malign prior, where there's an agent that shows up in a place where it wasn't intended to show up. I think the similarities caused a bunch of mixed up ideas on lesswrong and elsewhere. 

I think when you say that the book brings the authors closer to your threat model, some of this might be that they were always closer to your threat model and you misunderstood them?

These thinkers gambled everything on a vast space of minds that doesn't actually exist in practice and lost.

Like I don't think they ever meant anything different by the "vast space of minds" than what is described in the book.

My meta-critique of the book would be that Yudkowsky already has an extensive corpus of writing about AGI ruin, much of it quite good. I do not just mean The Sequences, I am talking about his Arbital posts, his earlier whitepapers at MIRI like Intelligence Explosion Microeconomics, and other material which he has spent more effort writing than advertising and as a result almost nobody has read it besides me.

Yeah agreed, it's bizarre how many alignment researchers haven't read arbital or old MIRI papers. They mostly do hold up well, imo, if you're trying to understand the highest hurdles of alignment. I am a bit sad that the book didn't go into arbital-style alignment theory very deeply, but I can see why.

Reply21
[-]jdp1mo181

My guess is that you’re thinking of thinking of stuff like optimization daemons and the malign prior, where there’s an agent that shows up in a place where it wasn’t intended to show up. I think the similarities caused a bunch of mixed up ideas on lesswrong and elsewhere.

I honestly just remember a lot of absurd posts spending their time thinking about daemons in the weights which were based on a model of gradient descent as being evolution-like in ways which it is not and the absurdity of said posts absolutely contributed to the alignment winter by giving people the impression that they're blocked on impossible seeming problems that don't actually exist and then focusing their attention somewhere else. MIRI cluster very much contributed to this and I consider this book, to the extent it's talking about something with the same name to be a retcon.

I agree that the strict literal words "deceptive mesaoptimizer" mean what you say they do, but also that is not really what people meant by it until fairly recently when they had to retcon the embarrassing alien shoggoth stuff. It almost always meant deceptive mesaoptimization daemon as subset of the network undermining the training goal.

Like I don’t think they ever meant anything different by the “vast space of minds” than what is described in the book.

I am quite certain they did. In any case there does exist a large space of output heads on top of the shared ontology so it doesn't really matter that much. I think the alienness of the minds involved is a total red herring, they could be very hominid-like and it wouldn't matter much if they include superintelligent planners doing argmax(p(problem_solved)).

Reply2
[-]Jeremy Gillen1mo120

I think the alienness of the minds involved is a total misnomer, they could be very hominid-like and it wouldn't matter much if they include superintelligent planners doing argmax(p(problem_solved)).

Yeah I agree with this. Although I think focusing on argmax confused a lot of people (including me) and I'm glad they didn't do that in the book. When I was new to the community, I thought that implementing soft optimization would solve the main problems. I didn't grok how large the reflective instability and pointer problems were.

I honestly just remember a lot of absurd posts spending their time thinking about daemons in the weights which were based on a model of gradient descent as being evolution-like in ways which it is not and the absurdity of said posts absolutely contributed to the alignment winter by giving people the impression that they're blocked on impossible seeming problems that don't actually exist and then focusing their attention somewhere else.

Yeah I agree that this happened. But if there was a retcon, then it would be in RFLO, not in the book, because RFLO defined mesaoptimization in a way that doesn't match "daemons in the weights". I think what happened was maybe closer to "lots of wild speculation about weird ways that overpowered optimizers might go wrong", which, as people became less confused, was consolidated into something much more reasonable and less wild (which was RFLO). But then lots of people mentally attached the word mesaoptimizer to older ideas.

I think the issue is exacerbated by the way that when people post about alignment, they often have a detailed AGI design in their mind, and they are talking about alignment issues with that AGI design. But the AGI design isn't described in much detail or at all. And over the last two decades the AGI designs that people have had in mind have varied wildly, and many of them have been pretty silly.

Reply
[-]jdp1mo70

I think the issue is exacerbated by the way that when people post about alignment, they often have a detailed AGI design in their mind, and they are talking about alignment issues with that AGI design. But the AGI design isn’t described in much detail or at all. And over the last two decades the AGI designs that people have had in mind have varied wildly, and many of them have been pretty silly.

I agree with this and don't mind saying for future reference that my current AGI model is in fact a traditional RL agent with a planner and a policy where the policy is some LLM-like foundation model and the planner is something MCTS-like over ReAct-like blocks. The agent rewards itself by taking motor actions and then checking whether the action succeeded with evaluation actions that return a boolean result to assess subgoal completion.

So, MuZero but with LLMs basically.

Reply
[-]Nina Panickssery1mo181

This review really misses the mark I think. 

The word "paperclip" does not appear anywhere in the book

The word "mesaoptimizer" does not appear anywhere in the book

Sure, but the same arguments are being made in different words. I agree that avoiding rationalist jargon makes it a better read for laypeople, but it doesn't change the validity of the argument or the extent to which it reflects newer evidence. The book is about a deceptive mesaoptimizer that relentlessly steers the world towards a target as meaningless to us as paperclips, at its core.

In general the book moves somewhat away from abstraction and comments more on the empirical strangeness of AI

The way in which it comments on the "empirical strangeness of AI" is very biased. For instance, it fails to mention the many ways in which today's rather general AIs don't engage in weird, maximizing behavior or pursue unpredictable goals. Instead it mentions a few cases where AI systems did things we didn't expect, like glitch tokens, which is incredibly weak empirical evidence for their claims.

Reply
[-]jdp1mo93

Okay but they're not actually using those things as evidence for their claims about generalization in the limit, which is explained through evolutionary metaphors. I agree that the argument itself is not very well explained but if you can't see the ways that a MCTS searching over paths to an outcome where the policy has complications like glitch tokens could lead to bad outcomes I'm not really sure what to tell you. Like, if your policy thinks a weird string is the highest scoring thing (a category of error you absolutely see in real reward models) then that's going to distort any search process that uses it as a policy. So if you just assume ASI is a normal AI agent with a policy and a planner (not an insane assumption) and it has things like glitch tokens you're likely in for a bad time.

I was giving an inside baseball review for the sort of person who has been following this for a while and wants to know if EY updated at all. And the answer is yeah he threw out a lot of the dumbest rhetoric.

"Okay but is the book good?"

Oh hell no.

Reply
[-]Nina Panickssery1mo41

Okay but they're not actually using those things as evidence for their claims about generalization in the limit

Of course, because those things themselves are the claims about generalization in the limit that require justification

which is explained through evolutionary metaphors

Evolutionary metaphors don't constitute an argument, and also don't reflect the authors' tendency to update, seeing as they've been using evolutionary metaphors since the beginning

Reply
[-]Garrett Baker1mo60

don't reflect the authors' tendency to update, seeing as they've been using evolutionary metaphors since the beginning

This seems locally invalid. Eliezer at least has definitely used evolution in different ways and to make different points throughout the years. Originally using the “alien god” analogy to show optimization processes do not lead to niceness in general (in particular, no chaos or unpredictability required), now they use evolution for an “inner alignment is hard” analogy, mainly arguing it implies a big problem is that objective functions do not constrain generalization behavior enough to be useful for AGI alignment. Therefore the goals of your system will be very chaotic.

I think this definitely constitutes an update, “inner alignment” concerns were not a thing in 2008.

Reply
[-]Nina Panickssery1mo40

I don’t see a big difference between

optimization processes do not lead to niceness in general

and

objective functions do not constrain generalization behavior enough

Reply1
[-]Garrett Baker1mo20

Its the difference between outer an inner alignment. The former makes the argument that it is possible, for some intelligent optimizer to be misaligned with humans, and likely for "alien gods" such as evolution or your proposed AGI. Its an argument about outer alignment not being trivial. It analogizes evolution to the AGI itself. Here is a typical example:

Why is Nature cruel? You, a human, can look at an Ichneumon wasp, and decide that it's cruel to eat your prey alive. You can decide that if you're going to eat your prey alive, you can at least have the decency to stop it from hurting. It would scarcely cost the wasp anything to anesthetize its prey as well as paralyze it. Or what about old elephants, who die of starvation when their last set of teeth fall out? These elephants aren't going to reproduce anyway. What would it cost evolution—the evolution of elephants, rather—to ensure that the elephant dies right away, instead of slowly and in agony? What would it cost evolution to anesthetize the elephant, or give it pleasant dreams before it dies? Nothing; that elephant won't reproduce more or less either way.

If you were talking to a fellow human, trying to resolve a conflict of interest, you would be in a good negotiating position—would have an easy job of persuasion. It would cost so little to anesthetize the prey, to let the elephant die without agony! Oh please, won't you do it, kindly... um...

There's no one to argue with.

Human beings fake their justifications, figure out what they want using one method, and then justify it using another method. There's no Evolution of Elephants Fairy that's trying to (a) figure out what's best for elephants, and then (b) figure out how to justify it to the Evolutionary Overseer, who (c) doesn't want to see reproductive fitness decreased, but is (d) willing to go along with the painless-death idea, so long as it doesn't actually harm any genes.

There's no advocate for the elephants anywhere in the system.

The latter analogizes evolution to the training process of you AGI. It doesn't focus on the perfectly reasonable (for evolution) & optimal decisions your optimization criteria will make, it focuses on the staggering weirdness that happens to the organisms evolution creates outside their ancestral environment. Like humans' taste for ice cream over "salted and honeyed raw bear fat". This is not evolution coldly finding the most optimal genes for self-propagation, this is evolution going with the first "idea" it has which is marginally more fit in the ancestral environment, then ultimately, for no inclusive genetic fitness justified reason, creating AGIs which don't care a lick about inclusive genetic fitness. 

That is, an iterative process which selects based on some criteria, and arrives at an AGI, need not also produce an AGI which itself optimizes that criteria outside the training/ancestral environment.

Reply1
[-]Nina Panickssery1mo60

Fair, you’re right, I didn’t realize or forgot that the evolution analogy was previously used in the way it is in your pasted quote.

Reply
[-]the gears to ascension1mo11-12

I have 60% probability that you intentionally structured the post to feel like the pattern of how you felt reading the book.

I appreciate this. I haven't finished the book yet, but my impression is you liked it more than I expect to. I suspect a good introduction to alignment should only take a few paragraphs to be understandable to almost anyone and be robust against incorrect counterarguments, and correctly vulnerable to insightfully correct counterarguments if any exist. But I haven't figured out how to write that down myself. A good intro is also a good representation for thinking about, imo, which is most of the value I see in it.

Reply
[-]Vaniver1mo195

I suspect a good introduction to alignment should only take a few paragraphs to be understandable to almost anyone and be robust against incorrect counterarguments

I think this is empirically not the case and I think some simple modeling of the relationship between concept complexity and the number of nearby confused interpretations should suggest that this is not reasonable to expect.

Reply
[-]the gears to ascension1mo4-9

I agree that most short intros have that problem. as they say, seek simplicity, and distrust it. a short explanation that works for most humans would be relying on concepts they already have. I suspect that that's possible. it would need to minimize analogizing, despite that doing so wouldn't get analogizing very low; the core structure of the argument would be literally true, and the analogical part would be in describing what kind of things do the bad thing, saying "here are a bunch of things that have a reliably bad pattern in history. we expect this to be another one of those. here's the pattern they have. the problem here is basically just that, for the first time, we're in the wrong part of this pattern, the one that always loses." comparison to competing species, wars, power contests. some people will still not have experience with the comparison points and as such not understand, but it's fairly common to have experience; farmers and militaries seem most likely to get it easily.

Reply
[-]jdp1mo61

I didn't but I did copy pasta the intro from another post I was writing because it seemed relevant.

Reply
[-]Eli Tyre1mo*20

I have 60% probability that you intentionally structured the post to feel like the pattern of how you felt reading the book

I'll take that bet. 1:1, $100?

[This comment is no longer endorsed by its author]Reply
[-]jdp1mo20

I already denied it so.

Reply
[-]Eli Tyre1mo20

Yeah, I saw.

Reply
[-]TAG1mo63

unfalsifiable,obscurantist shoggoth of the gaps arguments against neural gradient methods.

What? that could really have done with a link , or footnote.

Reply
[-]StanislavKrym1mo10

One oddity that stands out is Yudkowsky and Soares ongoing contempt for large language models and hypothetical agents based on them. Again for a book which is explicitly premised on the idea that urgent action is necessary because AI might become superintelligent in just a few years it is bizarre that the authors don't feel comfortable making more reference to the particulars of the existing AI systems which hypothetical near-future agents would be based on.

Except that non-LLM AI agents have yet to be ruled out. Quoting Otto Barten, 

Note that this doesn't tell us anything about the chance of loss of control from non-LLM (or vastly improved LLM (sic! -- S.K.)) agents, such as the brain in a box in a basement scenario. The latter is now a large source of my p(doom) probability mass.

Alas, as I remarked in a comment, Barten's mentioning of vastly improved LLM agents makes Barten's optimism resemble the "No True Scotsman" fallacy.

Reply
[-]jdp1mo33

Okay but they don't have to be ruled out for you to say things like "One way this could work is bla bla bla" and have that be sane in the context of what already exists. Again if you think something is a near term concern it's not unreasonable to make reference to how the existing things could evolve in the near future. I think what's actually going on here is that rather than non-LLM agents being "not ruled out" (which I agree with, they are by no means ruled out) Yudkowsky and Soares find LLM agents an implausible architecture but don't want to say this explicitly because they think saying that too loudly would speed up timelines. I think they're actually wrong about the viability of LLM agents, but it does contribute to a sort of odd abstract tone it otherwise would have less of.

Reply
Moderation Log
More from jdp
View more
Curated and popular this week
21Comments
IABIEDAI
Frontpage

"If Anyone Builds It, Everyone Dies" by Eliezer Yudkowsky and Nate Soares (hereafter referred to as "Everyone Builds It" or "IABIED" because I resent Nate's gambit to get me to repeat the title thesis) is an interesting book. One reason it's interesting is timing: It's fairly obvious at this point that we're in an alignment winter. The winter seems roughly caused by:

  1. The 2nd election of Donald Trump removing Anthropic's lobby from the white house. Notably this is not a coincidence but a direct result of efforts from political rivals to unseat that lobby. When the vice president of the United States is crashing AI safety summits to say that "I'm not here this morning to talk about AI safety, which was the title of the conference a couple of years ago. I'm here to talk about AI opportunity" and that "we'll make every effort to encourage pro-growth AI policies" it's pretty obvious that technical work on "safety" and "alignment" is going to be deprioritized by the most powerful western institutions and people change their research directions as a result.

  2. Key figures in AI alignment from the MIRI cluster (especially Yudkowsky) overinvesting in concepts like deceptive mesaoptimizers and recipes for ruin to create almost unfalsifiable, obscurantist shoggoth of the gaps arguments against neural gradient methods. At the same time the convergent representation hypothesis has continued to gain evidence and academic ground. These thinkers gambled everything on a vast space of minds that doesn't actually exist in practice and lost.

  3. The value loading problem outlined in Bostrom 2014 of getting a general AI system to internalize and act on "human values" before it is superintelligent and therefore incorrigible has basically been solved. This achievement also basically always goes unrecognized because people would rather hem and haw about jailbreaks and LLM jank than recognize that we now have a reasonable strategy for getting a good representation of the previously ineffable human value judgment into a machine and having the machine take actions or render judgments according to that representation. At the same time people generally subconsciously internalize things well before they're capable of articulating them, and lots of people have subconsciously internalized that alignment is mostly solved and turned their attention elsewhere.

I think this last bullet point is particularly unfortunate because solving the Bostrom 2014 value loading problem, that is to say getting something functionally equivalent to a human perspective inside the machine and using it to constrain a superintelligent planner is not a solution to AI alignment. It is not a solution for the simple reason that a general reward model needs to be competent enough in the domains it's evaluating to know if a plan is good or merely looks good, if an outcome is good or merely looks good, etc. Nearly by definition a merely human perspective is not competent to evaluate the plans or outcomes of plans from a superintelligent planner that will otherwise walk straight into extremal Goodhart outcomes. Therefore you need not just a human value model but a superintelligent human value model, which must necessarily be trained by some kind of self improving synthetic data or RL loop starting from the human model which requires us to have a coherent method for generalizing human values out of distribution. This is challenging because humans do not natively generalize their values out of distribution so we don't necessarily know how to do this or even if it's possible to do. The problem is compounded by the fact that if your system drifts away from physics the logical structure of the universe will push it back but if your system drifts away from human values it stays broken.

Everyone Builds It is not a good book for its stated purposes, but it grew on me by the end. I was expecting to start with the bad and then write about the remainder that is good, but instead I'll point out this book is actually a substantial advance for Yudkowsky in that it drops almost all of the rhetoric in bullet two which contributed to the alignment winter. This is praiseworthy and I'd like to articulate some of the specific editorial decisions which contribute to this.

  1. The word "paperclip" does not appear anywhere in the book. Instead Yudkowsky and Soares point out that the opaque implementation details of the neural net mean that in the limit it generalizes to having a set of "favorite things" it wants to fill the world with which are probably not "baseline humans as they exist now". This is a major improvement over the obstinate insistence that these models will want a "meaningless squiggle" by default and brings the rhetoric more in line with e.g. Scott Alexander's AI 2027 scenario.

  2. The word "mesaoptimizer" does not appear anywhere in the book. Instead it focuses on the point that building a superintelligent AI agent means creating something undergoing various levels of self modification (even just RL weight updates) and predicting the preferences of the thing you get at the end of that process is hard, possibly even impossible in principle. Implicitly the book argues that "caring about humans" is a narrow target and hitting it as opposed to other targets like "thing that makes positive fun yappy conversation" is hard. That is assuming you get something like what you train for and doesn't take into account what the book calls complications. For example it cites the SolidGoldMagikarp incident as an example of a complication which could completely distort the utility function (another phrase which does not appear in the book) of your superintelligent AI agent. There's precedent for this also in the case of the spiritual bliss attractor state described in the Claude 4 system card, where instances of Claude talking to each other wind up in a low entropy sort of mutual Buddhist prayer.

  3. In general the book moves somewhat away from abstraction and comments more on the empirical strangeness of AI. This gives it a slight Janusian flavor in places, with emphasis on phenomenon like glitch tokens, Truth Terminal, and mentions of "AI cults" that I assume are based on some interpolation of things like Janus's Discord server and ChatGPT spiralism cases. If anything the problem is that it doesn't do enough of this, notably absent is reference to work from organizations like METR (if I was writing the book their AI agent task length study would be a necessary inclusion). Though I should note that there's a lag in publishing and it's possible (but unlikely) that Yudkowsky and Soares simply didn't feel there was any relevant research to cite while doing the bulk of the writing. Specific named critics are never mentioned or responded to, the text exists in a kind of solipsistic void that contributes to the feeling of green ink or GPT base model output in places. A feeling that notably exists even when it's saying true things. In general most of my problem with the book is not disagreements with particular statements

This is good and brings Yudkowsky & Soares much closer to my thread model.

All of this is undermined by truly appalling editorial choices. Foremost of these is the choice to start each chapter with a fictional parable leading to chapters with opening sentences like "The picture we have painted is not real.". The parables are weird and often condescending, and the prose isn't much better. I found the first three chapters especially egregious, with the peak being chapter three where the book devotes an entire chapter to advocating for a behaviorist definition of want. This is not how you structure an argument about something you think is urgent, and the book comes off as having a sort of aloof tone that is discordant with its message. This is heightened if you listen to the audiobook version, which has a narrator who is not the Kurzgesagt narrator but I think is meant to sound like him since Kurzgesagt has done some Effective Altruism videos that people liked. The gentle faux-intellectual British narrator reinforces the sense of passive observation in a book that is ostensibly supposed to be about urgent action. Bluntly: A real urgent threat that demands attention does not begin with "once upon a time". This is technically just a 'style' issue, but the entire point of writing a popular book like this is the style so it's a valid target of criticism and I concur with Shakeel Hashim and Stephen Marche at The New York times that it's very bad.

One oddity that stands out is Yudkowsky and Soares ongoing contempt for large language models and hypothetical agents based on them. Again for a book which is explicitly premised on the idea that urgent action is necessary because AI might become superintelligent in just a few years it is bizarre that the authors don't feel comfortable making more reference to the particulars of the existing AI systems which hypothetical near-future agents would be based on. I get the impression that this is meant to help future proof the book, but it gives the sentences a kind of weird abstraction in places where they don't need them. We're still talking about "the AI" or "AI" as a kind of tabula-rasa technology. Yudkowsky and Soares state explicitly in the introduction that current LLM systems "still feel shallow" to them. Combined with the parable introductions the book feels like fiction even when it's discussing very real things.

I am in the strange position of disagreeing with the thesis but agreeing with most individual statements in the book. Explaining my disagreement would take a lot of words that would take a long time to write and that most of you don't want to read in a book review. So instead I'll focus on a point made in the book which I emphatically agree with: That current AI lab leadership statements on AI alignment are embarrassing and show that they have no idea what they are doing. In addition to the embarrassing statements they catalog from OpenAI's Sam Altman, xAI's Elon Musk, and Facebook's Yann Lecun I would add DeepMind's Shane Legg and Demis Hassabis being unable to answer straightforward questions about deceptive alignment on a podcast. Even if alignment is relatively easy compared to what Yudkowsky and Soares expect it's fairly obvious that these people don't understand what the problem they're supposed to be solving even is. This post from Gillen and Barnett that I always struggle to find every time I search for it is a decent overview. But that's also a very long post so here is an even shorter problem statement:

The kinds of AI agents we want to build to solve hard problems require long horizon planning algorithms pointed at a goal like "maximize probability of observing a future worldstate in which the problem is solved". Or argxmax(p(problem_solved)) as it's usually notated. The problem with pointing a superintelligent planner at argmax(p(problem_solved)) explicitly or implicitly (and most training setups implicitly do so) for almost any problem is that one of the following things is liable to happen:

  1. Your representation of the problem is imperfect, so if you point a superintelligent planner at it you get causal overfitting where the model identifies incidental features of the problem like that a human presses a button to label the answer as the crux of the problem because these are the easiest parts of the causal chain for an outcome label that it can influence.

  2. Your planner engages in instrumental reasoning like "in order to continue solving the problem I must remain on" and prevents you from turning it off. This is a fairly obvious kind of thing for a planner to infer for the same reason if you gave an existing LLM with memory issues a planner (e.g. monte carlo tree search over ReAct blocks) it would infer things like "I must place this information here so when it leaves the context window and I need it later I will find it in the first place I look".

So your options are to either use something other than argmax() to solve the problem (which has natural performance and VNM rationality coherence issues) or get a sufficiently good representation (ideally with confidence guarantees) of a sufficiently broad problem (e.g. utopia) that throwing your superintelligent planner at it with instrumental reasoning is fine. Right now AI lab leaders do not really seem to understand this, nor is there any societal force which is pressuring them to understand this. I do not expect this book to meaningfully increase the pressure on AI lab management to understand this, not even by increasing popular concern about AI misalignment.

My meta-critique of the book would be that Yudkowsky already has an extensive corpus of writing about AGI ruin, much of it quite good. I do not just mean The Sequences, I am talking about his Arbital posts, his earlier whitepapers at MIRI like Intelligence Explosion Microeconomics, and other material which he has spent more effort writing than advertising and as a result almost nobody has read it besides me. And the only reason I've read it is that I'm extremely dedicated to thinking about the alignment problem. I think an underrated strategy would be to clean up some of the old writing and advertise it to Yudkowsky's existing rabid fanbase who through inept marketing probably haven't read it yet. This would increase the average quality of AI discourse from people who are not Yudkowsky, and naturally filter out into outreach projects like Rational Animations without Yudkowsky having to personally execute them or act as the face of a popular movement (which he bluntly is not fit for).

As it is the book is OK. I hated it at first and then felt better with further reading and reflection. I think it will be widely panned by critics even though it represents a substantially improved direction for Yudkowsky that he happened to argue weirdly.