Problems with "The Possessed Machines"

Eye You

So, The Possessed Machines. There's been some discussion already. It is a valuable piece -- it has certainly provoked some thought in me! -- but it has some major flaws. It (sneakily!) dismisses specific arguments about AI existential risk and broad swaths of discourse altogether without actually arguing against them. Also, the author is untrustworthy at the moment; readers should be skeptical of purported first-person information in the piece.

This image comes from a different "book review" of Demons. It's an excellent piece. I highly recommend it.

Before getting into it, I want to praise the title. "Possessed" has four relevant meanings: demonic; ideologically possessed; frenzied/manically/madly; belonging to someone. "Machines" has three possible referents: AI; people; an efficient group of powerful people/institutions. There are twelve combinations there. I see the following seven (!) as being applicable.

1. Demonic machines; machines that are intelligent and evil.

2. Machines that belong to us; AI is something humanity currently possesses.

3a. Frenzied, manically productive people (AI-folk).

3b. Demonic, machine-like people.

3c. Ideologically possessed people. (They are machines for their ideology).

4a. The accelerationist AI industry.^[1]

4b. The out-of-control technocapitalist machine.^[2]

4c. The cabal of AI tech elites and their political allies.

Okay, let's get into it.

Dismissal of pivotal acts

An important idea in Possessed Machines is the Shigalyovian system/argument.^[3] This system/argument is defined as such:

No one can quite refute the argument. And this is Dostoevsky's point: the argument cannot be refuted on its own terms because its premises, once accepted, do indeed lead to its conclusions. The error is in the premises, but the premises are hidden behind such a mass of reasoning that they are difficult to locate.
I want to be very direct about the contemporary relevance of this passage. The AI safety community has developed its own versions of Shigalyovism—systems of thought that begin with freedom and end with despotism, proposals that would sacrifice almost everything to preserve what they define as valuable.

In theory, then, one should be able to dismantle a Shigalyovian system by identifying its hidden premises and arguing against them. Now, part of what makes an argument Shigalyovian is that its premises are difficult to locate, so this might be difficult. Indeed, part of the rhetorical and memetic success of these systems comes from this difficulty. Nevertheless, these premises exist and can be discovered.

The author then gives an example of this in today's AI world:

The concept of a "pivotal act" is perhaps the clearest example. A pivotal act, in AI safety discourse, is an action taken by a powerful AI system that permanently prevents certain catastrophic outcomes. The canonical example is using an aligned AI to prevent all other AI development—establishing a kind of permanent monopoly on artificial intelligence.¹⁵
This is Shigalyovism in digital form. It begins with the desire to protect humanity and ends with a proposal for a single point of failure controlling all future technological development. The reasoning is internally consistent: if unaligned AI would destroy humanity, and if many independent AI projects increase the probability of unaligned AI, then preventing independent AI development reduces existential risk. QED.
But the conclusion is monstrous. A world in which a single entity controls all AI development is a world without meaningful freedom, without the possibility of exit, without any check on the power of whoever controls that entity. It is Shigalyov's one-tenth ruling over his nine-tenths, with the moral framework of "preventing extinction" replacing the moral framework of "achieving paradise."

The implied conclusion here is that we shouldn't use an aligned AI to prevent all other AI development. But the author doesn't actually argue for this. In this Shigalyovian framework, what they need to do to rebut the pivotal act argument is find the hidden premises that are objectionable and argue against those. But the author doesn't do this.

To put this another way: the argument here is of the form:
1. X->Y is a system that begins with freedom and ends with despotism.

2. Thus X->Y is Shigalyovian.

3. [Implied] Thus X->Y is valid but unsound.

4. [Implied] Thus Y is wrong.

Where X->Y is the "pivotal act" system: <<The desire to protect humanity -> using an aligned AI to prevent all other AI development.>>

There's something wrong with this argument. The problem here is that the author hasn't actually shown that X->Y is unsound. Just because a valid argument starts from freedom and ends with despotism doesn't mean it's wrong! To figure out if the conclusion is wrong, you have to look at the assumptions -- the "hidden premises".

What are the hidden premises in the pivotal act argument? And what is the error in them? I don't know! But if you want to argue against pivotal acts... you need to engage with these questions substantively. Merely pointing out that the system starts from freedom and ends with "despotism" and that its conclusion is "monstrous" to you... is not enough. It's not a real argument.

Dismissal of calm, rational discourse

A central claim of Possessed Machines is that the heart of the problem is the moral deficit of certain powerful people in the industry. In the chapter titled "What Is to Be Done?", the author writes:

The core problem is that the people making the key decisions are, many of them, damaged in ways that disqualify them from making these decisions wisely.

They continue:

This damage is not primarily intellectual. The people I am thinking of are intelligent, often extraordinarily so. It is something more like moral—a failure of the channels that connect knowledge to action, that make abstract truths feel binding, that generate appropriate emotional responses to contemplated harms.

There are two characters in Demons that have this moral deficit: Pyotr Verkhovensky and Stavrogin.

Verkhovensky is charming, clever, and absolutely without moral content. He believes in nothing except his own power and the excitement of watching things burn...
Stavrogin is brilliant, beautiful, charismatic, and utterly empty... He is capable of intellectual engagement at the highest level but experiences it as performance rather than connection.

Possessed Machines makes a specific point: that some of the most powerful people in AI are Verkhovenskys and Stavrogins. I have no qualms with this.

Then there's a related, broader point that the essay makes, which is something like: most of the calm, "rational" discussion on AI existential risk comes from a deprived place. In one case, this deprivation is an emotional/moral "numbness":

Some of the people who speak most calmly about human extinction are not calm because they have achieved wisdom but because they have achieved numbness. They have looked at the abyss so long that they no longer see it. Their equanimity is not strength; it is the absence of appropriate emotional response.

In another case, this deprivation is "the aestheticization of darkness" or "performance":

Stavrogin's confession fails because he has turned it into a performance. He wants the shock value without the repentance. He wants to be seen as someone who has done terrible things and faces them without flinching—but this desire is itself a form of flinching, a way of converting a moral reality into an aesthetic pose.
I see this dynamic throughout the rationalist-adjacent world. The willingness to discuss existential risk, to contemplate human extinction, to reason about torture and genocide and civilizational collapse—all of this is valuable insofar as it helps us think more clearly about these topics. But it becomes dangerous when the willingness to discuss becomes the primary thing, when people compete to be the most willing to face the darkest topics, when the pose of unflinching analysis substitutes for genuine moral engagement.

The implication in these passages is that this deprivation makes the calm, "rational" AI existential risk discourse fundamentally unsound. The discourse (so the author claims) comes from numbness, not wisdom. The discourse is performance, not genuine moral engagement.

But wait. Does that mean the arguments are wrong? Again, we have a case in which the author does not actually engage with the arguments. Like in their discussion of the "pivotal act", the author provides meta-reasons for dismissing the arguments of the other side but does not actually engage with these arguments. The reader is left with the feeling that a valid rebuttal has been made, but it hasn't.

What is it, exactly, that the author wants? Reading those passages again, I gather they want "appropriate emotional response" and "genuine moral engagement". As opposed to... equanimity? Unflinching analysis?

Call me crazy, but I think that equanimity and unflinching analysis are good. Now, perhaps the author isn't criticizing the presence of these things but rather the lack of the other things. ¿Por qué no los dos?

Okay. What is the correct emotional response to all this? That's actually a good question, seriously, ask yourself that. What about genuine moral engagement? "When the pose of unflinching analysis substitutes for genuine moral engagement" sounds nice but... what does it mean? Call me crazy, but I think unflinching analysis is pretty good! What is the alternative?

Can we trust the author?

No. If the author is indeed who they say they are, they should provide verification. Why do I think this?

1. I think the author is being dishonest about how this piece was written.

There is a lot of AI in the writing of Possessed Machines. The bottom of the webpage states "To conceal stylistic identifiers of the authors, the above text is a sentence-for-sentence rewrite of an original hand-written composition processed via Claude Opus 4.5." As I wrote in a comment:

Ah, this [statement] was not there when I read the piece (Jan 23). You can see an archived version here in which it doesn't say that.

I don't actually believe that this is how the document was made. A few reasons. First, I don't think this is what a sentence-for-sentence rewrite looks like; I don't think you get that much of the AI style that this piece has with that^. Second, the stories in the interlude are superrrrr AI-y, not just in sentence-by-sentence style but in other ways. Third, the chapter and part titles seem very AI generated...
The piece has 31 uses of “genuine”/“genuinely” in ~17000 words. One “genuine” every 550 words.

2. Fishiness

From kaiwilliams:

There's some stuff that feels a little bit weird here. The author says they left in early 2024 and then spent the "following months" reading Dostoevsky and writing this essay. Was the essay a bit older and only got put up? (Has to be relatively recently edited, if it was run through 4.5). Who are the editors alluded to at the very end? Is it supposed to be Tim Hwang? A little bit more transparency would be much appreciated (the disclaimer about Opus 4.5 being used for anonymization was only added on the 24th after some people had pointed out that it sounded rather AI-written.).
Another weirdness: why did Hwang put up another microsite about Demons that's written by an anonymous author "still working in industry" that has clear LLM-writing patterns at basically the same time? https://shigalyovism.com/. Though this one is much less in-depth.

At the bottom of the webpage in an "About the Author" box, we are told "Correspondence may be directed to the editors." This is weird, because we don't know who the editors are. Probably this was something that Claude added and the human author didn't check.

Richard_Kennaway points out:

There are some anomalies in the chapter numbering:
Part IV ends with Chapter 18; Part V begins with Chapter 21... [etc.]

3. This piece could have been written by someone who wasn't an AI insider

If you're immersed in 2025/2026 ~rationalist AI discourse, you would have the information to write Possessed Machines. That is, there's no "inside information" in the piece. There is a lot of "I saw people at the lab do this [thing that I, a non-insider, already thought that people at the lab did]". Leogao has made this same point: "it seems plausible that the piece was written by someone who only has access to public writings."

^{^}
From the essay: "Not by ideology, not by any single vision, but by the spirit of acceleration itself—the drive toward "more" and "faster" that has no end point and no criterion for success except continued motion."
^{^}
"Technocapitalist machine" = the system made up of VCs, startups, labs, government, etc.
The machine is out of control in the sense that it has goals of its own, we can't control it, and it's creating something evil. It's a possessed machine.
^{^}
I understand "system" as meaning something like "system of beliefs"; synonyms I'd use are "worldview" or "memeplex".

Merely pointing out that the system starts from freedom and ends with "despotism" and that its conclusion is "monstrous" to you... is not enough. It's not a real argument.

sure it is! consider a classic troll argument that 1=0. we can conclude that some premise, or step of reasoning must be false, even if we are unable to locate the step. collaborative inquiry would have us then work together to determine the gap.

here, the contradiction is moral rather than logical: "i cannot stand the world that this argument implies is necessary." nonetheless, a response of "well, you need to engage with the reasoning, not just the conclusion" is rather missing the tone. we should prefer to work with our potentially dissatisfied friend to better understand our own argument, and the range of conclusions they could support.

Yeah, the possessed machines is a very good example of how important good web design is.

What do you mean by this? I didn't see a mention of web design in the OP, it seems largely about the content and the epistemic status of the content.

I think the content and arguments in the possessed machines are not very good, the prose is ok but it would read as too self important and AI slop on a less distinguished webpage. I think that many people were charmed by how respectable and nice the website looked and so were willing to give the writing and arguments much more leeway when sharing the page (if they read it at all).

Edit: I believe this because I notice it is popular, and I noticed this dynamic in myself, I felt I wanted the article to be deep and interesting, but also that if I thought about telling the arguments to a skeptical observer very few of them would stand up to scrutiny.

Yeah I think that good web-design is a costly signal that you're invested in the quality of the writing, like in the past when someone published a book. However it is fake-able.

It has the classic signals: Garamond font, use of smallcaps... I'm reminded of Situational Awareness (not to mention my own website or Read The Sequences).

The HTML source is a little curious - unusually regular and minimalist, but also possibly buggy? I was struck by some apparent bugs; for example, there are two Garamond fonts (why?), the first two subtitles are weirdly misaligned with everything else (at least in my Firefox), and I notice by reading the HTML source that the author goes to the trouble of writing div.character-intro class wrappers around each new character introduced, but the only use seems to be to tweak the margin, and it barely does anything (it adds a tiny bit of vertical space).

It doesn't come off as designed by hand, really, so I wonder if a LLM played a hand in that too.

Yeah, on reflection I'd be willing to bet that the designer had an LLM look at some of the sites you mentioned and then asked it to imitate the style closely, as a core part of the web design.

This is the affect heuristic at work in design. I think it actually applies in two ways:

It makes the site look serious, thoughtful, academic, and not disposable by resembling a book with effort put into the typesetting.
It evokes websites with relevant takes on AI: LW, Gwern.net, Situational Awareness (as Gwern mentions in a comment), AI 2027, Anthropic's research pages.

Call me crazy, but I think unflinching analysis is pretty good! What is the alternative?

first, i disagree with the author of the original essay. the rationalist community clearly does engage with emotional and moral realities.

that said, from a faith-based (as opposed to acts-based) perspective, the supposed lack of engagement does undermine the arguments made. it is not so easy to make this perspective clear, according to the rules of logical argument. but my best attempt is something like so:

humans have many drives. most of these are self-serving, if not outright selfish. only one ("compassionate attending" maybe?) is good and just and trustworthy. in the absence of that one, some other motivation will come to bear. this other motivation cannot be trusted, and will bend the argument to its dark designs. to avoid this, one must ~have a pure heart~ engage with the emotional and moral realities.

essentially this rejects the (implicit) acts-based claim that "the provenance of an argument does not matter to its soundness": we are, after all, fallible humans. and for any argument in a domain rich enough for moral philosophy to matter, we will find enough complexity that we should not trust the argument just because it seems sound.

again, i reject the author's premise. as well, i myself am not perfectly sympathetic to the faith-based perspective. however, i think it is worth giving it a fair shake.