One cannot believe that AI development should stop entirely. One cannot believe that the risks are so severe that no level of benefit justifies them. One cannot believe that the people currently working on AI are not the right people to be making these decisions. One cannot believe that traditional political processes might be better equipped to govern AI development than the informal governance of the research community.
FWIW, I believe all those things, especially #3. (well, with nuance. Like, it's not my ideal policy package, I think if I were in charge of the whole world we'd stop AI development temporarily and then figure out a new, safer, less power-concentrating way to proceed with it. But it's significantly better by my lights than what most people in the industry and on twitter and in DC are advocating for. I guess I should say I approximately believe all those things, and/or I think they are all directionally correct.)
But I am not representative of the 'uniparty' I guess. I think the 'uniparty' idea is a fairly accurate description of how frontier AI labs are, including the people in the labs who think of themselves as AI safety people. There are exceptions of course. I don't think the 'uniparty' as described by this anonymous essay is an accurate description of the AI safety community more generally. Basically I think it's pretty accurate at describing the part of the community that inhabits and is closely entangled with the AI companies, but inaccurate at describing e.g. MIRI or AIFP or most of the orgs in Constellation, or FLI or ... etc. It's unclear whether it's claiming to describe those groups, it wasn't super clear about its scope.
You are obviously not in the AGI uniparty (e.g. you chose to leave despite great financial cost).
Basically I think it's pretty accurate at describing the part of the community that inhabits and is closely entangled with the AI companies, but inaccurate at describing e.g. MIRI or AIFP or most of the orgs in Constellation, or FLI or ... etc.
I agree with most of these, though my vague sense is some Constellation orgs are quite entangled with Anthropic (e.g. sending people to Anthropic, Anthropic safety teams coworking there, etc.), and Anthropic seems like the cultural core of the AGI uniparty.
well, with nuance. Like, it's not my ideal policy package, I think if I were in charge of the whole world we'd stop AI development temporarily and then figure out a new, safer, less power-concentrating way to proceed with it. But it's significantly better by my lights than what most people in the industry and on twitter and in DC are advocating for. I guess I should say I approximately believe all those things, and/or I think they are all directionally correct
With all due respect, I'm pretty sure that the existence of this very long string of qualifiers and very carefully reasoned hedges is precisely what the author means when he talks about intellectualised but not internalised beliefs.
Can you elaborate? What do you think I should be doing or saying differently, if I really internalized the things I believe?
To be honest, I wasn't really pointing at you when I made the comment, more at the practice of the hedges and the qualifiers. I want to emphasise that (from the evidence available to me publicly) I think that you have internalised your beliefs a lot more than those the author collects into the "uniparty". I think that you have acted bravely and with courage in support of your convictions, especially in face of the NDA situation, for which I hold immense respect. It could not have been easy to leave when you did.
However, my interpretation of what the author is saying is that beliefs like "I think what these people are doing might seriously end the world" are in a sense fundamentally difficult to square with measured reasoning and careful qualifiers. The end of the world and existential risk are by their nature so totalising and awful ideas that any "sane" interaction with them (as in, trying to set measured bounds and make sensible models) is extremely epistemically unsound, the equivalent of arguing whether 1e8 + 14 people or 1e8 + 17 people (3 extra lives!) will be the true number of casualties in some kind of planetary extinction event when the error bars are themselves +- 1e5 or 1e6. (We are, after all, dealing with never-seen-before black swan events.)
In this sense, detailed debates about which metrics to include in a takeoff model and the precise slope of the METR exponential curve and which combination of chip trade and export policies increases tail risk the most/least is itself a kind of deception. This is because the arguing over details implies that our world and risk models have more accuracy and precision than they actually do, and in turn that we have more control over events than we actually do. "Directionally correct" is in fact the most accuracy we're going to get, because (per the author) Silicon Valley isn't actually doing some kind of carefully calculated compute-optimal RSI takeoff launch sequence with a well understood theory of learning. The AGI "industry" is more like a group of people pulling the lever of a slot machine over and over and over again, egged on by a crowd of eager onlookers, spending down the world's collective savings accounts until one of them wins big. By "win big", of course, I mean "unleashes a fundamentally new kind of intelligence into the world". And each of them may do it for different reasons, and some of them may in their heads actually have some kind of master plan, but all it looks like from the outside is ka-ching, ka-ching, ka-ching, ka-ching...
@Daniel Kokotajlo Addendum:
Finally, my interpretation of "Chapter 18: What Is to Be Done?" (and the closest I will come to answering your question based on the author's theory/frame) is something like "the AGI-birthing dynamic is not a rational dynamic, therefore it cannot be defeated by policies or strategies that are focused around rational action". Furthermore, since each actor wants to believe that their contribution to the dynamic is locally rational (if I don't do it someone else will/I'm counterfactually helping/this intervention will be net positive/I can use my influence for good at a pivotal moment [...] pick your argument), further arguments about optimally rational policies only encourages the delusion that everyone is acting rationally, making them dig in their heels further.
The core emotions the author points to that motivate the AGI dynamic are: thrill of novelty/innovation/discovery, paranoia and fear about "others" (other labs/other countries/other people) achieving immense power, distrust of institutions, philosophies, and systems that underpin the world, and a sense of self importance/destiny. All of these can be justified with intellectual arguments but are often the bottom line that comes before such arguments are written. On the other hand the author also shows how poor emotional understanding and estrangement from one's emotions and intuitions lead to people getting trapped by faulty but extremely sophisticated logic. Basically, emotions and intuitions offer first order heuristics in the massively high dimensional space of possible actions/policies, and when you cut off the heuristic system you are vulnerable to high dimensional traps/false leads that your logic or deductive abilities are insufficient to extract you from.
Therefore, the answer the author is pointing at is something like an emotional or frame realignment challenge. You don't start arguing with a suicidal person about why the logical reasons they have offered for jumping don't make sense (at least, you don't do this if you want them to stay alive), you try to point them to a different emotional frame or state (i.e. calming them down and showing them there is a way out). Though he leaves it very vague, it seems that he believes the world will also need such a fundamental frame shift or belief-reinterpretation to actually exit this destructive dynamic, the magnitude of which he likens to a religious revelation and compares to the redemptive power of love. Beyond this point I would be filling in my own interpretation and I will stop there, but I have a lot more thoughts about this (especially the idea of love/coordination/ends to moloch).
I am surprised you didn't mention the fact that the whole thing was paraphrased to preserve anonymity by Opus 4.5. (Which really stood out to me! When I first read it, I assumed it was AI-generated, and I was disconcerted to see such quality of thought coming with such a slopreek to the prose.)
Fair, I should've mentioned this. I speculated about this on Twitter yesterday. I also found the prose slightly off-putting. Will edit to mention.
The Possessed Machines is one of the most important AI microsites.
It is brand new; the domain was only registered two days ago according to WHOIS data. So this is a surprising claim.
I suspect that you are either the creator, or you read about it on /r/slatestarcodex on Reddit today and were really impressed by it. Please elucidate.
I am obviously not the creator; I have not worked at a frontier lab (as you can verify through online stalkery if you must). (I also have not even read Demons, but that's harder to verify)
I think I first saw this through the highly-viewed Tim Hwang tweet, but also have had several people in-person mention it to me. I am not on Reddit at all.
The microsites that stand out to me are Gradual Disempowerment, Situational Awareness, and (this one is half my fault) the Intelligence Curse. It's not a large set. Gradual Disempowerment talks about cultural and psychological in the abstract and as affected by future AIs, but not concretely analyzing the current cultural & social state of the field. I don't remember seeing a substantive cultural/psychological/social critique of the AGI uniparty before. I think this alone justifies that statement.
The Possessed Machines is one of the most important AI microsites. It was published anonymously by an ex- lab employee, and does not seem to have spread very far, likely at least partly due to this anonymity (e.g. there is no LessWrong discussion at the time I'm posting this). This post is my attempt to fix that.
(The piece was likely substantially human-directed but laundered through an AI due to anonymity or laziness. Thanks to Malcolm MacLeod for reminding me to mention this in the comments. See here for Pangram-on-X analysis claiming 67.5% AI. The prose is not its strength.)
I do not agree with everything in the piece, but I think cultural critiques of the "AGI uniparty" are vastly undersupplied and incredibly important in modeling & fixing the current trajectory.
The piece is a long but worthwhile analysis of some of the cultural and psychological failures of the AGI industry. The frame is Dostoevsky's Demons (alternatively translated The Possessed), a novel about ruin in a small provincial town. The author argues it's best read as a detailed description of earnest people causing a catastrophe by following tracks laid down by the surrounding culture that have gotten corrupted:
The piece is rich in good shorthands for important concepts, many taken from Dostoevsky, which I try to summarize below.
First: how to generalize from fictional evidence, correctly
The author argues for literature as a source of limited but valuable insight into questions of culture and moral intuition:
Stavroginism: the human orthogonality thesis
Stavrogin is a character for who moral considerations have become a parlor game. He can analyze everything and follow the threads of moral logic, but is not moved or compelled by them at a level beyond curiosity.
Kirillovan reasoning: reasoning to suicide
Closely related is Kirillov. Whereas Stavrogin is the detached curious observer to long chains of off-the-rails moral reasoning, Kirillov is the true believer.
The author compares Kirillov to people who accept Pascal's wager -type EV calculations about positive singularities. A better example might be the successionists, some of who want humanity to collectively commit suicide as the ultimate act of human moral concern towards future AIs.
Shigalyovism: reasoning to despotism
If Stavrogin is the intellectually entranced x-risk spectator & speculator, and Kirillov is the self-destructive whacko, Shigalyov is the political theorist who has rederived absolute despotism and Platonic totalitarianism for the AGI era.
Hollowed institutions
Possession
The AGI uniparty
The liberal father as creator of the nihilist son
Liberal Stepan's son Pyotr Stepanovich is a chief nihilist character in Demons. The author of The Possessed Machines argues this sort of thing - EA altruism turning into either outright nihilism or power-hunger - is a core cultural mechanic. I think they are directionally right but I don't follow their main example of this, which argues "technology ethics frameworks that are supposed to govern AI—fairness, accountability, transparency, the whole FAccT constellation—are the Stepan Trofimovich liberalism of our moment", and "the serious people [...] have moved past these frameworks" because they are obsolete. My read of the intellectual history is that AGI-related concerns and galaxy-brained arguments about the future of galaxies preceded that cluste rof more prosaic AI concerns, and they're different branches on the intellectual tree, rather than successors of each other.
Handcuffed Shatov
The solution is fundamentally spiritual