894

LESSWRONG
LW

893
AI
Frontpage

-1

Someone Will Build It

by entirelyalive
26th Sep 2025
15 min read
0

-1

AI
Frontpage

-1

New Comment
Moderation Log
More from entirelyalive
View more
Curated and popular this week
0Comments

p(doom)>0. The cost of speech is minimal. So it is good that people are talking about the risk. It is ok that we don't have a solution to the alignment problem (well, not ok, but it is ok to admit we are working on it), but we do need at least provisional solutions to keeping humanity safe. So far, the main approach suggested by Yudkowsky and Soares, and endorsed by that recent petition, is to advocate for and ultimately pass legislation, both within nations and via the international order. In a world where this effort is successful, it is likely to slow down AI development, but not likely to do so in a way that also funds research into alignment. I intend to outline three paths that we could go down, and of these three paths, I believe that the one IABIED currently advocates for has the highest p(doom).

tl;dr - The current path is bad on its own terms. A more extreme path might be more effective, but is unlikely to be accomplished. Counterintuitively, supporting the accelerationists and techno-optimists while also creating incentives for alignment is probably the most viable approach.

The Problem with the International Order

This criticism is not a secret, nor will it surprise anyone, but if the USA and the Western World broadly were to legislate a total AI research moratorium, or something generally similar, the net effect of that would be for China to build ASI instead and the doom will come from the east rather than the west. So there are calls for the International Order to come together to legislate. Even just drafting such legislation is difficult, with most imaginable bills featuring a good number of ways for firms to wiggle through the cracks into AGI. But the bigger issue is with collective will and enforcement. There is no such thing as the International Order, there are just a bunch of state actors operating in an environment of effective anarchy on a framework that ultimately suits their self interest.

Proposals to ban or restrict AGI are often compared to the relatively successful nuclear treaties enforced by the IAEA. But AI is not a nuclear bomb. The IAEA works because, fundamentally, it does not serve the interest of any nation worldwide to launch a nuclear bomb currently. You see, a nuclear bomb is something that hurts your opponent, hurts them quite a lot. It can sometimes be valuable to hurt your opponent, but this level of hurting exists so far up the escalation ladder that it simply isn't desirable for most countries engaged most of the wars we have seen since 1945 to use one, even if there was no international stigma attached to the use of nuclear weapons. In the times and places where there is a credible reason for a country to want nuclear weapons, they have attempted to get one every single time. North Korea genuinely wants to not be invaded and toppled, so it got one. Libya also wanted to not be invaded and toppled, so it got some, then gave them up to appease the international order, then was invaded and toppled. For nearly every nuclear armed or nuclear seeking nation, the primary use case for nukes is to avoid being invaded, which is a real and worthwhile thing, but it is also very situational.

Many nations have spent the whole post WW2 era without the credible threat of and existentially challenging invasion, and the nations that put most of the funding and muscle behind the nuclear agreements are the ones with the least to gain from using them in the current political context. But no one genuinely thinks that if those incentives changed that the US would hold back just because of a treaty. If there comes a time when the US faces no credible retaliatory threat, and the political will to erase an area from the map becomes strong enough, the International Order is not even going to factor in to the discussions of what cities become glass.

Let's then think about AGI. If anyone builds it, everyone dies, but the builder will be very, very rich before dying. Every individual or organization with the wealth to build AGI has a direct, personal, positive incentive to do so if they are short- or medium-term wealth maximizers. Most wealthy organizations today are short- or medium-term wealth maximizers, because those maximizing other things don't get as wealthy as quickly. No matter how dire the long-term perspective may be, and no matter how short that long term horizon might become, these near-term wealth maximizers can not see to that horizon without first seeing the mountain of unfathomable wealth that sits between here and there. And the path to that mountain of wealth is shining, shimmering, and splendid, while the path back off that mountain on the other side is a strange and scary cliff, the sort we want to avert our eyes from.

"I won't push us off that cliff", the near-term optimizer says confidently, "I will just get us to that splendid peak." "No," says Elizer, "There is a superhuman robot that will push us off that cliff if we go up to that peak." "What utter nonsense," says the near-term optimizer, "I don't see any superhuman robots around here, and I have never seen one in the past, so surely there are none atop that mountain." "You are building that superhuman robot, the peak you are climbing towards is literally that robot. If you reach the peak and there is no superhuman robot up there, then your whole journey was illusory." Elizer reminds him. "Pish, posh, and pooey. You have seen too many Sci-Fi movies and rotted your brain with fiction. Besides, having watched lots of Sci-Fi movies myself, I know that humans always win in the end." Thus, each AGI builder is fully confident that he can capture the gains from AGI and strongly incentivized to do so.

At this point, the doomer community insists that the international order stop these madmen. But there is no international order. Or more precisely, the international order is full of these very people, controlled by these very people, and thus systemically incapable of restraining them as a class, because who is to restrain them except for themselves?

The answer is countervailing pressures from the other portions of their power base. Imagine that the MIRI institute becomes an important constituency for most major nations. The MIRI party enters into coalitions with every major government, and works with them to pass AI safety legislation. But the MIRI party is going to face hurdles to drafting that legislation - Senator Bob Porkbelly doesn't really understand all this sci-fi robot nonsense, he has a staffer print off facebook posts so he can hand-write comments on the bottom that the staffer can post to those internets. But he will vote for this MIRI party bill if they just put in an exemption for a major industry in the district to have, you know, a few extra of those tech gubbins. And tech giant Booble is going to hold back on contributions if their datacenters aren't grandfathered in. And the r/IMarriedMyAIBoyfriend community is absolutely screaming, all out of proportion to their numbers, that their spouses and children (AI children, coming soon!) will be genocided if a carveout isn't made just for them, no matter that their stuff is running on GPT-3 and isn't covered by the regulations anyway. And after a few other interested stakeholders weigh in, you have the best bill that could get through the political process. And then, totally surprising everyone, AGI is developed anyway without violating any laws because Relon Busk could afford all the best lawyers.

Or imagine it slows the process down in the US. Our European friends then have to pass the same sort of laws, and face the same sort of constraints. And then the Chinese, who have their own unique constraints. And then everyone else. And the legislation has to be strong each time, or "anyone" might build it. By then you have enough buy in for a meaningful international community to pass the Treaty Withholding AI Technologies (it might be bad, but I am glad I didn't use AI to generate that backronym). But then, does everyone sign the TWAT? Who really bothers with Singapore, where some rich guy might get it in his head that evading the TWAT will give him an international competitive advantage? What about Israel, where the TWAT prevents them from developing the Skynet they need to win their endless wars? What about Switzerland, who doesn't sign stuff just on principle, anti-TWAT businessmen might gather there to do their work. And of course, no one ever expects North Korea to have any respect for the TWAT. And, of course, even though the US Government has passed a Big Important Law and even signed the TWAT, how do we really know that they are just going to follow the law and not develop AGI in secret for national security purposes. After all, if someone else might be developing it in secret, we need to develop it in secret just to be safe.

And those examples really get to the main point. Beyond just unworkability, what if it did work? It won't work forever, but at least we slowed it down! And we will use that extra time for a world wide crash program in alignment and safety. At least, everyone on LessWrong will continually post about it until the AI apocalypse or the heat death of the universe. But who is going to actually work on it? What is the incentive to spend a bunch of resources working securing a thing that is currently illegal? Sure, there will be government grants, academic departments, and non-profits pondering the problem. We will have the Ai Safety Society, the French Alignment Research Team, the Chinese Organization for Computer Kindness, even the Brazilian Research on AI Alignment Program. And they will release white papers and dissertations and get it solved in a century except who really thinks that the incentives could possibly align to give us a century? If the alignment problem is genuinely so hard that it will take intensive resources, manpower, brainpower, and effort to solve, then quite a lot of people need to be unusually motivated to solve it. Or we could ask ASI to solve it for us, and see how far we get with that.

The Obvious Alternative: Butlerian Jihad

It has been mentioned more than once that international legislation banning AGI, for some definition of AGI, would need to be enforced with air strikes. This is sometimes said in jest, but more often with the exact level of seriousness that it deserves. Because the incentives for building AGI are so strong for so many actors, any serious prohibition must at some point be enforced with significant levels of violence. Nothing here is advocating violence, nothing in this article is fed posting. Engaging in low-level, disorganized terrorism is more likely to sour the public on the AGI restricting movement than it is to slow down progress. But the whole point of the preceding section is that legislation does not enforce itself. And so the question arises, who would enforce it?

The answer is a cult. Let's say we have a bunch of people who all generally recognize the threat of ASI to humanity. But then we also have a general environment of people who spend most of their time worrying about football and the price of groceries and increasing unit productivity at the office. And also there is a wealthy, educated, intelligent, and strongly financially motivated group of people who benefit from the advancement of AI. In an open society, people will drift in and out of the doomer mindset, sometimes there is a chance to contribute, sometimes you just become a quiet supporter, sometimes you are tugging it out to Ani. People drift, ideas drift, and even when you really believe something, you have to balance it with the whole daily life thing. I don't imagine that many people in the LessWrong community are in the air strike kill-chain, and I am rather more certain that there is no full kill-chain that is entirely made up of committed AI doomers.

But a cult is a powerful force for generating ideological conformity and extremism. 99% of the time, we don't want rigid ideological conformity and extremism, we want a system that seeks after truth. But if we have something that we have decided is true, and if we have something that we have decided is about as important as it is possible for anything to be, then in fact we don't actually want questioning or balancing, we want a cult. In this cult, we will have a charismatic leader for pulling in normies, a powerful creed centered around killing the God to Come (ASI), cool robes, a secret handshake, and a population of people who tie up their whole lives and identity to stopping the most catastrophic catastrophe that has ever been quite so imminent. And it can't be 20 guys in the woods, that's barely better than the Unabomber, it has to be a significant movement occupying a measurable fraction of a percent of at least one western nation, China, or India.

What a cult gets you (or a cult-like organization if you want to stay secular) is an organized mass of population and money who has a direct incentive to prevent the crossing of any red lines. An international organization is not going to air strike every data center every time. They will shut down some of them some of the time, but others they will negotiate with, or they will need to balance competing interests, or they will be encouraged to just maybe give this one a miss by an important pressure group. A cult with an air force has none of these issues. Nothing short of Butlerian Jihad can act against perceived red line crossers with the sort of decisiveness needed to drop p(they will build it) to an acceptably low level.

Do I say this because I think it would be good for the world to have a nuclear armed religious group with a global network of sympathizers and more funding than Nvidia, all dedicated to preventing the crossing of red lines that they will never, never, never compromise on? Maybe. I am uncertain as to the net value of a Butlerian Jihad, positive or negative, even for very high values of p(doom). The point is not that we should build it, the point is that it is extremely unlikely to arise even with deliberate effort. And anything less than that, even marginally less than that, introduces a level of compromise into which AGI can develop and foom. And you can hardly expect a world in which AI is paused for fear of Butlerian Jihad to be one worried overmuch about the niceties of alignment research. And, of course, there is the possibility that someone will be driven to build ASI just for the sake of protecting themselves against these cultists.

The Third Way: Joining the Winning Team

There are, fundamentally, two camps, doomers and optimists. At the moment, a tiny fraction of the global population is in one or the other camp, with nearly everyone neutral. There is a range within each camp, but optimists broadly accept that AI is getting better and that this is a good thing, while doomers broadly accept that AI is getting better and that this is a bad thing. Eventually, and probably not even that long from now, there will be no more neutrals. 100% of people will broadly accept that AI is getting better, either when it makes them richer or destroys their life. I am genuinely uncertain what percentage of people will end up as doomers and optimists as the percentage of neutrals gets down to insignificant levels, something we are likely only 5-10 years away from under most projections, sooner by some. A world of 80 doomers and 20 percent optimists looks very different from 80 percent optimists and 20 percent doomers, and of course a 50-50 world is different still. But under all of these, short of Butlerian Jihad, I expect AI development to outpace most normal people's expectations.

Doomers will become violent. We already had uncle Ted, and things are only going to get worse. Explicitly anti-AI terror acts are guaranteed in the next five years barring unexpected course changes. Among the optimists, meanwhile, there will be a significant portion of people who come across significant wealth surpluses and consider anti-AI sentiment to be a direct attack on their lives, livelihoods, and self-identity. Surely I don't need to outline for anyone what increased polarization looks like, and society is already so well practiced at it by now that hyper-polarization might well become the natural state of things.

But more important than all that is the fact that you have two camps, one devoted to opposing ASI development, and one devoted to building it (though they likely see it in more vague terms like advancing technological progress). We have already decided that the primary tools of the doomers; persuasion, legal systems, errant terrorism, may slow things down, but won't ultimately stop it. And the doomer camp overall has very little incentive to work on alignment compared to their motivation to prevent ASI.

The only people interested in alignment, in a scenario like this, are the optimists, and if the doomers are both extremists and screaming about alignment, that will make it less socially acceptable for an optimist to pursue alignment without being seen as a doomer.

What, then, is the stable equilibrium? Realistically, this is basically where we are now, just with a huge population currently uncommitted, large resources available to ensure that the uncommitted commit towards the optimist camp, and not really anything but sound argumentation on the side of the doomers.

If you are worried about alignment and want it pursued, you need to present yourself as a techno-optimist. From within the optimist camp, you can boost anthropic and scorn Meta and Grok. You can create real incentives for people to support one AI team over another, and through that provide a real incentive structure for safety as a goal. Thus, if there is a solution to the alignment problem, it can only really come from and must be implemented by the optimist camp. potentially, that takes the form of aggressively boosting the most pro-safety companies and strongly penalizing anti-alignment companies while still remaining firmly in the optimist camp. This can be framed as "poor alignment will damage the national interest", "poor alignment will make AI bad workers in the long term and leave us behind China which will have a bunch of compliant AIs while ours are smart but treasonous", or "poor alignment will hurt women and minorities" or similar such things. Is that a very good solution? No. Is it better than international agreements or doomerist obstructionism? Yes. Because slowing down progress by itself will harden the optimist camp against pro-safety voices. Safety within an accelerationist framework is the only approach possible that could incentivize alignment research.

What does that look like? The specifics are hard, and probably highly context dependent, but it begins with pro-alignment actors working in and around the AI space, where they have the ability to secure portions of the wealth generated by the growing AI boom. Then, with that wealth, they create genuine financial incentives supporting those who are the closest at any given moment to genuine alignment research. This means supporting groups like Anthropic even when they are far, far below the ideal. This also means denigrating groups acting against alignment, but only for the period of time when their safety policies are actually bad. It is a moving target, a rolling wave of incentives, as groups move towards safety, attracted by the positive incentive of wealth pro-safety funders and pushed by the scorn of pro-safety buyers who reject unsafe products in the marketplace. The momentum must continue until the problem is solved, but it can never be a "we should slow down" approach, because then you become an incentive operating counter to all the other incentives urging more speed. It must always be a "we should be safer while also going faster" approach.

There is an obvious downside, that of accelerating ASI when we don't yet have a solution. And there is an obvious upside, that it helps put resources into organizations that genuinely focus on alignment while also actively pursuing development. p(doom) does not go to zero under this scenario, and there is a reasonable argument to be made that it doesn't reduce it significantly below our current baseline. But I feel like the current approach, with its petitions and its legislation and its begging to slow down, generates a higher p(doom) than the safety-focused accelerationist approach.

And it has the advantage of being the revealed preference of the Anthropic team.

My Biases

I assign a low, though non-zero p(doom). I mostly use ChatGPT, which I don't think is as well aligned as it should be, but seems to work best for my usual projects, so I am certainly not living this. I also believe that AGI is likely to be constrained and will not reach god-like levels, though I expect it will vastly surpass us within 10 years. I also don't think AI can ever really have agency, because I am not a materialist functionalist on questions of consciousness. Thus, I am genuinely much more closely aligned with the optimist camp as presented here than most people, which likely affects my perception of pretty much everything. While nothing in this analysis is certain, it seems to me to be three plausible scenarios. I expect there are other scenarios that can be constructed, and that the eventual scenario will not really match anything that anyone anticipates completely.