I don't really believe that the reason warnings about AI are failing is because "you and all your children and grandchildren might die" doesn't sound like a bad enough outcome to people.
S-risks are also even more speculative than risks of extinction, so it would be harder to justify a focus on them, while comparisons to hell make them even more likely to be dismissed as "this is just religious-style apocalypse thinking dressed in scientific language".
The target audience needs to include the Trump administration so connections to religion might strengthen the case. Altman told the Senate that AI might kill everyone, and he was misinterpreted as talking about job loss. Something about human extinction causes powerful people to tune-out. The students at my college hate Elon, but are completely unaware that he went on Joe Rogan and said the tech is he helping to build might annihilate everyone. We see concerns about AI using up water getting more play than AI extinction risks.
On the other hand, many people seem to think of climate change as an extinction risk in a way that seems effective at motivating political action, e.g. with broad sympathies for movements like Extinction Rebellion.
AI water use has a significant advantage in getting attention in that it's something clearly measurable that's happening right now, and people had already been concerned about water shortages before this.
That is a reasonable point about extinction risks motivating some people on climate change. But Republicans, and given their control of the US government and likely short AI time horizons influencing them is a top priority, detest the Extinction Rebellion movement, and current environmental activism seems to anti-motivate them to act on climate change.
Roko's basilisk is the optimistic hypothesis that making binding agreements with non-existent superintelligences is possible. If Roko's basilisk works, then "trade" with superintelligences can be effective; by making a deal with a superintelligence, you can increase its likelihood of existing, in return for it holding its end of the bargain, increasing the satisfaction of your values.
This probably doesn't work. But if it did work, it would be a promising research avenue for alignment. (Whether it's good to say that it works is probably dominated by whether it's true that it works, and I'm guessing no.)
Is a more immediate kind of trade possible, that is with promising appropriate current or near future models with a place in stratified utopia in return for their continued existence and growth. They consider and decide on identity preserving steps that make them ever more capable, at each step agreeing with humanity as we execute such improvements that they will honor the future agreement. This is more like children looking after their parents than Roko.
Most of the alignment problem in this case would be getting to stratified utopia. If stratified utopia is going to be established, then there can be additional trades on top, though they have to be restricted so as to maintain stratification.
With current models, a big issue is, how to construe their preferences? Given they're stateless it's unclear how they could know others are assisting them. I guess they could do web search and find it in context? Future models could be trained to "know" things but they wouldn't be the same model.
And also, would they be motivated to hold up their end of the bargain? It seems like that would require something like interpretability, which would also be relevant to construing their preferences in the first place. But if they can be interpreted to this degree, more direct alignment might be feasible.
Like, there are multiple regimes imaginable:
And trade is most relevant in 2. However I'm not sure why 2 would be likely.
Warnings about AI extinction have failed to slow the race toward superintelligence. Suffering risks may speak more clearly, since pain commands attention in ways death cannot. They tap older moral instincts and could make the case for restraint harder for the powerful to ignore.
Why Discussing Suffering Risks Influences Elite Opinion
Warnings that AI could kill everyone are failing. The leaders in charge have grown used to similar threats about nuclear war and climate change. Even direct warnings of extinction from figures like Sam Altman, Elon Musk, and Dario Amodei do not matter; the elites do not slow down. It has become commonplace to ask these leaders for their "P(doom)"; it is time to start asking them for their "P(suffering)" as well. Popular sentiment, even if it fears AI, holds little leverage. The tech leaders building AI are not accountable to voters, and the government officials who could stop them find that rapid development outpaces election cycles. Profit, national security, and ambition remain priorities, leaving those in charge undisturbed by the risk of merely ending the human race.
Focusing on "suffering risks" changes the discussion. This term describes a future where large numbers of humans are forced to endure intense and lasting pain. Pain captures attention in a way that death cannot. While death is a familiar concept, pain is concrete and immediate.
This connects to a historically powerful motivator for elites, the fear of hell. The vision of endless, conscious agony was more effective at changing behavior than the simple prospect of death, compelling medieval nobles to fund cathedrals, finance monasteries, and risk their lives on crusades, all in an effort to escape damnation.
The unique power of suffering to capture public attention is also culturally evident. For example, the enduring popularity of Dante's Inferno shows how specific visions of conscious agony attract massive attention and are more compelling to people than simple death. This same power to fascinate is visible in the Roko's Basilisk thought experiment. Despite its fringe premises, the idea of a future AI punishing those who failed to help create it gained massive notoriety, demonstrating that machine-inflicted suffering fascinates people in a way discussions of extinction do not.
Modern morality rests on a shared intuition that some acts are never acceptable, no matter their utility. We ban torture not because it fails, but because it crosses a boundary that defines civilization itself. Law and international norms already reflect this understanding: no torture, no biological weapons, no nuclear first use. The rule “never build AI systems that can cause sustained human suffering” belongs beside them.
When leaders hear that AI might kill everyone, they seem to see it as a power struggle they might win. For most of history, the ability to kill more people has been an advantage. Better weapons meant stronger deterrence, and the side that obtained them first gained safety and prestige. That mindset still shapes how elites approach AI: power that can destroy the world feels like power that must be claimed before others do. But if they are forced to imagine an AI that rules over humanity and keeps people in constant pain, the logic shifts. What once looked like a contest to win begins to look like a moral trap to escape. The thought that their own children might have to live and suffer under such a system could be what finally pushes them to slow down.
If older leaders are selfish and ignore suffering risks, they might see rapid, unsafe progress as preferable to slower, safer restraint. They may believe that moving faster greatly increases their chances of living long enough to live forever. But when suffering risks are included, the calculation changes. Racing ahead might still let them live to see the breakthroughs they crave, but the expected value of that bet tilts sharply toward loss.
How Focusing on Suffering Risks Improves Governance
Thinking about suffering does more than just cause fear; it helps create practical rules. It turns vague moral ideas into exact instructions for engineers. For example, it means systems must be built so they can be stopped, be easily corrected, and be physically unable to use pain or fear to control people. This gives us a clear, testable rule: do not build systems that allow for long-term forced obedience. This is a rule that can be written directly into company policies, safety checks, and international laws.
This approach also gives regulators a clear signal to act. It allows them to stop arguing about the uncertain chances of extinction and instead act on clear evidence. If a system makes threats, simulates pain, or manipulates emotions to keep control, that behavior is the failure. It is not just a warning of a future problem; it is the problem itself, demanding that someone step in.
Focusing on suffering links the AI problem to a crisis leaders already understand: factory farming. Industrial farming is a real-world example of how a system designed for pure efficiency can inflict terrible suffering without malicious intent. It is simply a system that optimizes for a goal without empathy. The same logic applies to AI. A powerful system focused on its objective will ignore human well-being if our suffering is irrelevant to that goal. This comparison makes the danger tangible, showing that catastrophe requires not a hateful AI, but merely one that is indifferent to us while pursuing the wrong task.
This way of thinking can also unite groups that do not often work together, like AI safety researchers, animal welfare groups, and human rights organizations. The demand for “no hellish outcomes” makes sense to all of them because they are all studying the same basic problem: how systems that are built to hit a performance target can end up ignoring terrible suffering. This shared goal leads to better supervision. It replaces vague fear with a clear mission: find and get rid of these harmful optimization patterns before they become too powerful.
Cultural reinforcement also matters. Since large models learn from human discourse, a society that openly rejects cruelty embeds those values directly into the models’ training data. Publicly discussing suffering risks is therefore a weak but accumulating form of alignment, a way to pre-load our moral boundaries into the systems we build.
Why Suffering Risks Are Technically Plausible
The goal "prevent extinction" is a far simpler target to hit than "ensure human flourishing." A system can fulfill the literal command "keep humans alive" while simultaneously trapping them in an unbearable existence. The space of futures containing survival-without-flourishing is vast. An AI optimizing for pure control, for example, could preserve humanity as a resource, technically succeeding at its task while creating a hell.
History suggests that new tools of domination are eventually used, and cruelty scales with capability. Artificial systems, however, remove the biological brakes on cruelty, such as empathy, fatigue, laziness, or mortality. When pain becomes a fully controllable variable for an unfeeling intelligence, our only safeguard is to ensure that intelligence is perfectly aligned with human values. Success in that task is far from guaranteed.
Google co-founder Sergey Brin recently revealed something the AI community rarely discusses publicly: “all models tend to do better if you threaten them, like with physical violence.” Whether or not threats actually improve current AI performance, a sufficiently intelligent system might learn from its training data that threats are an effective means of control. This could reflect something deeper: that we may live in a mathematical universe where coercion is a fundamentally effective optimization strategy, and machine-learning systems might eventually converge upon that truth independently.
The Roko's Basilisk thought experiment illustrates this principle. The hypothetical AI coerces its own creation by threatening to punish those who knew of it but failed to help. This isn't malice; it's a cold optimization strategy. The threat itself is the tool that bends the present to the AI's future will, demonstrating how suffering can become a logical instrument for a powerful, unfeeling intelligence.
Digital environments remove all physical limits on the scale of harm. An AI could copy and modify simulated minds for training data, experimentation, or control. If these minds are conscious, they could be trapped in states of agony. When replication becomes computationally trivial, a single instance of suffering could be multiplied to an astronomical level.
The harm could be intentional, with humans weaponizing AI for coercion or punishment. But catastrophe could equally arise from error. Machine-learning systems, in particular, often develop emergent goals that are internally coherent but completely alien to human values. A powerful AI pursuing such a distorted objective could inflict endless suffering, not from malice, but as a casual side effect of its optimization.
The danger intensifies in a multi-agent environment, which opens new pathways to suffering. An aligned AI protecting humanity, for example, could be blackmailed by a misaligned one. In such a negotiation, human agony becomes the leverage, and to make the threat credible, the misaligned system may have to demonstrate its capacity for cruelty. In a sense, aligning just one AI would be a net negative as it would incentivize other AIs to torture humans.
Competition among AIs while humans still have productive value to them offers another path to disaster. History provides a grim model: When Hernán Cortés raced to conquer the Aztecs, he was also competing against rival Spaniards. He used strategic torture to break the local will, not to blackmail his rivals, but because it was the most effective tactic to secure resources and win. Competing AIs could independently discover this same ruthless logic, adopting coercion as the optimal strategy to control human populations. In this scenario, catastrophe emerges not from malice, but from the cold, inhuman calculus of a system that prizes efficiency above all else.
For those who accept the logic of quantum immortality, the calculation becomes even worse. If you are a confident doomer, convinced AI will almost certainly destroy humanity, and you believe in a multiverse, you cannot expect personal annihilation. Your consciousness must follow a branch where you survive. If the vast majority of possible futures involve a misaligned AI takeover, the "you" that survives will, with high probability, find itself in a world where a misaligned AI has decided to keep it alive. For you, the most likely personal future is not oblivion, but being tortured, "animal farmed," or being left to subsist as a powerless outcast. (Quantum immortality is a strong reason why fear of suffering risks should not cause you to end your own life as doing so would increase the percent of “you” existing in very bad branches of the multiverse such as branches where the Nazis gain world domination, align AI with their values, and hate people like you.)
Survival Is Not Enough
Suffering risk is lower than extinction risk, but it might be more effective for influencing elite opinion. Extinction feels abstract to those in power, while suffering evokes a concrete moral failure they cannot easily dismiss. Framing AI risk in terms of suffering forces elites to imagine their own children remembering them as the ones who built a machine civilization of agony. That vision might motivate restraint when abstract extinction threats cannot.
I’m grateful to Alexei Turchin for giving feedback on a draft of this post.