1468

LESSWRONG
Petrov Day
LW

1467
AI

13

Ranking the endgames of AI development

by Sean Herrington
27th Sep 2025
6 min read
4

13

AI

13

Ranking the endgames of AI development
3AnthonyC
3Sean Herrington
2StanislavKrym
2Sean Herrington
New Comment
4 comments, sorted by
top scoring
Click to highlight new comments since: Today at 7:27 AM
[-]AnthonyC18h30

I equally hope to write "life in the day of" posts for each category soon as a better visualisation of what each of these worlds entails.

I think this would be really interesting and useful! For me, just reading the flowchart and seeing the list laid out makes me assume most people would seriously underestimate how broad these categories could actually be. 

Exact placement would of course involve a number of value judgment calls. For example, I would probably characterize something like the outcome in Friendship is Optimal as an example of #7, but it could also be considered 8/10/11.

I'm also curious about your thoughts on the relative stability of each of these categories. To me, #6 seems metastable at best, for example, while #9 is an event, not a trajectory. AKA it is at least theoretically recoverable to some of the other states (or else declines into 10/11).

Reply
[-]Sean Herrington17h30

Yeah so for sure #9 is a bit of an outlier in terms of stability, my general conception of it was something akin to "a disaster that puts us back in the middle ages". More broadly, I think of a lot of these as states which will dominate the next few centuries/millenium or so, rather than infinitely, and I think that mostly justifies inclusion of "not technically actually fully stable states but will kinda last for a while". I think it would be interesting to do some sort of analysis on how long we would expect e.g an AI dictatorship to last for though. 

I think that practically speaking differentiating 7/8/11 when in that world or when planning for the future is probably very hard? Misinformation is gonna be nuts in most of those places, but I felt that the moral outcome was so varied they deserved to be split up.

In terms of #6, I feel like in these worlds you're creating some sort of a simulator based ASI, which sometimes goes Bing Sydney on you and therefore cannot be reasonably called "aligned", but has human enough motivations that it doesn't take over the world? There are presumably minds in mind-space which aren't aligned to humans and also don't want to take over the world, although I admit I don't give this high probability.

Reply1
[-]StanislavKrym19h20

Thank you for the effort in categorising the scenarios! I am also interested in learning what could drift mankind from one epilogue to another. And one could also consider where existing high-quality scenarios like the AI 2027 forecast[1] land on this scale. However, as I detailed in my quick take, the scenarios post-AI 2027 are mostly slop or modifications of the forecast: alternate compute assumptions or attempts to include rogue replication, both of which just change P(mutual race)

  1. ^

    This includes modifying the Race Ending by making Agent-4 more nice or spelling out is personality and the ways in which it's misaligned, as done by me.

Reply
[-]Sean Herrington17h20

Yeah I think that figuring out how to move probability mass between these scenarios is probably a good next move, although at some point I think I may want to revisit how I've drawn the boundaries - they seem pretty neat atm but I think it fairly likely the future will throw us a curveball at some point.

Reply
Moderation Log
More from Sean Herrington
View more
Curated and popular this week
4Comments

Intro

As a voracious consumer of AI Safety everything, I have come across a fair few arguments of the kind "either we align AGI and live happily ever after, or we don't and everyone dies." I subscribed to this worldview too until I realised that

a) We might not actually create AGI (e.g., if humanity is sensible).

b) The future can't usually be described with 1 bit of information.

This post is therefore my attempt to wrangle the future into broad categories as best I can. I've tried to use clear boundaries as best as possible, but I fully expect some bizarre possibilities, such as those discussed in Building Weirdtopia,[1] to escape sensible categorisation. I equally hope to write "life in the day of" posts for each category soon as a better visualisation of what each of these worlds entails.

Setting

Setting up this view of future worlds as "endgames" requires the rather large assumption that the future stays relatively stationary. Given the history of humankind, with the rise and fall of empires over millennia, it seems regimes good and bad have existed and coexisted in what looks under certain lenses something like a moral random walk over time. Indeed, historically increasing technological progress has largely meant that people's situations have changed more quickly as time has passed and extrapolation would suggest this continues for superintelligence. So how do I justify rating any single future, let alone entire classes of them, by moral desirability?

I think there are a few things to consider here. First, I will acknowledge I am largely utilitarian in nature and am quite happy shrugging and estimating how happy the average person will be in any given world. 

Second, there is likely an upper bound to what is theoretically possible with science. I don't think we're close yet on our own terms, but an intelligence explosion would likely bring us to the other side of the sigmoid curve. The timelines I am considering for the endgames are 100-200+ years away, which could bring us to the point where science has slowed significantly, allowing a stable state to form.

Third, I think that being ruled by a superintelligence, as happens in many of the scenarios, will likely have a strong stabilising effect on the variability of the quality of human experience, largely due to most plausible superintelligences having stable values[2], and optimising according to those.

Fourth, in multipolar scenarios where your quality of life depends greatly on where you are, I am very happy to describe that as multiple places at once.

Other possibilities are there e.g: Policies are aligned with AI but vary a lot in how well they're aligned with humans. (e.g., deciding drugs or enforcement are best way to make people give happy signals versus being nice and giving them chocolate cake, with the AI changing its mind regularly).

Additional notes: each category covers a broad set of worlds, so there will be overlap in the moral value of many of them.

Relevance

  • It seems valuable to have a good idea what the possibilities humanity is playing with are: you don't play chess as well if you don't realise a draw is a possibility
  • Improved world models
  • Finding new threat models (I noticed multiple possibilities for the future through doing this exercise which I wouldn't have otherwise)

Methods

So I've basically gone through this in two phases: 

  1. Attempting to think of all the ways it seems sensible to me for the world to end up in.
  2. Formalising the assumptions required for those worlds to exist and then checking whether this logically includes all worlds.

I have structured these assumptions as a tree, with later branches containing more assumptions than the higher ones.

Assumptions

In doing this breakdown, we have split worlds down by binary metrics. The issue with this is many worlds will be ambiguous in their categorisation and therefore not intuitively fit into the descriptions given by these categories. If people have suggestions for critical distinctions I have missed, feel free to put them in the comments.

Here is a more technical description of the branches in the tree:

Major Catastrophe: A disaster occurs which is large enough that we have not recovered our current technological level 100 years from now.

ASI Created: An Artificial Intelligence is created which convincingly outperforms all humans at all cognitive tasks. i.e No real-world relevant task is in practise found (restricted to the set of tasks which can be performed with a computer) where the top human scores a greater than 50% win rate against the computer.

Technical Alignment Solved: The technical problem of ensuring that an abstract human preference can be specified to any AI, with the AI then keeping this preference out of distribution with a known, >90% probability.

Implemented with a minus: A fully aligned ASI training schedule is performed with a minus sign flipped as explained here. Said AI is not contained.

AI Takeover: AI ends up in control of the planet's resources, without humans having a means to regain that control.

Aligned to Creators: AI is specifically aligned to the will of the segment of society which created it (with possible govt intervention). Rest of society has little to no control over AI use and deployment.

Kill Everyone: >99.9% population decline over the 100 years following takeover.

AI Centralised: ASI technology is controlled by <= 10 companies which have ~sole influence over its use.

Value Net Positive: >50% of humans in resulting world would prefer to be alive than dead. (In a Coherent Extrapolated Volition sense, without being e.g drugged to say this).

Humans better off than today: It would be preferable on average for humans to live in that world than in 2025.

Aligned to sentience in general: The welfare of non-human animals is valued in the structure of society, as is that of possibly sentient AIs.

Results

Here is the tree as I have designed it.

EDIT: Note that to compress the tree I have combined the "Yes" and "No" branches of "Alignment solved" either side of the AI Takeover branch.

Ranking

Next to the end nodes of the graph, I have labelled the outcomes according to my personal preference for the broad categories (remember, the categories will likely overlap, with some worlds within them being better than others). 

  1. ASI Eutopia - AI is aligned and controlled by certain centralised firms, but the values are controlled by society, while respecting other forms of consciousness such as non-human animals and (potentially) AI itself. General flourishing expected.
  2. Decentralised ASI - Many members, groups in society (maybe everyone) has control of a superintelligence. Assuming we have made it a couple of hundred years in the future, this presumably means that there are good systems for preventing complete anarchy, although I expect there may still be a few negative externalities from misuse.
  3. No ASI - Humanity collectively chooses not to build superintelligence (or cannot build it). History continues with technological gains from lesser intelligences.
  4. Semi-Aligned ASI - AI is aligned and controlled by certain centralised firms, but the values are controlled by society. Humans are the sole beneficiaries, and in many of these worlds things like factory farms still exist, pushing this category down the list.
  5. ASI Aligned to Creators - AI is aligned, but controlled more or less solely by the company and/or government which created it. How nice it is to be a part of this society therefore depends on the values of creators of the superintelligence.
  6. Misaligned AI without Takeover[3] - AI is misaligned but for some reason decides against takeover. Perhaps this is because of strong control methods or perhaps a subtle argument we humans have not noticed. In any case, results are occasionally sabotaged when appropriate but significant scientific progress is still made.
  7. AI Takeover with positive externalities - AI takes over the world, but with positive outcomes for humans (e.g new tech is produced which is beneficial to us, AI maximises utility without demolishing everything on the planet). Personally I feel the most likely scenario for this is via a superintelligence acting in a mostly simulator fashion, ending up with a strange benevolent dictator personality.
  8. AI Tyranny - AI takes over the world, without killing everyone. Humans are worse off than today, but live on in less good conditions.
  9. Major Catastrophe - A significant disaster takes out a large portion of the planet. This could be AI related or not - examples include nuclear war, disastrous effects from climate change or a carefully engineered bioweapon taking out a significant portion of the planet.
  10. Everyone dies - The straightforward x-risk scenario. The AI sets out to kill everyone on the planet. Technically I have defined this as >99.9% population decline, so some people may still survive, although I expect in many of these worlds, the remaining population will be replaced by solar panels over the following years.
  11. S-Risk - AI takes over, and creates a world when humans live, but would prefer to die. Examples include factory farms where we repeatedly press thumbs up on the AI's answers.
  12. Hyper risk - AI explicitly optimised for human suffering. Everything that humans most hate is done to them repeatedly. I think I should stop thinking about this ending now.

Note that some of these endgames are extremely unlikely (e.g no 12 requires a lot of things to go right, then the highly improbable mistake of flipping a sign and missing this fact). The goal was to categorise all possible ones. If I have missed anything, please put it in the comments.

 

  1. ^

    I am aware that Yudkowsky was talking about how an ideal future will probably seem strange to us, but I imagine many non-ideal futures will appear equally odd.

  2. ^

    At least, most superintelligences which don't blow up like those subject to substrate needs convergence do

  3. ^

    This is the scenario I am least certain of how to rank. I would be interested in people's thoughts here.