@So8res[1] responded on twitter, copying his response here for completeness:
I appreciate your emphasis the bottom line that (IIUC) we agree upon: rogue superintelligence would permanently ruin the future and (at best) relegate humanity to something like zoo animals.
As I understand your arguments, they boil down to (a) maybe the AI will care a little and (b) maybe the AI will make backups of humans and sell them to aliens.
My basic response to (a) is: weak prefs about humans (if any) would probably get distorted, and existing ppl probably wouldn't be the optimum (and if they were, the result probably wouldn't be pretty). cf
My basic response to (b) is: I'm skeptical of your apparent high probabilities (>50%?) on this outcome, which looks somewhat specific and narrow to me. I also expect most implementations to route through a step most people would call "death".
In case (b), the traditional concept of "death" gets dodgy. (What if copies of you are sold multiple times? What if you only start running in an alien zoo billions of years later? What if the aliens distort your mind before restoring you?) I consider this topic a tangent.
Mostly I consider these cases to be exotic enough and death-flavored enough that I don't think "maybe AIs will sell backups of us to aliens" merits a caveat when communicating the basic danger. But I'm happy to acknowledge the caveat, and that some people think it's likely.
If Nate cross posts himself, I'll delete this comment.
I think there's a fallacy in going from "slight caring" to "slight percentage of resources allocated."
Suppose that preserving earth now costs one galaxy that could be colonized later. Even if that one galaxy is merely one billionth of the total reachable number, it's still an entire galaxy ("our galaxy itself contains a hundred billion stars..."), and its usefulness in absolute terms is very large.
So there's a hidden step where you assume that the AIs that take over have diminishing returns (strong enough to overcome the hundred-billion-suns thing) on all their other desires for what they're going to do with the galaxies they'll reach, allowing a slight preference for saving Earth to seem worthwhile. Or maybe they have some strong intrinsic drive for variety, like how if my favorite fruit is peaches I still don't buy peaches every single time I go to the supermarket.
If I had no such special pressures for variety, and simply valued peaches at $5 and apples at $0.05, I would not buy apples 1% of the time, or dedicate 1% of my pantry to apples. I would just eat peaches. Or even if I had some modest pressure for variety but apples were only my tenth favorite fruit, I might be satisfied just eating my five favorite fruits and would never buy apples.
You describe this as "there are some reasons why small amounts of motivation don't suffice (given above) which are around 20% likely", but I think that's backwards. Small amounts of motivation by default don't suffice, but there's some extra machinery AIs could have that would make them matter.
Or to try a different analogy: Suppose a transformer model is playing a game where it gets to replace galaxies with things, and one of the things it can replace a galaxy with is "a preserved Earth." If actions are primitive, then there's some chance of preserving Earth, which we might call "the exponential of how much it cares about Earth, divided by the partition function" making a Boltzmann-rational modeling assumption. But if a model with similar caring it has to execute multi-step plans to replace each galaxy, then the probability of preserving Earth goes down dramatically, because it will have chances to change its mind and do the thing it cares for more (using Boltzmann-rationality assumption the other way). So in this toy example, a slight "caring" in the sense of what the model will say it would pick when quickly asked isn't represented when you look at the distribution of results of many-step plans.
If small motivations do matter, I think you can't discount "weird" preferences to do other things with Earth than preserve it. "Optimize Earth according to proxy X, which will kill all humans but really grow the economy / save the ecosystem / fill it with intelligent life / cure cancer / make it beautiful / maximize law-abiding / create lots of rewarding work for a personal assistant / really preserve Earth". Such motivations sound like they'd be small unless fairly directly optimized for, but the AI is supposed to be acting on small motivations, why not those bad ones, rather than the one we want?
I think there's a fallacy in going from "slight caring" to "slight percentage of resources allocated."
Suppose that preserving earth now costs one galaxy that could be colonized later. Even if that one galaxy is merely one billionth of the total reachable number, it's still an entire galaxy ("our galaxy itself contains a hundred billion stars..."), and its usefulness in absolute terms is very large.
So there's a hidden step where you assume that the AIs that take over have diminishing returns
No, I'm not assuming diminishing returns or a preference for variety, when I talk about sufficient caring to result in a "extremely small amount of motivation" I just mean "caring enough to allocate ~1 / billionth to 1 / trillionth of our resources on it". This is a pretty natural notion of slight caring IMO. I agree 1 / billionth of resources will be absolutely very large to an AI and there is a type of caring which gets divided out by vast cosmic scale.
IMO, it's pretty natural for slight caring to manifest in this way and the thing you describe with peaches and apples is more like "slight preferences". I'm not saying "the AI will slightly prefer humans to survive over humans not surviving (but not more than what it could do with a marginal galaxy)", I mean that it will actually pay small costs at the margin of the actual decisions it has to make. I agree that AIs might prefer to preserve earth, but have this small preference swamped by the vastness of the galatic resources they would have to give up in exchange. I just wouldn't call this an amount of caring which suffices for paying 1 / billionth of resources.
I agree that the way this manifests will depend on the motivational structure of the AI, and motivational structures which purely care about the the ultimate arrangement of matter without regard to anything else will end up not caring at all about keeping humans alive (after all, why privilege humans, they just happened to be using the matter/energy to start with). (In the same way that these motivations would kill aliens to tile that matter with something that seems more optimal even though keeping these aliens alive might be insanely cheap as a fraction of resources.) Humans demonstrate motivations which don't just care about the ultimate arrangement of matter all the time, e.g. even relatively utilitarian people probably wouldn't kill everyone on earth to get an additional galaxy or kill aliens to get their stuff when it would be this cheap to leave them alone.
I think the relevant type of "kindness" is pretty natural, though obviously not guaranteed.
If small motivations do matter, I think you can't discount "weird" preferences to do other things with Earth than preserve it.
I don't discount this, see discussion in the section "How likely is it that AIs will actively have motivations to kill (most/many) humans". Note that it's pretty cheap to keep humans alive while also doing something that destroys earth.
Another way to put this: some types of caring are the types of caring where you're willing to allocate 1 / billionth of your resources on it even if you learn that you are extremely rich (so these resource could buy a lot of what you want in absolute terms). But you can also have preferences for what to spend your money on which depend strongly on what options are available; you'd be willing to trade some absolute amount of value for the thing and if it ends up being that your wealthier and can buy more in absolute terms, you'd be willing to spend a smaller fraction of money on the thing.
I'm predicting a type of caring/kindness which is the first rather than a type of preference which cares about questions like "exactly how big is the cosmic endowment".
It's unclear what fraction of people die due to takeover because this is expedient for the AI, but it seems like it could be the majority of people and could also be almost no one. If AIs are less powerful, this is more likely (because AIs would have a harder time securing a very high chance of takeover without killing more humans).
Yeah, this (extinction to facilitate takeover) seems like the most plausible pathway to total or near-total extinction by far. An AI that is only a little bit smarter than humanity collectively has to worry about humans making a counter-move - launching missiles or building a competing AI or various kinds of sabotage. If you're a rogue AI, engineering a killer virus (something that smart humans can already do or almost do, if they wanted to) as soon as you or humanity has built out sufficient robotics infrastructure, makes all the subsequent parts of your takeover / expansion plan much less contingent and more straightforward to reason about. (And I think the analogy to historical relatively-bloodless coups here is a pretty weak counter / faint hope - for one, because human coup instigators generally still need humans to rule over, whereas AIs wouldn't.)
If there are a large number of different rogue AIs, it becomes more likely that one of them would benefit from massive fatalities (e.g. due to a pandemic) making this substantially more likely.
I don't see how the number of AIs makes a big difference here, rather than the absolute power level of the leading AI? An extinction or near-extinction event seems beneficial to just about any unaligned AI that is not all-powerful enough to not have to worry about humanity at all.
Put another way, the scenarios where extinction doesn't happen due to takeover only feel plausible in scenarios where a single AI fooms so fast and so hard that it can leave humanity alive without really sweating it. But if I understand the landscape of the discourse / disagreement here, these fast and discontinuous takeoff scenarios are exactly the ones that you and some others find the least plausible.
human coup instigators generally still need humans to rule over, whereas AIs wouldn't
Sometimes? In countries where most wealth is mineral wealth, they actually need very few citizens. The more basic point is that human coups aren't particularly helped by indiscriminate killing while AI takeover might be. But I still think coups are surprisingly bloodless and often keeping coups bloodless is the optimal strategy for the person doing the coup. I think this transfers to AI.
I don't see how the number of AIs makes a big difference here, rather than the absolute power level of the leading AI?
My argument here is basically just that even if a given rogue AI wouldn't benefit from mass fatalities (which maybe you think is implausible, fair enough), if there are many rogue AIs in different situations, then mass fatalities become more likely.
If you think most humans dying as part of the takeover is over determined, then fair enough.
An extinction or near-extinction event seems beneficial to just about any unaligned AI that is not all-powerful enough to not have to worry about humanity at all.
I think this is pretty unclear because of the possiblity of responses from human actors and the potential for slowly building up control over time. E.g., in the AI 2027 scenario, killing (large fractions of) humans at the end didn't matter almost at all for the AI's takeover prospects and at each earlier point, killing tons of humans wouldn't be helpful as this would just cause variance and the potential for stronger responses while the AI can just build up power over time.
Edit: there is also a live question of "marginal returns to death"; e.g. does it increase your odds of takeover to kill 90% instead of 30%?
I don't feel like "slightly caring" imagery coherently describes any plausible feature of caring of misaligned AIs.
I think that if Earth-originated AI gets to choose between state of the world without humans and exactly the same world but with humans added, supermajority of AIs is going to choose the latter, because of central role humans are likely to take in training data. But "humans are more valuable than no humans ceteris paribus" doesn't say anything about whether configuration of matter that contains humans is literally the best possible configuration. Take "there are galaxies of computronium and solar system with humans" vs "there is one solar system of computronium more, no humans".
To put things on scale: on napkin estimate, plausible lower bound on achievable density of computations is ops/sec*kg. If we assume that for minimally comfortable live humans will need 100 times their weight and one human has weight 80kg, we will get that one second of existence of single human is worth ~ operations, which is approximately 3 billion years of computations on all currently existing computers. I think it sends idea of non-uploaded humans existing after misaligned takeover straight out of window.
Suppose humans retained control and came across some aliens who didn't want to be replaced with computronium and wanted to live on their planet. Do you think that humans would do something to these aliens as bad (from their perspective) as killing them?
I think this type of caring (where you respect the actual preferences of existing intelligent agents) seems reasonably plausible to me.
I think it's plausible that an AI that has this kind of caring could exist, but actually getting this AI instead of one that doesn't care at all seems very unlikely.
IMO "respect the actual preferences of existing intelligent agents" is a very narrow target in mind-space. I.e. if we had any reason to believe the AI has a decent chance of being this kind of mind, the alignment problem would be 90% solved. The hard part is going from "AI that kills everyone" to "AI that doesn't kill everyone". Once you're there, getting to"AI that benefits humanity, or at least leaves for another star system" is comparatively trivial.
My claim is that you're reasonably likely to get a small preference for something like this by default even in the absence of serious effort to ensure alignment beyond what is commercially expedient. For instance, note that humans put some weight on "respect the actual preferences of existing intelligent agents" despite this being a "narrow target in mind-space".
Insofar as I buy this argument, I would apply it to my own CEV, hence concluding that on reflection I would most prefer a universe without biological humans. Then I notice I'm probably not special and probably a lot of other people's CEV generates this conclusion too. This seems kind of heretical but I'm not convinced it's wrong.
Other entities that care trade with the AI (or with entities that the AI trades with) to keep humans alive. This includes acausal trade, ECL, simulations/anthropic capture (to be able to effectively acausally trade with decision theory naive AIs), and causal trade with aliens that the AI ends up encountering. It's unclear how simulations/anthropic capture works out. It seems plausible that this happens in a way which doesn't result in all relevant (acausal) trades happening. It seems plausible that other entities (including humans in other Everett branches) pretty universally don't want to bail humans (in this branch) out because they have better things to spend their resources on, but even a tiny fraction of entities spending some non-trivial fraction of resources could suffice. This depends on there being some beings with power who care about things like human survival despite alignment difficulties, but this feels very likely to me (even if alignment is very hard, there may well be more competent aliens or AIs who care a small amount about this sort of thing). Note that this requires the AI to care about at least some of these mechanisms which isn't obvious.
How much of your plausibility-of-caring-slightly routes through this, vs the "actually just slightly intrinisically nice?" thing.
In the previous discussion, you said:
I'm happy to consider non-consensual uploading to be death and I'm certainly happy to consider "the humans are modified in some way they would find horrifying (at least on reflection)" to be death. I think "the humans are alive in the normal sense of alive" is totally plausible and I expect some humans to be alive in the normal sense of alive in the majority of worlds where AIs takeover.
Making uploads is barely cheaper than literally keeping physical humans alive after AIs have fully solidified their power I think, maybe 0-3 OOMs more expensive or something, so I don't think non-consensual uploads are that much of the action. (I do think rounding humans up into shelters is relevant.)
I'm surprised at the "only 3 OOMs more expensive." I haven't done a calculation but that seems really implausible. Maybe I am not imagining how big an OOM is properly.
In particular, comparing "keep humans alive indefinitely, multigenerationally" vs "store them on ice without evening running them, turn them back on when/if you trade them."
The sort of entity that might pay extra for "keep them actually alive instead of on-ice"... probably also cares about them being alive and getting some kind of standard of living or something. (fyi it's not even obvious most people would prefer being alive in minimally-satisfying-shelters vs stored, or run in simulation at a higher standard of living)
Insofar as this option is loadbearing for "you put lowish odds on them killing everyone", I think a crux is not finding it very plausible that the AI would choose "keep fully alive in a way that is clearly better than death" vs "store digitally" or "upload." It's just a very narrow target for the game theory to work out such that this exact thing is what turns out to matter.
I hadn't seen this footnote yet when writing the above:
Consensual uploads, or uploads people are fine with on reflection, don't count as death. ↩︎
There's a range of stuff like "the AI manuevered us into a situation where we either die, or live in world with X standard of living, or get uploaded and get 1000x (or whatever) standard of living.
Most people on reflection choose uploading. I think it's reasonable to disagree with whether this counts as "death", but, this seems like pretty ambiguous consent at best to me, and while I think most fully informed people wouldn't count is as "death" per se, most people with their current set of beliefs/values would count it more like death than not-death, or, not be very impressed by arguments that it doesn't count as death.
(Curious if there is any kind of mechanical turk poll we could run that would change either of our minds about this? Not sure that it matters that much)
I think this sort of thing is also why I don't think the "AI may be very slightly nice" is likely to result in something that's clearly "not death". It seems really unlikely to me that very slightly nice things would go the "preserve parochial humans exactly as is", it's just such a narrow target to hit even within niceness.
It's kinda messy how to think about non-consensual uploads that the person is totally fine with after some reflection (especially if the reason the AI did the upload is because it know the person would be fine with this on reflection). I also don't think considering uploads without informed consent to be death makes a big difference to my numbers.
I think this sort of thing is also why I don't think the "AI may be very slightly nice" is likely to result in something that's clearly "not death". It seems really unlikely to me that very slightly nice things would go the "preserve parochial humans exactly as is", it's just such a narrow target to hit even within niceness.
I don't really agree? It seems like "don't upload people without their consent when they consider this to be death unless there are no better options" is pretty natural and the increase in resources for keeping people physically alive is pretty small.
(FYI I think I am sold by your other reply that the main question is "how much is it slowed down initially, and how much does it value the resources that it loses by doing so?". I agree leaving us with one star isn't that big a deal, modulo making sure we can't somehow mess up the rest of it's plans, which doesn't seem that hard)
How much of your plausibility-of-caring-slightly routes through this, vs the "actually just slightly intrinisically nice?" thing.
Maybe 60% trade, 40% intrinsic? Idk though.
I'm surprised at the "only 3 OOMs more expensive." I haven't done a calculation but that seems really implausible. Maybe I am not imagining how big an OOM is properly.
In particular, comparing "keep humans alive indefinitely, multigenerationally" vs "store them on ice without evening running them, turn them back on when/if you trade them."
The dominant cost of "keeping humans alive physically" from a linear-returns+patient industrial expansion perspective is a small delay from rounding humans up into shelters (that you have to build and supply etc) or avoiding boiling the oceans (and other catastrophic environmental damage). The dominant cost of uploads is the small delay from rounding up and scanning people before they die. They both seem small, but the delays seem comparable.
Giving humans an entire star long term (e.g. after a few decades) is negligible in cost (like, of all galactic resources) relative to this delay (edit: for patient AIs which don't prefer resources closer to earth), so I think keeping humans physically alive longer term is totally fine and just has higher upfront costs.
There's also the cost of not killing humans as part of a takeover attempt and not using them for some other purpose (reasons (1) and (3)), but these are equal between uploads and physically keeping humans alive. I wasn't intending to include this as I said "after AIs have fully solidified their power", but if I did, then this makes the gap much smaller depending on how important reason (1) is.
Mmm, nod. I maybe see it.
RE: "how long are you delaying?"
It seems like a major early choice is "is the AI basically using the Earth to bootstrap the intergalactic probe process" or "is the AI only using the amount of Earth resources that leaves it mostly functional from human's perspective, and then mostly using the rest of the planets."
You note "seems like it'd only slow down the AI a bit, to first round up humans". I haven't done a calculation about it, but, seemed potentially like a very big deal to decide between "full steam ahead on beginning the dyson sphere using Earth Parts" vs "start the process from the moon and then mercury", since there are harder-to-circumvent delays there.
You note "seems like it'd only slow down the AI a bit, to first round up humans". I haven't done a calculation about it, but, seemed potentially like a very big deal to decide between "full steam ahead on beginning the dyson sphere using Earth Parts" vs "start the process from the moon and then mercury", since there are harder-to-circumvent delays there.
Can't the AI proceed full steam ahead until it has enough industrial capacity to build shelters, then build shelters and put humans in the shelters (while pausing as needed at this point to avoid fatalities from environmental damage while humans are being rounded up), and then finish the industrial expansion (possibly upgrading shelters along the way as needed as the AI gets more resources)? Seems like naively this only delays you for as long as is needed to round up humans and up them in shelter which seems probably <1 year and probably <1 month. (At least if takeoff is pretty fast.)
Separately, not destroying the earth (and instead doing more of the growth in space) seems like it should cost <3 years of delay and probably <1 year of delay which is still pretty small as an absolute fraction of resources (for patient AIs). Like we're talking 1 / billion or something.
I agree it's small as a fraction of resources, but it still seems very expensive in terms of total resources since that's a lotta galaxies falling outside the lightcone.
We have a ~25% chance of extinction
Maybe add the implied 'conditional on AI takeover' to the conclusion so people skimming don't come away with the wrong bottom line? I had to go back through the post to check whether this was conditional or not.
I mean, everyone is going to die, absent singularity-like advances in medical/upload capabilities. There are two different doom scenarios:
1 implies 2, but not the reverse. I give pretty high chances of #2, but much lower of #1, assuming my "median case" of ASI, being fairly slow and not-magically-super.
Note, I already had a pretty high probability of #1 from non-AI sources. The current revolution in modeling has increased it, but not by much.
Interesting and good breakdown.
I place much higher odds on the "death due to takeover" for a pretty specific reason. We seem to have an excellent takeover mechanism in place which kills all or most of us: nukes. We have a gun pointed at our collective heads, and it's deadlier to humans than AGIs.
Igniting a nuclear exchange and having just enough working robots, power sources, and industrial resources (factories, etc) to rebuild seems like a pretty viable route to fast takeover.
Igniting that exchange could be done via software intrusion or perhaps more easily by spoofing human communications to launch. This is basically the only use of deepfakes that really concerns me, but it concerns me a lot.
This becomes increasingly concerning to me in a multipolar scenario, which in turn seems all too likely at this point. Then every misaligned AGI is incentivized to take over as quickly as possible. A risky plan becomes more appealing if you have to worry that another AGI with different goals may launch their coup at any point.
This logic also applies if we solve alignment and have intent-aligned AGIs in a multipolar scenario with different human masters. I guess it also favors everyone who can, getting themselves or a backup to a safe location.
This is also conditional on progress in robotics vs. timelines; even with short timelines it seems like robotics will probably be far enough along for clumsy robots to build better robots. But here my knowledge of robotics, EMP effects, etc, fails. It does seem like a nontrivial chance that triggering a nuclear exchange is the easiest/fastest route to takeover.
One very well-informed individual told me nobody knows if any humans would survive a nuclear winter; but that's probably less important than whether the "winning" AGI/human wanted them to survive.
The number of likely deaths given takeover seems higher than your estimate if that logic about nukes as a route to takeover mostly goes through.
But the question of extinction still hinges largely on whether AGI has any interest in humanity surviving. I think you're assuming it will have a distribution of interests like humans do; I don't think that's a safe assumption at all, even given LLM-based AGI and noting that LLMs do really seem to have a distribution of motivations, including kindness toward humans.
I think how they reason about their goals once they're capable of doing so is very hard to predict, so I'd give it more like a 50% chance they wind up being effectively maximizers for whatever their top motivation happens to be. There seems to be some instrumental pressure toward reflective stability, and that might favor one motivation winning outright vs. sharing power within that mind. I wrote a little about that in Section 5 and more in the remainder of LLM AGI may reason about its goals and discover misalignments by default, but it's pretty incomplete.
Like the rest of alignment, it's uncertain. LLMs have some kindness, but that doesn't mean that LLM-based AGI will retain it. If we could be confident they would, we'd be a lot better set to solve alignment.
I place much higher odds on the "death due to takeover" for a pretty specific reason. We seem to have an excellent takeover mechanism in place which kills all or most of us: nukes. We have a gun pointed at our collective heads, and it's deadlier to humans than AGIs.
I ended up feeling moderately persuaded by an argument that human coups typically kill a much smaller fraction of people in that country. I think some of the relevant reference classes don't look that deadly and then there are some specific arguments for thinking AIs will kill lots of people as part of takeover, and I overall come to a not-that-high bottom line.
The number of likely deaths given takeover seems higher than your estimate if that logic about nukes as a route to takeover mostly goes through.
My understanding is that a large scale nuclear exchange would probably kill well less than half of people (I'd guess <25%). The initial exchange would probably directly kill <1 billion and then I don't see a strong argument for nuclear winter killing far more people. This winter would be occuring during takeoff which has an unclear effect on the number of deaths.
But the question of extinction still hinges largely on whether AGI has any interest in humanity surviving. I think you're assuming it will have a distribution of interests like humans do
I don't think I'm particularly assuming this. I guess I think there is a roughly 50% chance that AI will be slightly kind in the sense needed to want to keep humans alive. Then, the rest is driven by trade. (Edit: I think trade is more important than slightly kind, but both are >50% likely.)
GPT5T roughly agrees with your estimate of nuclear winter deaths, so I'm probably way off. Looks like my information was well out of date; that conversation was from 2017 or so. And yes, deaths from winter-induced famine might be substantially mitigated if someone with AGI-developed tech bothered.
Thanks for the clairifications on trade vs kindness. I am less optimistic about trade, causal or acausal, than you, but it's a factor.
One difficulty I have with these posts is I never have any idea how seriously to take any of the percentages or even the overall conclusion. My overall update is to doubt there’s a reasonable threshold of saying “we don’t know enough to put a number on this”.
Yeah, my own instinct is to just see if the results are interesting in such a way that if I believed them, it would meaningfully change what I thought was the best strategy. In this case, don't think so. Even what I see as a very optimistic set of assumptions still results in what I see as an unacceptably high risk of very bad outcomes. I do find the exploration itself interesting, though.
Suppose misaligned AIs take over. What fraction of people will die? I'll discuss my thoughts on this question and my basic framework for thinking about it. These are some pretty low-effort notes, the topic is very speculative, and I don't get into all the specifics, so be warned.
I don't think moderate disagreements here are very action-guiding or cruxy on typical worldviews: it probably shouldn't alter your actions much if you end up thinking 25% of people die in expectation from misaligned AI takeover rather than 90% or end up thinking that misaligned AI takeover causing literal human extinction is 10% likely rather than 90% likely (or vice versa). (And the possibility that we're in a simulation poses a huge complication that I won't elaborate on here.) Note that even if misaligned AI takeover doesn't cause human extinction, it would still result in humans being disempowered and would likely result in the future being much less good (e.g. much worse things happen with the cosmic endowment). But regardless I thought it might be useful for me to quickly write something up on the topic.[1]
(By "takeover", I mean that misaligned AIs (as in, AIs which end up egregiously going against the intentions of their developer and which aren't controlled by some other human group) end up seizing effectively all power[2] in a way which doesn't involve that power being acquired through payment/trade with humans where the deals are largely upheld on the side of the AI. I'm not including outcomes where an appreciable fraction of the power is held by digital post-humans, e.g., emulated minds.)
While I don't think this is very action-guiding or cruxy, I do find the view that "if misaligned AIs take over, it's overwhelmingly likely everyone dies" implausible. I think there are quite strong counterarguments to this and I find the responses I've heard to these counterarguments not very compelling.
My guess is that, conditional on AI takeover, around 50% of currently living people die[3] in expectation and literal human extinction[4] is around 25% likely.[5]
The basic situation is: if (misaligned) AIs aren't motivated to keep humans alive (as in, they don't care and there aren't external incentives) and these AIs had effectively all power, then all the humans would die (probably as a byproduct of industrial expansion, but there are other potential causes). But even if AIs are misaligned enough to take over, they might still care a bit about keeping humans alive due to either intrinsically caring at least a bit or because other entities with power would compensate the AI for keeping humans alive (either aliens the AI encounters in this universe or acausal compensation). However, there are some other reasons why AIs might kill people: takeover strategies which involve killing large numbers of people might be more effective and the AI actively wants to kill people or do something to people that is effectively killing them.
Ultimately, we have to get into the details of these considerations and make some guesses about how they compare.
There is a variety of prior discussion on this topic, see here, here, and here for some discussion that seems reasonable and relevant to me. This linked content is roughly as relevant as the notes here, and I'm not sure these notes add that much value over these links.
Now I'll get into the details. Expanding on the basic situation, humans might die due to takeover for three main reasons:
Reason (2), industrial expansion (combined with the AI maintaining full power), would cause extinction of humans in the absence of the AI being motivated to keep humans alive for whatever reason. However, the costs of keeping physical humans alive are extremely low, probably it only slows the industrial expansion by a small amount (and this slowdown is probably the dominant effect from a resource perspective), probably by less than 1 month and very likely less than 1 year. Delaying this long costs a pretty tiny fraction of long-run cosmic resources, probably well less than one billionth of the resources that will ultimately be accessible to the AIs in aggregate.
Thus, extremely small amounts of motivation to keep humans alive could suffice for avoiding large fractions of humans dying due to industrial expansion (depending on the exact preferences of the AI). But, extremely small amounts of motivation to keep humans alive are unlikely to suffice for substantially reducing fatalities due to (1) and (3), though large amounts of motivation (e.g., it's one of the AI's top desires/priorities/aims) could suffice for greatly reducing expected fatalities from (1) or (3).
Note that an extremely small amount of motivation wouldn't necessarily stop the AI from (e.g.) boiling the oceans and destroying the biosphere while keeping humans alive in a shelter (or potentially scanning their brains and uploading them, especially if they would consent or would consent on reflection). Preserving earth (as in, not causing catastrophic environmental damage due to industrial expansion) is more expensive than keeping physical humans alive which is more expensive than only keeping humans alive as uploads. Preserving earth still seems like it would probably cost less than one billionth of resources by default. Keeping humans alive as just uploads might cost much less, e.g. more like one trillionth of resources (and less than this is plausible depending on the details of industrial expansion).
Extremely small amounts of motivation to keep humans alive wouldn't suffice for avoiding large fractions of humans dying due to industrial expansion if:
Why might we end up with small amounts of motivation to keep humans alive?
Note that these reasons could also result in large amounts of motivation to keep humans alive.
(I won't discuss these reasons in detail in my notes due to running out of time.)
Another factor is that AIs might not want to do a (fast) industrial expansion for whatever reason (e.g., they don't care about this, they intrinsically dislike change). However, if there are multiple AI systems with some power and at least some subset want to do a fast industrial expansion this would happen in the absence of coordination or other AIs actively fighting the AIs that do an industrial expansion. Even if AIs don't want to do a fast industrial expansion, humans might be killed as a side effect of their other activities, so I don't think this makes a huge difference to the bottom line.
My guess is that small amounts of motivation are substantially more likely than not (perhaps 85% likely?), but there are some reasons why small amounts of motivation don't suffice (given above) which are around 20% likely. This means that we overall end up with around a 70% chance that small motivations are there and suffice.
However, there is some chance (perhaps 30%) of large amounts of motivation to keep humans alive (>1/1000) and this would probably overwhelm the reasons why small amounts of motivation don't suffice. Moderate amounts of motivation could also suffice in some cases. I think this cuts the reasons why small amounts of motivation don't suffice by a bit, perhaps down to 15% rather than 20% which makes a pretty small difference to the bottom line.
Then, the possibility that AIs don't want to do a fast industrial expansion increases the chance of humans remaining alive by a little bit more.
Thus, I overall think a high fraction of people dying due to industrial expansion is maybe around 25% likely and literal extinction due to industrial expansion is 15% likely. (Note that the 15% of worlds with extinction are included in the 25% of worlds where a high fraction of people die.) Perhaps around 30% of people die in expectation (15% from extinction, another 7% from non-extinction worlds where a high fraction die, and maybe another 7% or so from other worlds where only a smaller fraction die).
This could arise from:
Overall, active motivations to kill humans seem pretty unlikely, but not impossible. I think the "proxies from training" story is made less likely because AIs on reflection probably would endorse something less specific than caring in this way about the state of (most/many) currently alive humans. Note that the proxies from training story could result in various types of somewhat net negative universes from the longtermist perspective (though this is probably much less bad than optimized suffering / maximal s-risk).
I think this contributes a small fraction of fatalities. Perhaps this contributes an 8% chance of many fatalities and a 4% chance of extinction.
It's unclear what fraction of people die due to takeover because this is expedient for the AI, but it seems like it could be the majority of people and could also be almost no one. If AIs are less powerful, a higher fraction of people dying is more likely (because AIs would have a harder time securing a very high chance of takeover without killing more humans).
Note that the AI's odds of retaining power after executing some takeover that doesn't result in full power might be slightly or significantly increased by exterminating most people (because the AI doesn't have total dominance and is (at least) slightly threatened by humans) and this could result in another source of fatalities. (I'm including this in the "death due to takeover" category.)
Failed takeover attempts along the path to takeover could also kill many people. If there are a large number of different rogue AIs, it becomes more likely that one of them would benefit from massive fatalities (e.g. due to a pandemic) making this substantially more likely. Interventions which don't stop AI takeover in the long run could reduce these fatalities.
It's plausible that killing fewer people is actively useful in some way to the AI (e.g., not killing people helps retain human allies for some transitional period).
Conditional on not seeing extinction due to industrial expansion, I'd guess this kills around 25% of people in expectation with a 5% chance of extinction.
We have a ~25% chance of extinction conditional on AI takeover. In the 75% chance of non-extinction, around 35% of people die due to all the factors given above. So, we have an additional ~25% of fatalities for a total of around 50% expected fatalities conditional on AI takeover.
I originally wrote this post to articulate why I thought the chance of literal extinction and number of expected fatalities were much higher than someone else thought, but it's also pretty relevant to ongoing discussion of the book "If anyone builds it, everyone dies". ↩︎
As in, all power of earth-originating civilizations. ↩︎
Within "death", I'm including outcomes like "some sort of modification happens to a person without their informed consent that they would consider similarly bad (or worse than death) or which effectively changes them into being an extremely different person, e.g., they are modified to have wildly different preferences such that they do very different things than they would otherwise do". This is to include (unlikely) outcomes like "the AI rewires everyone's brain to be highly approving of it" or similarly strange things that might happen if the AI(s) have strong preferences over the state of existing humans. Death also includes non-consensual uploads (digitizing and emulating someone's brain) insofar as the person wouldn't be fine with this on reflection (including if they aren't fine with it because they strongly dislike what happens to them after being uploaded). Consensual uploads, or uploads people are fine with on reflection, don't count as death. ↩︎
Concretely, literally every human in this universe is dead (under the definition of dead included in the prior footnote, so consensual uploads don't count). And, this happens within 300 years of AI takeover and is caused by AI takeover. I'll put aside outcomes where the AI later ends up simulating causally separate humans or otherwise instantiating humans (or human-like beings) which aren't really downstream of currently living humans. I won't consider it extinction if humans decide (in an informed way) to cease while they could have persisted or decide to modify themselves into very inhuman beings (again with informed consent etc.). ↩︎
As in, what would happen if we were in base reality or if we were in a simulation, what would happen within this simulation if it were continued in a faithful way. ↩︎
Outcomes for currently living humans which aren't death but which are similarly bad or worse than death are also possible. ↩︎