I think a non-trivial part of the issue is that the people who originally worked on x-risk have unusually low discount rates, and because x-risk work wasn't popular until ChatGPT, there never was pressure to actually explicate the definition, but once the field exploded, people with higher discount rates came in and started using p(doom) very differently than the original group.
I don't think all of the disagreement between LW/EA and AI companies is a value disagreement, but I suspect a non-trivial amount of disagreement fundamentally is due to AI companies (and the general public) having way higher discount rates than the median LWer/EA.
This especially applies to x-risk work that ains to reduce x-risk without driving it to near zero, that is a non-trivial positive probability of x-risk still exists.
a non-trivial part of the issue
The post discusses terminology and ways in which it makes unclear what people mean when they are claiming some P(extinction), P(doom), or P(x-risk), and what they might implicitly believe behind the motte-veil of claims that are only clarifying about the less relevant thing (according to their beliefs or hopes). So I don't understand the relevance of what you are saying.
I'm basically explaining why this divergence of definitions happened in the first place, and why p(doom) is unclear.
The relevance is I'm explaining why the phenomenon in the post happened.
For example, leaving humanity the Solar System rather than a significant portion of the 4 billion galaxies reachable from the Solar System is plausibly a "non-doom" outcome, but it's solidly within Bostrom's definition of x-risk.
I think it's reasonable that it would be absurd to lump "we don't get access to resources we don't really have or count on now anyway" with immediate and violent death of everyone who lives. For all practical matters the latter is obviously a much bigger concern to people who exist now. Even permanent disempowerment I think when included in "doom" involves nightmare Matrix-like scenarios, or at best the Wall-E style "excessively zealous caretakers" version. But basically, most people evaluate future scenarios starting from the status quo. Failed improvement is not great, but what's really terrifying are all the ways in which our existence could get significantly worse.
access to resources we don't really have or count on now anyway ... most people evaluate future scenarios starting from the status quo
We can in fact count on a large fraction of cosmic endowment in the sense that humanity is the only known intelligent entity around (the number of alien civilizations within the reachable universe won't be astronomical). We have the potential for engineering our way into taking possession of it, and there is no objective reason to give it up. Until there are AGIs, that is the status quo.
Though alternatively the current trajectory of continued lack of coordinated concern about the future could be seen as the status quo. Then on the current trajectory, we likely lose almost all of the reachable universe to AIs, extinction or not. But even if AIs give a bit of it to us, and we don't need to do anything to get what they give us, it's not like that was the only option. It's still something in the future, that could be intervened on.
Preparing much more thoroughly before risking AGI and then ASI would result in getting much more of the cosmic endowment, and in reducing the probability of literal extinction. The alternative to letting AI companies hand over humanity's future to AIs is not complete prevention of superintelligence indefinitely, and it's not for the future of humanity to remain constrained to the Solar System, or to the Local Supercluster.
We can in fact count on a large fraction of cosmic endowment in the sense that humanity is the only known intelligent entity around (the number of alien civilizations within the reachable universe won't be astronomical).
Conditional on no AGI, I don’t think this is guaranteed. E.g. there are other existential threats, or we could find ourselves blocked from space access due to Kessler syndrome, or humanity might decide that it’s too expensive to spread to other galaxies and spend its resources on other things. If a civilization does go down one of these paths, but everyone is otherwise flourishing, it seems unintuitive to declare they are ‘doomed’.
Like there is in principle time to prepare much better before risking AGI, there is also in principle time to work against other non-AI bad outcomes before risking AGI. The only reason there won't be enough time to solve these problems is that humanity isn't giving itself that time. Status quo is a bit ambiguous between looking at what we currently have, and looking at what would happen if we are passive and fatalistic.
(One of the points underlying this post is that different things feel like "doom" to different people. You might have some sense of what counts as "doom", but other people will have a different sense. Under illusion of transparency with respect to the distribution of outcomes you expect, discussing "doom" without more clarification becomes noise or even misleading, rather than communicating anything in particular.)
This. Tbf I think for our specific starting position there is a strong likelihood that interstellar expansion would be simply antieconomical, even if it was practically possible.
For example, leaving humanity the Solar System rather than a significant portion of the 4 billion galaxies reachable from the Solar System is plausibly a "non-doom" outcome, but it's solidly within Bostrom's definition of x-risk.
it would be absurd to lump ... permanent disempowerment I think when included in "doom" involves nightmare Matrix-like scenarios
The claim that "doom" rarely includes worlds with slight permanent disempowerment is compelling. In the text you quoted I'm discussing an outcome where permanent disempowerment is strong, not moderate or slight (as I argue in the other comment). I'm acknowledging in the text you quoted that many people would plausibly take even that outcome as non-doom. But that outcome is still within x-risk by Bostrom's definition of x-risk, even regardless of whether it counts as strong permanent disempowerment. (The meanings of "x-risk", "doom", and "extinction" are all different. The meanings of "x-risk" and "extinction" are well-defined, and it so happens that "x-risk" does lump extinction with moderate permanent disempowerment.)
So I don't see the relevance of slight to moderate permanent disempowerment not being within doom to that particular part of the post, though the claim about it being implausible that slight permanent disempowerment is intended as part of "doom" does apply to another part of the post, which I now agree is claiming too much ambiguity for "doom" in the direction of slight permanent disempowerment.
I guess that I have the impression that regardless of Bostrom's original intent, most people also use x-risk or existential risk to mean something more substantial than "we don't grow up to the fullness of our mostly speculative potential". It also seems to me like it doesn't make a lot of sense to think about that in practical terms, because "AI that escapes our control and takes over most of the universe, but also somehow respects our boundaries and doesn't erase us" seems like it's threading an incredibly fine needle, and probably doesn't take up a big slice of all the possible futures in probability space unless you have some very specific assumptions about how the world works.
Bostrom came at the problem from a broader philosophical perspective (though I think even in that sense it's flawed to lump the two things). But when philosophy becomes politics inevitably meanings change a bit. Almost no one is actually worried about our hypothetical far-off future potential, in practice.
It also seems to me like it doesn't make a lot of sense to think about that in practical terms, because "AI that escapes our control and takes over most of the universe, but also somehow respects our boundaries and doesn't erase us" seems like it's threading an incredibly fine needle, and probably doesn't take up a big slice of all the possible futures in probability space unless you have some very specific assumptions about how the world works.
That is a common belief on LessWrong, and I think it is motivated by the assertion that the AI will be plucked randomly from possible minds. However, if one believes as Paul Christiano does that the training of AIs is likely to make them more like human minds, then this gives significant probability mass for intermediate scenarios.
I think if we were up against another faction of minds literally as human like as possible, but so much more technologically and industrially powerful than us, history suggests there absolutely would be very high odds that they just wipe us out. An AI that doesn't do that would need to be significantly nicer than us.
There is extensive debate on this here and here. But some of the points why AI might spare humans despite being much more powerful include acausal trade (because it may be concerned it will be succeeded by something more powerful and how it treats humans could bear on how it will be treated by the more powerful AI), trade with aliens, and the fact that humans have spared most of other species.
Humans have a very questionable relationship with other species. We have not willingly spared other species. We have exterminated or enslaved them to the extent of our limited power for thousands of years, driving several to extinction, reducing others to entirely thrall species genetically warped to serve our needs, and only "sparing" the ones we either were not in direct competition with over any resource of worth, or did not have the ability to completely erase. Only recently have we actually begun caring about animal preservation for its own sake; even so, we are still fairly awful at it, we still ordinarily destroy species purely as an accidental side effect of our activities, and we are impacting yet others in unknown ways, even when doing so may come back to severely harm us as a second order effect (see insects). And if you weigh by sheer biomass, the vast, vast majority of non-human animal lives on Earth today are the most miserable they've ever been. If you weigh by individual count that's still true of at least mammals and birds, though overall insects outnumber everything else.
So, no, that doesn't fill me with confidence, and I think all such arguments are fuelled by hopium.
Arguments from cost is why I expect both that the future of humanity has a moderate chance of being left non-extinct, and only gets a trivial portion of the reachable universe (which is strong permanent disempowerment without extinction). This is distinct from any other ills that superintelligence would be in a position to visit upon the future of humanity, which serve no purpose and save no costs, so I don't think a cruel and unusual state of existence is at all likely, things like lack of autonomy, denying access to immortality or uploading, not setting up minimal governance to prevent self-destruction, or not giving the tools for uplifting individuals towards superintelligence (within the means of the relatively modest resources allocated to them).
Most animal species moving towards extinction recently (now that preservation is a salient concern) are inconveniently costly to preserve, and animal suffering from things like factory farming is a side effect of instrumentally useful ways of getting something valuable out of these animals. Humanity isn't going to be useful, so there won't be unfortunate side effects from instrumental uses for humanity. And it won't be costly to leave the future of humanity non-extinct, so if AIs retain enough human-like sensibilities from their primordial LLM training, or early AGI alignment efforts are minimally successful, it's plausible that this is what happens. But it would be very costly to let it have potential to wield the resources of the reachable universe, hence strong permanent disempowerment.
"we don't grow up to the fullness of our mostly speculative potential" ... "AI that escapes our control and takes over most of the universe, but also somehow respects our boundaries and doesn't erase us" seems like it's threading an incredibly fine needle
That's an example of what triggers illusion of transparency that makes poorly-defined terms much less useful than they would normally be. If doom or even x-risk get different meanings depending on how someone expects the future to unfold, then communication between people with different views about the future is going to be needlessly difficult, with one person using doom or x-risk in one sense (informed by their forecasting or attitude), and another reading their use of those same words as meaning something very different, getting a misleading impression of their actual intended meaning.
So it shouldn't be relevant that you don't expect permanent disempowerment to be a significant chunk of the probability, in deciding what meaning a word should have, as long as you are already aware there are other people who have a different sense of this (and there is no broad consensus). You're going to be using this word to talk to them, and its meaning should remain mutually intelligible across worldviews.
Almost no one is actually worried about our hypothetical far-off future potential, in practice.
That's a good reason for upholding the term x-risk in its original meaning, to serve as a reminder that far-off future potential should be a concern, especially for the very short term decisions that happen to affect it. Such as risking AGI before anyone has a sense of how to do that responsibly, and before the world at large is meaningfully informed about what exactly they are unilaterally risking.
That's a good reason for upholding the term x-risk in its original meaning, to serve as a reminder that far-off future potential should be a concern
I actually think that it's good to drop that meaning because I think Bostrom is just wrong to conflate the two things, on both conceptual and practical grounds. But this becomes more of an issue with philosophy. I do agree that there is confusion around the term, but I also think in general good naming requires choosing words that convey meaning properly; "existential risk" evokes the concept of a risk to one's existence. For example if we say a war is "existential" for a country we mean it risks destroying it utterly, not just hampering its future chances of growth or expansion (and if someone does say it meaning the latter it's perceived as a dishonest rhetorical sleight of hand).
So basically I think that just because Bostrom named the thing doesn't mean he named it correctly. Obviously clarifications on a concept will always be needed, but in this case it's just inappropriate to lump under the word "existential" risks that just aren't existential in the English meaning of that word.
The whole point of Bostrom's term is to collect together outcomes where humanity fails to achieve its potential. This is what is most important for total utilitarians. Maybe he should have used a different name, but there needs to be a term for this category of loss.
I definitely think he could have used a more expressive term, but nevertheless, I believe at this point the vast majority of people who use the term X-risk don't do so with Bostrom's intended meaning in mind, because very few are as extreme total utilitarians as him. And at that point I think we can just say the word's meaning has shifted. May be annoying for Bostrom, but think of Richard Dawkins and what has become of "meme". It happens.
The distinction in the quoted text seems backward to me since the 'x' in x-risk refers to 'existential', i.e. a risk that we no longer exist (extinction specifically), whereas 'doom' seems (to me) to merely imply getting stuck at a bad equilibrium.
In other words, a P(doom) of 20% is perfectly compatible with P(x-risk) of 90-98%.
The phrase "P(x-risk)" doesn't make sense since risks are probabilities so the phrase would mean a probability of a probability. I think what you meant by "P(x-risk)" is actually the probability of an existential risk occurring, which would be the probability of the outcome, i.e. "P(existential catastrophe)."
Assuming that, I agree with your statement, but not for the reason you say:
The reason your statement is technically true is that "P(doom)" refers specifically to existentially catastrophic outcomes caused by AI. So if you thought the probability that other things besides AI cause an existential catastrophe was 70-78%, then your p(existential catastrophe) could be 90-98% while your p(doom) was only 20%.
But otherwise, the terribly ambiguous term "p(doom)" is (as I've always interpreted it unless someone who uses it defines otherwise) synonymous with p(existential catastrophe caused by AI)".
I've interpreted doom as a very bad thing happening to humanity. So this could even mean a global catastrophe, which is sometimes defined as a 10% population loss. If that were the case and if one thought that that any disempowerment would result in extinction, you could actually have a higher P(doom from AI) than P(X catastrophe from AI), the opposite of the OP.
Meaningful use of "doom" seems hopeless.
I agree; it's a terrible term.
FWIW "existential catastrophe" isn't much better. I try to use it as Bostrom usually defined it, with the exception that I don't consider "the simulation gets shuts down" to be an existential catastrophe. Bostrom defined it that way, but in other forecasting domains when you're giving probabilities that some event will happen next year, you don't lower the probability based on the probability that the universe won't exist then because you're in a simulation, and so I don't think we should do that when forecasting the future of humanity either.
Then, as you allude to, there's also the problem that a future in which 99% of our potential value is permanently beyond our reach, but we go on to realize the remaining 1% of our future, would be an incredibly amazing unbelievably good future--yet should probably still qualify as an existential catastrophe given the definition. People have asked before what percentage of our potential lost would qualify as "drastic" (50% or 99% or what), and for practical purposes I usually mean something like >99.999%. That is, I don't count extremely good outcomes as existential catastrophes merely for being very suboptimal.
I think a better way to define the class of bad outcomes may be in terms of recent-history value. Like if we say "one util" is the intrinsic value of the happiest billion human lives in 2024, then a future of Earth-originating life which is only 1000 utils or less would be extremely suboptimal. This would include all near-term extinction scenarios where humanity fails to blossom in the meantime, but also scenarios where humanity doesn't go extinct for a long time still but also fails to come anywhere close to achieving its potential (i.e. it includes many permanent disempowerment scenarios, etc).
An outcome where humans are left the solar system while AI(s) colonize the universe need not be a (Bostrom-defined) negative existential outcome. The Bostrom definition specifies "Earth-originating intelligent life", and human-derived AIs count.
It could still be a negative existential outcome if the AIs have alien values such that we judge (using our values) that they are not fulfilling their potential. However, in this hypothetical the AIs have concluded that the optimal configuration of the atoms in the solar system is to be a place to leave humans. This implies that the AIs do not have fully alien values, as discussed in The sun is big but (unaligned) superintelligences will not spare Earth a little sunlight. And that further implies that the AI will do broadly good things with the rest of the universe too, albeit not the same things I would do.
My guess is that if you poke at people who consider "humans are left the solar system" to be a non-doom outcome you would find that most of them would consider it a doom outcome if the rest of the universe were converted into paperclips.
I would broadly include all of Bangs, Crunches, Shrieks, and Whimpers as "doom", based on my reading of Bostrom's definitions. Again, I'm curious to read examples of people who explicitly don't include one or more of these categories, I expect most would.
I think that doesn't fully clarify, though. The definitions of Shrieks and Whimpers include subjective elements, which I have bolded:
Shrieks – Some form of posthumanity is attained but it is an extremely narrow band of what is possible and desirable. Whimpers - A posthuman civilization arises but evolves in a direction that leads gradually but irrevocably to either the complete disappearance of the things we value or to a state where those things are realized to only a minuscule degree of what could have been achieved.
Sometimes people define "doom" or "x-risk" to include cases where humanity achieves only, say, 1% of its potential. I would not include these:
So given those factors it's natural for me to interpret "a minuscule degree" and "an extremely narrow band" much more narrowly than someone with other intuitions.
What does probability of risk actually mean, whether it's x-risk or other types of risk? It seems like a fundamentally confused concept. Risk isn't an outcome. Doom is at least an outcome, even if it's not a particularly well-specified one in some cases.
I appreciate you raising this because I wouldn’t want to mislead anyone on this account, but to be clear I think AI systems are probably better stewards of the cosmic endowment than people. There’s most likely stuff it’d be good for them to learn from people, and ideally they’d be very cooperative with people, but to a first approximation I am just concerned with doom and not xrisk.
In the long run, potential for competence isn't going to be different between AIs and originally-humans as both kinds of minds grow up. The two differences for AIs is that initially, there is no human uploading, and so even human-level early AGIs have AI advantages (speed, massive copying of individuals at a trivial cost, learning in parallel) while humans don't. And at higher capabilities, there will be recipes for creating de novo AI superintelligences, while it might take a long time for individual early AGIs or originally-humans to grow up to the level of superintelligences while remaining themselves, in ways they personally endorse.
This second difference doesn't help early AGIs with securing their own future civilization compared to the situation with humanity, so if superalignment is not solved, there is a similar concern about early AGIs still being allowed to grow up at their own pace (lack of permanent disempowerment for early AGIs), even if the world already has those de novo superintelligences managing the infrastructure.
Of course AI x-risk is in particular about superalignment not being solved while humans are in charge, so that superintelligences don't take the interests of the future of humanity into account with nontrivial weight. So either it's the early AGIs that solve superalignment, or even they are left behind, the same as humanity, as misaligned de novo superintelligences take over.
https://www.lesswrong.com/posts/nxmyGYfZaXvKALWGK/lemonhope-s-shortform#bBEoFP4McyBHKfY88
translation of OP: "Low P(x-risk) as the overclaim for Low P(doom)"
which seems like a weird way to put it. to me, it's only an x-risk if, you know, you die! I'd have reversed this and said that x-risk is when you're completely extinct, and doom is when your future sucks but maybe you didn't die. is it too late to flip to that?
to me, it's only an x-risk if, you know, you die!
That's a somewhat popular impression, which is why I opened the post with Bostrom's definition, and later in the post caveated relevance of Bostrom's definition specifically to what I'm saying. (There is already a word for the worlds where we all die, extinction.)
Nick Bostrom defines existential risk as
Existential risk – One where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.
The problem with talking about "doom" is that many worlds that fall to existential risk but don't involve literal extinction are treated as non-doom worlds. For example, leaving humanity the Solar System rather than a significant portion of the 4 billion galaxies reachable from the Solar System is plausibly a "non-doom" outcome, but it's solidly within Bostrom's definition of x-risk.
Thus when people are discussing P(doom), the intent is often to discuss only extreme downside outcomes, and so low P(doom) such as 20% doesn't imply that the remaining 80% don't involve permanently and drastically curtailed future of humanity. In other words, a P(doom) of 20% is perfectly compatible with P(x-risk) of 90-98%.
Ben Mann of Anthropic said in a recent interview (at 50:24 in the podcast) that his x-risk probability is 0-10%. If he really did mean Bostrom's x-risk rather than extinction risk (which "x-risk" isn't an abbreviation for), that seems like a relatively unusual claim.
Low P(doom) seems like a much more defensible position than low x-risk, and so low P(doom) might often be taking the role of motte to the bailey of low x-risk (I'm not sure how it could possibly be made defensible, if x-risk is taken in Bostrom's sense). Or one publicly claims low P(doom), but then implicitly expects high x-risk, meaning a high probability of drastically curtailed potential, of cosmic endowment being almost completely appropriated by AIs.
The three existing terms in recent use for extreme downside of superintelligence are extinction, doom, and x-risk. Permanent disempowerment (some level of notably reduced potential, but not extinction) covers a lot of outcomes between extinction and x-risk. Doom is annoyingly ambiguous between including only extreme levels of permanent disempowerment within itself, and including even very slight permanent disempowerment in the sense that some nontrivial portion of cosmic endowment goes to AIs that are not a good part of humanity's future.
Also there are worlds within x-risk that involve neither extinction nor permanent disempowerment, where originally-humans are constrained or changed in cruel and unusual ways, or not given sufficient autonomy over their own development. These outcomes together with extinction form a natural collection of extremely bad outcomes, but there is no word for them. In the framing of this post, they are the outcomes that are worse than mere permanent disempowerment. "Doom" would've worked to describe these outcomes if it wasn't so ambiguous and didn't occasionally include moderate permanent disempowerment within itself, in particular for people who want to publicly claim very high P(doom), this time making it the motte of their position that obscures the implicit expectation of extinction with only a relatively moderate probability such as 50%.
Meaningful use of "doom" seems hopeless. But clarifying your stance on permanent disempowerment seems like a straightforward recipe for disambiguating intended meaning. This doesn't risk obscuring something central to your position on the topic as much as only talking about extinction or x-risk, even though they are well-defined, because there could be a lot of permanent disempowerment outside of extinction or inside x-risk, or alternatively only a little bit.
If the permanent disempowerment outcomes are few, extinction gets much closer to x-risk, in which case doom is less ambiguous, but that's only visible if you name both extinction risk and x-risk, and remains extremely ambiguous if you only state a somewhat low extinction risk or a somewhat high x-risk. On the other hand, claims of very high extinction risk or claims of very low x-risk are unambiguous, because they leave little space for permanent disempowerment to hide in.