My views on “doom”

I made a visualization of Paul's guesses to better understand how they overlap:

https://docs.google.com/spreadsheets/d/1x0I3rrxRtMFCd50SyraXFizSO-VRB3TrCRxUiWe5RMU/edit#gid=0

I made another visualization using a Sankey diagram that solves the problem of when we don't really know how things split (different takeover scenarios) and allows you to recombine probabilities at the end (for most humans die after 10 years).

[-]Victor Lecomte3y342

More geometric (but less faithful):

[-]interstice3y2922

Interesting that you think the risk of us "going crazy" after getting AI in some way is roughly comparable to overall AI takeover risk. I'd be interested to hear more if you have more detailed thoughts here. On this view it also seems like it could be a great x-risk-reduction opportunity if there are any tractable strategies, given how neglected it is compared to takeover risk.

[-]ryan_greenblatt2y51

Probably considerably harder to influence than AI takeover given that it happens later in the singularity at a point where we have already had access to a huge amount of superhuman AI labor?

[-]Pavel Roubalík3y138

Do you consider S-risk relevant too? If yes then what probability do you assign to it under the current ML paradigma?

[-]Max H3y137

I think there are some conditional and unconditional probabilities that are worth estimating to distinguish between disagreements about:

the technical nature of intelligence and alignment, as technical problems
how the future is likely to play out, given geopolitical considerations and models of how people and organizations with power are likely to act, effectiveness of AI governance efforts, etc.

The conditional probability is risk of extinction / takeover / disempowerment, given that AI governance is non-existent or mostly ineffective. In notation, p(doom | no societal shift / effective intervention).

My estimate (and I think the estimate of many other doom-ier people) is that this probability is very high (95%+) - it is basically overdetermined that things will go pretty badly, absent radical societal change. This estimate is based on an intuition that the technical problem of building TAI seems to be somewhat easier than the technical problem of aligning that intelligence.

The next probability estimate is p(relevant actors / government / society etc. react correctly to avert the default outcome).

This probability feels much harder to estimate and shifts around much more, because it involves modeling human behavior on a global scale, in the face of uncertain future events. Worldwide reaction to COVID was a pretty big negative update; the developing AI race dynamics between big organizations is another negative update; some of the reactions to the FLI letter and the TIME article are small positive updates.

This probability also depends on the nature of the technical problem - for example, if aligning a superintelligence is harder than building one, but not much harder, then the size and precision of the intervention needed for a non-default outcome is probably a lot smaller.

Overall, I'm not optimistic about this probability, but like many, I'm hesitant to put down a firm number. I think that in worlds where things do not go badly though, lots of things look pretty radically different than they do now. And we don't seem to be on track for that to happen.

(You can then get an unconditional p(doom) by multiplying these two probabilities (or the appropriate complements) together. But since my estimate of the first conditional probability is very close to 1, shifts and updates come entirely from the fuzzy / shifty second probability.)

[-]paulfchristiano3y163

I'm somewhat optimistic that AI takeover might not happen (or might be very easy to avoid) even given no policy interventions whatsoever, i.e. that the problem is easily enough addressed that it can be done by firms in the interests of making a good product and/or based on even a modest amount of concern from their employees and leadership. Perhaps I'd give a 50% chance of takeover with no policy effort whatsoever to avoid it, compared to my 22% chance of takeover with realistic efforts to avoid it.

I think it's pretty hard to talk about "no policy effort whatsoever," or to distinguish voluntary measures from government regulation, or so on. So it's not totally clear what the "conditioned on no intervention" number means and I think that's actually a pretty serious ambiguity.

That said I do think my 50% vs your 95% points at a real disagreement---I feel like I have very little idea about how real a problem takeover will be, and have been so far unpersuaded by arguments that takeover is a very strong default. If you are confident that's a real problem that will be hard to fix, it might be reasonable to just double my takeover probabilities to take that into account.

[-]paulfchristiano3y81

Actually I think my view is more like 50% from AI systems built by humans (compared to 15% unconditionally), if there is no effort to avoid takeover.

If you continue assuming "no effort to avoid takeover at all" into the indefinite future then I expect eventual takeover is quite likely, maybe more like 80-90% conditioned on nothing else going wrong, though in all these questions it really matters a lot what exactly you mean by "no effort" and it doesn't seem like a fully coherent counterfactual.

[-]Max H3y10

To clarify, the conditional probability in the parent comment is not conditioned on no policy effort or intervention, it's conditional on whatever policy / governance / voluntary measures are tried being insufficient or ineffective, given whatever the actual risk turns out to be.

If a small team hacking in secret for a few months can bootstrap to superintelligence using a few GPUs, the necessary level of policy and governance intervention is massive. If the technical problem has a somewhat different nature, then less radical interventions are plausibly sufficient.

I personally feel pretty confident that:

Eventually, and maybe pretty soon (within a few years), the nature of the problem will indeed be that it is plausible a small team can bootstrap to superintelligence in secret, without massive resources.
Such an intelligence will be dramatically harder to align than it is to build, and this difficulty will be non-obvious to many would-be builders.

And believe somewhat less confidently that:

The governance and policy interventions necessary to robustly avert doom given these technical assumptions are massive and draconian.
We are not on track to see such interventions put in place.

Given different views on the nature of the technical problem (the first two bullets), you can get a different level of intervention which you think is required for robust safety (the third bullet), and different estimate that such an intervention is put in place successfully (the fourth bullet).

I think it's also useful to think about cases where policy interventions were (in hindsight) obviously not sufficient to prevent doom robustly, but by luck or miracle (or weird anthropics) we make it through anyway. My estimate of this probability is that it's really low - on my model, we need a really big miracle, given actually-insufficient intervention. What "sufficient intervention" looks like, and how likely we are to get it, I find much harder to estimate.

[-]laserfiche3y10

Are you assuming that avoiding doom in this way will require a pivotal act? It seem absent policy intervention and societal change, even if some firms exhibit a proper amount of concern many others will not.

[-]paulfchristiano3y*50

It's unclear whether some people being cautious and some people being incautious leads to an AI takeover.

In this hypothetical, I'm including AI developers selling AI systems to law enforcement and militaries, which are used to enforce the law and win wars against competitors using AI. But I'm assuming we wouldn't pass a bunch of new anti-AI laws (and that AI developers don't become paramilitaries).

[-]sudo3y10

i.e. that the problem is easily enough addressed that it can be done by firms in the interests of making a good product and/or based on even a modest amount of concern from their employees and leadership

I'm curious about how contingent this prediction is on 1, timelines and 2, rate of alignment research progress. On 2, how much of your P(no takeover) comes from expectations about future research output from ARC specifically?

If tomorrow, all alignment researchers stopped working on alignment (and went to become professional tennis players or something) and no new alignment researchers arrived, how much more pessimistic would you become about AI takeover?

[-]paulfchristiano3y116

These predictions are not very related to any alignment research that is currently occurring. I think it's just quite unclear how hard the problem is, e.g. does deceptive alignment occur, do models trained to honestly answer easy questions generalize to hard questions, how much intellectual work are AI systems doing before they can take over, etc.

I know people have spilled a lot of ink over this, but right now I don't have much sympathy for confidence that the risk will be real and hard to fix (just as I don't have much sympathy for confidence that the problem isn't real or will be easy to fix).

[-]Richard_Ngo3y100

Quickly sketching out some of my views - deliberately quite basic because I don't typically try to generate very accurate credences for this sort of question:

When I think about the two tasks "solve alignment" and "take over the world", I feel pretty uncertain which one will happen first. There are a bunch of considerations weighing in each direction. On balance I think the former is easier, so let's say that conditional on one of them happening, 60% that it happens first, and 40% that the latter happens first. (This may end up depending quite sensitively on deployment decisions.)
There are also a bunch of possible misuse-related catastrophes. In general I don't see particularly strong arguments for any of them, but AIs are just going to be very good at generating weapons etc, so I'm going to give them a 10% chance of being existentially bad.
I feel like I have something like 33% Knightian uncertainty when reasoning in this domain - i.e. that there's a 33% chance that what ends up happening doesn't conform very well to any of my categories.
So if we add this up we get numbers something like: P(takeover) = 24%, P(doom) = 31%, P(fine) = 36%, P(???) = 33%

I think that in about half of the takeover scenarios I'm picturing, many humans end up living pretty reasonable lives, and so the probability of everyone dying ends up more like 15-20% than 30%.

I notice that many of these numbers seem pretty similar to Paul's. My guess is mostly that this reflects the uninformativeness of explicit credences in this domain. I usually don't focus much on generating explicit credences because I don't find them very useful.

Compared with Paul, I think the main difference (apart from the Knightian uncertainty bit) is that I'm more skeptical of non-takeover xrisk. I basically think the world is really robust and it's gonna take active effort to really derail it.

[-]paulfchristiano3y2611

I think the substance of my views can be mostly summarized as:

AI takeover is a real thing that could happen, not an exotic or implausible scenario.
By the time we build powerful AI, the world will likely be moving fast enough that a lot of stuff will happen within the next 10 years.
I think that the world is reasonably robust against extinction but not against takeover or other failures (for which there is no outer feedback loop keeping things on the rails).

I don't think my credences add very much except as a way of quantifying that basic stance. I largely made this post to avoid confusion after quoting a few numbers on a podcast and seeing some people misinterpret them.

[-]Richard_Ngo3y51

Yepp, agree with all that.

[-]Wei Dai3y60

Probability that humanity has somehow irreversibly messed up our future within 10 years of building powerful AI: 46%

What's a short phrase that captures this? I've been using "AI-related x-risk" or just "AI x-risk" or "AI risk" but it sounds like you might disagree with using some or all of these phrases for this purpose (since most of this 46% isn't "from AI" in your perspective)?

(BTW it seems that we're not as far part as I thought. My own number for this is 80-90% and I thought yours was closer to 20% than 50%.)

[-]Lukas Finnveden3y20

Maybe x-risk driven by explosive (technological) growth?

Edit: though some people think AI point of no return might happen before the growth explosion.

[-]Martin Vlach2y10

AI-induced problems/risks

[-]Seth Herd1y40Review for 2023 Review

Sharing well-informed, carefully-reasoned scenarios of how things might go right or wrong helps figure out how to steer the future.

[-]Anirandis2y41

What does the distribution of these non-death dystopias look like? There’s an enormous difference between 1984 and maximally efficient torture; for example, do you have a rough guess of what the probability distribution looks like if you condition on an irreversibly messed up but non-death future?

[-]avturchin3y41

Does these numbers take into account that we will do our best in AI risk prevention, or they are expressing probability conditional that no prevention happens?

[-]paulfchristiano3y71

These are all-things-considered estimates, including the fact that we will do our best to prevent AI risk.

[-]rvnnt3y30

[...] bad futures without extinction, e.g. that AI systems take over but don’t kill everyone.

What probability would you assign to humans remaining but not being able to kill themseleves; i.e., to unescapable dystopias (vs. dystopias whose badness for any individual are bounded by death-by-suicide)?

[-]Seth Herd3y21

Interesting. Your numbers imply a pretty good chance of everyone not dying soon after an AI takeover. I'd imagine that's either from a slow transition period in which humans are still useful, or from partially aligned AI. Partially successful alignment isn't discussed much. It's generally been assumed that we'd get alignment right or we won't.

But it seems much more possible to get partial alignment with systems based on deep networks with complex representations. These might be something like an AI that won't kill humans but will let us die out, or more subtle or arbitrary mixes of aligned and unaligned behavior.

That's not particularly helpful, but it does point to a potentially important and relatively unaddressed question: how precise (and stable) does alignment need to be to get good results?

If anyone could point me to work on partial alignment, or the precision necessary for alignment, I'd appreciate it.

[-]paulfchristiano3y65

The probability of human survival is primarily driven by AI systems caring a small amount about humans (whether due to ECL, commonsense morality, complicated and messy values, acausal trade, or whatever---I find all of those plausible).

I haven't thought deeply about this question, because a world where AI systems don't care very much about humans seems pretty bad for humans in expectation. I do think it matters whether the probability we all literally die is 10% or 50% or 90%, but it doesn't matter very much to my personal prioritization.

[-]WilliamKiely4mo*1-1

In Vitalik Buterin's recent appearance on Liron Shapira's Doom Debates podcast, Vitalik stated that his p(doom) is currently about 12% and his p(extinction by 2050) is at least 9%. This means Vitalik's p(extinction by 2050 | doom) > 75%, which I think is way too high.

My reading of Paul's views stated here is that Paul's p(extinction by 2050 | doom) is probably < 30%.

I referenced Paul's views in my comment on the Doom Debates YouTube video, which I'm copying here in case someone thinks my reading of Paul's views is mistaken, so they can correct me:

Vitalik's p(extinction by 2050 | doom) > 75%.
Vitalik's numbers concretely: 9+% / 12% > 75%.
I think this >75% is significantly too high.
In other words, I think Vitalik seems to be overconfident in near-term extinction conditional on AI causing doom eventually.
I'm not the only person with high p(doom) (note: my p(doom) is ~60%) who thinks that Vitalik is overconfident about this.
For example, Paul Christiano, based on his numbers from his 2023 "My views on “doom”" post on LessWrong, thinks p(extinction by 2050 | doom) < 43%, and probably < 30%. Paul's numbers:
1. "Probability that most humans die within 10 years of building powerful AI (powerful enough to make human labor obsolete): 20%"
2. "Probability that humanity has somehow irreversibly messed up our future within 10 years of building powerful AI: 46%".
So Paul's p(extinction by 2050 | doom) < 20%/46% = 43%.
Paul's p(extinction by 2050 | doom) is probably actually significantly lower than 43%, perhaps < 30% because "most humans die" does not necessarily mean extinction.
Note: Paul also assigns substantial probability mass to scenarios where less than half of humans die -- He thinks (as of his 2023 post) at least half of humans die in only 50% of takeover scenarios and in only 37% of non-takeover scenarios in which humanity has irreversibly messed up it's future within 10 years of building powerful AI (doom). So I'd guess that his p(all humans die | most humans die within 10 years of building powerful AI) is < 70%, hence why I said his p(extinction by 2050 | doom) is probably < 30% (math: 70% of 43% is 30%).
Another important factor that potentially brings Paul's conditional even lower than 30%: We might not get powerful AI before 2040. So for example, if we get powerful AI in 2045, then even if it causes extinction say 9 years later in 2054, then extinction has not occurred by 2050, even though doom and near-term extinction occur.
So Vitalik's p(extinction by 2050 | doom) > 75% is strongly in disagreement with e.g. Paul Christiano, even though Paul's p(doom) is much higher than Vitalik's.
I suspect that upon consideration of these points Vitalik would raise his unconditional p(doom) while keeping his p(extinction by 2050) roughly the same, but I'd like to actually hear a more detailed discussion with him on this.
My views, approximately:
P(extinction by 2050) = 10%
P(doom) = 60%
P(extinction by 2050 | doom) = 10%/60% = 17%, i.e. way less than 75%.

[-]WilliamKiely1y1-1

Something I noticed:

"Probability that most humans die because of an AI takeover: 11%" should actually read as "Probability that most humans die [within 10 years of building powerful AI] because of an AI takeover: 11%" since it is defined as a sub-set of the 20% of scenarios in which "most humans die within 10 years of building powerful AI".

This means that there is a scenario with unspecified probability taking up some of the remaining 11% of the 22% of AI takeover scenarios that corresponds to the "Probability that most humans die because of an AI takeover more than 10 years after building powerful AI".

In other words, Paul's P(most humans die because of an AI takeover | AI takeover) is not 11%/22%=50%, as a quick reading of his post or a quick look at my visualization seems to imply, but is actually undefined, and is actually >11%/22% = >50%.

For example, perhaps Paul thinks that there is a 3% chance that there is an AI takeover that causes most humans to die more than 10 years after powerful AI is developed. In this case, Paul's P(most humans die because of an AI takeover | AI takeover) would be equal to (11%+3%)/22%=64%.

I don't know if Paul himself noticed this. But worth flagging this when revising these estimates later, or meta-updating on them.

[-]Review Bot2y*10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

[-]Christopher King3y11

Here's a question:

If you were a super forecaster and reviewed all the public information and thought really hard, how much do you expect the probabilities to move? Of course it doesn't make sense to predict which direction, but could you estimate the magnitude of the change? Vaguely like the "VIX" on your beliefs. Or another way to phrase is how many bits of information do you think are missing but are in theory available right now.

This question is basically asking how much of this is your own imprecision vs what you think nobody knows.

[-]WilliamKiely3y10

Probability that most humans die because of an AI takeover: 11%

This 11% is for "within 10 years" as well, right?

[-]WilliamKiely3y10

Probability that the AI we build doesn’t take over, but that it builds even smarter AI and there is a takeover some day further down the line: 7%

Does "further down the line" here mean "further down the line, but still within 10 years of building powerful AI"? Or do you mean it unqualified?

[-]paulfchristiano3y40

I think almost all the cumulative takeover probability is within 10 years of building powerful AI. Didn't draw the distinction here, but my views aren't precise enough to distinguish.

[-]Aryeh Englander3y11

When you say that you'd give different probability estimates on different days, do you think you can represent that as you sampling on different days from a probability distribution over your "true" latent credence? If yes, do you think it would be useful to try to estimate what that distribution looks like, and then report the mean or perhaps the 90% CI or something like that? So for example, if your estimate typically ranges between 33% and 66% depending on the day with a mean of say 50%, then instead of reporting what you think today (the equivalent of taking a single random sample from the distribution), maybe you could report 50% because that's your mean and/or report that your estimate typically ranges from 33% to 66%.

[-]sudo3y10

Epistemic Status: First read. Moderately endorsed.

I appreciate this post and I think it's generally good for this sort of clarification to be made.

One distinction is between dying (“extinction risk”) and having a bad future (“existential risk”). I think there’s a good chance of bad futures without extinction, e.g. that AI systems take over but don’t kill everyone.

This still seems ambiguous to me. Does "dying" here mean literally everyone? Does it mean "all animals," all mammals," "all humans," or just "most humans? If it's all humans dying, do all humans have to be killed by the AI? Or is it permissible that (for example) the AI leaves N people alive, and N is low enough that human extinction follows at the end of these people's natural lifespan?

I think I understand your sentence to mean "literally zero humans exist X years after the deployment of the AI as a direct causal effect of the AI's deployment."

It's possible that this specific distinction is just not a big deal, but I thought it's worth noting.

[-]paulfchristiano3y41

I think these questions are all still ambiguous, just a little bit less ambiguous.

I gave a probability for "most" humans killed, and I intended P(>50% of humans killed). This is fairly close to my estimate for E[fraction of humans killed].

I think if humans die it is very likely that many non-human animals die as well. I don't have a strong view about the insects and really haven't thought about it.

In the final bullet I implicitly assumed that the probability of most humans dying for non-takeover reasons shortly after building AI was very similar to the probability of human extinction; I was being imprecise, I think that's kind of close to true but am not sure exactly what my view is.

[+]Petr 'Margot' Andreev3y-21-11

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

250

My views on “doom”

250

250

Two distinctions

Other caveats

My best guesses