(I am writing this primarily so I can reference it in another article that I'm writing to keep that article from getting longer)

Currently, my best guess is that P(AI doom by 2100) ≈ 20%. That is, there’s a 20% chance that strong AIs will be an existential challenge for humanity. But “existential challenge” hides behind academic phrasing that doesn’t drive home what this really means: there’s a 20% chance that AI will either literally eliminate humanity or permanently disempower it from choosing its future as a species, plausibly within our lifetimes.

Why 20%?

I don’t think it’s possible to give an exact calculation for this. I will start by noting that giving a number is more meaningful than just saying “wellll maybe yes, maybe no”, because then you can then start discussing “why 20%? Why not 80%? Why not 1%?”

So here's why:

My main reason is that I can find no great counterargument that knocks down all the challenges of aligning AIs. This seems to be a fundamentally hard problem, and maybe one of the hardest ones we’ll ever face, if it’s at all solvable. I don't currently see any permanent solutions that hold up against all the challenges of aligning AGIs/ASIs.

So by default, I start at 100% chance, and then make adjustments for a few reasons:

  1. -15%: I am fallible, and there’s a chance I am wrong in my specific arguments. Even though notable experts are increasingly speaking out about AI risk, there are still smart people who disagree with me and with them. Almost no argument is bulletproof.
  2. -20%: I don’t know everything about the topic, and there could be some big part that I'm missing. I hope so!
  3. -30%: We might wake up (possibly due to an early, large-scale catastrophe) and 1) put in the right level of investment for the right amount of time for figuring out alignment and 2) slow down capability gain to the point where alignment significantly exceeds capabilities by the time we hit AGI.
  4. -15%: Maybe permanent alignment turns out to be easy in a really unexpected and lucky way. Maybe?

Here's what worries me: A really big chunk of the probability of "humanity survives" comes from some form of "the AI existential risk arguments turn out to be wrong", and currently, I'm not seeing where they could realistically be wrong. If I continuously learn more about alignment but keep not finding a way out of the hard problems, the probabilities I can assign to "I turn out to be wrong" continue to shrink. Under the above basic "model", the core probability of "humanity wakes up and figures it out" is at 30%. That's not very reassuring! But our best hope really lies in that outcome space: we speed up alignment research by 10-100x and simultaneously slow down capability gain 3-5x until alignment comfortably exceeds capabilities.

Could I have picked 10% or 30%? I think both of these numbers are defensible. I increasingly don't think that the lower end, like 5%, is defensible, but if 5% is your foothold into AI safety that lets you say "I'm concerned about the possibility but not ready to say I'm confident in the topic", then by all means, pick 5% as your starting point of AI risk. Even a 5% expectation of wiping out the sum total value of humanity's future is more than enough to spend a lot of time, resources, and effort to avoid this outcome.


New Comment
1 comment, sorted by Click to highlight new comments since:

If you want counterarguments, here's one good place to look: Object-Level AI Risk Skepticism - LessWrong

I expect we might get more today, as it's the deadline for the Open Philanthropy AI Worldview Contest