Pausing AI is Positive Expected Value

Liron

The PauseAI (⏸️) movement often gets this pushback:

“You're not factoring in all the benefits of good AI!”
“Stopping AI progress is also a doom scenario!”

To which I reply: If you agree P(doom) from building superintelligent AI before knowing how to align or control it is 5%+, try doing the basic expected-value calculation; you'll see why your objection is misguided.

First, we need to estimate a few key probabilities and values. These can vary by many orders of magnitude. I'll pick values that AI optimists hopefully agree are fair:

Probability that AI goes right if capabilities scale to superintelligence by 2034

This is an immediate "fast takeoff" scenario where state-of-the-art AI remains near-inscrutable, yet within a decade it becomes vastly more intelligent than humans on every dimension. I'd personally give this scenario a much lower probability than 50% of going right for humanity, but I'm trying to be generous to AI optimists.

Probability that AI goes right if we delay superintelligence to 2100

$70 %$

An important premise of PauseAI is that if we can give ourselves a few extra years or decades to thoroughly research the fundamental principles of how to align AI — how to robustly specify preferences, how to capture the delicate structure of human values as self-consistent preferences, etc — then we can significantly increase the probability that superintelligent AI goes well.

If you agree that more time for safety research helps safety catch up to capabilities, you can take whatever probability you gave to superintelligent AI going right in 2034 and add 20% (or more) to the probability that it goes right in 2100.

Value of baseline future, where AI never gets beyond human intelligence

Let's define this as our baseline $$ 0$ scenario, because it's how normies who've never even heard of superintelligent AI currently imagine the future. We'll define the value of other scenarios in relation to the value of this scenario.

If we never let ourselves get superintelligent AI (or it turns out to be too hard to build), there'll probably still be at least a trillion future human lives worth living.

Value of future where AI goes wrong

$- $ 10^{18}$

If superintelligent AI goes wrong, it could very plausibly wipe out the entire future potential value of Earth-originating life. Compared to the baseline no-ASI scenario, we lose out on at least a trillion future human lives, which I'll estimate are worth at least a $million each.

Value if superintelligent AI by 2034 goes right

$$ 10^{26}$

I've estimated this as the combined GDP of a trillion current Earths. High enough for you, AI optimists?

This number could plausibly even be MUCH higher, but it doesn't matter; it won't change the decision-relevant calculation.

Value if superintelligent AI by 2100 goes right

$$ 10^{26} — $ 10^{24} \approx $ 10^{26}$

I subtracted $$ 10^{24}$ from the 2034 estimate because in this scenario, the extra 66 years it takes us to reach a "good singularity" could forego a $trillion(trillion) worth of additional value when we factor in how the delay caused billions of people on the margin to die of cancer and old age, and endure countless other types of preventable suffering.

But $$ 10^{24}$ is a tiny fraction of $$ 10^{26}$ , just 1% to be exact. So even after subtracting that 66-year delay penalty from {value if superintelligent AI by 2034 goes right}, we still get a similar total value estimate of about $$ 10^{26}$ .

Naturally, when we're evaluating a decision with the whole future value of the universe at stake, its impact on a particular 66-year time interval barely tilts the scale.

Now we plug the above numbers into the well-known formula for expected value:

Expected Value of Superintelligent AI in 2034

$= P (A I g o e s r i g h t i n 2034) \times V (A I g o e s r i g h t i n 2034)$

$+ P (A I g o e s w r o n g i n 2034) \times V (A I g o e s w r o n g)$

$= 50 % \times $ 10^{26} + 50 % \times (- $ 10^{18})$

$\approx $ 5 \times 10^{25}$

Expected Value of Superintelligent AI in 2100

$= P (A I g o e s r i g h t i n 2100) \times V (A I g o e s r i g h t i n 2100)$

$+ P (A I g o e s w r o n g i n 2100) \times V (A I g o e s w r o n g)$

$\approx 70 % \times $ 10^{26} + 30 % \times (- $ 10^{18})$

$\approx $ 7 \times 10^{25}$

In this calculation, the extra probability of a good outcome that we get by taking more time with our ASI efforts — e.g. 70% chance of a good outcome by pausing until 2100, instead of only 50% chance by rushing it in 2034 — flows straight to the final expected value.

That's because the stakes of prolonging current-level suffering by 66 years are much smaller than the stakes of accidentally throwing the entire future in a dumpster, foreclosing the long-term positive outcome of good AI entirely.

Note: The number I used for a bad AI future (relative to the no-AI baseline future), $- $ 10^{18}$ , got drowned out in the calculation by the potential value of a future where AI goes right. If you're worried about an S-risk scenario (the risk of creating unprecedented astronomical suffering as a result of ASI), then "value of future where AI goes wrong" tips the scale even more toward pausing or stopping AI development.

The original objections—

“You're not factoring in all the benefits of good AI!”
“Stopping AI progress is also a doom scenario!”

—don't map to any choice of numbers you could reasonably put into a basic expected value calculation, to conclude that we shouldn't pause AI capabilities progress right now (or soon).

Feel free to try this calculation with your own numbers instead of mine. The orders of magnitude involved are ridiculously uncertain and wide-ranging. And yet, I don't think any reasonable choice of numbers will change the conclusion that pausing AI is the right decision.

In your framing here, the negative value of AI going wrong is due to wiping out potential future value. Your baseline scenario (0 value) thus assumes away the possibility that civilization permanently collapses (in some sense) in the absence of some path to greater intelligence (whether via AI or whatever else), which would also wipe out any future value. This is a non-negligible possibility.

The other big issue I have with this framing: "AI going wrong" can dereference to something like paperclips, which I deny have 0 value. To be clear, it could also dereference to mean s-risk, which I would agree is the worst possibility. But if the papperclipper-esque agents have even a little value, filling the universe with them is a lot of value. To be honest the only thing preventing me from granting paperclippers as much or more value than humans is uncertainty/conservatism about my metaethics; human-value is the only value we have certainty about, and so should be a priority as a target. We should be hesitant to grant paperclippers or other non-human agents value, but that hesitancy I don't think can translate into granting them 0 value in calculations such as these.

With these two changes in mind, being anti-pause doesn't sound so crazy. It paints a picture more like:

dead lightcone: 0 value
paperclipped lightcone: +100-1100 value
glorious transhumanist lightcone: +1000-1100 value
s-risked lightcone: -10000 value

This calculus changes when considering aliens, but it's not obvious to me in which direction. We could consider this a distributed/iterated game whereby all alien civilizations are faced with this same choice, or we could think "better that life/AI originating from our planet ends, rather than risking paperclips, so that some alien civilization can have another shot at filling up some of our lightcone". Or some other reasoning about aliens, or perhaps disregarding the alien possibility entirely.

Your baseline scenario (0 value) thus assumes away the possibility that civilization permanently collapses (in some sense) in the absence of some path to greater intelligence (whether via AI or whatever else), which would also wipe out any future value. This is a non-negligible possibility.

Yes, my mainline no-superintelligence-by-2100 scenario is that the trend toward a better world continues to 2100.

You're welcome to set the baseline number to a negative, or tweak the numbers however you want to reflect any probability of a non-ASI existential disaster happening before 2100. I doubt it'll affect the conclusion.

To be honest the only thing preventing me from granting paperclippers as much or more value than humans is uncertainty/conservatism about my metaethics

Ah ok, the crux of our disagreement is how much you value the paperclipper type scenario that I'd consider a very bad outcome. If you think that outcome is good then yeah, that licenses you in this formula to conclude that rushing toward AI is good.

dead lightcone: 0 value
paperclipped lightcone: +100-1100 value
glorious transhumanist lightcone: +1000-1100 value
s-risked lightcone: -10000 value