AI safety can be a Pascal's mugging even if p(doom) is high

Elliott Thornley (EJT)

AI safety can be a Pascal's mugging even if p(doom) is high — LessWrong

29 AI safety can be a Pascal's mugging even if p(doom) is high

25th Apr 2026

2 min read

29

People sometimes say that AI safety is a Pascal’s mugging. Other people sometimes reply that AI safety can’t be a Pascal’s mugging, because p(doom) is high. Both these people are wrong.

The second group of people are wrong because Pascal’s muggings are about the probability that you make a difference, not about baseline risk. The first group of people are wrong because the probability that you personally avert AI catastrophe isn’t that small.

Here’s a story to show that Pascal’s muggings are about the probability that you make a difference. Imagine that God will flip a coin at the end of time. If the coin lands heads, He’ll send everyone to heaven. If the coin lands tails, He’ll send everyone to hell. Everyone knows this is what will happen.

In a dark alley, a stranger approaches you and tells you that he can make God’s coin land heads, thereby ensuring that everyone goes to heaven. He says he’ll do it if you give him your wallet. You assign a very low probability to this stranger telling the truth — 1 in a bajillion — but the stranger reminds you that 10 bajillion people will have their fates determined by God’s coin.

‘Hang on,’ you say, ‘This seems a lot like a Pascal’s mugging.’

‘Au contraire,’ says the stranger, ‘It can’t be a Pascal’s mugging. The outcome I’m promising to avert — everyone going to hell — is not low probability at all. p(hell) is 50%.’

Would this reply convince you to hand over your wallet? Of course not. Even though the baseline risk of everyone going to hell is high, the probability that you make a difference — getting everyone to heaven when they otherwise would have gone to hell — is extremely low. And it’s this latter probability that determines whether your situation is a Pascal’s mugging.

So when people say that AI safety is a Pascal’s mugging, you can’t just reply that p(doom) is high. You have to argue that p(you avert doom) is high.

All that said, I think p(you — yes, you — avert doom) is high, or at least high enough. The whole doom situation is really up-in-the-air right now, and you’re at most like 4 degrees of separation from the big players: presidents, lab CEOs, and the like. You can influence someone who influences someone who influences someone. Your chances are way higher than 1 in a bajillion.

Pascal's MuggingAI

Frontpage

29

AI safety can be a Pascal's mugging even if p(doom) is high

22ryan_greenblatt

2Elliott Thornley (EJT)

2ryan_greenblatt

5Elliott Thornley (EJT)

5J Bostock

4Elliott Thornley (EJT)

New Comment

10 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:41 PM

[-]ryan_greenblatt19d2210

When people say "AI safety can’t be a Pascal’s mugging, because p(doom) is high" I think they are typically responding to a perspective like "sure, taking strong actions to reduce risk from misaligned AI would be doable, but isn't doing this a Pascal's mugging (implicitly responding to how much people have emphasized the stakes while less so arguing for the risk)". In this case, risk being high is sufficient to defeat this argument. I think it's typically implicit that societal actions to greatly reduce risk are possible. (I certainly think this, though it isn't overdetermined this is true.) And these arguments are typically about "should society (or the US) do X" rather than "should I personally work on XYZ".

[-]Elliott Thornley (EJT)19d*20

"sure, taking strong actions to reduce risk from misaligned AI would be doable, but isn't doing this a Pascal's mugging (implicitly responding to how much people have emphasized the stakes while less so arguing for the risk)"

I don't really understand what this perspective is saying. Is the idea that people tend to grant the premise 'If p(doom) is high, then p(you avert doom) is high'? I agree p(doom) being high would be sufficient in that case.

[-]ryan_greenblatt19d20

Is the idea that people tend to grant the premise 'If p(doom) is high, then p(you avert doom) is high'

Yes ^[1] , and also that the argument often isn't about "what should I do" and is more about "what would be good policy / what should the world do".

Or idk about high but typically people aren't thinking about the marginal probability change from their exact actions and are instead thinking about it similarly to other problems. Like people aren't typically arguing "doom is high, but it's actually very different from other problems you might work on: it's extremely hard to avert relative to other problems, so it's a pascal's mugging and people shouldn't work on reducing risk". ↩︎

[-]Elliott Thornley (EJT)19d50

Okay that's good to know. I've mostly encountered the argument as a reply to individuals worrying that they're getting Pascal's-mugged into working on AI safety. In that sort of case,

AI safety can't be a Pascal's mugging because p(doom) is high

is invalid, and the premise needed to make it valid --

If p(doom) is high, then p(you can avert doom) is high

-- is way too doubtful to leave implicit.

But if the argument is a reply to people worried that the world/US government is getting Pascal's-mugged into working on AI safety, then the premise needed to make it valid is

If p(doom) is high, then p(the world/USG can avert doom) is high

and I agree that premise is safe/uncontroversial enough to leave implicit.

[-]J Bostock19d50

I dunno man, if I were in the specific situation you described, I might just hand over my wallet. If I'm ever in a situation that crazy, it would mean I've gone far, far outside what my existing priors are capable of reasoning about. Sure, hand over the wallet, what the hell. Maybe it's also God doing some kind of test. Probably the whole thing is a simulated test of character. Am I a human in this scenario? Could I tell whether I was a human or a persona being simulated in superposition by a particularly large LLM?

[-]Elliott Thornley (EJT)19d40

Wait is God flipping the coin load-bearing for the craziness? Because strangers making wild promises isn't that crazy.

[-]J Bostock19d42

Yes, and I think I see your point: if you replace God with an asteroid that's on a 50:50 collision course with Earth, or something like that, then the mugging is still a mugging.

[-]Kaarel18d40

I think I agree with basically everything you say in the post, but I think there's a further important point against [working on AI risk] = [getting Pascal's mugged] that isn't discussed in your post. The point I have in mind is that the Pascal's mugging worry is also clearly defeated if there are many actions such that they add up to a significant change in p(doom), even if each individual action contributes only a tiny change. The rest of my comment makes this point in more detail.

Consider your dark alley example again, but with the following modification:

Suppose now that the stranger gives you the option to take each of a bajillion different actions which cost you on the order of giving away your wallet, each of which you take to add probability 1/bajillion to 10 bajillion people going to heaven. ^[1] In particular, if you take all the actions together, you make it so 10 bajillion people just certainly go to heaven, and if you take none of the actions, all 10 bajillion people certainly go to hell.

I think that now it's clear that (supposing you are fully altruistic, which I think we're supposing here anyway), it is massively better to be taking all the bajillion actions — you're just suffering a deterministic cost of much less than a bajillion heaven vs hell differences for a deterministic gain of 10 bajillion heaven vs hell differences. If one has a Pascal's mugging worry that is telling one to not take each individual action (because it's giving one 10 bajillion utils with probability only 1/bajillion just like in your original example) and so telling one to not take any of the actions (with everyone ending up in hell), then this worry is just dumb, at least in this case where one has the option to take many actions that add up to a macroscopic change. ^[2]

To bring this back to AI safety:

Consider the problem of assigning resources to AI safety work from society's POV. Say society has the option to pay people 100k USD salaries each and reduce risk by . I think it's abundantly clear that this would be a very good joint thing to do — it's each person reducing their consumption by on the order of for a reduction in the probability they die ^[3] .

A caveat: there's some difference between humanity collectively assigning a person's worth of additional resources to AI safety vs each individual's decision to work on AI safety. Still, it's clear that 10000 altruists would want to be in a world where they are collectively reducing p(doom) by than in a world where they aren't. And I think this sort of consideration should be compelling to altruists even if the probabilities involved in individual decisions were Pascalian and they generally had some worries about getting Pascal's mugged. (That said, I agree with you that the probabilities actually aren't Pascalian.)

To state the same point another way:

One can split anything into many small pieces that look Pascalian (like, what's the probability that making each next small improvement to a research project averts doom?). If one gets concerned about getting Pascal's mugged when probabilities get small, one should track when many individual small probabilities are actually adding up to something big, and then not be concerned about getting Pascal's mugged in those cases.

if we want to make it concrete what these costly actions are, maybe imagine that you're agreeing to the stranger magically getting the wallets of each of a bajillion other people ↩︎
I'm aware that this can be extended to an argument for agreeing to be Pascal's mugged even in the case where you have only a single action available, but I think the case where you actually have many actions available that add up to a significant change in total is more clear than this extension, so I think what I'm stating should count as a separate argument against AI safety being a Pascal's mugging. ↩︎
and for other good stuff like there being a grand human future ↩︎

[-]Grendel120919d41

I would give a strong +1 to this specifically:

All that said, I think p(you — yes, you — avert doom) is high, or at least high enough. The whole doom situation is really up-in-the-air right now, and you’re at most like 4 degrees of separation from the big players: presidents, lab CEOs, and the like. You can influence someone who influences someone who influences someone. Your chances are way higher than 1 in a bajillion.

Policy makers are reasonably easy to reach, and are most likely to respond and take meetings with affluent and educated people who could be potential donors (likely a heavy overlap with the LW audience).

Policy makers in my experience are often also inherently skeptical of AI & Big Tech. One conversation with a policy maker or their staff (local, state, federal) is far more impactful than almost any other intervention and it’s not close.

Right now it’s not a politically polarized issue. Republicans dont like big tech because they perceive that big tech has helped Democrats for the past 20 years. Democrats don’t like big tech because they feel big tech betrayed the party in 2024. The public absolutely loathes AI across the political spectrum.

[-]CarbonField19d-1-2

Except that no one is asking for your wallet to avert the Doom scenario.

What if instead we imagine that you are the Microsoft CEO and a stranger stops you in a dark alley and tells you that if you give them 140bn dollars they will create God and you get to be God's boss.

That seems closer to the standard Pascals mugging format.

Moderation Log