When people say "AI safety can’t be a Pascal’s mugging, because p(doom) is high" I think they are typically responding to a perspective like "sure, taking strong actions to reduce risk from misaligned AI would be doable, but isn't doing this a Pascal's mugging (implicitly responding to how much people have emphasized the stakes while less so arguing for the risk)". In this case, risk being high is sufficient to defeat this argument. I think it's typically implicit that societal actions to greatly reduce risk are possible. (I certainly think this, though it isn't overdetermined this is true.) And these arguments are typically about "should society (or the US) do X" rather than "should I personally work on XYZ".
"sure, taking strong actions to reduce risk from misaligned AI would be doable, but isn't doing this a Pascal's mugging (implicitly responding to how much people have emphasized the stakes while less so arguing for the risk)"
I don't really understand what this perspective is saying. Is the idea that people tend to grant the premise 'If p(doom) is high, then p(you avert doom) is high'? I agree p(doom) being high would be sufficient in that case.
Is the idea that people tend to grant the premise 'If p(doom) is high, then p(you avert doom) is high'
Yes [1] , and also that the argument often isn't about "what should I do" and is more about "what would be good policy / what should the world do".
Or idk about high but typically people aren't thinking about the marginal probability change from their exact actions and are instead thinking about it similarly to other problems. Like people aren't typically arguing "doom is high, but it's actually very different from other problems you might work on: it's extremely hard to avert relative to other problems, so it's a pascal's mugging and people shouldn't work on reducing risk". ↩︎
Okay that's good to know. I've mostly encountered the argument as a reply to individuals worrying that they're getting Pascal's-mugged into working on AI safety. In that sort of case,
AI safety can't be a Pascal's mugging because p(doom) is high
is invalid, and the premise needed to make it valid --
If p(doom) is high, then p(you can avert doom) is high
-- is way too doubtful to leave implicit.
But if the argument is a reply to people worried that the world/US government is getting Pascal's-mugged into working on AI safety, then the premise needed to make it valid is
If p(doom) is high, then p(the world/USG can avert doom) is high
and I agree that premise is safe/uncontroversial enough to leave implicit.
I dunno man, if I were in the specific situation you described, I might just hand over my wallet. If I'm ever in a situation that crazy, it would mean I've gone far, far outside what my existing priors are capable of reasoning about. Sure, hand over the wallet, what the hell. Maybe it's also God doing some kind of test. Probably the whole thing is a simulated test of character. Am I a human in this scenario? Could I tell whether I was a human or a persona being simulated in superposition by a particularly large LLM?
Wait is God flipping the coin load-bearing for the craziness? Because strangers making wild promises isn't that crazy.
Yes, and I think I see your point: if you replace God with an asteroid that's on a 50:50 collision course with Earth, or something like that, then the mugging is still a mugging.
I think I agree with basically everything you say in the post, but I think there's a further important point against [working on AI risk] = [getting Pascal's mugged] that isn't discussed in your post. The point I have in mind is that the Pascal's mugging worry is also clearly defeated if there are many actions such that they add up to a significant change in p(doom), even if each individual action contributes only a tiny change. The rest of my comment makes this point in more detail.
Consider your dark alley example again, but with the following modification:
I think that now it's clear that (supposing you are fully altruistic, which I think we're supposing here anyway), it is massively better to be taking all the bajillion actions — you're just suffering a deterministic cost of much less than a bajillion heaven vs hell differences for a deterministic gain of 10 bajillion heaven vs hell differences. If one has a Pascal's mugging worry that is telling one to not take each individual action (because it's giving one 10 bajillion utils with probability only 1/bajillion just like in your original example) and so telling one to not take any of the actions (with everyone ending up in hell), then this worry is just dumb, at least in this case where one has the option to take many actions that add up to a macroscopic change. [2]
To bring this back to AI safety:
A caveat: there's some difference between humanity collectively assigning a person's worth of additional resources to AI safety vs each individual's decision to work on AI safety. Still, it's clear that 10000 altruists would want to be in a world where they are collectively reducing p(doom) by
To state the same point another way:
if we want to make it concrete what these costly actions are, maybe imagine that you're agreeing to the stranger magically getting the wallets of each of a bajillion other people ↩︎
I'm aware that this can be extended to an argument for agreeing to be Pascal's mugged even in the case where you have only a single action available, but I think the case where you actually have many actions available that add up to a significant change in total is more clear than this extension, so I think what I'm stating should count as a separate argument against AI safety being a Pascal's mugging. ↩︎
and for other good stuff like there being a grand human future ↩︎
I would give a strong +1 to this specifically:
All that said, I think p(you — yes, you — avert doom) is high, or at least high enough. The whole doom situation is really up-in-the-air right now, and you’re at most like 4 degrees of separation from the big players: presidents, lab CEOs, and the like. You can influence someone who influences someone who influences someone. Your chances are way higher than 1 in a bajillion.
Policy makers are reasonably easy to reach, and are most likely to respond and take meetings with affluent and educated people who could be potential donors (likely a heavy overlap with the LW audience).
Policy makers in my experience are often also inherently skeptical of AI & Big Tech. One conversation with a policy maker or their staff (local, state, federal) is far more impactful than almost any other intervention and it’s not close.
Right now it’s not a politically polarized issue. Republicans dont like big tech because they perceive that big tech has helped Democrats for the past 20 years. Democrats don’t like big tech because they feel big tech betrayed the party in 2024. The public absolutely loathes AI across the political spectrum.
Except that no one is asking for your wallet to avert the Doom scenario.
What if instead we imagine that you are the Microsoft CEO and a stranger stops you in a dark alley and tells you that if you give them 140bn dollars they will create God and you get to be God's boss.
That seems closer to the standard Pascals mugging format.
People sometimes say that AI safety is a Pascal’s mugging. Other people sometimes reply that AI safety can’t be a Pascal’s mugging, because p(doom) is high. Both these people are wrong.
The second group of people are wrong because Pascal’s muggings are about the probability that you make a difference, not about baseline risk. The first group of people are wrong because the probability that you personally avert AI catastrophe isn’t that small.
Here’s a story to show that Pascal’s muggings are about the probability that you make a difference. Imagine that God will flip a coin at the end of time. If the coin lands heads, He’ll send everyone to heaven. If the coin lands tails, He’ll send everyone to hell. Everyone knows this is what will happen.
In a dark alley, a stranger approaches you and tells you that he can make God’s coin land heads, thereby ensuring that everyone goes to heaven. He says he’ll do it if you give him your wallet. You assign a very low probability to this stranger telling the truth — 1 in a bajillion — but the stranger reminds you that 10 bajillion people will have their fates determined by God’s coin.
‘Hang on,’ you say, ‘This seems a lot like a Pascal’s mugging.’
‘Au contraire,’ says the stranger, ‘It can’t be a Pascal’s mugging. The outcome I’m promising to avert — everyone going to hell — is not low probability at all. p(hell) is 50%.’
Would this reply convince you to hand over your wallet? Of course not. Even though the baseline risk of everyone going to hell is high, the probability that you make a difference — getting everyone to heaven when they otherwise would have gone to hell — is extremely low. And it’s this latter probability that determines whether your situation is a Pascal’s mugging.
So when people say that AI safety is a Pascal’s mugging, you can’t just reply that p(doom) is high. You have to argue that p(you avert doom) is high.
All that said, I think p(you — yes, you — avert doom) is high, or at least high enough. The whole doom situation is really up-in-the-air right now, and you’re at most like 4 degrees of separation from the big players: presidents, lab CEOs, and the like. You can influence someone who influences someone who influences someone. Your chances are way higher than 1 in a bajillion.