[Epistemic Status: confident considering the outside view]
The Pascal’s Mugging dilemma is this: a random person walks up to you in the street and says that if you don’t give them a dollar, they’ll destroy the earth tomorrow. Do you pay? Since the probability of them doing so cannot plausibly be low enough to make giving the dollar negative expected utility, decision theory (whether causal, evidential, or functional) says you do.
The dominant view is
Paying is obviously wrong. Therefore, this is a yet unsolved problem. There is probably a theoretical insight by which Decision Theory should be amended to avoid this behavior.
But this has never made sense to me. My view has always been
Decision Theory is almost certainly going to produce the correct output on this problem because it is conceptually simple. Therefore, either there are reasons why decision theory doesn’t actually tell you to pay, or paying is correct.
It’s strange to me that the intuition “you shouldn’t pay” is apparently valued much more highly than the intuition “FDT is correct,” to the point that the idea that it could possibly be correct to pay isn’t even on the table. So far, I have never read a satisfying way to deal with this problem, which strengthens my suspicion that there is none. Moreover, the ideas which I have seen mostly seem very misguided to me, particularly anything that involves rounding small numbers to zero or treating utility as non-linear. Therefore-
If your decision theory pays, then you can be exploited heavily, by being mugged repeatedly.
No I can’t. The probability that the mugger is telling the truth doesn’t plausibly increase if they ask multiple times, and the cost of being mugged repeatedly is high. A single dollar already has a chance to save the earth in other ways, and that chance increases at a roughly linear pace with repeated asks, both in the literal case as a human and in the metaphorical case as an aspiring AI. Finally, there is the fact that appearing muggable is negative utility.
Yeah, but that’s just a hack, and if your argument relies on this, then that’s terrible.
I don’t personally feel uncomfortable paying, or having an AI that would pay in such a scenario. As I said, the probability doesn’t increase with repeated asks, so that’s very similar to just asking for more in the first place.
So what if they do ask for more in the first place? If the same homeless person asked you for 200$, would you still pay?
I would, but only because the fear of being responsible for someone else’s death would impact me personally. Otherwise, I don’t think it’d be correct. If they threaten to destroy the earth, I wouldn’t pay. I think giving the 200$ to Miri has better odds of doing that.
You’ve been avoiding part of the problem. What about really high numbers? Don’t your explanations break apart there?
I don’t think so. I hold that the chance to save a googolplex people is also higher by donating than by paying. When people discuss this issue, noting that paying can have arbitrarily high payoffs, they always forget that other ways of spending the money can also have arbitrarily high payoffs. I don’t think this changes in the case of an AI, either. Yes, there is always some chance that the AI is misprogrammed in a way that precludes it from seeing how the mugger could save a googolplex people. But there is also a chance that giving up whatever resources the mugger asked for will prevent it from figuring that out itself.
What about infinite payoffs?
I’m not sure. I think that is a separate issue, and I want to explicitly exclude it from this post.
And what about if the sum asked is sufficiently small, huh?
Then you pay.
I realize there aren’t any arguments here that anyone else probably couldn’t come up with in 5 minutes. However, as it stands, no-one else is making them. It just seems really obvious to me that this is talking in circles: either there is a reason why FDT wouldn’t pay, or it is correct to pay. Like, come on! If paying the mugger is -actually- the highest utility option you have, then why wouldn’t you take it? Doesn’t that seem weird? I find it weird, much weirder than the idea that paying might sometimes be correct. I think it is useful to look at a mugging scenario as simply providing you with an additional option to spend money. If there is a better option, ignore it. If not, then there is no reason not to take it.
Another thing that I’ve never seen anyone point out is that the total amount of damage caused by being mugged seems to be naturally bounded above. It doesn’t matter how high the utility at stake gets, there is always some amount of resources that will have a better chance of gaining as much utility by being used in other ways. It makes no sense to consider the maximum utility bounded in the latter case but not in the former. The mugger claiming that they can affect a googolplexplex lives doesn’t give them exclusive access to a non-zero probability of affecting a googolplexplex lives; other ways do exist. It will never be positive expected utility to pay a really significant amount of resources in response to threats, because at some point, the probability that further resources will help negotiating with people running the simulation just takes over, for any arbitrarily large number.
Pascal’s Mugger has never seemed to be anything else but “hey, here is an unintuitive result of correct decision theory” to me, and I believe the correct response would be to say “okay, interesting” and move on.