Pascal's Mugging Solved

23Strilanc

1common_law

2Strilanc

8solipsist

0common_law

4solipsist

4solipsist

0Kingreaper

0solipsist

3philh

0common_law

2ntroPi

2Plasmon

1christopherj

1Luke_A_Somers

0common_law

1[anonymous]

2DanielLC

0[anonymous]

2DanielLC

0[anonymous]

1DanielLC

3common_law

4DanielLC

0common_law

0DanielLC

0John_Maxwell

0Slider

0common_law

-2TylerJay

New Comment

"Solving" Pascal's Mugging involves giving an explicit reasoning system and showing that it makes the right decision.

It's not enough to just say "your confidence has to go down more than their claimed reward goes up". That part is obvious. The hard part is coming up with *actual explicit rules that do that*. Particularly ones that don't fall apart in other situations (e.g. the decision system "always do nothing" can't be pascal-mugged, but has serious problems).

Another thing not addressed here is that the mugger may be a hypothetical. For example, if the AI generates hypotheses where the universe affects 3^^^^3 people then all decisions will be dominated by these hypotheses because their outcomes outweigh their prior by absurd margins. How do you detect these bad hypotheses? How do you penalize them without excluding them? Should you exclude them?

Please give a more concrete situation with actual numbers and algorithms.

I think you'll find the argument is clear without any formalization if you recognize that it is NOT the usual claim that confidence goes down. Rather, it's that the confidence falls *below* its contrary.

In philH's terms, you're engaging in pattern matching rather than taking the argument on its own terms.

How have I not addressed the arguments on its own terms? I agree with basically everything you said, except calling it a solution. You'll run into non-trivial problems when you try to turn it into an algorithm.

For example, the case of there being an actual physical mugger is meant to be an example of the more general problem of programs with tiny priors predicting super-huge rewards. A strategy based on "probability of the mugger lying" has to be translated to the general case somehow. You have to prevent the AI from *mugging itself*.

This article would benefit from working through a concrete example.

If you become super-exponentially more skeptical as the mugger invokes super-exponentially higher utilities, how do you react if the mugger tears the sky asunder?

You become less skeptical, but that doesn't affect the issue presented, which concerns only the evidential force of the claim itself.

If someone tears the sky asunder, you will be more inclined to believe the threat. But after a point of increasing threat, increasing it further should *decrease* your expectation.

But after a point of increasing threat, increasing it further should decrease your expectation.

OK, so after a certain point, the mugger *increasing his threat* will cause you to *decrease your belief* faster. After a certain point, the mugger increasing his threat will cause (threat badness * probability) to go decrease.

That implies that if he threatens you with a super-exponentially bad outcome, you will assign a super-exponentially small probability to his threat.

But super-exponentially small probabilities are a tricky thing. Once you've assigned a super-exponentially small probability to an event, **no amount of evidence in the visible universe can make you change your mind**. It doesn't matter if the mugger grows wings of fire or turns passers by into goats; no amount of evidence your eyes and ears are capable of receiving can sway a super-exponentially small probability. If the city around you melts into lava, should you believe the mugger then? How do you quantify whether you should or should not?

I should define *super-exponentially large*, since I'm using it as a hand-wavy term. Let's call a number *super-exponentially large* if it's bigger than anything you could write down using scientific notation using the matter available on earth. Numbers like a googol (=10^100) or 1,000,000,000^1,000,000,000 (=10^000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000) are not super-exponentially large, since I can write them as a power of ten in a reasonable amount of space. A googolplex (10^10^100, or 1 followed by a googol zeros) *is* super-exponentially large because there is not enough matter on earth to write a googol zeros.

A *super-exponentially small probability* is a probability less than 1 over a super-exponentially large number.

Speaking loosely, if you express a probability in scientific notation, a given amount of evidence lets you add or remove a constant number of digits from the exponent. For most things, that's pretty powerful. But if you're dealing with super-exponentially small probabilities, to hold real sway you would need more evidence than you could write down with the matter available on earth.

Once you've assigned a super-exponentially small probability to an event, no amount of evidence in the visible universe can make you change your mind.

I don't see why this is necessarily a problem.

The claim that the mugger will torture 3^^^3 people, unless you give them $100, is so implausible that there *should* be no possible evidence that will convince you of it.

Any possible evidence is more plausibly explained by possibilities such as you being in a computer game, and the mugger being a player who's just being a dick because they find it funny.

So I was pattern matching this as an argument for "why the probability decreases more than we previously acknowledged, as the threat increases", but that isn't what you're going for. Attempting to summarize it in my own words:

There are three relevant events: (A) the threat will not happen; (B) not giving in to blackmail will trigger the threat; (C) giving in to blackmail will trigger the threat (or worse). As the threat increases, P(B) and P(C) both decrease, but P(C) begins to dominate P(B).

Is this an accurate summary?

It's accurate. But it's crucial, of course, to see why P(C) comes to dominate P(B), and I think this is what most commenters have missed. (But maybe I'm wrong about that; maybe its because of pattern matching.) As the threat increases, P(C) comes to dominate P(B) because the threat, when large enough, is evidence *against* the threatened event occurring.

I like your solution to pascals mugging but as some people mentioned it breaks down with superexponential numbers. This is caused by the extreme difficulty to do meaningful calculations once such a number is present (similar to infinity or a division by zero).

I propose the following modification:

- Given a Problem that contains huge payoffs or penalties, try common_laws solution.
- Should any number above Gogol be in the calculation, refuse to calculate!
- Try to reformulate the problem in a way that doesn't contain such a big number.
- Should this fail, do nothing.

I would go so far as to treat any claim with such numbers in it as fictional.

Another LW classic containing such numbers is the Dust Speck vs. Torture paradox. I think that just trying to calculate in the presence of such numbers is a fallacy. Has someone formulated a Number-Too-Big-Fallacy already?

As the amount of harm threatened gets larger, the probability that the mugger is maximizing approaches unity.

It seems entirely possible that some finite nonzero fraction of "matrix lords" is capable of carrying out arbitrarily large threats, or of providing arbitrarily large rewards (provided that the utility function by which you judge such things is unbounded).

As the probability that the mugger is engaged in maximizing approaches unity, the likelihood that the mugger’s claim is true approaches zero.

It is possible that the mugger is maximizing while still telling the truth.

Pascal's mugging against an actual opponent is easy. If they are able to carry out their threat, they don't need anything you would be able to give them. If the threat is real, you're at their mercy and you have no way of knowing if acceding to their demand will actually make anyone safer, whereas if he's lying you don't want to be giving resources to that sort of person. This situation is a special case of privileging the hypothesis, since for no reason you're considering a nearly impossible event while ignoring all the others.

If we're talking about a metaphor for general decision-making, eg an AI who's actions could well affect the entirety of the human race, it's much harder. I'd probably have it ignore any probabilities below x%, where x is calculated as a probability so small that the AI would be paralyzed from worrying about the huge number of improbable things. Not because it's a good idea, but because as probability approaches zero, the number of things to consider approaches infinity yet processing power is limited.

It seems to me that you've pretty much solved the literal interpretation of the hypothetical, putting into words what everyone was already thinking.

The more relevant one is where you generated the probability and utility estimates yourself and are trying to figure out what to do about it.

I intended nothing more than to solve the literal interpretation. This isn't my beaten path. I don't intend more on the subject besides speculation about why an essentially trivial problem of "literal interpretation" has resisted articulation.

I have a general question about Pascal's mugging: Does the problem still work if the mugger demands substantially more than five dollars?

Let's say the Pascal's Mugger demands you accept a Simpleton Gambit. Is there any line of thought that says 'I believe a properly constructed mugging could mug me for 5 dollars, but couldn't mug me into a Simpleton Gambit?'

My current understanding is that if you are a utilitarian who accepts a Pascal's Mugger can mug you at all, then you accept a Pascal's Mugger can mug you for anything by properly constructing the math behind the mugging. (Although if the mugger is demanding you accept a Simpleton Gambit, it might be more accurate to call that entity a Pascal's Slaver.)

But I'm only recently came to that understanding and it doesn't directly address whether or not you should accept Pascal's muggings, just what seems to be a plausible consequence if you are a utilitarian who does, so I thought I should verify it first.

Reference note for Simpleton Gambits: http://wiki.lesswrong.com/wiki/Simpleton_Gambit

Thanks. I think I see something about Pascal's Mugging that is confusing me, then, but I'll try to build out the logic step by step so that if I go wrong I'll better know where.

Even if someone has a value system that can be made to accept multiple types of Simpleton Gambits, they can still only be made to accept one Simpleton Gambit that has, as a requirement for accepting that type of gambit, not accepting any other Simpleton Gambits.

It doesn't appear to matter that not accepting those other types of Simpleton Gambits goes against their previous value system: Once they accept that gambit, they are now a Simpleton and don't get to decide.

A similar problem occurs with them giving up the 5 dollars at all: Once they do that, that limits their ability to make future decisions (that might require them to pay), even if they want to, in those circumstances. Not as much as a Simpleton Gambit, of course.

In both cases, agreeing to the threat now limits your options later, right?

Perhaps, but performing every action to prepare for Pascal's mugging hardly counts as avoiding Pascal's mugging. Quite the opposite: now you're constraining everything you do even if nobody threatens you.

There's actually a larger problem. If you don't have some way of avoiding Pascal's mugging, your expected utility is almost certainly divergent. You can find a risk with expected value of arbitrarily high magnitude in either direction.

Thank you! This makes me glad that I had been going through my logic item by Item, because I had not not been considering that Pascal's mugging had mathematical similarities to the St. Petersburg Paradox.

Short version:

Even just considering the a priori probability of them honestly threatening you before you even take into account that they threatened you, it's still enough to get Pascal mugged. The probability that they're lying increases, but not fast enough.

Long version, with math:

Consider three possibilities:

A: Nobody makes the threat

B: Someone dishonestly makes the threat

C: Someone honestly makes the threat

Note that A, B, and C form a partition. That is to say exactly one of them is true. Technically it's possible that more than one person makes the threat, and only one of them is honest, so B and C happen, so let's say that the time and place of the threat are specified as well.

C is highly unlikely. If you tried to write a program simulating that, it would be long. It would still fit within the confines of this universe. Let's be hyper-conservative, and call it K-complexity of a googol.

There is also a piece of evidence E, consisting of someone making the threat. E is consistent with B and C, but not A. This means:

P(A&E) = 0, P(B&E) = P(B), P(C&E) = P(C)

Calculating from here:

P(C|E) = P(C&E)/P(E)

= P(C)/P(E)

P(C)

= 2^-googol

So the probability of the threatener being honest is at least 2^-googol. Since the disutility of not paying him if he is honest more than 2^googol times the utility of paying him, it's worthwhile to pay.

What you present is the basic fallacy of Pascal's Mugging: treating the probability of B and of C as independent the fact that a threat of given magnitude is made.

Your formalism, in other words, doesn't model the argument. The basic point is that Pascal Mugging can be solved by the same logic as succeeds with Pascal's wager. Pascal ignored that believing in god A was instrumentally rational by ignoring that there might, with equal consequences, be a god B instead who hated people who worshiped god A.

Pascal's Mugging ignores that giving to the mugger might cause the calamity threatened to be more likely if you accede to the mugger than if you don't. The point of inflection is that point where the mugger's making the claim becomes evidence against it rather than for it.

No commenters have engaged the argument!

What you present is the basic fallacy of Pascal's Mugging: treating the probability of B and of C as independent the fact that a threat of given magnitude is made.

The prior probability of X is 2^-(K-complexity of X). There are more possible universes where they carry out smaller threats, so the K-complexity is lower. What I showed is that, even if there were only a single possible universe where the threat was carried out, it's still simple enough that the K-complexity is small enough that it's worth paying the threatener.

No commenters have engaged the argument!

You gave a vague argument. Rather than giving a vague counterargument along the same lines, I just ran the math directly. You can argue that P(C|E) decreases all you want, but since I found that the actual value is still too high, it clearly doesn't decrease fast enough.

If you want the vague counterargument, it's simple: The probability that it's a lie approaches unity. It just doesn't approach it fast enough. It's a heck of a lot less likely that someone who threatens 3^^^3 lives is telling the truth than someone who's threatening one. It's just not 3^^^3 times less likely.

I'm not sure what you mean.

If you mean what I think you mean, I'm ignoring it because I'm going with worst case. Rather than tracking how the probability of someone making the threat reduces slower than the probability of them carrying it out (which means a lower probability of them carrying it out), I'm showing that even if we assume that the probability is one, it's not enough to discount the threat.

P(Person is capable of carrying out the threat) is high enough for you to pay it off on its own. The only way for P(Person is capable of carrying out the threat | Person makes the threat) to be small enough to ignore is if P(Person makes the threat) > 1.

Let's consider two scenarios S1 and S2, with S1 having a lesser harm and S2 a greater harm. M1 and M2 are the events where a mugger presents S1 and S2 respectively. Our bayesian update for either takes the form

%20=%20\frac{P(M1|S1)P(S1)}{P(M1|S1)P(S1)%20+%20P(M1|\neg%20S1)P(\neg%20S1)})

Your argument is that (edit: maybe not; I'm going to leave this comment here anyway, hope that's OK)

)

is greater than

)

and this increase offsets increases in the utility part of the expected utility equation. I'm pretty tired right now, but it seems possible that you could do some algebra and determine what the shape of the function determining this conditional probability as a function of the harms amount would have to be in order to achieve this. In any case, while it might be convenient for your updating module to have the property that this particular function possesses this particular shape, it's not obvious that it's the correct way to update. (And in particular, if you were programming an AI and you gave it what you thought was a reasonably good Bayesian updating module, it's not obvious that the module would update the way you would like on this particular problem, or that you could come up with a sufficiently rigorous description of the scenarios under which it should abandon its regular updating module in order to update in the way you would like.)

Even if this argument works, I'm not sure it completely solves the problem, because it seems possible that there are scenarios with extreme harms and relatively high (but still extremely miniscule) prior probabilities. I don't know how we would feel about our ideal expected utility maximizer assigning most of its weight to some runaway hypothetical.

Another thought: if a mugger had read your post and understood your argument, perhaps they would choose to mug for smaller amounts. So would that bring ) above )? What if the mugger can show you their source code and prove that for them the two probabilities are equal? (E.g. they do occasionally play this mugging strategy and when they do they do it with equal probability for any of three harm amounts, all astronomically large.)

This solutions seems to implicitly assume that all real needs are pretty moderate. That is the only plausible reason to state a meganumber class high utility is to beat someone elses number. A matrix lord could have a different sense of time in their "home dimension" and for all I know what I do might be the culmination of bet involving the creation and destruction of whole classes of universes.

Also why what the mugger says have anythinhg to do how big of a threat the conversation is? If someone would come up to me and say "give me 5$ or I will do everything in my power to screw with you" without saying how much screwage would happen or what their power level is, it wouldn't seem that problematic. Why would the threat be more potent with stating that information? It seems it should limit the threat it poses. It would appear to me that what people say only has relevance in the limits on how I have established agenthood, ie it maxes out. If I would really be on the look out for freaky eldrich gods I should worry that the sky would fall on me - ie there is no need for the sky/might to state any threat for them to be one.

That is [it is assumed that] the only plausible reason to state a meganumber class high utility is to beat someone elses number.

It's the only reason that doesn't cancel out because it's the only one about which we have any knowledge. The higher the number, the more likely it is that the mugger is playing the "pick the highest number" game. You can imagine scenarios in which picking the highest number has some unknown significance, they cancel out, in the same way as Pascal's God is canceled by the possibility of contrary gods.

Also why what the mugger says have anythinhg to do how big of a threat the conversation is?

Same question (formally) as why should failure to confirm a theory be evidence against it.

Okay, I'll play along. Lets see where this takes us. The math here is not going to be strict and I'm going to use infinities to mean "sufficiently large", but it will hopefully help us make some sense of this proposition.

P(W) = Probability of the mechanism working P(T) = Probability that the mugger is being truthful P(M) = Probability that the mugger is "maximizing" h = Amount of harm threatened a = Amount being asked for in the mugging

We want to know if PT(h) * PW * h > a for sufficiently large h, without really specifying a.

As the amount of harm threatened gets larger, the probability that the mugger is maximizing approaches unity.

Since you're claiming that the probability that the mugger is maximizing is dependent on the amount of harm threatened, we can rewrite P(M) as a function of h, so lets call it PM(h) such that lim h->∞ PM(h) = 1

As the probability that the mugger is engaged in maximizing approaches unity, the likelihood that the mugger’s claim is true approaches zero.

We can compose the functions to get our relationship between PM and PT: lim h->∞ PT(PM(h)) = 0 which simplifies to: lim h->∞ PT(h) = 0

If we express the original question using this notation, we want to know if PT(h) * PW * h > a for large enough values of h. If we take our limits, we get: lim h->∞ PT(h) * PW * h > a which evaluates to 0 * PW * ∞ > a

This doesn't really work, since we have 0 * ∞, but remember that our infinity means "sufficiently large" and our zero therefore has to mean "very very low probability".

...the evidence that the mugger is maximizing can lower the probability below that of the same harm when no mugger has claimed it. [...] the claim can become less believable than if it hadn’t been expressed.

This part here has to mean that for sufficiently large h, PT(h) * PW * h < PW * h, which is not hard to believe since it's just adding another probability, but it also doesn't solve the original problem of telling us that we can be justified in not paying the mugger. In order to do that, we'd need some assurance that for sufficiently large h and some a, PT(h) * PW * h < a. To get that assurance, PT(h) * h would have to have some upper bound, and the theory you presented doesn't give us that.

It's a fun theory to play with and I would encourage you to try to flesh it out more and see if you can find a good mathematical relationship to model it.

Since Pascal’s Mugging is well known on LW, I won’t describe it at length. Suffice to say that a mugger tries to blackmail you by threatening enormous harm by a completely mysterious mechanism. If the harm is great enough, a sufficiently large threat eventually dominates doubts about the mechanism.

I have a reasonably simple solution to Pascal’s Mugging. In four steps, here it is:

Pascal’s Mugging induces us to look at the likelihood of the claim in abstraction from the fact that the claim is made. The paradox can be solved by breaking the probability that the mugger’s claim is true into two parts: the probability of the claim itself (its simplicity) and the probability that the mugger is truthful. Even if the probability of magical harm doesn’t decrease when the amount of harm increases, the probability that the mugger is truthful decreases continuously as the amount of harm predicted increases.

Solving the paradox in Pascal’s Mugging depends on recognizing that, if the logic were sound, it would engage muggers in a game where they try to pick the highest practicable number to represent the amount of harm. But this means that the higher the number, the more likely they are to be playing this game (undermining the logic believed sound).

But solving Pascal’s Mugging also depends on recognizing that the evidence that the mugger is maximizing can lower the probability below that of the same harm when no mugger has claimed it. It involves recognizing that, when it is almost certain that the claim is motivated by something unrelated to the claim’s truth, the claim can become less believable than if it hadn’t been expressed.

The mugger’s maximizing motivation is evidence against his claim.If someone presents you with a number representing the amount of threatened harm 3^3^3..., continued as long as a computer can print out when the printer is allowed for run for, say, a decade, you should think this result less probable than if someone had never presented you with the tome. While people are more likely to be telling the truth than to be lying, if you are sufficiently sure they are lying, their testimony counts against their claim.

The proof is the same as the proof of the (also counter-intuitive) proposition that failure to find (some definite amount of) evidence for a theory constitutes negative evidence. The mugger has elicited your search for evidence, but because of the mugger’s clear interest in falsehood, you find that evidence wanting.