# 4

For those who aren't familiar, Pascal's Mugging is a simple thought experiment that seems to demonstrate an intuitive flaw in naive expected utility maximization.  In the classic version, someone walks up to you on the street, and says, 'Hi, I'm an entity outside your current model of the universe with essentially unlimited capabilities.  If you don't give me five dollars, I'm going to use my powers to create 3^^^^3 people, and then torture them to death.'  (For those not familiar with Knuth up-arrow notation, see here).  The idea being that however small your probability is that the person is telling the truth, they can simply state a number that's grossly larger -  and when you shut up and multiply, expected utility calculations say you should give them the five dollars, along with pretty much anything else they ask for.

Intuitively, this is nonsense.  However, an AI under construction doesn't have a piece of code that lights up when exposed to nonsense.  Not unless we program one in.  And formalizing why, exactly, we shouldn't listen to the mugger is not as trivial as it sounds.  The actual underlying problem has to do with how we handle arbitrarily small probabilities.  There are a number of variations you could construct on the original problem that present the same paradoxical results.  There are also a number of simple hacks you could undertake that produce the correct results in this particular case, but these are worrying (not to mention unsatisfying) for a number of reasons.

So, with the background out of the way, let's move on to a potential approach to solving the problem which occurred to me about fifteen minutes ago while I was lying in bed with a bad case of insomnia at about five in the morning.  If it winds up being incoherent, I blame sleep deprivation.  If not, I take full credit.

Let's take a look at a new thought experiment.  Let's say someone comes up to you and tells you that they have magic powers, and will make a magic pony fall out of the sky.  Let's say that, through some bizarrely specific priors, you decide that the probability that they're telling the truth (and, therefore, the probability that a magic pony is about to fall from the sky) is exactly 1/2^100.  That's all well and good.

Now, let's say that later that day, someone comes up to you, and hands you a fair quarter and says that if you flip it one hundred times, the probability that you'll get a straight run of heads is 1/2^100.  You agree with them, chat about math for a bit, and then leave with their quarter.

I propose that the probability value in the second case, while superficially identical to the probability value in the first case, represents a fundamentally different kind of claim about reality than the first case.  In the first case, you believe, overwhelmingly, that a magic pony will not fall from the sky.  You believe, overwhelmingly, that the probability (in underlying reality, divorced from the map and its limitations) is zero.  It is only grudgingly that you inch even a tiny morsel of probability into the other hypothesis (that the universe is structured in such a way as to make the probability non-zero).

In the second case, you also believe, overwhelmingly, that you will not see the event in question (a run of heads).  However, you don't believe that the probability is zero.  You believe it's 1/2^100.  You believe that, through only the lawful operation of the universe that actually exists, you could be surprised, even if it's not likely.  You believe that if you ran the experiment in question enough times, you would probably, eventually, see a run of one hundred heads.  This is not true for the first case.  No matter how many times somebody pulls the pony trick, a rational agent is never going to get their hopes up.

I would like, at this point, to talk about the notion of metaconfidence.  When we talk to the crazy pony man, and to the woman with the coin, what we leave with are two identical numerical probabilities.  However, those numbers do not represent the sum total of the information at our disposal.  In the two cases, we have differing levels of confidence in our levels of confidence.  And, furthermore, this difference has an actual ramifications on what a rational agent should expect to observe.  In other words, even from a very conservative perspective, metaconfidence intervals pay rent.  By treating the two probabilities as identical, we are needlessly throwing away information.  I'm honestly not sure if this topic has been discussed before.  I am not up to date on the literature on the subject.  If the subject has already been thoroughly discussed, I apologize for the waste of time.

Disclaimer aside, I'd like to propose that we push this a step further, and say that metaconfidence should play a role in how we calculate expected utility.  If we have a very small probability of a large payoff (positive or negative), we should behave differently when metaconfidence is high than when it is low.

From a very superificial analysis, lying in bed, metaconfidence appears to be directional.  A low metaconfidence, in the case of the pony claim, should not increase the probability that the probability of a pony dropping out of the sky is HIGHER than our initial estimate.  It also works the other way as well: if we have a very high degree of confidence in some event (the sun rising tomorrow), and we get some very suspect evidence to the contrary (an ancient civilization predicting the end of the world tonight), and we update our probability downward slightly, our low metaconfidence should not make us believe that the sun is less likely to rise tomorrow than we thought.  Low metaconfidence should move our effective probability estimate against the direction of the evidence that we have low confidence in: the pony is less likely, and the sunrise is more likely, than a naive probability estimate would suggest.

So, if you have a claim like the pony claim (or Pascal's mugging), in which you have a very low estimated probability, and a very low metaconfidence, should become dramatically less likely to actually happen, in the real world, than a case in which we have a low estimated probability, but a very high confidence in that probability.  See the pony versus the coins.  Rationally, we can only mathematically justify so low a confidence in the crazy pony man's claims.  However, in the territory, you can add enough coins that the two probabilities are mathematically equal, and you are still more likely to get a run of heads than you are to have a pony magically drop out of the sky.  I am proposing metaconfidence weighting as a way to get around this issue, and allow our map to more accurately reflect the underlying territory.  It's not perfect, since metaconfidence is still, ultimately, calculated from our map of the territory, but it seems to me, based on my extremely brief analysis, that it is at least an improvement on the current model.

Essentially, this idea is based on the understanding that the numbers that we generate and call probability do not, in fact, correspond to the actual rules of the territory.  They are approximations, and they are perturbed by observation, and our finite data set limits the resolution of the probability intervals we can draw.  This causes systematic distortions at the extreme ends of the probability spectrum, and especially at the small end, where the scale of the distortion rises dramatically as a function of the actual probability.  I believe that the apparently absurd behavior demonstrated by an expected-utility agent exposed to Pascal's mugging, is a result of these distortions.  I am proposing we attempt to compensate by filling in the missing information at the extreme ends of the bell curve with data from our model about our sources of evidence, and about the underlying nature of the territory.  In other words, this is simply a way to use our available evidence more efficiently, and I suspect that, in practice, it eliminates many of the Pascal's-mugging-style problems we encounter currently.

I apologize for not having worked the math out completely.  I would like to reiterate that it is six thirty in the morning, and I've only been thinking about the subject for about a hundred minutes.  That said, I'm not likely to get any sleep either way, so I thought I'd jot the idea down and see what you folks thought.  Having outside eyes is very helpful, when you've just had a Brilliant New Idea.

# 4

New Comment
40 comments, sorted by Click to highlight new comments since:

If your metaconfidence serves only to push your confidence interval further in the direction it was already at, it seems likely you're double-counting somewhere; you're weighing the same evidence (or lack thereof) to generate your metaconfidence as your confidence.

I am skeptical about "metaconfidence."

If you have all the relevant probabilities, you have all the information you need to calculate expected utilities for all possible choices--you don't need to decide on how "metaconfident" you are in these probabilities. Your probabilities may not be based on very good data, and so you might anticipate that these probabilities will change drastically when you update on new observations--I think you would call this "low metaconfidence." But your strategy for updating based on new evidence is already encoded in your priors for what evidence you expect to observe. I don't think metaconfidence is useful as a new, independent concept, apart from indicating that your probabilities are susceptible to change based on future evidence.

In particular, if "metaconfidence" considerations seem to be prompting you to distrust your probability as being either systematically too high or too low, then you should just immediately update your probability in that direction, by conservation of expected evidence. So when you say

the pony is less likely, and the sunrise is more likely, than a naive probability estimate would suggest.

then if you really believe that, you ought to preemptively update your "naive" probabilities until you no longer believe they are systematically biased.

I think a particular misconception may be leading you astray:

When we talk to the crazy pony man, and to the woman with the coin, what we leave with are two identical numerical probabilities. However, those numbers do not represent the sum total of the information at our disposal. In the two cases, we have differing levels of confidence in our levels of confidence. And, furthermore, this difference has an actual ramifications on what a rational agent should expect to observe.

I think what you are referring to here is that if you repeated the 100-coin-flip experiment 2^100 times, you expect to see 100 heads on average once. But even if the pony guy tried 2^100 times, you do not expect an average of one pony to fall from the sky. You expect him to simply lack the power, and so expect him to fail every time.

The real difference here is not metaconfidence, it's that each set of 100 coin flips is completely independent of any other set of 100 coin flips. But the pony guy's first attempt to summon a pony is not independent of his second attempt to summon a pony. If he fails on his first attempt, you strongly expect him to fail on every future attempt.

In more detail:

If you are assigning a probability of 1/2^100 to the pony guy's claim, you think that the claim is pretty improbable. How improbable? You are saying, "I expect that if I heard 2^100 similarly improbable, but completely independent and uncorrelated claims, on average one such claim would be true and all the rest would be false." One such completely independent claim to which you might assign the same probability might be if I predicted that "tomorrow, physicists will discover a new sentient fundamental particle that vacations in Bermuda every August." If your probability is 1/2^100, you really should expect that if you heard 2^100 equally astonishing claims, one of them would actually turn out to be true. But in particular you don't expect that if the pony guy repeated his claim 2^100 times, on average he'd be telling the truth once. Repetitions of the same claim by the same person are clearly not independent claims.

Superficially this may look different from the case of the coin flips. There, you expect that if you make 2^100 tests of the claim "flipping this coin 100 times will give all heads," it would be true on average once. To symmetrize the situation, let me make the coin claim more specific without changing it in any meaningful way: "The first 100 flips of this coin will all be heads." This, like "I can summon ponies from the sky" is a claim that is simply either true or false. When you assign a probability of 1/2^100 to the coin claim, you are again saying, "I expect that if I heard 2^100 similarly improbable, but completely independent and uncorrelated claims, on average one of them would be true." Because different flips of the same coin happen to have the property of statistical independence, similarly improbable but completely independent claims are easy to construct. For instance, "The second 100 flips of this coin will all be heads," etc.

This is all to say that I disagree that

the probability value in the second case, while superficially identical to the probability value in the first case, represents a fundamentally different kind of claim about reality than the first case.

The two probabilities represent exactly analogous claims, and there is no need for a notion of metaconfidence to distinguish them.

Jaynes deals with "metaconfidence" somewhat in Chapter 18. One of the more interesting chapters, but I haven't seen people take notice, or do anything with it. If anyone knows where people have pushed along these lines further, please let me know.

As for Pascal's mugging, I'll say three things.

First, you have to decide how you want to respond to terroristic threats in game theory terms. What's to stop the guy from playing this game again and again and again? Nothing that I see.

Second, I've recently been trying to get at the issue of how to deal with fundamentally arbitrary assertions. This particular claim of a being "outside my model of the universe" would seem to apply in spades. It's one of an infinite number of equivalent arbitrary claims. Maybe P should be zero for these.

Third, if you don't pretend you care about every sentient being equally, or that you have an unbounded amount of "give a shit" to dole out to them, "shut up and multiply" just does not apply, and the ever increasing number of people in the threat is not a trump card against every low probability assignments.

Maybe P should be zero for these.

One could model this as "there are infinite claims with 0 evidence, so given any finite resource X, I should spread that X evenly over all infinite claims - therefor I am willing to spend at most epsilon resources on any given cause." Given that listening to the person STATE the claim is WELL above epsilon resources, one has already over-invested, and can safely move on to the next infinitesimally unlikely claim.

First, you have to decide how you want to respond to terroristic threats in game theory terms. What's to stop the guy from playing this game again and again and again? Nothing that I see.

This, I think, is the biggest confounding issue on the Pascal's Mugging problem: that one's probability of being approached with that claim is not independent of how one responds to such claims. So if you're "the type to pay up" you will encounter such muggers far out of proportion to the "true" occurrence of beings that are willing and able to create a bunch of minds to torture (whatever that true frequency might be!).

In order to modify the problem to get to the "core" issue here [1], you would have to remove the part where the mugger benefits at your expense, which is what draws in the game-theoretic effects (and, of course, makes them analogous to a mugger in the first place).

So how does the problem look when the "mugger" "only" asks of you something that (as best you can determine?) doesn't cost you anything and doesn't benefit them either? But this part isn't so easily abstracted away either, because whether the mugger can conceal the fact that the proposed course of action benefits them at your expense is itself a game! Argh!

[1] the core issue being how to handle utilities that increase faster than probabilities, relative to the length of the message received

Why even bother with a mugger?

H1 = X sentient creatures will be created and tortured unless you go to the nearest window and throw a quarter into the street.

It's a conceivable hypothesis. If you set the P(H1) >0, you're subject to the same mugging without a mugger as long as you "shut up and multiply". Who needs a mugger to be mugged? Mug yourself! There's almost nothing you can't accomplish by giving credence to arbitrary ideas!

H2 = X sentient creatures will be created and tortured unless you don't go to the nearest window and throw a quarter into the street.

Oh noes! What do I do now?

If you expose pidgeons to intermittent random reward, you can observe the birth of supersticion: the animal will try to correlate the most disparate things to the reward. I long thought this was the kind of mechanism underlying human supersticions, but now it seems more likely that it can also be a self counterfactual mugging: if I open an umbrella in the house, I'm inviting death inside (I don't know if this belief exists outside Italy). Inviting death has such a bigger disutility that even in the tiny probability that the supersticion is real you're better not opening an umbrella in your house.

I've always heard it said that it's "bad luck" to open an umbrella inside (and live in the US), though not specifically that it invites death inside.

I always assumed it originated as a tale to get children not to open umbrellas indoors.

Oh noes! What do I do now?

Unless you have some reason to believe H1 over H2 (or vice versa), stop wasting time pondering whether to throw quarters in the street. Introducing spurious infinities that just cancel out doesn't help you reason any.

Or, in other words, a belief is only useful if you have evidence to support it over a belief in the opposite direction.

So how does the problem look when the "mugger" "only" asks of you something that (as best you can determine?) doesn't cost you anything and doesn't benefit them either?

Attention is a scarce resource. Listenning to the "mugger" already costs me something.

True, but you don't know what the person (not yet known to be a mugger) is saying until you listen! And you can't quite avoid that cost unless you avoid all communication with strangers (though some consider that route reasonable).

I suppose, on further reflection, that the metaconfidence concept is simply a heuristic for how we can more accurately compute our probabilities. What I'm actually saying is that, when presented with very suspicious, contradictory, or weak evidence, then the probability that Bayes' theorem computes is not the real value: it is an approximation of a value that is very probably <= to the calculated value, and the probability distribution below the calculated value, on an asymptotic scale, grows wider exponentially and unboundedly as an inverse function of the size of the calculated probability. Put another way, I think there's a rational, justifiable reason to exponentially discount evidence at the small end of the probability spectrum.

So what's the membership test for "suspicious, contradictory, or weak evidence" and what update rule should be used for that kind of evidence if Bayes is biased when dealing with it?

Nice post. I think directionality of meta confidence is a consequence if markov's inequality.

Can you elaborate?

Markov's inequality says that, if 0 <= X <= 1, then

P[X >= a] <= E[X] / a

and that

P[X <= a] <= E[1-X] / (1-a)

These are directional statements, hence directionality of metaconfidence.

The two scenarios are more different than you present them. In one case, the probability is of in any 100 coin flips, all of them are heads. In the other, it's that of all possible worlds, you live in one where ponies fall from the sky. So the second is a probability of the type of universe we live in.

So, when you say

No matter how many times somebody pulls the pony trick, a rational agent is never going to get their hopes up.

It's not accurate. What it should be, is your confidence in pony man should be exactly as high as getting all heads in 100 flips the first time you make the coin flips, no matter how many times he makes that claim.

Isn't your whole argument merely a rediscovery of the mathematical concept of uncertainty? You admit at the outset that the priors for the magically falling pony must be bizzarely specific to render the exact probability of 1/2^100. In contrast, given a fair coin and a hundred coinflips, it's nothing at all bizzare about the fact that the probability that you'll get a straight run of heads is exactly 1/2^100. It's just math. The problem of course, is that the uncertainty as to whether magically falling ponies are possible isn't a question that is amenable to a quantitative study, given that the event so seldom repeat... Even so, my intuition is exactly opposite to yours: I know how bizzarely unlikely it would be to get a hundred heads in a row, so if that were to happen (under circumstances that ruled out foul play) I would be astonished. If a magically falling pony appeared I would be likewise astonished, but less so since my model of reality is more uncertain to me than my model of abstract math.

That said, the idea of "metaconfidence" reminds me somewhat of Yvain's post Confidence levels inside and outside an argument.

Edited to add: And the the concept of falling ponies reminds me of this.

Wait, what?

I don't see how your certainty in your model of abstract math is even relevant.

If I flip a coin a hundred times and it comes up heads each time, that does not in any way shake my confidence that the probability of 100 independent binary choices resulting in a specific bitstring is 1/2^100. What it does do is shake my confidence that flipping a coin a hundred times is a physical act that can be reliably modeled as 100 independent binary choices.

That second thing isn't a statement about abstract mathematics at all, it's a statement about the physical world... and one I'm significantly less confident of than I am in the continued nonexistence of magically falling ponies.

What sequences of heads and tails would NOT shake your confidence that flipping a coin is reliable enough to model independent binary choices? Any sequence with a prior probability of 1/2^100? 1 tails followed by 99 heads? 49H 1T 50h? Alternating heads and tails? 50heads first, followed by 50 tails? Alternating chains of heads and tails of equal length? Exactly 50 heads and 50 tails, regardless of order?

If you flip that coin 100 times, you will get a sequence with prior probability of roughly 8*10^-31.

What sequences of heads and tails would NOT shake your confidence that flipping a coin is reliable enough to model independent binary choices?

One that wasn't specifically identified ahead of time.

Fair enough. How many different sequences can be included in a given 'called set'? If I said "99 heads out of 100", then I'm identifying 100 different sequences.

In the end, though, I'm trying to set a ceiling: What's the most likely prediction I could make which would cause you to reevaluate the math behind odds? So far the lower limit is 1/2^100. Would you accept the call that at least 80% of the coin flips would be heads? My powers of telekinetic manipulation of coin flips are limited, you see, and both exhausting and unreliable.

It depends a lot.

For example, if I approached you and offered you a bet that you could not predict the flip of a series of coins, and you got it right three times in a row, that wouldn't particularly shake my confidence that each coin-flip could be modeled as an independent binary choice with equal chances on both sides.
OTOH if you approached me and offered me a bet that you could not predict the flip of a series of coins, and you got it right three times in a row, that would indeed shake my confidence.

But if I leave all the real-world stuff out of it, sure, a coin that comes up heads 80% of the time on, say, 100 flips would certainly make me suspicious.

Here's a bet then- Flip the coin nearest to you 100 times, and report the results. If you get 79 or fewer heads, then I will donate \$20US to the cause of your choice (which may be you, personally). If you get 80 or more heads, then consider the possibility that I have the ability to alter the results of coin flips in a way which is unexplained by modern physics.

Or maybe I'm willing to gamble \$20US on a very small chance (~half of six standard deviations, if I have the math right) that I can mindscrew you.

I used to date a girl who had a favorite card trick: she would hand you a deck of cards, ask you to pick one, and say "is it the three of clubs?"

Her theory was that she'd be wrong most of the time, but when she was right it would be really impressive.

It would be a lot more than \$50 more impressive (the few times it works) if she said "I bet you a dollar that it's the three of clubs."

I was also considering the 'cider in my ear' angle. Just because you don't see any possible way that I could rig the bet, the fact that I proposed it is an indication above baseline that I might have.

I instantly thought about that, too.

If I flip a coin 100 times and get all heads, there are many, many hypotheses that suddenly get a lot more plausible. Perhaps the coin is strongly biased. In fact, a weak prior of slight bias will, post-update, seem much more plausible than the fair coin. Perhaps it's a double headed coin, and in my inspection I failed to notice that. It seems vanishingly unlikely that I would miss that on repeated inspection... but I'm still more inclined to believe that, than a mathematically fair coin coming up heads every time. Perhaps I've unconsciously learned how to predictably flip a coin. Perhaps I've failed horribly at counting, and actually only flipped the coin 10 times. Perhaps any of the above are combined with my not noticing a tails or ten appear in the result string.

In other words, exceedingly unlikely events will make me doubt my sanity before they make me start doubting math.

Agreed with all of that. (Not sure if you think we disagree on any of it.)

I assumed you did. I just thought it worth explicitly adding to the discussion that considering only some extraordinarily weird ideas when discussing extraordinarily weird events is a form of bias that seems to run rampant in hypotheticals around here. It's not just the one aspect you mentioned where our confidence should be shaken by such a result.

Ah, gotcha... sure, agreed on all counts.

I may be wrong here.

What does an arbitrarily powerful entity need with(a starship) \$5? \$5 once won't break me but it sets a president you should never negotiate with terrorist as though they are playing a fair game you should have a probability of this being a lie scale faster than the claim. Including the possibility that they'll do it any way even if you relent or they will make the same demand over until you have nothing to pay them with.

"You believe, overwhelmingly, that the probability (in underlying reality, divorced from the map and its limitations) is zero. It is only grudgingly that you inch even a tiny morsel of probability into the other hypothesis (that the universe is structured in such a way as to make the probability non-zero)."

There is a definitive, non-zero probability that, when throwing a tennis ball at a wall, it will simply "quantum tunnel" through it and fall out the other side. There's a definitive, non-zero probability that random chance will leave all the air molecules in the room on the east side, suffocating anyone caught on the west.

It's worth noting that larger effects are exponentially less likely - just like getting heads once is 50%, but twice is only 25%, and three times is 12.5%.

Therefor, there's a minimum computable plausibility for pretty much any claim. Your prior should never be zero, and (if you are a strict rationalist AI) you should presumably never want it to be zero.

(Your further point about metaconfidence was actually covered by Givewell on the site previously. It's also worth noting that what I've said here doesn't prevent muggings at all, it just establishes that Pascal's Muggings has a definitive non-zero probability by our current understanding of the universe)

Don't update your priors on the basis of unreliable sources. If someone walks up to you and says "This is a fair coin, and the odds of it coming up heads the next 100 times in a row are 1/2^20, provided it is flipped 100 times in a row.", you don't update based on that, at all. That statement does not make a run of heads more or less likely.

Similarly, when someone makes a claim about magical falling ponies, I don't update my estimate of the odds of magical flying ponies unless their claim is credible.

As a check- I say that the (fair) coin nearest to you will turn up heads the next 100 times it is flipped, as of the time you read this. Do you think the odds of that happening are greater, less than, or exactly equal to 1/2^100? Did you update your estimate based on my claim? Why or why not?

(You could suggest that the odds of a particular post being by Omega, who has specific unexplainable knowledge about the future, are sigma, and so the odds are 1/2^100+sigma. Now compare that with the odds that a particular post is by Omega, and that he is lying. That makes the odds 1/2^100 + sigma - sigma; the odds provided that neither alternative is true, plus the odds that all-heads will be forced, plus the odds that not-all-heads will be forced. Which sigma is greater, or are they exactly equal? On what basis do you judge that?)

So, if you have a claim like the pony claim (or Pascal's mugging), in which you have a very low estimated probability, and a very low metaconfidence, should become dramatically less likely to actually happen, in the real world, than a case in which we have a low estimated probability, but a very high confidence in that probability.

So? Unless it's on the order of 1/3^^^3, it doesn't matter how unlikely it is, and while my metaconfidence may be low for the exact value (insomuch as that means anything), it's clearly more likely than that. The human genome only takes about eight megabytes. If you want 3^^^3 of them, you'll have to get more general than "human", although only if you're certain that killing the same one 3^^^3 times doesn't count. Even if you do so, there's no way a generic pattern for producing sapient entities takes that much information.

I suspect that, in practice, it eliminates many of the Pascal's-mugging-style problems we encounter currently.

When have you last encountered one?

Personally, I think the bigger problem is a slightly related paradox in that if you take decision A, it could kill 3^^^3 people, but if you take decision B, it could kill 3^^^^3 people, but then again A could kill 3^^^^^3 people etc., so you could never make a decision. This would come up on every decision you make.

So? Unless it's on the order of 1/3^^^3, it doesn't matter how unlikely it is, and while my metaconfidence may be low for the exact value (insomuch as that means anything), it's clearly more likely than that.

Actually, you're absolutely right. I don't think it's possible to resist Pascal's mugging by discounting probabilities at the edge. I thought, initially, that you could use busy-beaver to put an upper bound on the size of claim they could express, and simply discount, at the extreme end, according to 1/BB(message length). Busy beaver would be larger than any normal mathematical function you could express in the message. Then it occurred to me that the mugger has a trivial solution:

"If you don't give me five dollars, I'm going to create (the busy beaver function of the length of this message's bitstring factorial) people, and torture them to death."

Plus, busy beaver is uncomputable, so that's not exactly trivially implementable.

EDIT: I should point out that doing what I initially proposed would be mathematical nonsense with no justification. I was just checking to see if it was possible in the trivial case.

There's a post from GIveWell along these lines that you'll find very informative.