For those who aren't familiar, Pascal's Mugging is a simple thought experiment that seems to demonstrate an intuitive flaw in naive expected utility maximization. In the classic version, someone walks up to you on the street, and says, 'Hi, I'm an entity outside your current model of the universe with essentially unlimited capabilities. If you don't give me five dollars, I'm going to use my powers to create 3^^^^3 people, and then torture them to death.' (For those not familiar with Knuth up-arrow notation, see here). The idea being that however small your probability is that the person is telling the truth, they can simply state a number that's grossly larger - and when you shut up and multiply, expected utility calculations say you should give them the five dollars, along with pretty much anything else they ask for.
Intuitively, this is nonsense. However, an AI under construction doesn't have a piece of code that lights up when exposed to nonsense. Not unless we program one in. And formalizing why, exactly, we shouldn't listen to the mugger is not as trivial as it sounds. The actual underlying problem has to do with how we handle arbitrarily small probabilities. There are a number of variations you could construct on the original problem that present the same paradoxical results. There are also a number of simple hacks you could undertake that produce the correct results in this particular case, but these are worrying (not to mention unsatisfying) for a number of reasons.
So, with the background out of the way, let's move on to a potential approach to solving the problem which occurred to me about fifteen minutes ago while I was lying in bed with a bad case of insomnia at about five in the morning. If it winds up being incoherent, I blame sleep deprivation. If not, I take full credit.
Let's take a look at a new thought experiment. Let's say someone comes up to you and tells you that they have magic powers, and will make a magic pony fall out of the sky. Let's say that, through some bizarrely specific priors, you decide that the probability that they're telling the truth (and, therefore, the probability that a magic pony is about to fall from the sky) is exactly 1/2^100. That's all well and good.
Now, let's say that later that day, someone comes up to you, and hands you a fair quarter and says that if you flip it one hundred times, the probability that you'll get a straight run of heads is 1/2^100. You agree with them, chat about math for a bit, and then leave with their quarter.
I propose that the probability value in the second case, while superficially identical to the probability value in the first case, represents a fundamentally different kind of claim about reality than the first case. In the first case, you believe, overwhelmingly, that a magic pony will not fall from the sky. You believe, overwhelmingly, that the probability (in underlying reality, divorced from the map and its limitations) is zero. It is only grudgingly that you inch even a tiny morsel of probability into the other hypothesis (that the universe is structured in such a way as to make the probability non-zero).
In the second case, you also believe, overwhelmingly, that you will not see the event in question (a run of heads). However, you don't believe that the probability is zero. You believe it's 1/2^100. You believe that, through only the lawful operation of the universe that actually exists, you could be surprised, even if it's not likely. You believe that if you ran the experiment in question enough times, you would probably, eventually, see a run of one hundred heads. This is not true for the first case. No matter how many times somebody pulls the pony trick, a rational agent is never going to get their hopes up.
I would like, at this point, to talk about the notion of metaconfidence. When we talk to the crazy pony man, and to the woman with the coin, what we leave with are two identical numerical probabilities. However, those numbers do not represent the sum total of the information at our disposal. In the two cases, we have differing levels of confidence in our levels of confidence. And, furthermore, this difference has an actual ramifications on what a rational agent should expect to observe. In other words, even from a very conservative perspective, metaconfidence intervals pay rent. By treating the two probabilities as identical, we are needlessly throwing away information. I'm honestly not sure if this topic has been discussed before. I am not up to date on the literature on the subject. If the subject has already been thoroughly discussed, I apologize for the waste of time.
Disclaimer aside, I'd like to propose that we push this a step further, and say that metaconfidence should play a role in how we calculate expected utility. If we have a very small probability of a large payoff (positive or negative), we should behave differently when metaconfidence is high than when it is low.
From a very superificial analysis, lying in bed, metaconfidence appears to be directional. A low metaconfidence, in the case of the pony claim, should not increase the probability that the probability of a pony dropping out of the sky is HIGHER than our initial estimate. It also works the other way as well: if we have a very high degree of confidence in some event (the sun rising tomorrow), and we get some very suspect evidence to the contrary (an ancient civilization predicting the end of the world tonight), and we update our probability downward slightly, our low metaconfidence should not make us believe that the sun is less likely to rise tomorrow than we thought. Low metaconfidence should move our effective probability estimate against the direction of the evidence that we have low confidence in: the pony is less likely, and the sunrise is more likely, than a naive probability estimate would suggest.
So, if you have a claim like the pony claim (or Pascal's mugging), in which you have a very low estimated probability, and a very low metaconfidence, should become dramatically less likely to actually happen, in the real world, than a case in which we have a low estimated probability, but a very high confidence in that probability. See the pony versus the coins. Rationally, we can only mathematically justify so low a confidence in the crazy pony man's claims. However, in the territory, you can add enough coins that the two probabilities are mathematically equal, and you are still more likely to get a run of heads than you are to have a pony magically drop out of the sky. I am proposing metaconfidence weighting as a way to get around this issue, and allow our map to more accurately reflect the underlying territory. It's not perfect, since metaconfidence is still, ultimately, calculated from our map of the territory, but it seems to me, based on my extremely brief analysis, that it is at least an improvement on the current model.
Essentially, this idea is based on the understanding that the numbers that we generate and call probability do not, in fact, correspond to the actual rules of the territory. They are approximations, and they are perturbed by observation, and our finite data set limits the resolution of the probability intervals we can draw. This causes systematic distortions at the extreme ends of the probability spectrum, and especially at the small end, where the scale of the distortion rises dramatically as a function of the actual probability. I believe that the apparently absurd behavior demonstrated by an expected-utility agent exposed to Pascal's mugging, is a result of these distortions. I am proposing we attempt to compensate by filling in the missing information at the extreme ends of the bell curve with data from our model about our sources of evidence, and about the underlying nature of the territory. In other words, this is simply a way to use our available evidence more efficiently, and I suspect that, in practice, it eliminates many of the Pascal's-mugging-style problems we encounter currently.
I apologize for not having worked the math out completely. I would like to reiterate that it is six thirty in the morning, and I've only been thinking about the subject for about a hundred minutes. That said, I'm not likely to get any sleep either way, so I thought I'd jot the idea down and see what you folks thought. Having outside eyes is very helpful, when you've just had a Brilliant New Idea.