This article is the first in a sequence that will consider situations where probability estimates are not, by themselves, adequate to make rational decisions. This one introduces a "meta-probability" approach, borrowed from E. T. Jaynes, and uses it to analyze a gambling problem. This situation is one in which reasonably straightforward decision-theoretic methods suffice. Later articles introduce increasingly problematic cases.

## A surprising decision anomaly

Let’s say I’ve recruited you as a subject in my thought experiment. I show you three cubical plastic boxes, about eight inches on a side. There’s two green ones—identical as far as you can see—and a brown one. I explain that they are gambling machines: each has a faceplate with a slot that accepts a dollar coin, and an output slot that will return either two or zero dollars.

I unscrew the faceplates to show you the mechanisms inside. They are quite simple. When you put a coin in, a wheel spins. It has a hundred holes around the rim. Each can be blocked, or not, with a teeny rubber plug. When the wheel slows to a halt, a sensor checks the nearest hole, and dispenses either zero or two coins.

The brown box has 45 holes open, so it has probability p=0.45 of returning two coins. One green box has 90 holes open (p=0.9) and the other has none (p=0). I let you experiment with the boxes until you are satisfied these probabilities are accurate (or very nearly so).

Then, I screw the faceplates back on, and put all the boxes in a black cloth sack with an elastic closure. I squidge the sack around, to mix up the boxes inside, and you reach in and pull one out at random.

I give you a hundred one-dollar coins. You can put as many into the box as you like. You can keep as many coins as you don’t gamble, plus whatever comes out of the box.

If you pulled out the brown box, there’s a 45% chance of getting $2 back, and the expected value of putting a dollar in is $0.90. Rationally, you should keep the hundred coins I gave you, and not gamble.

If you pulled out a green box, there’s a 50% chance that it’s the one that pays two dollars 90% of the time, and a 50% chance that it’s the one that never pays out. So, overall, there’s a 45% chance of getting $2 back.

Still, rationally, you should put some coins in the box. If it pays out at least once, you should gamble all the coins I gave you, because you know that you got the 90% box, and you’ll nearly double your money.

If you get nothing out after a few tries, you’ve probably got the never-pay box, and you should hold onto the rest of your money. (Exercise for readers: how many no-payouts in a row should you accept before quitting?)

What’s interesting is that, when you have to decide whether or not to gamble your first coin, the probability is exactly the same in the two cases (p=0.45 of a $2 payout). However, the rational course of action is different. What’s up with that?

Here, a single probability value fails to capture everything you **know** about an uncertain event. And, it’s a case in which that failure matters.

Such limitations have been recognized almost since the beginning of probability theory. Dozens of solutions have been proposed. In the rest of this article, I’ll explore one. In subsequent articles, I’ll look at the problem more generally.

## Meta-probability

To think about the green box, we have to reason about *the probabilities of probabilities*. We could call this **meta-probability**, although that’s not a standard term. Let’s develop a method for it.

Pull a penny out of your pocket. If you flip it, what’s the probability it will come up heads? 0.5. Are you sure? Pretty darn sure.

What’s the probability that my local junior high school sportsball team will win its next game? I haven’t a ghost of a clue. I don’t know anything even about professional sportsball, and certainly nothing about “my” team. In a match between two teams, I’d have to say the probability is 0.5.

My girlfriend asked me today: “Do you think Raley’s will have dolmades?” Raley’s is our local supermarket. “I don’t know,” I said. “I guess it’s about 50/50.” But unlike sportsball, I know something about supermarkets. A fancy Whole Foods is very likely to have dolmades; a 7-11 almost certainly won’t; Raley’s is somewhere in between.

How can we model these three cases? One way is by assigning probabilities to each possible probability between 0 and 1. In the case of a coin flip, 0.5 is much more probable than any other probability:

We can’t be *absolutely sure* the probability is 0.5. In fact, it’s almost certainly not *exactly* that, because coins aren’t perfectly symmetrical. And, there’s a very small probability that you’ve been given a tricky penny that comes up tails only 10% of the time. So I’ve illustrated this with a tight Gaussian centered around 0.5.

In the sportsball case, I have no clue what the odds are. They might be anything between 0 to 1:

In the Raley’s case, I have *some* knowledge, and extremely high and extremely low probabilities seem unlikely. So the curve looks something like this:

Each of these curves averages to a probability of 0.5, but they express different degrees of confidence in that probability.

Now let’s consider the gambling machines in my thought experiment. The brown box has a curve like this:

Whereas, when you’ve chosen one of the two green boxes at random, the curve looks like this:

Both these curves give an average probability of 0.45. However, a rational decision theory has to distinguish between them. Your optimal strategy in the two cases is quite different.

With this framework, we can consider another box—a blue one. It has a fixed payout probability somewhere between 0 and 0.9. I put a random number of plugs in the holes in the spinning disk—leaving between 0 and 90 holes open. I used a noise diode to choose; but you don’t get to see what the odds are. Here the probability-of-probability curve looks rather like this:

This isn’t quite right, because 0.23 and 0.24 are much more likely than 0.235—the plot should look like a comb—but for strategy choice the difference doesn’t matter.

What *is* your optimal strategy in this case?

As with the green box, you ought to spend some coins gathering information about what the odds are. If your estimate of the probability is less than 0.5, when you get confident enough in that estimate, you should stop. If you’re confident enough that it’s more than 0.5, you should continue gambling.

If you enjoy this sort of thing, you might like to work out what the exact optimal algorithm is.

In the next article in this sequence, we’ll look at some more complicated and interesting cases.

## Further reading

The “meta-probability” approach I’ve taken here is the A_{p} distribution of E. T. Jaynes. I find it highly intuitive, but it seems to have had almost no influence or application in practice. We’ll see later that it has some problems, which might explain this.

The green and blue boxes are related to “multi-armed bandit problems.” A “one-armed bandit” is a casino slot machine, which has defined odds of payout. A multi-armed bandit is a hypothetical generalization with several arms, each of which may have different, unknown odds. In general, you ought to pull each arm several times, to gain information. The question is: what is the optimal algorithm for deciding which arms to pull how many times, given the payments you have received so far?

If you read the Wikipedia article and follow some links, you’ll find the concepts you need to find the optimal green and blue box strategies. But it might be more fun to try on your own first! The green box is simple. The blue box is harder, but the same general approach applies.

Wikipedia also has an accidental list of formal approaches for problems where ordinary probability theory fails. This is far from complete, but a good starting point for a browser tab explosion.

## Acknowledgements

Thanks to Rin’dzin Pamo, St. Rev., Matt_Simpson, Kaj_Sotala, and Vaniver for helpful comments on drafts. Of course, they may disagree with my analyses, and aren’t responsible for my mistakes!

It may be helpful to read some related posts (linked by lukeprog in a comment on this post): Estimate stability, and Model Stability in Intervention Assessment, which comments on Why We Can't Take Expected Value Estimates Literally (Even When They're Unbiased). The first of those motivates the A_p (meta-probability) approach, the second uses it, and the third explains intuitively why it's important in practice.