This article is the first in a sequence that will consider situations where probability estimates are not, by themselves, adequate to make rational decisions. This one introduces a "meta-probability" approach, borrowed from E. T. Jaynes, and uses it to analyze a gambling problem. This situation is one in which reasonably straightforward decision-theoretic methods suffice. Later articles introduce increasingly problematic cases.

## A surprising decision anomaly

Let’s say I’ve recruited you as a subject in my thought experiment. I show you three cubical plastic boxes, about eight inches on a side. There’s two green ones—identical as far as you can see—and a brown one. I explain that they are gambling machines: each has a faceplate with a slot that accepts a dollar coin, and an output slot that will return either two or zero dollars.

I unscrew the faceplates to show you the mechanisms inside. They are quite simple. When you put a coin in, a wheel spins. It has a hundred holes around the rim. Each can be blocked, or not, with a teeny rubber plug. When the wheel slows to a halt, a sensor checks the nearest hole, and dispenses either zero or two coins.

The brown box has 45 holes open, so it has probability p=0.45 of returning two coins. One green box has 90 holes open (p=0.9) and the other has none (p=0). I let you experiment with the boxes until you are satisfied these probabilities are accurate (or very nearly so).

Then, I screw the faceplates back on, and put all the boxes in a black cloth sack with an elastic closure. I squidge the sack around, to mix up the boxes inside, and you reach in and pull one out at random.

I give you a hundred one-dollar coins. You can put as many into the box as you like. You can keep as many coins as you don’t gamble, plus whatever comes out of the box.

If you pulled out the brown box, there’s a 45% chance of getting $2 back, and the expected value of putting a dollar in is $0.90. Rationally, you should keep the hundred coins I gave you, and not gamble.

If you pulled out a green box, there’s a 50% chance that it’s the one that pays two dollars 90% of the time, and a 50% chance that it’s the one that never pays out. So, overall, there’s a 45% chance of getting $2 back.

Still, rationally, you should put some coins in the box. If it pays out at least once, you should gamble all the coins I gave you, because you know that you got the 90% box, and you’ll nearly double your money.

If you get nothing out after a few tries, you’ve probably got the never-pay box, and you should hold onto the rest of your money. (Exercise for readers: how many no-payouts in a row should you accept before quitting?)

What’s interesting is that, when you have to decide whether or not to gamble your first coin, the probability is exactly the same in the two cases (p=0.45 of a $2 payout). However, the rational course of action is different. What’s up with that?

Here, a single probability value fails to capture everything you **know** about an uncertain event. And, it’s a case in which that failure matters.

Such limitations have been recognized almost since the beginning of probability theory. Dozens of solutions have been proposed. In the rest of this article, I’ll explore one. In subsequent articles, I’ll look at the problem more generally.

## Meta-probability

To think about the green box, we have to reason about *the probabilities of probabilities*. We could call this **meta-probability**, although that’s not a standard term. Let’s develop a method for it.

Pull a penny out of your pocket. If you flip it, what’s the probability it will come up heads? 0.5. Are you sure? Pretty darn sure.

What’s the probability that my local junior high school sportsball team will win its next game? I haven’t a ghost of a clue. I don’t know anything even about professional sportsball, and certainly nothing about “my” team. In a match between two teams, I’d have to say the probability is 0.5.

My girlfriend asked me today: “Do you think Raley’s will have dolmades?” Raley’s is our local supermarket. “I don’t know,” I said. “I guess it’s about 50/50.” But unlike sportsball, I know something about supermarkets. A fancy Whole Foods is very likely to have dolmades; a 7-11 almost certainly won’t; Raley’s is somewhere in between.

How can we model these three cases? One way is by assigning probabilities to each possible probability between 0 and 1. In the case of a coin flip, 0.5 is much more probable than any other probability:

We can’t be *absolutely sure* the probability is 0.5. In fact, it’s almost certainly not *exactly* that, because coins aren’t perfectly symmetrical. And, there’s a very small probability that you’ve been given a tricky penny that comes up tails only 10% of the time. So I’ve illustrated this with a tight Gaussian centered around 0.5.

In the sportsball case, I have no clue what the odds are. They might be anything between 0 to 1:

In the Raley’s case, I have *some* knowledge, and extremely high and extremely low probabilities seem unlikely. So the curve looks something like this:

Each of these curves averages to a probability of 0.5, but they express different degrees of confidence in that probability.

Now let’s consider the gambling machines in my thought experiment. The brown box has a curve like this:

Whereas, when you’ve chosen one of the two green boxes at random, the curve looks like this:

Both these curves give an average probability of 0.45. However, a rational decision theory has to distinguish between them. Your optimal strategy in the two cases is quite different.

With this framework, we can consider another box—a blue one. It has a fixed payout probability somewhere between 0 and 0.9. I put a random number of plugs in the holes in the spinning disk—leaving between 0 and 90 holes open. I used a noise diode to choose; but you don’t get to see what the odds are. Here the probability-of-probability curve looks rather like this:

This isn’t quite right, because 0.23 and 0.24 are much more likely than 0.235—the plot should look like a comb—but for strategy choice the difference doesn’t matter.

What *is* your optimal strategy in this case?

As with the green box, you ought to spend some coins gathering information about what the odds are. If your estimate of the probability is less than 0.5, when you get confident enough in that estimate, you should stop. If you’re confident enough that it’s more than 0.5, you should continue gambling.

If you enjoy this sort of thing, you might like to work out what the exact optimal algorithm is.

In the next article in this sequence, we’ll look at some more complicated and interesting cases.

## Further reading

The “meta-probability” approach I’ve taken here is the A_{p} distribution of E. T. Jaynes. I find it highly intuitive, but it seems to have had almost no influence or application in practice. We’ll see later that it has some problems, which might explain this.

The green and blue boxes are related to “multi-armed bandit problems.” A “one-armed bandit” is a casino slot machine, which has defined odds of payout. A multi-armed bandit is a hypothetical generalization with several arms, each of which may have different, unknown odds. In general, you ought to pull each arm several times, to gain information. The question is: what is the optimal algorithm for deciding which arms to pull how many times, given the payments you have received so far?

If you read the Wikipedia article and follow some links, you’ll find the concepts you need to find the optimal green and blue box strategies. But it might be more fun to try on your own first! The green box is simple. The blue box is harder, but the same general approach applies.

Wikipedia also has an accidental list of formal approaches for problems where ordinary probability theory fails. This is far from complete, but a good starting point for a browser tab explosion.

## Acknowledgements

Thanks to Rin’dzin Pamo, St. Rev., Matt_Simpson, Kaj_Sotala, and Vaniver for helpful comments on drafts. Of course, they may disagree with my analyses, and aren’t responsible for my mistakes!

Ordinary probability theory and expected utility are sufficient to handle this puzzle. You just have to calculate the expected utility of each strategy before choosing a strategy. In this puzzle a strategy is more complicated than simply putting some number of coins in the machine: it requires deciding what to do after each coin either succeeds or fails to succeed in releasing two coins.

In other words, a strategy is a choice of what you'll do at each point in the game tree - just like a strategy in chess.

We don't expect to do well at chess if we decide on a course of action that ignores our opponent's moves. Similarly, we shouldn't expect to do well in this probabilistic game if we only consider strategies that ignore what the machine does. If we consider

allstrategies, compute their expected utility based on the information we have, and choose the one that maximizes this, we'll do fine.I'm saying essentially the same thing Jeremy Salwen said.

The exposition of meta-probability is well done, and shows an interesting way of examining and evaluating scenarios. However, I would take issue with the first section of this article in which you establish single probability (expected utility) calculations as insufficient for the problem, and present meta-probability as the solution.

In particular, you say

I do not believe that this is a failure of applying a single probability to the situation, but merely calculating the probability wrongly, by ignoring future effects of your choice. I think this is most clearly illustrated by scaling the problem down to the case where you are handed a green box, and only two coins. In this simplified problem, we can clearly examine all possible strategies.

I think a much better approach is to assign models to the problem (e.g. "it's a box that has 100 holes, 45 open and 65 plugged, the machine picks one hole, you get 2 coins if the hole is open and nothing if it's plugged."), and then have a probability distribution over models. This is better because keeps probabilities assigned to facts about the world.

It's true that probabilities-of-probabilities are just an abstraction of this (when used correctly), but I've found that people get confused really fast if you ask them to think in terms of probabilities-of-probabilities. (See every confused discussion of "what's the standard deviation of the standard deviation?")

Suppose we're using Laplace's Rule of Succession on a coin. On the zeroth round before we have seen any evidence, we assign probability 0.5 to the first coinflip coming up heads. We also assign marginal probability 0.5 to the second flip coming up heads, the third flip coming up heads, and so on. What distinguishes the Laplace epistemic state from the 'certainty of a fair coin' epistemic state is that they represent different probability distributions over

sequencesof coinflips.Since some probability distributions over events are correlated, we must represent our states of knowledge by assigning probabilities to sequences or sets of events, and our states of knowledge cannot be represented by stating marginal probabilities for all events independently.

We could also try to summarize some features of such epistemic states by talking about the instability of estimates - the degree to which they are easily updated by knowledge of other events - though of course this will be a derived feature of the probability distribution, rather than an ontologically extra feature of probability.

I reject that this is a good reason for probability theorists to panic.

On the meta level I remark that... (read more)

Thanks for writing this up! I've been wanting to write something on the Ap distribution since April, but hadn't gotten around to it. I look forward to your forthcoming posts.

There aren't many citations of Jaynes on the Ap distribution, but model uncertainty gets discussed a lot, and is modeling the same kind of thing in a Bayesian way.

On the subject of applied rationality being a lot more than probability estimates, see also When Not to Use Probabilities, Explicit and tacit rationality, and... well, The Sequences.

On the Ap distribution and model uncertainty more generally, see also Model Stability in Intervention Assessment, Model Combination and Adjustment, Why We Can't Take Expected Value Estimates Literally, and The Optimizer's Curse and How to Beat It.

Why on earth should we expect that the long term expected value of all future consequences of a choice to be equal to the immediate payoffs? They are two different things. Learning is the most obvious example of when these can be expected to be different. In this case learning information and in other cases learning skills.

There's more than one event. If you assign a single probability to winning the first, third, and seventh times and failing the second, fourth, fifth, and sixth times given that you put in seven coins, etc. that captures everything you need to know and does not involve meta-probabilities.

More succinctly, the probability of winning on the second try given that you win on the first try is different depending on the color of the machine.

That's pretty trivial.

The expected payout of putting a coin into a brown box is 0.90.

The expected payout of putting a coin into a green box is 0.90

plus valuable information about what kind of a green box it is. It is a *different payout*.The term "metaprobability" strikes me as adding confusion. The two layers are

notthe same thing applied to itself, but are in factdifferent questions. "What fraction of the time does this box pay out?" is a different question from "Is this box going to pay out on the next coin?".Often it takes a lot of questions to fully describe a situation. Using the term "probability" for all of them hides the distinction.

The statement "probability estimates are not, by themselves, adequate to make rational decisions" could apparently have been replaced with the statement "my definition of the phrase 'probability estimates' is less inclusive than yours" - what you call a "meta-probability" I would have just called a "probability". In a world where both epistemic and aleatory uncertainty exist, your expectation of events in that world is going to look like a probability distribution over a space of probability distributions over outputs; this is still a probability distribution, just a much more expensive one to do approximate calculations with.

Of course it doesn't. Who ever said it does? Decisions are made on the basis of expected value, not probability. And your analysis of the first bet ignores the value of the information gained from it in executing your options for further play thereafter.

I think you're just fundamentally confusing the probability of a win on the first coin with the expected long run frequency of wins for the different boxes. Enti... (read more)

I don't like your use of the word "probability". Sometimes, you use it to describe subjective probabilities, but sometimes you use it to describe the frequency properties of putting a coin in a given box.

When you say,

"The brown box has 45 holes open, so it has probability p=0.45 of returning two coins."you are really saying that knowing that I have the brown box in front of me, and I put a coin in it, I would assign a 0.45 probability of that coin yielding 2 coins.And, as far as I know, the coin tosses are all independent: no amount ... (read more)So perhaps this is for the next post, but are these 'metaprobabilities' just regular hyperparameters?

Thanks for posting this! :D I'm curious to see where you go next.

It seems odd to me that the mode for the left mixture is to the right of 0. I would have put it at 0, and made that mixture twice as tall so the area underneath would still be the same.

I guess this is a joke. From wikipedia: "Originally considered by Allied scientists in World War II, it proved so intractable that, according to Peter Whittle, it was proposed the problem be dropped over Germany so that German scientists could also waste their time on it.[10]" (note that your wikipedia-link is broken)

See Judea Pearl's Probablilistic Reasoning in Intelligent Systems, section 7.3, for a discussion of "metaprobabilities" in the context of graphical models.

Although it's true that you could compute the correct decision by directly putting a distribution on all possible futures, the computational complexity of this strategy grows combinatorially as the scenario gets longer. This isn't a minor point; generalizing the brute force method gets you AIXI. That is why you need something like the A_p distribution or Pearl's "contingencies" to store evidence and reason efficiently.

I don't see how this differs from how anyone else ever handles this problem. I hope you explain the difference in this example, before going on to other examples.

I really liked the article. So allow me to miss the forest for a moment; I want to chop down this tree:

Let's solve the green box problem:

Try zero coins: EV: 100 coins.

Try one coin, give up if no payout: 45% of 180.2 + 55% of 99= c. 135.5 (I hope.)

(I think this is right, but welcome corrections; 90%x50%x178, +.2 for first coin winning (EV of that 2 not 1.8), + keeper coins. I definitely got this wrong the first time I wrote it out, so I'm less confident I got it right this time. Edit before posting: Not just once.)

Try two coins, give up if no payout:

45% o... (read more)

Your link to Ap is broken:( overall, this was really interesting and understandable. Thank you.

Then why use it instead of learning the standard terms and using those? This might sound like pedantic, but it matters because this kind of thing leads to proliferation of unnecessary jargon and sometimes reinventing the wheel.

Are we talking about conditional probability? Joint probability?

Also, a minor nitpick about your next-to-last figure: given what's said about the boxes, it's not two bell curves centered at 0 and 0.9. It should be a point mass (vertical line) at 0 and a bell curve centered at 0.9.

Agree with John Baez, Jeremy Salwen and others. Standard tools are enough to solve this problem. You don't need probabilities over probabilities, just probabilities over states of the world, and probabilities over what might happen in each state of the world.

Has anyone used meta-probabilities, or something similar, to analyze the Pascal Mugger problem?