Interpretations of "probability"

So8res

(Written for Arbital in 2016.)

What does it mean to say that a flipped coin has a 50% probability of landing heads?

Historically, there are two popular types of answers to this question, the "frequentist" and "subjective" (aka "Bayesian") answers, which give rise to radically different approaches to experimental statistics. There is also a third "propensity" viewpoint which is largely discredited (assuming the coin is deterministic). Roughly, the three approaches answer the above question as follows:

The propensity interpretation: Some probabilities are just out there in the world. It's a brute fact about coins that they come up heads half the time. When we flip a coin, it has a fundamental propensity of 0.5 for the coin to show heads. When we say the coin has a 50% probability of being heads, we're talking directly about this propensity.
The frequentist interpretation: When we say the coin has a 50% probability of being heads after this flip, we mean that there's a class of events similar to this coin flip, and across that class, coins come up heads about half the time. That is, the frequency of the coin coming up heads is 50% inside the event class, which might be "all other times this particular coin has been tossed" or "all times that a similar coin has been tossed", and so on.
The subjective interpretation: Uncertainty is in the mind, not the environment. If I flip a coin and slap it against my wrist, it's already landed either heads or tails. The fact that I don't know whether it landed heads or tails is a fact about me, not a fact about the coin. The claim "I think this coin is heads with probability 50%" is an expression of my own ignorance, and 50% probability means that I'd bet at 1 : 1 odds (or better) that the coin came up heads.

For a visualization of the differences between these three viewpoints, see Correspondence visualizations for different interpretations of "probability". For examples of the difference, see Probability interpretations: Examples. See also the Stanford Encyclopedia of Philosophy article on interpretations of probability.

The propensity view is perhaps the most intuitive view, as for many people, it just feels like the coin is intrinsically random. However, this view is difficult to reconcile with the idea that once we've flipped the coin, it has already landed heads or tails. If the event in question is decided deterministically, the propensity view can be seen as an instance of the mind projection fallacy: When we mentally consider the coin flip, it feels 50% likely to be heads, so we find it very easy to imagine a world in which the coin is fundamentally 50%-heads-ish. But that feeling is actually a fact about us, not a fact about the coin; and the coin has no physical 0.5-heads-propensity hidden in there somewhere — it's just a coin.

The other two interpretations are both self-consistent, and give rise to pragmatically different statistical techniques, and there has been much debate as to which is preferable. The subjective interpretation is more generally applicable, as it allows one to assign probabilities (interpreted as betting odds) to one-off events.

Frequentism vs subjectivism

As an example of the difference between frequentism and subjectivism, consider the question: "What is the probability that Hillary Clinton will win the 2016 US presidential election?", as analyzed in the summer of 2016.

A stereotypical (straw) frequentist would say, "The 2016 presidential election only happens once. We can't observe a frequency with which Clinton wins presidential elections. So we can't do any statistics or assign any probabilities here."

A stereotypical subjectivist would say: "Well, prediction markets tend to be pretty well-calibrated about this sort of thing, in the sense that when prediction markets assign 20% probability to an event, it happens around 1 time in 5. And the prediction markets are currently betting on Hillary at about 3 : 1 odds. Thus, I'm comfortable saying she has about a 75% chance of winning. If someone offered me 20 : 1 odds against Clinton — they get $1 if she loses, I get $20 if she wins — then I'd take the bet. I suppose you could refuse to take that bet on the grounds that you Just Can't Talk About Probabilities of One-off Events, but then you'd be pointlessly passing up a really good bet."

A stereotypical (non-straw) frequentist would reply: "I'd take that bet too, of course. But my taking that bet is not based on rigorous epistemology, and we shouldn't allow that sort of thinking in experimental science and other important venues. You can do subjective reasoning about probabilities when making bets, but we should exclude subjective reasoning in our scientific journals, and that's what frequentist statistics is designed for. Your paper should not conclude "and therefore, having observed thus-and-such data about carbon dioxide levels, I'd personally bet at 9 : 1 odds that anthropogenic global warming is real," because you can't build scientific consensus on opinions."

...and then it starts getting complicated. The subjectivist responds "First of all, I agree you shouldn't put posterior odds into papers, and second of all, it's not like your method is truly objective — the choice of "similar events" is arbitrary, abusable, and has given rise to p-hacking and the replication crisis." The frequentists say "well your choice of prior is even more subjective, and I'd like to see you do better in an environment where peer pressure pushes people to abuse statistics and exaggerate their results," and then down the rabbit hole we go.

The subjectivist interpretation of probability is common among artificial intelligence researchers (who often design computer systems that manipulate subjective probability distributions), Wall Street traders (who need to be able to make bets even in relatively unique situations), and common intuition (where people feel like they can say there's a 30% chance of rain tomorrow without worrying about the fact that tomorrow only happens once). Nevertheless, the frequentist interpretation is commonly taught in introductory statistics classes, and is the gold standard for most scientific journals.

A common frequentist stance is that it is virtuous to have a large toolbox of statistical tools at your disposal. Subjectivist tools have their place in that toolbox, but they don't deserve any particular primacy (and they aren't generally accepted when it comes time to publish in a scientific journal).

An aggressive subjectivist stance is that frequentists have invented some interesting tools, and many of them are useful, but that refusing to consider subjective probabilities is toxic. Frequentist statistics were invented in a (failed) attempt to keep subjectivity out of science in a time before humanity really understood the laws of probability theory. Now we have theorems about how to manage subjective probabilities correctly, and how to factor personal beliefs out from the objective evidence provided by the data, and if you ignore these theorems you'll get in trouble. The frequentist interpretation is broken, and that's why science has p-hacking and a replication crisis even as all the wall-street traders and AI scientists use the Bayesian interpretation. This "let's compromise and agree that everyone's viewpoint is valid" thing is all well and good, but how much worse do things need to get before we say "oops" and start acknowledging the subjective probability interpretation across all fields of science?

The most common stance among scientists and researchers is much more agnostic, along the lines of "use whatever statistical techniques work best at the time, and use frequentist techniques when publishing in journals because that's what everyone's been doing for decades upon decades upon decades, and that's what everyone's expecting."

Which interpretation is most useful?

Probably the subjective interpretation, because it subsumes the propensity and frequentist interpretations as special cases, while being more flexible than both.

When the frequentist "similar event" class is clear, the subjectivist can take those frequencies (often called base rates in this context) into account. But unlike the frequentist, she can also combine those base rates with other evidence that she's seen, and assign probabilities to one-off events, and make money in prediction markets and/or stock markets (when she knows something that the market doesn't).

When the laws of physics actually do "contain uncertainty", such as when they say that there are multiple different observations you might make next with differing likelihoods (as the Schrodinger equation often will), a subjectivist can combine her propensity-style uncertainty with her personal uncertainty in order to generate her aggregate subjective probabilities. But unlike a propensity theorist, she's not forced to think that all uncertainty is physical uncertainty: She can act like a propensity theorist with respect to Schrodinger-equation-induced uncertainty, while still believing that her uncertainty about a coin that has already been flipped and slapped against her wrist is in her head, rather than in the coin.

This fully general stance is consistent with the belief that frequentist tools are useful for answering frequentist questions: The fact that you can personally assign probabilities to one-off events (and, e.g., evaluate how good a certain trade is on a prediction market or a stock market) does not mean that tools labeled "Bayesian" are always better than tools labeled "frequentist". Whatever interpretation of "probability" you use, you're encouraged to use whatever statistical tool works best for you at any given time, regardless of what "camp" the tool comes from. Don't let the fact that you think it's possible to assign probabilities to one-off events prevent you from using useful frequentist tools!

The idea that "probability" is some preexisting thing that needs to be "interpreted" as something always seemed a little bit backwards to me. Isn't it more straightforward to say:

Beliefs exist, and obey the Kolmogorov axioms (at least, "correct" beliefs do, as formalized by generalizations of logic (Cox's theorem), or by possible-world-counting). This is what we refer to as "bayesian probabilities", and code into AIs when we want to them to represent beliefs.
Measures over imaginary event classes / ensembles also obey the Kolmogorov axioms. "Frequentist probabilities" fall into this category.

Personally I mostly think about #1 because I'm interested in figuring out what I should believe, not about frequencies in arbitrary ensembles. But the fact is that both of these obey the same "probability" axioms, the Kolmogorov axioms. Denying one or the other because "probability" must be "interpreted" as exclusively either #1 or #2 is simply wrong (but that's what frequentists effectively do when they loudly shout that you "can't" apply probability to beliefs).

Now, sometimes you do need to interpret "probability" as something -- in the specific case where someone else makes an utterance containing the word "probability" and you want to figure out what they meant. But the answer there is probably that in many cases people don't even distinguish between #1 and #2, because they'll only commit to a specific number when there's a convenient instance of #2 that make #1 easy to calculate. For instance, saying 1/6 for a roll of a "fair" die.

People often act as though their utterances about probability refer to #1 though. For instance when they misinterpret p-values as the post-data probability of the null hypothesis and go around believing that the effect is real...

You might be interested in some work by Glenn Shafer and Vladimir Vovk about replacing measure theory with a game-theoretic approach. They have a website here, and I wrote a lay review of their first book on the subject here.

I have also just now discovered that a new book is due out in May, which presumably captures the last 18 years or so of research on the subject.

This isn't really a direct response to your post, except insofar as I feel broadly the same way about the Kolmogorov axioms as you do about interpreting their application to phenomena, and this is another way of getting at the same intuitions.

There's a Q&A with one of the authors here which explains a little about the purpose of the approach, mainly talks about the new book.

I clicked this because it seemed interesting, but reading the Q&A:

In atypical game we consider, one player offers bets, another decides how to bet, and a third decides the outcome of the bet. We often call the first player Forecaster, the second Skeptic, and the third Reality.

How is this any different from the classical Dutch Book argument, that unless you maintain beliefs as probabilities you will inevitably lose money?

It's just a different way of arriving at the same conclusions. The whole project is developing game-theoretic proofs for results in probability and finance.

The pitch is, rather than using a Dutch Book argument as a separate singular argument, they make those intuitions central as a mechanism of proof for all of probability (or at least the core of it, thus far).

The claim "I think this coin is heads with probability 50%" is an expression of my own ignorance, and 50% probability means that I'd bet at 1 : 1 odds (or better) that the coin came up heads.

Just a minor quibble - using this interpretation to define one's subjective probabilities is problematic because people are not necessarily indifferent about placing a bet that has an expected value of 0 (e.g. due to loss aversion).

Therefore, I think the following interpretation is more useful: Suppose I win [some reward] if the coin comes up heads. I'd prefer to replace the winning condition with "the ball in a roulette wheel ends up in a red slot" for any roulette wheel in which more than 50% of the slots are red.

(I think I first came across this type of definition in this post by Andrew Critch)

Frequentist statistics were invented in a (failed) attempt to keep subjectivity out of science in a time before humanity really understood the laws of probability theory

I'm a Bayesian, but do you have a source for this claim? It was my understanding that Frequentism was mostly promoted by Ron Fisher in the 20th century, well after the work of Bayes.

Synthesised from Wikipedia:

While the first cited frequentist work (the weak law of large numbers, 1713, Jacob Bernoulli, Frequentist probability) predates Bayes' work (edited by Price in 1763, Bayes' Theorem), it's not by much. Further, according to the article on "Frequentist Probability", "[Bernoulli] is also credited with some appreciation for subjective probability (prior to and without Bayes theorem)."

The ones that pushed frequentism in order to achieve objectivity were Fisher, Neyman and Pearson. From "Frequentist probability": "All valued objectivity, so the best interpretation of probability available to them was frequentist". Fisher did other nasty things, such as using the fact that causality is really hard to soundly establish to argue that tobacco was not proven to cause cancer. But nothing indicates that this was done out of not understanding the laws of probability theory.

AI scientists use the Bayesian interpretation

Sometimes yes, sometimes not. Even Bayesian AI scientists use frequentist statistics pretty often.

This post makes it sound like frequentism is useless and that is not true. The concepts of: a stochastic estimator for a quantity, and looking at whether it is biased, and its variance; were developed by frequentists to look at real world data. AI scientists use it to analyse algorithms like gradient descent, or approximate Bayesian inference schemes, but the tools are definitely useful.

The subjective interpretation: Uncertainty is in the mind, not the environment. If I flip a coin and slap it against my wrist, it’s already landed either heads or tails. The fact that I don’t know whether it landed heads or tails is a fact about me, not a fact about the coin. The claim “I think this coin is heads with probability 50%” is an expression of my own ignorance, and 50% probability means that I’d bet at 1 : 1 odds (or better) that the coin came up heads.

Hold on, you’re pulling a fast one here—you’ve substituted the question of “what is the probability that this coin which I have already flipped but haven’t looked at yet has already landed heads” for the question of “what is the probability that this coin which I am about to flip will land lands”!

It is obviously easy to see what the subjective interpretation means in the case of the former question—as you say, the coin is already heads or tails, no matter that I don’t know which it is. But it is not so easy to see how the subjective interpretation makes sense when applied to the latter question—and that is what people generally have difficulty with, when they have trouble accepting subjectivism.

Doesn't it mean the same thing in either case? Either way, I don't know which way the coin will land or has landed, and I have some odds at which I'll be willing to make a bet. I don't see the problem.

(Though my willingness to bet at all will generally go down over time in the "already flipped" case, due to the increasing possibility that whoever is offering the bet somehow looked at the coin in the intervening time.)

The difference is (to the naive view; I don’t necessarily endorse it) that in the case where the coin has landed, I do not know how it landed, but there’s a sense in which I could, in theory, know; there is, in any case, something to know; there is a fact of the matter about how the coin has landed, but I do not know that fact. So the “probability” of it having landed heads, or tails—the uncertainty—is, indeed, entirely in my mind.

But in the case where the coin has yet to be tossed, there is as yet no case of the matter about whether it’s heads or tails! I don’t know whether it’ll land heads or tails, but nor could I know; there’s nothing to know! (Or do you say the future is predetermined?—asks the naive interlocutor—Else how else may one talk about probability being merely “in the mind”, for something which has not happened yet?)

Whatever the answers to these questions may be, they are certainly not obvious or simple answers… and that is my objection to the OP: that it attempts to pass off a difficult and confusing conceptual question as a simple and obvious one, thereby failing to do justice to those who find it confusing or difficult.

the coin is already heads or tails, no matter that I don’t know which it is

it's worse than that. All you know that the coin has landed. You need further observations to learn more. Maybe it will slip from your hand and fall on the ground. Maybe you will be distracted with reading LW and forget to check. Maybe you don't remember which side to check, the wrist or the hand side. You can insist that the coin has already landed and therefore it has landed either heads or tails, but that is not a useful supposition until you actually look. Think just a little way back: the coin is about to land, but not quite yet. Is it the same as the coin has landed? Almost, but not quite. what about a little ways further back? The uncertainty about the outcome is even more. So, there is nothing special about the landed coin until you actually look, beyond a certain level of probabilities. A pragmatic approach (I refuse to wade into the ideological debate between militant frequentists and militant Bayesians) would be to use all available information to make the best prediction possible, depending on the question asked.

He never said "will land heads", though. He just said "a flipped coin has a chance of landing heads", which is not a timeful statement. EDIT: no longer confident that this is the case

Didn't the post already counter your second paragraph? The subjective interpretation can be a superset of the propensity interpretation.

Actually, the assignment of probability 1 to an event that has happened is also subjective. You don't know that it had to occur with complete inevitability, ie you don't know that it had a conditional probability of 1 relative to the preceding state of the universe. You are setting it to 1 because it is a given as far as you are concerned

The question is not “what is the probability that the coin would have landed heads”. The question is, “what is the probability that the coin has in fact landed heads”!

If you are interested in the objective probability of the coin flip,the it only has one value because it is only one event. In a deterministic universe the objective probability is 1, in a suitably indeterministic universe it is always 0.5.

If you think the questions "what will it be" and "what was it" are different, you are dealing with subjective probability, because the difference the passage of time makes is a difference in the information available to you, the subject.

Failing to distinguish objective and subjective probability leads to confusion. For instance, the sleeping beauty paradox is only a paradox if you expect all observers to calculate the same probability despite the different information available to them.

The subjectivist interpretation of probability is common [in] … common intuition (where people feel like they can say there’s a 30% chance of rain tomorrow without worrying about the fact that tomorrow only happens once)

Why do you say that “there’s a 30% chance of rain tomorrow” is an example of the subjective interpretation? Isn’t it just as readily interpreted as saying “on 30% of all days similar to this one [in meteorological conditions, etc.], it rains”?

Besides, “this coin flip that I am going to do right now” only happens once, too (any subsequent coin flips will be other, different, coin flips, and not that specific coin flip). Surely you don’t conclude from this that when someone says “this coin has a 50% chance of coming up heads”, it means they’re taking the subjectivist view of the coin’s behavior?

There are 0 other days "similar to" this one in Earth's history, if "similar to" is strict enough (e.g. the exact pattern of temperature over time, cloud patterns, etc). You'd need a precise, more permissive definition of "similar to" for the statement to be meaningful.

But the same is true of coin flips.

When you say "all days similar to this one", are you talking about all real days or all possible days? If it's "all possible days", then this seems like summing over the measures of all possible worlds compatible with both your experiences and the hypothesis, and dividing by the sum of the measures of all possible worlds compatible with your experiences. (Under this interpretation, jessicata's response doesn't make much sense; "similar to" means "observationally equivalent for observers with as much information as I have", and doesn't have a free variable.)

There are also two schools of bayesian thinking: "It is popular to divide Bayesians into two main categories, “objective” and “subjective” Bayesians. The divide is sometimes made formal, there are conferences labelled as one but not the other, for example.

A caricature of subjective Bayes is that all probabilities are just opinion, and the best we can do with an opinion is make sure it isn’t self contradictory, and satisfying the rules of probability is a way of ensuring that. A caricature of objective Bayes is that there exists a correct probability for every hypothesis given certain information, and that different people with the same information should make exactly the same probability judgments."

Must the frequentist refuse to assign probabilities to one-off events? Consider the question 'will it rain tomorrow'. The frequentist can define some abstract class of events, say the class of possible weathers. She can then assume that every day the actual weather is randomly sampled from this imaginary population. She can then look at some past weather records and calculate a random sample from this hypothetical population. Suppose there that in this large sample 30% of days were sunny; we can then say that approximately 30% of the hypothetical weathers in this population are sunny and hence the probability of drawing a sunny day tomorrow is approx 30%.

Obviously the answer she gets hinders on the model assumptions she specifies. She can, for instance, model the weather as some stationary, autoregressive process (then the actual weather is sampled from an abstract population of weather time series), run her regression, calculate the estimates and arrive at a completely different answer. That is still the case for Bayesians though, since they also have to specify their priors and models and their answers depend on how they do it. My point is only that the above line of thought would allow a frequentist to make statements about probabilities of one-offs.

It seems to me that this kind of philosophy is often employed in social science. When political scientists estimate the effect of democracy on GDP, what they are trying to find, statistically speaking, is the expected difference in GDP between a democratic and a non-democratic country drawn from their respective populations, all else equal. Those populations are not the *real world* populations of democratic and non-democratic countries, but some abstract populations which real-world countries are assumed to be drawn from. I have never seen this logic explicitly spilled out, but it seems to be implicitly assumed and is required for applying frequentist techniques to social science questions.

I don't think using likelihoods when publishing in journals is tractable.

Where did your priors come from? What if other scientists have different priors? Justifying the chosen prior seems difficult.
Where did your likelihood ratios come from? What if other scientists disagree.

P values may bave been a failed attempt at objectivity, but they're a better attempt than moving towards subjective probabilities (even though the latter is more correct).