Today's post, 0 And 1 Are Not Probabilities was originally published on 10 January 2008. A summary (taken from the LW wiki):


In the ordinary way of writing probabilities, 0 and 1 both seem like entirely reachable quantities. But when you transform probabilities into odds ratios, or log-odds, you realize that in order to get a proposition to probability 1 would require an infinite amount of evidence.

Discuss the post here (rather than in the comments to the original post).

This post is part of the Rerunning the Sequences series, where we'll be going through Eliezer Yudkowsky's old posts in order so that people who are interested can (re-)read and discuss them. The previous post was Infinite Certainty, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.

Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day's sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.

New Comment
10 comments, sorted by Click to highlight new comments since: Today at 11:34 AM

Well, I've a problem with attributing a non-1 probability of the laws of probabilities. Not that I couldn't conceive them to be false - but that if they are false, any reasoning done on probabilities is wrong anyway.

Or said otherwise : P(A|A) = 1 is true by definition. And I claim that when you write P(A) and apply probability theorems on it, you're in fact manipulating P(A|the laws of probabilities). So P(an axiom of probability theory) is in fact P(an axiom of probability theory|the laws of probabilities) = 1.

For theorems, you can say that P(Bayes Theorem) is not 1 because even if the axioms of probability theory are true, we may be wrong in proving Bayes Theorem from it. But as soon as you actually use Bayes Theorem to obtain a P(A) then you obtain in fact a P(A|Bayes Theorem).

Successful use would count as evidence for the laws of probabilities providing "good" values right? So if we use these laws quite a bit and they always work, we might have P(Laws of Probability do what we think they do) = .99999 We could discount our output using this. We could also be more constructive and discount based on the complexity of the derivation using the principle "long proofs are less likely to be correct" in the following way: Each derivation can be done in terms of combinations of various sub-derivations so we could get probability bounds for new, longer derivations from our priors over other derivations from which it is assembled. (derivations being the general form of the computation rather than the value specific one).

ETA: Wait, were you sort of diagonalizing on Bayes Theorem because we need to use that to update P(Bayes Theorem)? If so I might have misread you.

I think this is kind of funny considering that the second axiom of probability states that an elementary event has probability one. It's just a simple way to define the system, like how the axioms of euclidean geometry are simpler if you have a point at infinity. It doesn't necessarily mean anything. I just find it kind of funny.

the probability that some elementary event in the entire sample space will occur is 1

I believe that a part of the post's point is that the entire sample space is hard to find in most real-life cases. From the post:

However, in the real world, when you roll a die, it doesn't literally have infinite certainty of coming up some number between 1 and 6. The die might land on its edge; or get struck by a meteor; or the Dark Lords of the Matrix might reach in and write "37" on one side.

EDIT: Another example, this time from the Martin Gardner's excellent book, Mathematical Games :

The hotel's cocktail lounge before the dinner hour was noisy with prestidigitators. At the bar I ran into my old friend "Bet a Nickel" Nick, a blackjack dealer from Las Vegas who likes to keep up with the latest in card magic. The nickname derives from his habit of perpetually making five-cent bets on peculiar propositions. Everybody knows his bets have "catches" to them, but who cares about a nickel? It was worth five cents just to find out what he was up to. "Any new bar bets, Nick?" I asked. "Particularly bets with probability angles?"

Nick slapped a dime on the counter beside his glass of beer. "If I hold this dime several inches above the top of the bar and drop it, chances are one-half it falls heads, one-half it falls tails, right ?"

"Right," I said.

"Betcha a nickel," said Nick, "it lands on its edge and stays there."

"O.K.," I said.

Nick dunked the dime in his beer, placed it against the side of his glass and let it go. It slid down the straight side, landed on its edge and stayed on its edge, held to the glass by the beer's adhesion. I handed Nick a nickel. Everybody laughed.

Nick tore a paper match out of a folder, marked one side of the match with a pencil. "If I drop this match, chances are fifty-fifty it falls marked side up, right?" I nodded. "Betcha a nickel," he went on, "that it falls on its edge, like the dime."

"It's a bet," I said.

Nick dropped the match. But before doing so, he bent it into the shape of a V. Of course it fell on its edge and I lost another nickel.

Jaynes didn't like Kolmogorov's axioms, and I expect Eliezer would agree. I remember he mentioned somewhere in the sequences that he thought probability could be axiomatized without reference to probabilities of 0 or 1, but it wouldn't have much practical use to do so.

Jaynes definitely believed in 0 and 1 probabilities. In Probability Theory: The Logic of Science, equation (2.71), he gives

P(B | A, (A implies B)) = 1

P(A | not B, (A implies B)) = 0

Remember that probabilities are relative to a state of information. If X is a state of information from which we can infer A via deductive logic, then P(A | X) = 1 necessarily. Some common cases of this are

  • A is a tautology,

  • we are doing some sort of case analysis and X represents one of the cases being considered, or

  • we are investigating the consequences of some hypothesis and X represents the hypothesis.

However, Eliezer's fundamental point is correct when we turn to the states of information of rational beings and propositions that are not tautologies or theorems. If a person's state of information is X, and P(A | X) = 1, then no amount of contrary evidence can dissuade that person of A. This does not sound like rational behavior, unless A is necessarily true (in the mathematical sense of being a tautology or theorem).

Jaynes definitely believed in 0 and 1 probabilities.

I did not say that he didn't. I said that he didn't like Kolmogorov's axioms. You can also derive Bayes' rule from Kolmogorov's axioms; that doesn't mean Jayes didn't believe in Bayes' rule.

I said that he didn't like Kolmogorov's axioms.

I don't know what one thing it means to not like axioms. So I'm not sure what you mean.

I meant that he didn't think they were the best way to describe probability. IIRC, he thought that they didn't make it clear why the structure they described is the right way to handle uncertainty. He also may have said that they allow you to talk about certain objects that don't really correspond to any epistemological concepts. You can find his criticism in one of the appendices to Probability Theory: the Logic of Science.


I think this idea is overrated by LWers. It's true that if you make an argument that P(A) = 1 then it does not follow that P(A) = 1 because you might be wrong. There is nothing really special about 1 here: it's also true that if you make an argument that P(A) = 2/3 then it does not follow that P(A) = 2/3 because you might be wrong. The only reason to even mention it is that it's a common special case: many arguments, in particular most mathematical proofs, do not involve probability, and so their output consists of P(A) = 1 or P(A) = 0; also, mathematical proofs tend to be correct with a very high probability, so P(A|proof of A) is very close to 1.

So does it follow that we should avoid probabilities of 0 and 1 in our reasoning? I don't think it does, and I think that doing so becomes more and more pointless as your arguments become more and more mathematically rigorous. The concept of 0 and 1 probabilities are just too useful to discard just because someone might get confused. Sure, if you're manually setting priors for your Bayesian AI, you should be aware that giving a prior of 0 or 1 for a statement means it will never update. But to how many of us is that relevant?

A similar idea is much better explained in Confidence Levels Inside and Outside an Argument. In my opinion, any part of this post that is not also covered there is not worth reading.