I had been under the impression that Cox's theorem said something pretty strong about the consistent ways to represent uncertainty, relying on very plausible assumptions. However, I recently found this 1999 paper, which claims that Cox's result actually requires some stronger assumptions. I am curious what people here think of this. Has there been subsequent work which relaxes the stronger assumptions?

New Comment
10 comments, sorted by Click to highlight new comments since: Today at 4:42 PM

The wikipedia article on Cox's theorem mentions Halpern's 1999 paper and links to some subsequent work which seems to restore something like a status quo. But I haven't yet looked at any of the papers.

ETA: I've looked at the papers. I think I can recommend both the original 1999 paper by Halpern and this 2002 paper by Hardy.

To answer your title question, I would say that you shouldn't take the problems very seriously at all. Cox's theorem basically doesn't work for "small worlds" - i.e. models in which only a finite number of events exist. Cox's theorem does work if your model consists of a small world plus a fair coin which can be flipped an arbitrary number of times.

Somewhere in between those two points (small world and small world + coin), Cox's theorem switches from not working to working. Describing exactly where the switchover takes place may interest mathematicians, but it probably won't interest most Bayesians - or at least not Bayesians who are willing to carry coins in their pockets.

Interesting. Do they give a good intuition for why this change occurs?

The missing ingredient in a "small world" is roughly the continuity conditions that Jaynes calls "qualitative correspondence with common sense" in Chapter 2 of PT:TLoS. In terms of model theory, adding the coin means that the model now "has enough points".

Here is one way to think about it: One of the consequences of Cox's theorem is that

  • P(X) = 1 - P(~X)

Suppose you decided to graph P(X) against P(~X). But in a small world, there are only a finite number of events you can substitute-in for X. So your graph is just a finite set of colinear points - not a line. Many continuous functions can be made to fit those points. Add a coin to your world, and you can interpolate an event between any two events in your world. You get a dense infinity of points between 0 and 1. And that is all you need. Only a single unique function (y = 1 - x) can be fit to this data.

That was hand-waving, but I hope it helped.

It did help. I was expecting something like this. I still have to go look at the paper for some more clarification.

[-]Roko13y00

Failures of Cox's theorem are more likely to come from unstated implicit assumptions than from this kind of mathematical pedantry.

Unstated implicit assumptions in Cox's theorem? That's exactly what this was about.

[-]Roko13y30

Now that the assumption of an infinite set of events has been made explicit, I don't think it's a problem. I think that other subtle violations of the axioms might be a problem, e.g. likelihoods not always comparable, etc would be more of a problem.

I'd like to see an example of a nonbayesian probability function in a finite world btw.

OK, fair enough, I guess the value of this paper was making that assumption explicit. Halpern's 1999 paper (Perplexed links) constructs such an example.

[-]Roko13y00

And is it in any way interesting? Does it allow you to do great inference beyond the ken of Bayesianism? Or is it just some annoying corner-case?

I haven't spent time understanding the example, but Perplexed's explanation of the need for infinite event space suggests it's not very interesting.