474

LESSWRONG
LW

473
Personal Blog

14

How seriously should I take the supposed problems with Cox's theorem?

by jsalvatier
6th Dec 2010
1 min read
10

14

Personal Blog

14

How seriously should I take the supposed problems with Cox's theorem?
12Perplexed
1jsalvatier
8Perplexed
1jsalvatier
0Roko
1jsalvatier
3Roko
2jsalvatier
0Roko
0jsalvatier
New Comment
10 comments, sorted by
top scoring
Click to highlight new comments since: Today at 1:07 AM
[-]Perplexed15y120

The wikipedia article on Cox's theorem mentions Halpern's 1999 paper and links to some subsequent work which seems to restore something like a status quo. But I haven't yet looked at any of the papers.

ETA: I've looked at the papers. I think I can recommend both the original 1999 paper by Halpern and this 2002 paper by Hardy.

To answer your title question, I would say that you shouldn't take the problems very seriously at all. Cox's theorem basically doesn't work for "small worlds" - i.e. models in which only a finite number of events exist. Cox's theorem does work if your model consists of a small world plus a fair coin which can be flipped an arbitrary number of times.

Somewhere in between those two points (small world and small world + coin), Cox's theorem switches from not working to working. Describing exactly where the switchover takes place may interest mathematicians, but it probably won't interest most Bayesians - or at least not Bayesians who are willing to carry coins in their pockets.

Reply
[-]jsalvatier15y10

Interesting. Do they give a good intuition for why this change occurs?

Reply
[-]Perplexed15y80

The missing ingredient in a "small world" is roughly the continuity conditions that Jaynes calls "qualitative correspondence with common sense" in Chapter 2 of PT:TLoS. In terms of model theory, adding the coin means that the model now "has enough points".

Here is one way to think about it: One of the consequences of Cox's theorem is that

  • P(X) = 1 - P(~X)

Suppose you decided to graph P(X) against P(~X). But in a small world, there are only a finite number of events you can substitute-in for X. So your graph is just a finite set of colinear points - not a line. Many continuous functions can be made to fit those points. Add a coin to your world, and you can interpolate an event between any two events in your world. You get a dense infinity of points between 0 and 1. And that is all you need. Only a single unique function (y = 1 - x) can be fit to this data.

That was hand-waving, but I hope it helped.

Reply
[-]jsalvatier15y10

It did help. I was expecting something like this. I still have to go look at the paper for some more clarification.

Reply
[-]Roko15y00

Failures of Cox's theorem are more likely to come from unstated implicit assumptions than from this kind of mathematical pedantry.

Reply
[-]jsalvatier15y10

Unstated implicit assumptions in Cox's theorem? That's exactly what this was about.

Reply
[-]Roko15y30

Now that the assumption of an infinite set of events has been made explicit, I don't think it's a problem. I think that other subtle violations of the axioms might be a problem, e.g. likelihoods not always comparable, etc would be more of a problem.

I'd like to see an example of a nonbayesian probability function in a finite world btw.

Reply
[-]jsalvatier15y20

OK, fair enough, I guess the value of this paper was making that assumption explicit. Halpern's 1999 paper (Perplexed links) constructs such an example.

Reply
[-]Roko15y00

And is it in any way interesting? Does it allow you to do great inference beyond the ken of Bayesianism? Or is it just some annoying corner-case?

Reply
[-]jsalvatier15y00

I haven't spent time understanding the example, but Perplexed's explanation of the need for infinite event space suggests it's not very interesting.

Reply
Moderation Log
More from jsalvatier
View more
Curated and popular this week
10Comments

I had been under the impression that Cox's theorem said something pretty strong about the consistent ways to represent uncertainty, relying on very plausible assumptions. However, I recently found this 1999 paper, which claims that Cox's result actually requires some stronger assumptions. I am curious what people here think of this. Has there been subsequent work which relaxes the stronger assumptions?