In Defense of Objective Bayesianism by Jon Williamson was mentioned recently in a post by lukeprog as the sort of book that should be being read by people on Less Wrong. Now, I have been reading it, and found some of it quite bizarre. This point in particular seems obviously false. If it’s just me, I’ll be glad to be enlightened as to what was meant. If collectively we don’t understand, that’d be pretty strong evidence that we should read more academic Bayesian stuff.
Williamson advocates use of the Maximum Entropy Principle. In short, you should take account of the limits placed on your probability by the empirical evidence, and then choose a probability distribution closest to uniform that satisfies those constraints.
So, if asked to assign a probability to an arbitrary A, you’d say p = 0.5. But if you were given evidence in the form of some constraints on p, say that p ≥ 0.8, you’d set p = 0.8, as that was the new entropy-maximising level. Constraints are restricted to Affine constraints. I found this somewhat counter-intuitive already, but I do follow what he means.
But now for the confusing bit. I quote directly;
“Suppose A is ‘Peterson is a Swede’, B is ‘Peterson is a Norwegian’, C is ‘Peterson is a Scandinavian’, and ε is ‘80% of all Scandinavians are Swedes’. Initially, the agent sets P(A) = 0.2, P(B) = 0.8, P(C) = 1 P(ε) = 0.2, P(A & ε) = P(B & ε) = 0.1. All these degrees of belief satisfy the norms of subjectivism. Updating by maxent on learning ε, the agent believes Peterson is a Swede to degree 0.8, which seems quite right. On the other hand, updating by conditionalizing on ε leads to a degree of belief of 0.5 that Peterson is a Swede, which is quite wrong. Thus, we see that maxent is to be preferred to conditionalization in this kind of example because the conditionalization update does not satisfy the new constraints X’, while the maxent update does.”
p80, 2010 edition. Note that this example is actually from Bacchus et al (1990), but Williamson quotes approvingly.
His calculation for the Bayesian update is correct; you do get 0.5. What’s more, this seems to be intuitively the right answer; the update has caused you to ‘zoom in’ on the probability mass assigned to ε, while maintaining relative proportions inside it.
As far as I can see, you get 0.8 only if we assume that Peterson is a randomly chosen Scandinavian. But if that were true, the prior given is bizarre. If he was a randomly chosen individual, the prior should have been something like P(A & ε) = 0.16 P(B & ε) = 0.04 The only way I can make sense of the prior is if constraints simply “don’t apply” until they have p=1.
Can anyone explain the reasoning behind a posterior probability of 0.8?