Follow up to: A Proof of Occam's Razor
In my post on Occam’s Razor, I showed that a certain weak form of the Razor follows necessarily from standard mathematics and probability theory. Naturally, the Razor as used in practice is stronger and more concrete, and cannot be proven to be necessarily true. So rather than attempting to give a necessary proof, I pointed out that we learn by induction what concrete form the Razor should take.
But what justifies induction? Like the Razor, some aspects of it follow necessarily from standard probability theory, while other aspects do not.
Suppose we consider the statement S, “The sun will rise every day for the next 10,000 days,” assigning it a probability p, between 0 and 1. Then suppose we are given evidence E, namely that the sun rises tomorrow. What is our updated probability for S? According to Bayes’ theorem, our new probability will be:
P(S|E) = P(E|S)P(S)/P(E) = p/P(E), because given that the sun will rise every day for the next 10,000 days, it will certainly rise tomorrow. So our new probability is greater than p. So this seems to justify induction, showing it to work of necessity. But does it? In the same way we could argue that the probability that “every human being is less than 10 feet tall” must increase every time we see another human being less than 10 feet tall, since the probability of this evidence (“the next human being I see will be less than 10 feet tall”), given the hypothesis, is also 1. On the other hand, if we come upon a human being 9 feet 11 inches tall, our subjective probability that there is a 10 foot tall human being will increase, not decrease. So is there something wrong with the math here? Or with our intuitions?
In fact, the problem is neither with the math nor with the intuition. Given that every human being is less than 10 feet tall, the probability that “the next human being I see will be less than 10 feet tall” is indeed 1, but the probability that “there is a human being 9 feet 11 inches tall” is definitely not 1. So the math updates on a single aspect of our evidence, while our intuition is taking more of the evidence into account.
But this math seems to work because we are trying to induce a universal which includes the evidence. Suppose instead we try to go from one particular to another: I see a black crow today. Does it become more probable that a crow I see tomorrow will also be black? We know from the above reasoning that it becomes more probable that all crows are black, and one might suppose that it therefore follows that it is more probable that the next crow I see will be black. But this does not follow. The probability of “I see a black crow today”, given that “I see a black crow tomorrow,” is certainly not 1, and so the probability of seeing a black crow tomorrow, given that I see one today, may increase or decrease depending on our prior – no necessary conclusion can be drawn. Eliezer points this out in the article Where Recursive Justification Hits Bottom.
On the other hand, we would not want to draw a conclusion of that sort: even in practice we don’t always update in the same direction in such cases. If we know there is only one white marble in a bucket, and many black ones, then when we draw the white marble, we become very sure the next draw will not be white. Note however that this depends on knowing something about the contents of the bucket, namely that there is only one white marble. If we are completely ignorant about the contents of the bucket, then we form universal hypotheses about the contents based on the draws we have seen. And such hypotheses do indeed increase in probability when they are confirmed, as was shown above.