In this post, I would like to share a simple observation of mine which is about deriving some crude form of Occam's razor from a small set of assumptions.
Assumptions:
When we accept these assumptions it is easy to prove the following statement.
For any there are only finitely many hypotheses in that are more complex and more probable or less complex and less probable than .
Proof.
The set of all hypotheses which are more probable than is finite, since otherwise probabilities cannot sum to 1. The set of all hypotheses that are simpler than is also finite by assumptions 3 and 4. Intersection of a finite set with another set is a finite set, and union of two finite sets is a finite set.
Note that this result is independent from the choice of function that assigns beliefs to hypotheses. In addition, while it assumes some properties about the notion of simplicity, it does not require a full formalization of it.
Edit 1:
Assumption 3 can be thought of as two distinct assumptions combined together:
As a remark, the notion of Kolmogorov complexity (the length of the shortest computer program that generates an object) satisfies these properties.
Edit 2:
Here is an even simpler version of the blunt razor, that does not require any reference to beliefs. The assumptions are:
The proof then follows easily. By definition, every wrong hypothesis has to contradict with some observation (otherwise we consider it correct). Therefore, it is guaranteed that making enough observations and removing the so far detected wrong hypotheses from the list, one of the true hypotheses (or the true hypothesis if it is unique) will appear in the first position thereafter. In other words, it'll be among the simplest hypotheses that agree with the observations. The quantity of "enough" depends on the definition of simplicity.
Note that while the second assumption still assumes the one-sided infinite nature of simplicity, we don't really need the other assumptions we made earlier. Also note that simplicity is not a special ingredient in this algorithm, any arbitrary ordering will be fine.
What if we have not observed enough? Can we still argue that the razor will probably yield a good hypothesis? What needs to be assumed additionally, to prove a result like that?