In this post, I would like to share a simple observation of mine which is about deriving some crude form of Occam's razor from a small set of assumptions.

Assumptions:

  1.  Subjective Bayesian probability is used to model an agent's degree of belief in hypotheses.
  2. The set of hypotheses  that we assign beliefs is countably infinite.
  3. Complexity of a hypothesis is represented with a positive integer where bigger integers represent more complex hypotheses.
  4. For any  there exists finitely many hypotheses in  that have the same complexity as .

When we accept these assumptions it is easy to prove the following statement.

For any  there are only finitely many hypotheses in  that are more complex and more probable or less complex and less probable than .

Proof.

The set of all hypotheses which are more probable than  is finite, since otherwise probabilities cannot sum to 1. The set of all hypotheses that are simpler than  is also finite by assumptions 3 and 4. Intersection of a finite set with another set is a finite set, and union of two finite sets is a finite set. 

 

Note that this result is independent from the choice of function that assigns beliefs to hypotheses. In addition, while it assumes some properties about the notion of simplicity, it does not require a full formalization of it.

 

Edit 1:

Assumption 3 can be thought of as two distinct assumptions combined together:

  1. The one-sided infinite nature of complexity, that is, there is always a more complex hypothesis while there exists "the simplest level".
  2. The finite gradation of complexity, that is, for any hypothesis , there exists a hypothesis  such that there is no hypothesis which is more complex than  and less complex than .

As a remark, the notion of Kolmogorov complexity (the length of the shortest computer program that generates an object) satisfies these properties.

 

Edit 2:

Here is an even simpler version of the blunt razor, that does not require any reference to beliefs. The assumptions are:

  1. Set of all hypotheses is countable, and at least one hypothesis is correct.
  2. The definition of simplicity, whatever it is, allows us to order hypotheses so that hypothesis is no more complex than the  hypothesis for all .

The proof then follows easily. By definition, every wrong hypothesis has to contradict with some observation (otherwise we consider it correct). Therefore, it is guaranteed that making enough observations and removing the so far detected wrong hypotheses from the list, one of the true hypotheses (or the true hypothesis if it is unique) will appear in the first position thereafter. In other words, it'll be among the simplest hypotheses that agree with the observations. The quantity of "enough" depends on the definition of simplicity.

Note that while the second assumption still assumes the one-sided infinite nature of simplicity, we don't really need the other assumptions we made earlier. Also note that simplicity is not a special ingredient in this algorithm, any arbitrary ordering will be fine.

What if we have not observed enough? Can we still argue that the razor will probably yield a good hypothesis? What needs to be assumed additionally, to prove a result like that?

New Comment