# How do low level hypotheses constrain high level ones? The mystery of the disappearing diamond.

2 min read11 comments

# 17

Consider two friends, Alice and Bob, trying to figure out what happened to a diamond that disappeared from a museum. They do so in the form of a game that is kind of an approximation to Solomonoff induction: they will work together to come up with the smallest possible explanations that conform to the data, for some intuitive notion of smallness.

This helps to eliminate fake explanations; the hypothesis "a witch caused Henry to fall ill" can be simplified to "Henry fell ill". But "Sally touched Henry and Bob, and Sally is sick and Henry is sick and Bob is sick" is beaten by "Sally touched Henry and Bob and Sally is contagiously sick". An explanation is good if it is smaller than just hard-coding the answer.

Bob knows that there are four diamond thieves in the city, so he comes up with four hypotheses:

1. The diamond was stolen by thief number 1.
2. The diamond was stolen by thief number 2.
3. The diamond was stolen by thief number 3.
4. The diamond was stolen by thief number 4.

These are all roughly the same complexity (depending on how you encode numbers), so this provides a uniform distribution over the four thieves.

Alice comes up with one hypothesis:

1. The diamond spontaneously ceased existing.

and declares victory.

Bob: What, that makes no sense? Physical objects can't stop existing.
Alice: We aren't doing physics; we are playing a game.
Bob: ͠° ͟ʖ ͡°
Alice: ¯\_(ツ)_/¯

But there is an additional rule; you can add other data from the real world to the challenge. For example, for "Henry falling ill", you might get better hypotheses if you try to compress info about all the sick people in the village, so that a slightly more complex hypothesis that can explain all of them wins!

Bob: I hereby add all physics experiments to the data set!

Alice then comes up with the following hypothesis: all physical experiments are explained by the standard model of physics, except the diamond spontaneously ceased existing.

Bob: That bit about the diamond ceasing to exist is arbitrary!
Alice: Do you have a better hypothesis?
Bob: Uhm, idk. All physical experiments are explained by the standard model of physics, and the diamond was stolen by thief 3?
Alice: ͠° ͟ʖ ͡°

Bob's new hypothesis is bigger than Alice's new hypothesis.

And thus we run into the problem. Due to Alice's and Bob's bounded rationality, they can't determine which thief stole the diamond from the laws of physics alone. So the laws of physics at a low level don't help compress hypotheses at a high level, and thus can't constrain the smallest possible hypotheses they can consider.

How do you modify the game so that the spontaneously disappearing diamond hypothesis doesn't win?

New Comment
11 comments, sorted by Click to highlight new comments since:

If we are trying to approximate Solomonoff induction, only the complexity in the overall description of the universe counts directly, and a universe in which thief 3 stole the diamond isn't any more complex in terms of overall description than one in which the diamond stayed put. Instead, we account for the complexity of Bob's specific hypothesis in terms of ordinary probability, which accounts for the fact that there are more universes which are compatible with some theories than are compatible with other theories. E.g. in this particular case there will be some base rate for theft, for a locally prominent thief being involved, etc, and we can use that to penalize Bob's hypotheses instead. As part of that calculation, the fact that there are 4 thieves applies a factor of four penalty (2 bits) to any particular thief.

Regarding Alice's hypotheses, I think the "the diamond spontaneously disappeared" hypothesis is actually a much larger hypothesis (in terms of bits) than you are giving it credit for. If you don't gerrymander your descriptions to make this smaller, then the same number of bits should describe any other comparable object disappearing. Also, your bits need to specify the time of disappearance as well up to the observed precision, so the number of bits should be (ignoring additional details such as the precise manner of disappearance) around log2((number of comparable objects in universe)*(age of the universe)/(observed time window of disappearance)), which should I think be pretty decent in size.

Now, this may not be a particularly satisfying answer since I am only addressing your particular example, and not the general question of "how do low level hypotheses constrain high level ones?" AFAIK assessing how compatible any given high level hypothesis is with simple low level physics might in general be a complex issue.

Then, doesn't the theft hypothesis also need to account for specific timeframe where the diamond was stolen?

Yes, it would.

(in writing the original comment, I actually wrote the second paragraph first then re-ordered them, which may have effected the consistency. I do think however it would be easy to forget to take this into account in calculating bit's for Alice's calculation while automatically taking it into acccount (via base rate which includes amount of thefts per time) in Bob's calculation.)

[-]nim20

they can't determine which thief stole the diamond from the laws of physics alone

"thieves 1, 2, and 4 were all observed by credible sources to be in a jail at the other side of the city at the time the diamond went missing, and the laws of physics say that none of them can have been in two places at once" has a similar appeal-to-physics shape to it ;)

Or even worse is when you get into less clear science, like biological research that you aren't certain of. Then you get uncertainty on multiple levels.

I kinda don't understand setup, because Standard Model implies conservation of energy, which implies strict zero probability of hypothesis "SM and diamond ceased to exist". Yes, because you are not Solomonoff induction and probably not physicist, you can't deduce conservation of energy from lagrangian of SM alone, but I expect someone who can write it from head to know about conservation of energy.

This seems very interesting but I'm having trouble understanding something. Can you specify what is meant by:

An explanation is good if it is smaller than just hard-coding the answer.

What does 'just hard-coding the answer' mean and look like?

Let's say that you are trying to model the data 3,1,4,1,5,9

The hypothesis "The data is 3,1,4,1,5,9" would be hard-coding the answer. It is better than the hypothesis "a witch wrote down the data, which was 3,1,4,1,5,9". (This example is just ruled out by Occam's razor, but more generally we want our explanations to be less data than the data itself, lest it just sneak in a clever encoding of the data.)

Thanks, that makes sense! And to be clear, would an 'explanation' be a program which could generate the data 3,1,4,1,5,9? And a good explanation would be one which took up fewer bits of information than just the list 3,1,4,1,5,9?

Yes! In fact, ideally it would be computer programs; the game is based on Solomonoff induction, which is algorithms in a fixed programming language. In this post I'm exploring the idea of using informal human language instead of programming languages, but explanations should be thought of as informal programs.

I see, thanks for taking the time to explain!