Description complexity: an apology and note on terminology

by cousin_it2 min read10th Nov 201014 comments


Personal Blog

This post is an amendment and sort of apology for my LW posts dealing with complexity (1, 2). In these posts I got the terms all wrong. Now I'll try to set it right. Many thanks to taw, JGWeissman and Daniel_Burfoot.

The Solomonoff prior works like this: every bit string is assigned a real number that's equal to the probability of a "random program" outputting that string and then halting, when the "probability" of choosing each program is assumed to be inverse-exponential in its length. (Alternatively, you talk about a "universal prefix Turing machine" that consumes fair coin flips on the input tape.)

In this setup, the words "event", "hypothesis", "probability", "complexity" etc. are all ambiguous. For example, the word "hypothesis" can mean a) an individual program, b) an equivalence class of programs that output the same bit string, or c) a statement about output bit strings, like "the third bit will be 0". The word "event" has exactly the same problem.

Now the trouble is, you can give reasonable-sounding definitions of "probability" and "Kolmogorov complexity" to objects of all three types: (a), (b), and (c). But it's not at all clear what real-world implications these values should have. Does it make sense to talk about prior probability for objects of type (a), given that we can never distinguish two (a)'s that are part of the same (b)? (This is the mistake in saying that MWI has a higher prior probability than collapse interpretations.) Is it useful to talk about K-complexity for objects of type (c), given that a very long statement of type (c) can still have probability close to 1? (This is the mistake in saying Islam must be improbable because the Qur'an is such a thick book.) For that matter, is it at all useful to talk about K-complexity for objects of type (b) which are after all just bit strings, or should we restrict ourselves to talking about their prior probabilities for some reason?

At the moment it seems uncontroversial that we can talk about K-complexity for objects of type (a) - let's call them "programs" - and talk about prior probability for objects of types (b) and (c) - let's call them "sequences of observations" and "statements about the world", respectively. Curiously, this means there's no object for which we have defined both "complexity" and "prior probability", which makes all arguments of the form "complexity is high => probability is low" automatically suspect.

There's another worrying point. Given that "programs" get glued into equivalence classes - "sequences of observations" - anyway, it seems the K-complexity of an individual program is a completely inconsequential variable. You're better off just talking about the length. And the other two kinds of objects don't seem to have useful notions of K-complexity (according to my esteemed commentators who I thanked above), so we can adopt the general rule that mentioning K-complexity in a discussion of physics is always a sign of confusion :-)

What do you say? Is this better? I'm honestly trying to get better!


15 comments, sorted by Highlighting new comments since Today at 9:58 PM
New Comment

Curiously, this means there's no object for which we have defined both "complexity" and "prior probability", which makes all arguments of the form "complexity is high => probability is low" automatically suspect.

This is an interesting pattern to keep in mind.

The above is not as strong as it seems. It can be shown that on any given Turing machine, the prior probability of an equivalence class of programs is at most an O(1) factor greater than the part of that probability due to just the simplest program in the class.

Arguments can still be suspect if you don't know which program is the simplest. However, it is often possible to show that a program is probably the simplest; since there are many more complex programs than simple ones, a randomly drawn complex hypothesis is extremely unlikely to have a shorter equivalent.

Suppose I actually have a machine that chooses a program with probabilities dependant on length as described by Solomonoff Induction, and then runs the selected program and displays its output, but it does not say what program it used. Do you think that just because you will never learn what program it used, that it doesn't make sense to talk about the probability that it used a particular program?

That's a subtle point. Do you think it makes sense to ask whether reality uses Lagrangian of Hamiltonian mechanics "in truth"? Or, to take an example from the Feynman Lectures: Maxwell's equations look very nice when expressed in terms of the four-potential. Does that mean we should believe in its "physical reality" over the "reality" of the electric and magnetic vector fields?

Maxwell's equations look very nice when expressed in terms of the four-potential. Does that mean we should believe in its "physical reality" over the "reality" of the electric and magnetic vector fields?

If you consider which concepts are used in Quantum Mechanics, it seems right to favor a formulation of Classical Electromagnetism that focuses on potentials rather than fields.

One reason I'm wary of such questions is on John Baez's crackpot index:

17. 10 points for arguing that while a current well-established theory predicts phenomena correctly, it doesn't explain "why" they occur, or fails to provide a "mechanism".

I understand being wary of the question, but imagine a Bayesian rationalist learning physics. First he learns about Classical Electromagnetism using fields, and he assigns this theory a prior probability based on the length of a program implementing this, and updates to a posterior probability based on the experimental evidence available. He then learns about the formulation based on potentials, which he finds makes all the same predictions, but has a shorter program length, so he adds some number of bits to his log-odds for the equivalence class, and he thinks that it is meaningless to ask whether reality uses potentials or fields, because no experiment could distinguish between them. Then he learns about Quantum Mechanics, which indicates that his classical understanding of potentials is an approximation of how they really work, that works on large scale, and that on the small scale where the approximation does not hold, fields are not a useful concept.

Should he be surprised to learn, having thought that experiment could not distinguish the two formulations, that there is an observable underlying reality in which uses potentials and not fields, and it is only in the domains he has previously encountered that fields make sense?

And then he learns Quantum Field Theory, and the shit really hits the fan.

Note that while it is on the Crackpot index it is one of the weaker parts there. Many actually good scientists try to search for explanatory mechanisms and get annoyed when theories lack explanatory mechanisms.

I don't know about potentials vs fields, but this seems backwards if you mean it to apply to MWI. That interpretation tries to say that the quantum math describes reality and we need assume no further mechanism.

Beyond that, it seems to me that we don't care about Occam's Razor due to William of Occam's philosophical fapping, nor due to someone proposing the ideal of Solomonoff Induction. We care because Isaac Newton claimed to use a version of the Razor (applied to causes) in making his practical discoveries. Unlike those other formulations, Newton's four Rules for Natural Philosophy inspired the rest of modern science. And it looks to me like Einstein made his most famous discoveries by interpreting Rule 2 more strictly than Newton himself did.

I mention this because Newton wrote these rules specifically to reduce the appeal of taking his mathematical theory of gravity and adding Cartesian or Aristotelian "mechanisms", without adding predictions. The analogy with MWI seems inexact (if MWI still has a mathematical hole) but precise enough to encompass your objection. Newton made one possibly-new physical observation that I can recall, and it related to one of his laws of motion rather than his actual law of gravitation. He justified the latter part of his theory solely on the grounds of simplicity, since it allowed the derivation of pre-existing rules drawn from pre-existing observations. (I think the 'good' kind of mechanism, the kind JoshuaZ mentions, also brings together previously separate postulates or allows us to make novel predictions.) You'll notice that Rule 4, the one that says we need new evidence to justify "contrary hypotheses", technically doesn't say the preferred theory has to come first (at least not in this English version). The author was no fool. He just said we should take "propositions inferred by general induction from phenomena as accurately or very nearly true," which means that either induction includes Newtonian simplicity, or effective science has to judge theories by some criteria beyond truth, or this method contradicts itself.

If Solomonoff Induction doesn't at least give us Newton's Rules in Newton's case, then Solomonoff Induction fails at its job.

Let's see if I've got this straight. An invocation of Occam's razor would look like:

"MWI predicts one sequence of observations and Copenhagen predicts another. The programs that make the same predictions as the Copenhagen interpretation are generally so much longer than the ones that agree with many-worlds that the K-complexity of the sequence of observations that Copenhagen predicts is greater than that of MWI's predictions. So we should a priori expect MWI's predictions to be vindicated by experiment."

Nope, it looks more like this:

"Given our observations so far, either many worlds or Copenhagen interpretations could be the 'right'(1) program. Since MWI is simpler to write down(2), we should assign it a higher probability of being 'right.'"

I'm not entirely sure what 1 means for programs that output the same bit strings for all possible measurements (e.g. Hamiltonian vs. Lagrangian formulation, but not evolution vs. God did it, since those make different predictions). But we can use JGWeissman's idea of the simpler one contributing more probability to a hypothesis, and therefore being "more important" rather than "right."

2 is a little more problematic, since it seems pretty tough to prove. "Just look at them!" is not a very good method, you would actually need to write the computer programs down, barring some elegant argument that involves deep knowledge of how the computer programs would turn out.

EDIT: Actually, that's not really an application of Occam's razor, which is pretty vague, but rather an application of the more specific Solomonoff prior.

MWI is only simpler to write down in a form that doesn't allow you to make predictions.

"but (the claim is) that the computer program for Copenhagen would have to have an extra section that specified how collapse upon observation worked that many-worlds wouldn’t need."

What exactly are we doing here? Calculating the complexity of a WM ontology versus a Copenhagen ontology, or figuring out the simplest way to predict observations?

The minimal subset of calculation you need to do in order to predict observation is in fact going to be the same whatever interpretation you hold to -- it's just the subset that "shut up and calculate" uses. Even many worlders would go through cycles of renormalising according to observed data, and discarding unoberserved data, which is to say, behaving "as if" collapse were occurring. Even though they don't interpret it that way. So just predicting observation doesn't tell you which ontology is simplest.

On the other hand, modelling ontology without bothering about prediction can differentiate the complexity of ontologies. But why would you want to do that? What you are interested in is the simplest correct theory , not the simplest theory. Its easy to come up with simple theories that are not predictive.

In particular, if you just model the wave function, the only results you will get represent every possible outcome. In order to match observation , you will have to keep discarding unobserved outcomes and renormalising as you do in every interpretation. It's just that that extra stage is performed manually, not by the programme.

[+][comment deleted]9mo 1