Shane Legg's text Solomonoff Induction has helped me a lot over the last few days. I was trying to iron out the kinks in my understanding of Eliezer's argument in this old thread:

Solomonoff's universal distribution isn't trying to assume the universe is computable.  It's trying to beat any computable probability distribution you could put over the universe. In other words, if your attempt to calculate, to talk about and reason about the universe is computable - even if you are talking about systems that you think are uncomputable - then your ratiocinations appear somewhere in Solomonoff induction.

Eliezer's statement is correct (more precisely, a computable human cannot beat Solomonoff in accumulated log scores by more than a constant, even if the universe is uncomputable and loves the human), but understanding his purported proof is tricky. Legg's text doesn't give any direct answer to the question at hand, but all the technical details in there, like the difference between "measures" and "semimeasures", are really damn important if you want to work out the answer for yourself. I know mathematics has many areas where an "intuitive understanding" kinda sorta suffices. This is not one of those areas.

New to LessWrong?

New Comment
8 comments, sorted by Click to highlight new comments since: Today at 4:45 AM

There are straightforward proofs of this fact, I think. For example, observe that Solomonoff induction does only a constant worse than multiplicative weights over the space of computable predictors. Thinking about it this way seems more productive, unless I am missing something important.

The tricky part for me was proving the equivalence of two views of the Solomonoff prior: as a probability distribution over programs that print bitstrings and never make mistakes, and as a mixture of all "lower semi-computable" probability distributions over finite prefixes of bitstrings that are allowed to make mistakes. Once you derive the second view from the first, the conclusion is indeed trivial.

I started from the view of multiplicative weights over computable predictors, and then when I tried to describe it as an improvement on Solomonoff induction realized it was the same thing. If you write down what the two algorithms actually do, the equivalence to within a constant is clear.

I mentioned to you a few days ago (in chat) that Scholarpedia stated:

The universal a priori probability m has many remarkable properties. A theorem of L.A. Levin states that among the lower semi-computable semi-measures it is the largest such function in the following sense: If p is a lower semi-computable semi-measure, then there is a constant c such that cm(x) >= p(x) for all x.

When you say "The tricky part for me was proving the equivalence of two views of the Solomonoff prior" do you mean you tried to re-derive this result?

For the discrete version, doesn't this boil down to converting a program that semi-computes a semi-measure into a program that "semi-samples from" a semi-measure?

Which is pretty straightforward:

We interpret the remainder of our (infinite) input as a real number y in [0,1]. Every time the measure that we're semi-computing adds an amount p of probability to one of its possible values v, we increment a variable x (which was initialised to 0) by p. If and when x finally exceeds y, we return the value v whose probability was just incremented.

(The continuous version, which is the one we actually need, looks harder, but I guess the same general idea should work. Perhaps we just repeat the algorithm above once for every bit of the output?)

Yes. I eventually succeeded, but not without a tiny bit of help from the texts.

Link is broken, here's another.

Legg's text doesn't give any direct answer to the question at hand, but all the technical details in there, like the difference between "measures" and "semimeasures", are really damn important if you want to work out the answer for yourself. I know mathematics has many areas where an "intuitive understanding" kinda sorta suffices. This is not one of those areas.

Is this .pdf relevant? It certainly seems to be relevant, though I admittedly haven't read it all (or really even a large portion of it) yet. I've been learning about Solomonoff induction and found Legg's write up to be handy, but that paper seems to go into much more relevant detail.