Solomonoff Induction seems clearly "on the right track", but there are a number of problems with it that I've been puzzling over for several years and have not made much progress on. I think I've talked about all of them in various comments in the past, but never collected them in one place.
Apparent Unformalizability of “Actual” Induction
Argument via Tarski’s Indefinability of Truth
Informally, the theorem states that arithmetical truth cannot be defined in arithmetic. The theorem applies more generally to any sufficiently strong formal system, showing that truth in the standard model of the system cannot be defined within the system.
Suppose we define a generalized version of Solomonoff Induction based on some second-order logic. The truth predicate for this logic can’t be defined within the logic and therefore a device that can decide the truth value of arbitrary statements in this logical has no finite description within this logic. If an alien claimed to have such a device, this generalized Solomonoff induction would assign the hypothesis that they're telling the truth zero probability, whereas we would assign it some small but positive probability.
Argument via Berry’s Paradox
Consider an arbitrary probability distribution P, and the smallest integer (or the lexicographically least object) x such that P(x) < 1/3^^^3 (in Knuth's up-arrow notation). Since x has a short description, a universal distribution shouldn't assign it such a low probability, but P does, so P can't be a universal distribution.
Is Solomonoff Induction “good enough”?
Given the above, is Solomonoff Induction nevertheless “good enough” for practical purposes? In other words, would an AI programmed to approximate Solomonoff Induction do as well as any other possible agent we might build, even though it wouldn’t have what we’d consider correct beliefs?
Is complexity objective?
Solomonoff Induction is supposed to be a formalization of Occam’s Razor, and it’s confusing that the formalization has a free parameter in the form of a universal Turing machine that is used to define the notion of complexity. What’s the significance of the fact that we can’t seem to define a parameterless concept of complexity? That complexity is subjective?
Is Solomonoff an ideal or an approximation?
Is it the case that the universal prior (or some suitable generalization of it that somehow overcomes the above "unformalizability problems") is the “true” prior and that Solomonoff Induction represents idealized reasoning, or does Solomonoff just “work well enough” (in some sense) at approximating any rational agent?
How can we apply Solomonoff when our inputs are not symbol strings?
Solomonoff Induction is defined over symbol strings (for example bit strings) but our perceptions are made of “qualia” instead of symbols. How is Solomonoff Induction supposed to work for us?
What does Solomonoff Induction actually say?
What does Solomonoff Induction actually say about, for example, whether we live in a creatorless universe that runs on physics? Or the Simulation Argument?
Solomonoff Induction would not consider this a description of x as it cannot be used to compute x.
It seems to me that the paradox... (read more)
This one bother me as well. Turing machine basis for complexity favours certain types of theories over others. For example, how many bits of turing machine does it take to predict, say, maxwells equations, which involve continuous fields?
(warning: original "research" following)
One temptation is to sweep this under the rug by saying that the complexity required to talk about continuous math is just a constant-sized interpreter that you put on the turing machine, but this can't get around the fact that some thoeries won't... (read more)
This is the second mention of second-order logical Solomonoff Induction, but I can't imagine what such a thing would look like.
All universal Turing machines can simulate each other with logarithmic slowdown. Saying that the parameter means that complexity "subjective" is like saying the time complexity of Quicksort is "subjective" because the algorithm doesn't specify which programming language to implement it in.
For aliens with a halting oracle:
Suppose the aliens have this machine that may or may not be a halting oracle. We give them a few Turing machine programs and they decide which ones halt and which ones don't. Then we run the programs. Sure enough, none of the ones they say run forever halt, and some of them they say don't run forever will halt at some point. Suppose we repeat this process a few times with different programs.
Now what method should we use to predict the point at which new programs halt? The best strategy seems to be to ask the aliens whi... (read more)
The description of x has to include the description of P, and that has to be computable if a universal distribution is going to assign positive probability to x.
If P has a short computable description, then yes, you can conclude ... (read more)
Actually, what Tarski seems to s... (read more)
If I understand Solomonoff Induction correctly, for all n and p the the sum of the probabilities of all the hypotheses of length n equals the sum of the probabilities of hypotheses of length p. If this is the case, what normalization constant could you possibility use to make all the probabilities sum to one? It seems there are none.
A kilobit of improbability requires only a kilobit of data to offset it, which isn't very crackpot at all. Proving minimum length is impossible, but proving an upper bound on length is very easy, and that proves a lower bound on probability.
I take your point re: length vs speed. The theorem that I think justifies calling Kolmogorov Complexity objective is this:
"If K1 and K2 are the complexity functions relative to description languages L1 and L2, then there is a constant c (which depends only on the languages L1 and L2) such that |K1(s) - K2(s)| <= c for all strings s."
(To see why this is true, note that you can write a compiler for L2 in L1 and vice versa)
I don't see why code modelling symmetric laws should be longer than code modelling asymmetric laws (I'd expect the reverse; ... (read more)
Here's an idea for a modification of Solomonoff induction that no longer has a subjectively-chosen machine to encode hypotheses in. One could instead simply consider how many bits it would take to encode a solution on all universal Turing machines, and making each hypotheses’ prior equal to the average of its prior on each machine. This makes somewhat intuitive sense, as one doesn’t know which “machine” the universe in “programmed” on, so it’s best to just assume a random machine. Unless I’m mistaken, there’s an infinite number of universal Turing machines, but I think algorithms could still approximate the induction.
Is there any justification that Solomonoff Induction is accurate, other than intuition?
I'm not sure I understand you c... (read more)
I believe this one has been closed ages ago by Alan Turing, and practically demonstrated for approximations by the investigation into busy beaver function for example. We won't be able to know BB(10) from God almighty. Ever.
I'm very confused about this myself, having just read this introductory paper on the subject.
My understanding is that an "ideal" reference UTM would be a universal turing machine with no bias towards any arbitrary string, but rigorously defining such a machine is an open problem. Based on our observation of UTMs, the more... (read more)
Re: Tarski’s Indefinability of Truth - I don't get this one. Solomonoff Induction estimates probabilities of bitstring sequences - but it's uncomputable - and the best we can do is finite approximations - which of course have limitations and will be able to be improved upon - just as Tarski’s "Indefinability of Truth" says.
Re: Argument via Berry’s Paradox - but the description of x includes P - which could be complex. So the description of x is not short - if you are not given P.
Kinda. Some notions of complexity are more useful than others. However, condition your machine on some real world sense data for a while, and any reasonable prior soon gets swamped by data - so in practice, this is not a big deal.
Inputs can always be represented by bit sequences. Qualia are an irrelevance. Q.E.D.
I'm not sure what you have in mind by basing Solomonoff induction on a second-order logic. Can you give a reference?
Here is what I'd suggest as a baseline. Use John Tromp's Binary Lambda Calculus with some variation of Levin Search to explore time/complexity bounded programs that output some initial sequence of data.
Perhaps we could formalize some argument that for an arbitrarily poor definition of complexity, the error becomes negligible for the average sufficiently complex universe?
Our perceptions have some neurological representation.
The entirely depends on your definition of creator. Traditional creators such as the Christian god could potentially have enough explanatory power once properly defined, yet would end up horrendously complex (encoding an entire human-like mind into the primitive natural law) unless you further reduce the construction of the creator to a natural process.
Solomonoff Induction is empirical: unless the ... (read more)
The interesting bit is that if you could actually use Solomonoff induction as prior, I'm pretty sure you'd be an irreparable aether believer and 'relativity sceptic', with confidence that has so many nines nothing would ever convince you. That's because the absolute-speed irreversible aether universes like 3d versions of Conway's game of life are so much more computationally compact than highly symmetric universe with many space-time symmetries (to the point of time and space intermixing in the equations) and the fully reversible fundamental laws of physic... (read more)
In the Kolmogorov-Chaitin Minimum description length approach, the subject must pick a Turing machine whose operations describe the basic operations believed to represent 'simplicity' by the subject... (read more)