How Bayes' theorem is consistent with Solomonoff induction

6A1987dM

4Alex_Altair

2A1987dM

6Alex_Altair

1private_messaging

3private_messaging

-1thomblake

New Comment

Huh? All of that applies to any choice of priors whatsoever, not just Solomonoff's. Or am I missing something?

I'm saying that Solomonoff induction doesn't contradict Bayes' theorem. The purpose of Solomonoff induction was to find an objective prior, but then after they discovered it, it included a way of updating too. Bayes' theorem turned out to be redundant. But since we're pretty sure Bayes' theorem is correct, it's nice to see that they don't contradict.

Solomonoff induction as opposed to what? Is there any choice of priors which does contradict Bayes' theorem?

Solomonoff induction is more than a choice of priors. It's also a method of finding all possible hypotheses, and a method of computing likelihoods. It's an entire system of reasoning.

Worth also noting possible misunderstanding from 0 and 1 are not probabilities .

I guess I made conversational assumption that when Bayes name is used rather than 'Aristotelian logic', it speaks of non-binary probabilities rather than the limit in which Bayes does not contradict Aristotelian logic of the form 'if hypothesis does not match data exactly, hypothesis is wrong'.

A clarification if I might:

"is the probability that we will see data sequence E, given that we run program H on the universal Turing machine."

I think it'll be helpful to word it as "output **begins** with the data sequence E", as it is generally a very common misconception that it suffices to see E *somewhere* within the output; that it suffices that the H "explains" the data (the original article used "explains").

When thinking of e.g. the universe, the "explains" is typically taken to mean "the universe contains me somewhere" and a form of anthropic reasoning, which can lead to substantially different concept than Solomonoff induction.

As a side note, one can obtain a type of anthropic reasoning prior by including some self-description on extra tape that can be read; then the code can search for instances of itself within the models for only a constant cost, but still needs to be predictive, i.e. output string that begins with the observed data. This seems no different (up to a constant) from simply including the self description as part of the data sequence E . edit: on second thought, extra tape is different in major fallible way: the self description on extra tape, if sufficiently complete, can allow to construct the god in your own image for 'goddidit' . One should just add self description as part of the data sequence E . It is still no-different-up-to-a-constant though.

You've read the introduction to Bayes' theorem. You've read the introduction to Solomonoff induction. Both describe fundamental theories of epistemic rationality. But how do they fit together?

It turns out that it’s pretty simple. Let’s take a look at Bayes’ theorem.

For a review:

_{i}is an arbitrary element in the set of all hypotheses.In terms of Solomonoff induction:

_{i}is an arbitrary binary sequence.The denominator is the same meaning as the numerator, except as a sum for every possible hypothesis. This essentially normalizes the probability in the numerators. Any hypotheses that do not match the data E exactly will cause P(E|H

_{i}) = 0, and therefore that term will contribute nothing to the sum. If the hypothesis does output E exactly, then P(E|H_{i}) = 1, and the matching hypothesis contributes its weight to the renormalizing sum in the denominator.Let's see an example with these things substituted. Here, the set of H

_{i}is the set of hypotheses that match.In summary; Bayes’ theorem says that once we find all matching hypotheses, we can find their individual probability by dividing their individual weight of by the weights of all the matching hypotheses.

This is intuitive, and matches Bayes’ theorem both mathematically and philosophically. Updating will occur when you get more bits of evidence E. This will eliminate some of the hypotheses H

_{i}, which will cause the renormalization in the denominator to get smaller.