Let f:N→{0,1} be a function such that (for large n) f(n) can be computed in time 2n, but there is no algorithm that can predict f(n) in time less than 2n consistently better than a coin which outputs 1 with probability 1/3.

Construct a Turing machine E as follows:

E(2k)=f(2k), and

E(2k+1)=f(2k+1) XOR f(10k).

Imagine trying to predict E(k) with an algorithm which takes time at most k2. If k is even, then it is determined by an event indistinguishable in time k2 from a 1/3 coin, so the predictor should output 1 or 0 with probability 1/3. If k is odd, then we have the XOR of two bits that are indistinguishable from independently 1 with probability 1/3, so the predictor should output 1 with probability 4/9.

Therefore, if we let T(k)=k3 and R(k)=k2, if SIA worked correctly, then we would have that the limit of the probability that SIA(E,T,R) outputs 1 on the even bits would be 1/3.

However, this is not the case. In particular, the the probability that the 10kth bit output by SIA(E,T,R) is a 1 will not converge to 1/3, but rather switch between 4/9 and 5/9.

This is because at the time that a predictor needs to predict E(10k), it will have already outputted a 1 with probability 4/9 for E(2k+1), and had enough time to compute f(2k+1). Therefore, if f(2k+1)=0, then any predictor that does make its prediction for E(10k) match its prediction for E(2k+1) will contradict itself.

The machines that will do well when sampled in SIA(E,T,R) will be the ones that output 1 independently with probability 1/3 for all even bits and 1 with probability 4/9 for all odd bits, EXCEPT when predicting a bit in a location that is a power of 10, in which case, it will copy (or negate) a previous bit, and output 1 with probability either 4/9 or 5/9, both of which are greater than the desired 1/3.

Further, a machine that gives a 1 with probability of 1/3 on E(10k) for all k will have a likelyhood going to 0 relative to the machines which just copy or negate their old answers.

Note that the correct probability with which to believe E(10k)=1 really is 1/3, not 4/9 or 5/9. The prediction given to E(2k+1) was calculated from a state of ignorance about f(2n+1), and this prediction should have no effect on the prediction for E(10k).

This post is part of the Asymptotic Logical Uncertainty series. In this post, I give a concrete example of how the Solomonoff Induction Inspired Aproach fails to converge to the correct probabilities when predicting the output of a simple Turing machine.

Let f:N→{0,1} be a function such that (for large n) f(n) can be computed in time 2n, but there is no algorithm that can predict f(n) in time less than 2n consistently better than a coin which outputs 1 with probability 1/3.

Construct a Turing machine E as follows:

E(2k)=f(2k), and

E(2k+1)=f(2k+1) XOR f(10k).

Imagine trying to predict E(k) with an algorithm which takes time at most k2. If k is even, then it is determined by an event indistinguishable in time k2 from a 1/3 coin, so the predictor should output 1 or 0 with probability 1/3. If k is odd, then we have the XOR of two bits that are indistinguishable from independently 1 with probability 1/3, so the predictor should output 1 with probability 4/9.

Therefore, if we let T(k)=k3 and R(k)=k2, if SIA worked correctly, then we would have that the limit of the probability that SIA(E,T,R) outputs 1 on the even bits would be 1/3.

However, this is not the case. In particular, the the probability that the 10kth bit output by SIA(E,T,R) is a 1 will not converge to 1/3, but rather switch between 4/9 and 5/9.

This is because at the time that a predictor needs to predict E(10k), it will have already outputted a 1 with probability 4/9 for E(2k+1), and had enough time to compute f(2k+1). Therefore, if f(2k+1)=0, then any predictor that does make its prediction for E(10k) match its prediction for E(2k+1) will contradict itself.

The machines that will do well when sampled in SIA(E,T,R) will be the ones that output 1 independently with probability 1/3 for all even bits and 1 with probability 4/9 for all odd bits, EXCEPT when predicting a bit in a location that is a power of 10, in which case, it will copy (or negate) a previous bit, and output 1 with probability either 4/9 or 5/9, both of which are greater than the desired 1/3.

Further, a machine that gives a 1 with probability of 1/3 on E(10k) for all k will have a likelyhood going to 0 relative to the machines which just copy or negate their old answers.

Note that the correct probability with which to believe E(10k)=1 really is 1/3, not 4/9 or 5/9. The prediction given to E(2k+1) was calculated from a state of ignorance about f(2n+1), and this prediction should have no effect on the prediction for E(10k).