Fixing Resource Bounded Solomonoff Induction (draft)

Scott Garrabrant

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This post is very sketchy, but it is a big idea that I have been wanting to talk with a lot of people about. I wanted to get a post out with all the key ideas. I promise that either there is an error in this post, or I will write it up much more clearly, so if you are not in a rush, you can wait for that. (I already found an error, I will see if I can patch it.)

The rough idea of this algorithm is that the Solomonoff induction inspired approach fails because sampled Turing machines have an incentive to double down on past mistakes when trying to correlate their behavior with past weaker versions of themselves.

This algorithm gets past that by only requiring that sampled machines do well on a very sparse training set. This training set is so sparse that when a machine has to choose what to do for one input, it has enough time to actually compute the correct answer to all previous values of the training set.

However, we cant just use any sparse sequence. If we did, we would have a positive proportion of sampled Turing machines do well on the training sequence, but identify but identify inputs that it knows are not in the training set, and have bad behavior on those inputs it knows it wont be judged on.

The training set therefore has to diagonalize across all fast Turing machines, so that no Turing machine (within some runtime constraints) can identify elements that are consistently not in the training set. To do this, we end up having to add some randomness to the training set.

To make the proofs work out, we end up wanting our whole logical predictor to be in the set of Turing machines we diagonalize against, so that if the logical predictor has bad behavior, the places where it has bad behavior is guaranteed to be represented in the training set. We achieve this by making the randomness used in the training set available to all Turing machines we diagonalize against, but in a limited way so that they cannot use the access to the random bits to dodge the training set.

These special Turing machines with shared randomness are used in the algorithm, but in the end, we just have a normal randomized Turing machine which passes the generalized Benford test (and much more).

Let $g$ be increasing function. Let $A$ be a fast growing function.

I will be presenting an algorithm that will use a type of randomized Turing machine, I will call an $R T M$ . These machines will have access to a large number of shared random bits. Let $F$ be a function $N \times N \to {0, 1}$ chosen at random. An $R T M$ is a Turing machine which has a read only input tape containing a natural number, $n$ . It also has the ability to look up $F (a, b)$ , as long as $A (a) < n$ .

Consider the following $R T M$ , $L_{F} (n)$ , which outputs an increasing sequence of natural numbers.

b=1
i=1
while A(i)<n:
  if i>=b:
    T=the RTM encoded by the least j>0 s.t. F(b,j)=1
    b=A(b)
  if T accepts i in time g(i):
    if F(i,0)=1:
      output i
      i=A(i)+1
    else:
      i=b
  else:
    i=i+1

Let $S_{F}$ be the set of all natural numbers output by $L_{F} (n)$ for any $n$ . Since $L_{F} (n)$ only uses $n$ to know when to stop the while loop, any number output by $L_{F} (n)$ is also output by $L_{F} (m)$ for $m > n$ .

Theorem: $S_{F}$ has the following properties, when $F$ is chosen uniformly at random.

$S_{F}$ is infinite with probability 1.
With probability 1, for every $R T M$ $T$ such that there exist infinitely many $n$ such that $T$ accepts $n$ in time at most $g (n)$ , there are infinitely many $n \in S_{F}$ such that $T$ accepts $n$ in time at most $g (n)$ .
If $i, j \in S_{F}$ and $i < j$ , then $A (i) < j$ .
$L_{F}$ is an $R T M$ , which runs in time in time $O (n)$ .

Proof Sketch:

The first if statement is clearly satisfied infinitely often. Each time it is satisfied, an $R T M$ $T$ is sampled. The sampled $R T M$ will be a machine that immediately accepts every input with some fixed probability. Every time this $R T M$ is sampled, we immediately output $i$ if $F (i, 0) = 1$ . This $R T M$ is sampled infinitely many times with probability 1, and a number is output infinitely many times with probability 1.
It suffices to show that this is true with probability 1 for each fixed $R T M$ $T$ . Fix a $T$ which is encoded by the natural number $j$ such that there exist infinitely many $m$ such that $T$ accepts $m$ in time at most $g (m)$ . Note that if $A (ℓ) \geq m,$ then $T$ cannot reference $F (ℓ, \cdot)$ when simulating $T$ on input $m$ . For each $m$ such that $T$ accepts $m$ , consider the greatest $B \leq m$ of the form $A^{ℓ} (1)$ for some $ℓ$ . WLOG assume that $m$ is the minimal thing at least $B$ which $T$ accepts in time $g$ . Then with probability $2^{- j - 2}$ , $m$ will be output. This is because there is a probability $1 / 2^{j}$ chance of sampling $j$ with $T (B, \cdot)$ , and a probability 1/2 chance of not skipping over $m$ (and skipping to $B$ instead), and a 1/2 chance of outputting $m$ when it gets there. Note that while these random variables come from $F$ , they come from parts of $F$ which cannot be referenced when calculating $T (m)$ .
This is clear because every time we output $i$ , we immediately increase $i$ to $A (i) + 1$ .
The algorithm only ever looks up $F (i, j)$ where $A (i) < n$ . Every part of the algorithm runs in an amount of time that is polynomial in $i$ , and therefore much smaller than $O (n)$ .

$□$

We now use the this in a new algorithm:

Let $E$ be a Turing machine which either accepts or rejects an input $n$ in time at most $A (n)$ .

We wish to construct a Turing machine $M_{F}$ which runs quickly and on input $n$ accepts with some probability that represents a good belief about how likely $E$ is to accept the same input.

$M_{F} (n)$

Let $m$ be the greatest integer with $A (m) < n$ .
Compute the set $S$ elements of output by $S_{F} (n)$ .
Run $E$ on input $i$ for all $i \in S$ .
Sample a Turing machine $T$ . Let $t$ be the number of bits necessary to encode $T$ .
Run $T$ on input $i$ for all $i \in S$ .
If for all $i \in S_{m}$ , $T$ accepts $i$ in time $t \cdot f (i)$ if and only if $E$ accepts $i$ , then run $T$ on input $n$ for time $t \cdot f (n)$ , and output the result.
Otherwise, goto 4.

Theorem: $M_{F}$ passes the generalized Benford Test.

Proof Sketch: Let $Q$ be a quickly running Turing machine which accepts the numbers $q_{1}, q_{2} \dots$ , and the output of $E$ on inputs $q_{i}$ is indistinguishable from a coin that accepts with probability $p$ .

Assume by way of contradiction that the probability that $M_{F}$ assigns to $q_{i}$ does not converge to $p$ , and there exists an $ε$ , and infinitely many $q_{i}$ such that the probability that $M_{F}$ accepts $q_{i}$ is at least $p + ε$ .

Consider $M_{F}^{'}$ , a derandomized version of $M_{F}$ which outputs a probability $p$ on input $n$ which is within $ε / 3$ of the probability that $M_{F}$ accepts $n$ . This can be done by running $M_{F}$ on all possible sequences of random inputs until the probability that $M_{F}$ has not halted yet is less than $ε / 3$ . Note that we do not derandomize the bits coming from $F$ , only the extra bits in sampling Turing machines. This derandomized version is now an $R T M$ . By assuming that $g$ grows at least slightly faster than $f$ ( $f log f = o (g)$ should be sufficient), it can be made into an $R T M$ which runs in time $g$ on all inputs (with probability 1).

Consider the Turing machine $R$ which accepts $n$ if and only if $Q$ accepts $n$ , and $M_{F}^{'}$ assigns a probability that is at least $p + 2 ε / 3$ . Then $R$ accepts infinitely inputs, and each input that $R$ accepts is some $q_{i}$ such that $M_{F}$ accepts $q_{i}$ with probability at least $p + ε / 3$ . Again, $R$ is an $R T M$ which runs in time $g$ or all inputs.

Therefore, with probability 1, $S_{F}$ has infinitely many elements $k_{1}, k_{2}, \dots$ such that $Q$ accepts $k_{i}$ and that $M_{F}$ accepts $k_{i}$ with probability at least $p + ε / 3$ .

Note that if $n \in S_{F}$ , then the probability that $M_{F}$ gets $n$ correct is exactly the probability that a randomly chosen Turing machine gets $n$ correct given that it got all of the previous elements of $S_{F}$ correct.

Let $B (j)$ denote the probability that a randomly chosen Turing machine gets the correct answer on all elements of $S_{F}$ less than $j$ . Let $C (j)$ denote the probability that a randomly chosen Turing machine gets the correct answer on all $k_{i}$ less than $j$ .

Note that the $k_{i}$ is quickly computable (random) subsequence of the things accepted by $Q$ , so by assumption of $Q$ representing an irreducible pattern (or some similar definition), $C (j) \in O ((p (p + ε / 3) + (1 - p) (1 - p - ε / 3))^{r}),$ where $r$ is the number of $k_{i}$ less than $j$ . (i.e. The probability of getting them all right is at most a constant times the probability of getting them right if they were actually coming from a $p$ -coin.)

Let $B^{'} (j)$ denote the probability that a randomly chosen Turing machine gets the correct answer on all elements of $S_{F}$ less than $j$ , if that coin were modified to accept with a fixed probability between $p$ and $p + ε / 6$ on all inputs which $Q$ accepts in time $f$ , and let $B^{'} (j)$ denote the probability such a modified Turing machine gets the correct answer on all of the $k_{i}$ .

Observe that $B^{'} (j) / C^{'} (j) = B (j) / C (j)$ ,

(THIS STEP IS WRONG. $B^{'} (j) / C^{'} (j)$ is the probability that a randomly chosen TM gets all the $S_{L}$ which are not $k_{i}$ right, while $B (j) / C (j)$ is the probability that it gets those right given that it got the $k_{i}$ right. I will see if I can fix it soon.)

and that $C (j) \in o (C^{'} (j))$ . Therefore $B (j) \in o (B^{'} (j))$ .

However, a randomly chosen Turing machine has some probability $s$ of already having the above surgery done to it where all the answers to $Q$ are changed, which implies that $B (j) \geq s \cdot B^{'} (j)$ , which is a contradiction.

$□$

Random Thoughts:

Note that while $M_{F}$ is an $R T M$ , we can just run $M_{F}$ on a randomly chosen $F$ and pass the generalized Benford test with a normal Turing machine with probability 1.

I believe that this algorithm is correctly implementing resource bounded Solomonoff induction, and will thus have all the reasonable desired properties you could ask of such an algorithm. i.e. Unless I am missing something and there is a hole in the proof, this algorithm is really awesome.

This algorithm is only able to predict the output of Turing machines which are guaranteed to halt on every input. (We made the assumption that they halt in $A (n)$ time, but if we do not know how long they take to halt, we could define $A (n)$ to be the maximum amount of time it takes to halt on any input less than $n$ , which is a computable function). It is unclear whether or not we can extend it to general logical uncertainty.

Amusingly, this algorithm takes even longer to converge than other similar ALU algorithms, because we are restricting ourselves to only train the algorithm on a very sparse set of inputs.

I am not sure what I want to call this algorithm, but I have been referring to $M_{F}$ as the Universal Heuristic Generator (UHG) and $S_{F}$ as a Universal Training Set (UTS). Alternative ideas are welcome.

[-]Vanessa Kosoy9yΩ110

I haven't understood the details yet, but just a basic question. Your "randomized Turing machines", do they have a random oracle or a random advice? In other words, do you assume the lookup of $F (i, j)$ takes time $p o l y (log i, log j)$ or $p o l y (i, j)$ ?

[-]Scott Garrabrant9yΩ000

It does not matter. Let's say $p o l y (i, j)$ . The most it is going change is how large a gap is needed between $f$ and $g$ .

LESSWRONG
LW

Fixing Resource Bounded Solomonoff Induction (draft)

4

Ω 2

4

Ω 2