Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This post is very sketchy, but it is a big idea that I have been wanting to talk with a lot of people about. I wanted to get a post out with all the key ideas. I promise that either there is an error in this post, or I will write it up much more clearly, so if you are not in a rush, you can wait for that. (I already found an error, I will see if I can patch it.)


The rough idea of this algorithm is that the Solomonoff induction inspired approach fails because sampled Turing machines have an incentive to double down on past mistakes when trying to correlate their behavior with past weaker versions of themselves.

This algorithm gets past that by only requiring that sampled machines do well on a very sparse training set. This training set is so sparse that when a machine has to choose what to do for one input, it has enough time to actually compute the correct answer to all previous values of the training set.

However, we cant just use any sparse sequence. If we did, we would have a positive proportion of sampled Turing machines do well on the training sequence, but identify but identify inputs that it knows are not in the training set, and have bad behavior on those inputs it knows it wont be judged on.

The training set therefore has to diagonalize across all fast Turing machines, so that no Turing machine (within some runtime constraints) can identify elements that are consistently not in the training set. To do this, we end up having to add some randomness to the training set.

To make the proofs work out, we end up wanting our whole logical predictor to be in the set of Turing machines we diagonalize against, so that if the logical predictor has bad behavior, the places where it has bad behavior is guaranteed to be represented in the training set. We achieve this by making the randomness used in the training set available to all Turing machines we diagonalize against, but in a limited way so that they cannot use the access to the random bits to dodge the training set.

These special Turing machines with shared randomness are used in the algorithm, but in the end, we just have a normal randomized Turing machine which passes the generalized Benford test (and much more).


Let be increasing function. Let be a fast growing function.

I will be presenting an algorithm that will use a type of randomized Turing machine, I will call an . These machines will have access to a large number of shared random bits. Let be a function chosen at random. An is a Turing machine which has a read only input tape containing a natural number, . It also has the ability to look up , as long as .

Consider the following , , which outputs an increasing sequence of natural numbers.

b=1
i=1
while A(i)<n:
  if i>=b:
    T=the RTM encoded by the least j>0 s.t. F(b,j)=1
    b=A(b)
  if T accepts i in time g(i):
    if F(i,0)=1:
      output i
      i=A(i)+1
    else:
      i=b
  else:
    i=i+1

Let be the set of all natural numbers output by for any . Since only uses to know when to stop the while loop, any number output by is also output by for .

Theorem: has the following properties, when is chosen uniformly at random.

  1. is infinite with probability 1.

  2. With probability 1, for every such that there exist infinitely many such that accepts in time at most , there are infinitely many such that accepts in time at most .

  3. If and , then .

  4. is an , which runs in time in time .

Proof Sketch:

  1. The first if statement is clearly satisfied infinitely often. Each time it is satisfied, an is sampled. The sampled will be a machine that immediately accepts every input with some fixed probability. Every time this is sampled, we immediately output if . This is sampled infinitely many times with probability 1, and a number is output infinitely many times with probability 1.

  2. It suffices to show that this is true with probability 1 for each fixed . Fix a which is encoded by the natural number such that there exist infinitely many such that accepts in time at most . Note that if then cannot reference when simulating on input . For each such that accepts , consider the greatest of the form for some . WLOG assume that is the minimal thing at least which accepts in time . Then with probability , will be output. This is because there is a probability chance of sampling with , and a probability 1/2 chance of not skipping over (and skipping to instead), and a 1/2 chance of outputting when it gets there. Note that while these random variables come from , they come from parts of which cannot be referenced when calculating .

  3. This is clear because every time we output , we immediately increase to .

  4. The algorithm only ever looks up where . Every part of the algorithm runs in an amount of time that is polynomial in , and therefore much smaller than .


We now use the this in a new algorithm:

Let be a Turing machine which either accepts or rejects an input in time at most .

We wish to construct a Turing machine which runs quickly and on input accepts with some probability that represents a good belief about how likely is to accept the same input.

  1. Let be the greatest integer with .
  2. Compute the set elements of output by .
  3. Run on input for all .
  4. Sample a Turing machine . Let be the number of bits necessary to encode .
  5. Run on input for all .
  6. If for all , accepts in time if and only if accepts , then run on input for time , and output the result.
  7. Otherwise, goto 4.

Theorem: passes the generalized Benford Test.

Proof Sketch: Let be a quickly running Turing machine which accepts the numbers , and the output of on inputs is indistinguishable from a coin that accepts with probability .

Assume by way of contradiction that the probability that assigns to does not converge to , and there exists an , and infinitely many such that the probability that accepts is at least .

Consider , a derandomized version of which outputs a probability on input which is within of the probability that accepts . This can be done by running on all possible sequences of random inputs until the probability that has not halted yet is less than . Note that we do not derandomize the bits coming from , only the extra bits in sampling Turing machines. This derandomized version is now an . By assuming that grows at least slightly faster than ( should be sufficient), it can be made into an which runs in time on all inputs (with probability 1).

Consider the Turing machine which accepts if and only if accepts , and assigns a probability that is at least . Then accepts infinitely inputs, and each input that accepts is some such that accepts with probability at least . Again, is an which runs in time or all inputs.

Therefore, with probability 1, has infinitely many elements such that accepts and that accepts with probability at least .

Note that if , then the probability that gets correct is exactly the probability that a randomly chosen Turing machine gets correct given that it got all of the previous elements of correct.

Let denote the probability that a randomly chosen Turing machine gets the correct answer on all elements of less than . Let denote the probability that a randomly chosen Turing machine gets the correct answer on all less than .

Note that the is quickly computable (random) subsequence of the things accepted by , so by assumption of representing an irreducible pattern (or some similar definition), where is the number of less than . (i.e. The probability of getting them all right is at most a constant times the probability of getting them right if they were actually coming from a -coin.)

Let denote the probability that a randomly chosen Turing machine gets the correct answer on all elements of less than , if that coin were modified to accept with a fixed probability between and on all inputs which accepts in time , and let denote the probability such a modified Turing machine gets the correct answer on all of the .

Observe that ,

(THIS STEP IS WRONG. is the probability that a randomly chosen TM gets all the which are not right, while is the probability that it gets those right given that it got the right. I will see if I can fix it soon.)

and that . Therefore .

However, a randomly chosen Turing machine has some probability of already having the above surgery done to it where all the answers to are changed, which implies that , which is a contradiction.


Random Thoughts:

Note that while is an , we can just run on a randomly chosen and pass the generalized Benford test with a normal Turing machine with probability 1.

I believe that this algorithm is correctly implementing resource bounded Solomonoff induction, and will thus have all the reasonable desired properties you could ask of such an algorithm. i.e. Unless I am missing something and there is a hole in the proof, this algorithm is really awesome.

This algorithm is only able to predict the output of Turing machines which are guaranteed to halt on every input. (We made the assumption that they halt in time, but if we do not know how long they take to halt, we could define to be the maximum amount of time it takes to halt on any input less than , which is a computable function). It is unclear whether or not we can extend it to general logical uncertainty.

Amusingly, this algorithm takes even longer to converge than other similar ALU algorithms, because we are restricting ourselves to only train the algorithm on a very sparse set of inputs.

I am not sure what I want to call this algorithm, but I have been referring to as the Universal Heuristic Generator (UHG) and as a Universal Training Set (UTS). Alternative ideas are welcome.

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 6:34 PM

I haven't understood the details yet, but just a basic question. Your "randomized Turing machines", do they have a random oracle or a random advice? In other words, do you assume the lookup of takes time or ?

It does not matter. Let's say . The most it is going change is how large a gap is needed between and .