Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This article describes an idea of Paul Christiano's. Paul described it to Benja Fallenstein and Benja described it to me. It's a way to use reflective oracle machines as an aid in answering philosophical questions.

Edit, 7am Pacific Time, 17 February 2015: Made a major correction.

Recall that a reflective oracle machine is a pair , where is a probabilistic Turing machine with an advance-only output tape and the ability to make calls to the external oracle function . is a function satisfying certain properties. It takes as input a representation of any such Turing machine , a finite bitstring , and a rational number . It outputs a or depending on whether is greater than or less than the probability that 's output begins with . If the probability is exactly then is random. If fails to write an infinite sequence of bits, the oracle is allowed to pretend that it does. See Benja's post for details.

#Solomonoff induction

Benja's post also mentions that we can make a Solomonoff inductor out of a reflective oracle machine: Write a program that simply iterates over all and uses to determine what the output of each is. This program will be able to update a universal prior on observations and sample predictions from the posterior. The program will be very slow; call it . We can write another program that calls on and get our answer very fast.

Suppose we had such an inductor. If we feed it enough real-world data, it will be able to make very good predictions, assuming a version of the Church-Turing thesis that admits the existence of stochastic phenomena and the existence of reflective oracle machines.

We will suppose our inductor is implemented as a software object with two callable methods, observe() and query(). observe() appends the bitstring plus a reserved end-of-message code to an internal list of observations. query() returns the message maximizing subject to the constraint that must be terminated by the end-of-message code, where is the internal list of observations and indicates concatenation. In other words, it conditions on and returns the maximum-likelihood message following . An alternative approach would be to condition on and sample from the posterior.

#Automated philosophy

Suppose we want to solve a philosophical problem, , in a month. It would be nice if a philosopher had access to a device which, if given a question , creates an alternative universe with a copy of that philosopher initialized to some initial state, except that instead of being given the original question to work on, the copy of the philosopher is given . After the copy works on for a month, they submit their answer, the alternate universe is terminated, and the answer is returned to the original philosopher, with no time having passed in the original universe. Thus the philosopher can freely converse with a device that is qualitatively as smart as the philosopher except it can do a month's work of thinking instantaneously. It would be even better if the copy of the philosopher in the alternate universe can submit questions of their own to their own copy of the device.

We are going to describe a way to effectively build such a device out of the inductor from the previous section. Suppose we have a question of interest , a human, and an inductor. We choose a recursion depth . The protocol is as follows:

  1. A question and recursion depth are sampled from a certain distribution described below.
  2. The human is presented with and thinks about it for a month. If d > 0, the human may freely submit any number of questions , to . a. For each , we call inductor.query(). I.e., conditioning on observing the pair , what is the inductor's prediction about the next observation? b. The answer is displayed to the human, where is the value returned by inductor.query().
  3. After a month, the human settles on a final answer to the question .
  4. We call inductor.query(). The return value is a pair , and provides the answer to our question .
  5. We randomly sample from and call inductor.observe().

Note that even though we get our answer in step 4, we still have to reliably perform step 5, no matter what the answer is. Otherwise the inductor won't give us an accurate answer in step 4.

Note also that we could perform step 4 at the very beginning of the protocol. But we would have to be careful to prevent the human from learning the result until the end of the protocol.

The random sample in step 1 should be truly random, so that the inductor cannot predict it in advance. Sampling from a uniform distribution might have strange effects, because conditioning on in step 4 effectively discards the hypothesis that is drawn from a uniform distribution. Instead we want a distribution that favors the original question and sub-questions that naturally follow from it. So we use the following distribution instead:

( is the empty string.)

The reason we use a recursion depth limit is that otherwise the result might be a fixed point of rumination chosen by the oracle inside the inductor rather than the result of a lot of thought.

Care would have to be taken to make sure that the human is motivated to work on the assigned query rather than the bigger-picture ; they would have to be prevented from querying the inductor when or making other queries not allowed by the protocol; perhaps they should be discouraged from making frivolous queries that would test their own patience; and of course, the protocol won't produce results that are qualitatively wiser than the human at its heart.


The inductor is an impossible object. But any machine that's good at making predictions is in some sense an approximate inductor. If we had a theory of what it means for a machine to be approximately reflective, and we had a machine that was good at predicting people and which we judged to be sufficiently reflective, we might implement a version of the above protocol to augment philosophical work. Thus, while the advent of powerful predictive AI will hasten the advent of dangerous artificial agents, it might also hasten progress on philosophical problems like those in MIRI's technical research agenda.

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 10:54 AM

Thanks for writing this up.

I think that sampling from the posterior is a much safer bet. For example, suppose that you may answer a query either with the empty string (because you had a heart attack immediately) or with a complex philosophical treatise with many degrees of freedom. Then maximum likelihood will always give you the empty string! (Sorry if I'm misunderstanding this.)

My original post is here, though I'm afraid it's somewhat less precise and clear, and it may be too ambitious given what the technique can actually deliver.

Here is a first attempt at scaling down to more realistic predictors.