Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Summary: This is a quick expository recap, with links, of writing (in papers and on this forum) on reflective oracles, through 3/21/15. Read this if you want to learn more about reflective oracles, or if you’re curious about what we’ve been working on lately!

Introduction

Solomonoff induction can be used to predict a sequence of bits produced by a computable environment. However, if the environment that produces the bits contains the Solomonoff inductor (or other Solomonoff inductors), then the environment will be uncomputable, and Solomonoff induction will no longer have any guarantee of accuracy. This motivates the study of the naturalized induction problem. This
is still an open problem, but we have recently made progress
using the framework of reflective oracles.

In this framework, programs are allowed to call a reflective
oracle that (probabilistically) predicts
the output of other programs (which may also call reflective oracles).
Roughly speaking, the oracle answers questions of the form:
"Is the probability that machine M returns 1 greater than
p?"? If the probability that M returns 1 is less than p, then the oracle
must return 0; if it is greater than p, it must return 1; and if
it is exactly p, then it may randomize between 0 and 1 arbitrarily.

It is
due to the randomization that paradoxes do not arise.
For example, we could define a program that asks the oracle if it
returns 1 with "greater than" 50% probability, and then
outputs 0 if this is true or 1 if it is false.
In this case, the oracle returns 1 half the time and 0 the
other half of the time for the query the program makes. Then the program also
returns 1 half the time, so the oracle was indeed allowed to randomize its
answer to the query.

While reflective oracles aren't a complete solution to the naturalized
induction problem, they are a useful model that lets us define naturalized algorithms and prove interesting properties about them. In particular, it is possible to define reflective variants of causal decision theory, Solomonoff induction, and AIXI.

Papers

Reflective oracles as a Foundation for Classical Game Theory,
Benja Fallenstein, Jessica Taylor, and Paul Christiano.
This paper formalizes reflective oracles, shows that they always exist, and defines
causal decision theory for agents in a world containing reflective oracles. When
multiple agents use this reflective version of CDT, they will play a Nash equilibrium.
In fact, there is a one-to-one correspondence between reflective oracles
and Nash equilibria.

Reflective variants of Solomonoff Induction and AIXI,
Benja Fallenstein, Nate Soares, and Jessica Taylor.
This paper uses reflective oracles to define
reflective variants of Solomonoff induction and AIXI.
A reflective Solomonoff inductor will make predictions
consistent with the fact that its output affects
its environment, and a reflective AIXI will
take actions based on predictions consistent with
AIXI being part of the environment.

Development of reflective oracles on the forum

Although you only need to read the papers to know about the current
state of research in reflective oracles, forum posts on the subject
show how the research developed.

Topological truth predicates: Towards a model of perfect Bayesian agents, Benja Fallenstein. This post introduces the setting where agents can predict
each other using reflective truth predicates, and shows that a reflective
truth predicate exists. This is similar to reflective oracles, but using
logic instead of Turing machines.
Paul commented on this post with a suggestion
to use Turing machines instead of logic.

Oracle machines instead of topological truth predicates, Benja Fallenstein. This version of the reflective oracles formalism
is similar to the one that is currently used, but is different in that it specifies that
the oracle returns 0 for any program that does not halt for every oracle.
The post shows that these reflective oracles always exist.

Simplicity priors with reflective oracles, Benja Fallenstein. It is possible to formalize a variant of Solomonoff induction
using the reflective oracles from the previous post. There are some complications here
that require "protecting" queries.

Multibit reflective oracles, Benja Fallenstein. This is a new version of reflective oracles that (a)
extends the formalism to programs outputting a stream of bits (not just
a single bit) and (b) changes how the oracles behave on non-halting
machines. Instead of answering 0 in response to a query about a machine that does not halt for all oracles, a reflective oracle will now have somewhat arbitrary behavior for programs that do not halt (depending on the probability that the program does not halt). Formalizing Solomnoff induction is significantly simpler
with these new oracles.

Single-bit reflective oracles are enough, Benja Fallenstein. It turns out that we can use single-bit reflective
oracles to get the same functionality as multi-bit reflective oracles by
asking them for conditional probabilities given previous bits.

Exploratory applications of reflective oracles

Oracle machines for automatic philosophy, Nisan Stiennon (writing up Paul Christiano's idea). We can use reflective
oracles to define a method to automate philosophy by setting up a thought
experiment where a person can ask themselves questions and get responses
back immediately, in a recursive fashion. This is similar to an earlier approach that has been
dubbed "Paul-boxing",
but seems to have some nicer properties.
See Paul's comment for more
of his posts on the subject.

UDT in the land of probabilistic oracles, Jessica Taylor. This is a first step towards trying to define a UDT variant
using reflective oracles. However, the formalism in this post is unsatisfactory because when making a decision, an agent will
only act as if it its decision determines the actions of exact copies of itself.

Summary: This is a quick expository recap, with links, of writing (in papers and on this forum) on reflective oracles, through 3/21/15. Read this if you want to learn more about reflective oracles, or if you’re curious about what we’ve been working on lately!## Introduction

Solomonoff induction can be used to predict a sequence of bits produced by a computable environment. However, if the environment that produces the bits contains the Solomonoff inductor (or other Solomonoff inductors), then the environment will be uncomputable, and Solomonoff induction will no longer have any guarantee of accuracy. This motivates the study of the naturalized induction problem. This is still an open problem, but we have recently made progress using the framework of reflective oracles.

In this framework, programs are allowed to call a reflective oracle that (probabilistically) predicts the output of other programs (which may also call reflective oracles). Roughly speaking, the oracle answers questions of the form: "Is the probability that machine M returns 1 greater than p?"? If the probability that M returns 1 is less than p, then the oracle must return 0; if it is greater than p, it must return 1; and if it is exactly p, then it may randomize between 0 and 1 arbitrarily.

It is due to the randomization that paradoxes do not arise. For example, we could define a program that asks the oracle if it returns 1 with "greater than" 50% probability, and then outputs 0 if this is true or 1 if it is false. In this case, the oracle returns 1 half the time and 0 the other half of the time for the query the program makes. Then the program also returns 1 half the time, so the oracle was indeed allowed to randomize its answer to the query.

While reflective oracles aren't a complete solution to the naturalized induction problem, they are a useful model that lets us define naturalized algorithms and prove interesting properties about them. In particular, it is possible to define reflective variants of causal decision theory, Solomonoff induction, and AIXI.

## Papers

Reflective oracles as a Foundation for Classical Game Theory, Benja Fallenstein, Jessica Taylor, and Paul Christiano. This paper formalizes reflective oracles, shows that they always exist, and defines causal decision theory for agents in a world containing reflective oracles. When multiple agents use this reflective version of CDT, they will play a Nash equilibrium. In fact, there is a one-to-one correspondence between reflective oracles and Nash equilibria.

Reflective variants of Solomonoff Induction and AIXI, Benja Fallenstein, Nate Soares, and Jessica Taylor. This paper uses reflective oracles to define reflective variants of Solomonoff induction and AIXI. A reflective Solomonoff inductor will make predictions consistent with the fact that its output affects its environment, and a reflective AIXI will take actions based on predictions consistent with AIXI being part of the environment.

## Development of reflective oracles on the forum

Although you only need to read the papers to know about the current state of research in reflective oracles, forum posts on the subject show how the research developed.

Topological truth predicates: Towards a model of perfect Bayesian agents, Benja Fallenstein. This post introduces the setting where agents can predict each other using reflective truth predicates, and shows that a reflective truth predicate exists. This is similar to reflective oracles, but using logic instead of Turing machines. Paul commented on this post with a suggestion to use Turing machines instead of logic.

Oracle machines instead of topological truth predicates, Benja Fallenstein. This version of the reflective oracles formalism is similar to the one that is currently used, but is different in that it specifies that the oracle returns 0 for any program that does not halt for every oracle. The post shows that these reflective oracles always exist.

Simplicity priors with reflective oracles, Benja Fallenstein. It is possible to formalize a variant of Solomonoff induction using the reflective oracles from the previous post. There are some complications here that require "protecting" queries.

Multibit reflective oracles, Benja Fallenstein. This is a new version of reflective oracles that (a) extends the formalism to programs outputting a stream of bits (not just a single bit) and (b) changes how the oracles behave on non-halting machines. Instead of answering 0 in response to a query about a machine that does not halt for all oracles, a reflective oracle will now have somewhat arbitrary behavior for programs that do not halt (depending on the probability that the program does not halt). Formalizing Solomnoff induction is significantly simpler with these new oracles.

Single-bit reflective oracles are enough, Benja Fallenstein. It turns out that we can use single-bit reflective oracles to get the same functionality as multi-bit reflective oracles by asking them for conditional probabilities given previous bits.

## Exploratory applications of reflective oracles

Oracle machines for automatic philosophy, Nisan Stiennon (writing up Paul Christiano's idea). We can use reflective oracles to define a method to automate philosophy by setting up a thought experiment where a person can ask themselves questions and get responses back immediately, in a recursive fashion. This is similar to an earlier approach that has been dubbed "Paul-boxing", but seems to have some nicer properties. See Paul's comment for more of his posts on the subject.

UDT in the land of probabilistic oracles, Jessica Taylor. This is a first step towards trying to define a UDT variant using reflective oracles. However, the formalism in this post is unsatisfactory because when making a decision, an agent will only act as if it its decision determines the actions of exact copies of itself.