Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

In this post, I'll introduce a new kind of self-referential "truth predicate" (of sorts), which avoids diagonalization by placing a certain topological condition on the formulas it can be applied to. In future posts, I'll show how this can be used to model perfect Bayesian agents that are able to reason about a world containing other, equally powerful agents, and how, in particular, this yields a variant of AIXI that can reason about a world containing other instances of the same kind of AIXI.

This is inspired by the way that classical game theory avoids diagonalization by its use of mixed strategies, and in fact I'll show that if they have enough common knowledge, these agents will play Nash equilibria against each other; but this framework doesn't require players to be special kinds of objects, or at least not to the degree that classical game theory does. (Besides classical game theory, Paul Christiano's reflection principle for probabilistic logic was the other main inspiration for this work.)

Perfect Bayesian agents

The fact that AIXI can't model environments containing other AIXIs isn't as specific to AIXI as it may seem; the way I see it, it's really a problem with the decision-theoretic ideal of a logically omniscient, perfect Bayesian agent. A perfect Bayesian may have uncertainty about which possible world it's living in, but given any possible world in its hypothesis space, its supposed to know everything that happens in that world (so that it can calculate expected utility). For example, if the world is some sort of deterministic cellular automaton, and our agent has uncertainty about the initial state of that automaton, then for every particular initial state, the agent needs to figure out what will happen later---despite the fact that it itself lives inside that automaton.

Of course, in reality, that isn't literally possible. But having models that idealize some aspects of reality is good, because it allows us to ignore complications that are irrelevant to the problem we're trying to solve, and because it allows us to see which complications are due to which aspects of reality. So it would be nice to have a model of a world with unlimited computing power, or a world containing oracles, which could contain perfect Bayesians that are able to reason about each other. But unlimited computing power and oracles can't avoid diagonalization.

Suppose, say, that our agents are Turing machines with an oracle that allows them to determine the behavior of other Turing machines with the same kind of oracle (so that they can compute the utility they'll get in every possible world). If the laws of physics of our toy world allow for oracles like this, we could construct a machine that uses its oracle to figure out whether it (the machine) will output "heads" or "tails", and which outputs "heads" iff the oracle said it would output "tails". The usual way to avoid this kind of diagonalization is to consider oracles which can only answer questions about worlds that contain no oracles, or strictly weaker oracles---which is exactly the way AIXI solves the problem, leading to the familiar limitations.

Classical game theory faces a version of the same diagonalization problem, but uses a rather different way to resolve it.

Mixed strategies

Consider the game of Matching Pennies. Two players, A and B, simultaneously choose between two actions, "heads" and "tails". If they make the same choice, A wins; otherwise, B wins.

A Nash equilibrium is an assignment of strategies to players such that every player's strategy maximizes their expected utility, given the other players' strategies. So what are the Nash equilibria of Matching Pennies? If A plays heads, then B wants to play tails; but if B wants to play tails, then A wants to play tails as well; but if A plays tails, then B wants to play heads, and so on...

The answer, of course, is that Matching Pennies has a Nash equilibrium, but only in mixed strategies, where each player chooses their action probabilistically (and independently of the other player). If a player flips a fair coin to determine their action, then their opponent is indifferent between choosing heads and tails, and is thus fine with also flipping a coin. There's all sorts of reasons to suspect that Nash equilibria may not be a great model for real-world players (including superintelligent AI systems), but they do avoid diagonalization in a rather general way: There's a theorem that every game with finitely many players and finitely many pure strategies has at least one Nash equilibrium, if we consider mixed strategies.

The "truth predicate" I propose in this post solves the diagonalization problem in a way that's closely related to classical game theory's solution: First, I show the consistency of its reflection principle by using the same kind of fixed point theorem that is usually used to show the existence of Nash equilibria. (So does Paul's reflection principle, which lead to this idea, but in this case the math is even closer to that used in the Nash equilibria case.) And second, I propose a way to model perfect Bayesian agents as machines with access to a probabilistic oracle, which answers some queries randomly; this randomness will lead to randomness in the agents' choices, and this will resolve diagonalization issues in the same way as it does in classical game theory. In fact, I will show in a future post that if you have these agents play a game against each other, and they have enough mutual knowledge, they will play a Nash equilibrium.

The new "truth predicate"

The "truth predicate" we'll discuss in this post will actually be a real-valued two-argument function, . Let be the language of set theory, and let be extended with a single constant symbol, also written (because this constant symbol will represent the function inside our object language); then the arguments to that we're interested in will be the Gödel numbers of sentences in . Roughly speaking, the reflection principle for will be as follows: If and are mutually exclusive sentences in which satisfy a certain topological condition, then will equal if is true, will equal if is true, and will be arbitrary otherwise.

(Why a constant symbol and not a function symbol? Because it will be more convenient to treat as a set-theoretic function, rather than as a function in the logic. In other words, will be an element of the set .)

Before going into the technical details, let me sketch how we'll use this "truth predicate". We'll imagine a world which allows probabilistic oracles for to be built; such an oracle takes two Gödel numbers, and , and returns "true" with probability . Thus, the "laws of physics" of this world are stochastic, and determine a probability distribution over outcomes (rather than a single outcome), which is parametrized by .

Suppose that we have a utility function over these outcomes, which satisfies a certain continuity condition. Then, given two actions and that an agent could take, the sentences " leads to strictly higher expected utility than " and " leads to strictly higher expected utility than " will satisfy the conditions of the reflection principle, which means that will be if is true, and if is true. (I'm glossing over the details, but the and I have in mind will be implemented using causal counterfactuals.)

We can now implement a perfect Bayesian (CDT) agent by building a machine which queries a probabilistic oracle on , and does if that oracle returns "true", otherwise. There are three possible cases:

  • Either is strictly better than ; then is true, , and the machine will take action with probability .
  • Or is strictly better than ; then is true, , and the machine will definitely take action .
  • Or both actions lead to the same expected utility; then neither nor is true, is arbitrary, and the machine will take action with probability , action otherwise.

In each case, the machine takes an action which maximizes expected utility. Diagonalization will be avoided by mixed strategies: If two such agents play a game of Matching Pennies against each other, both will choose heads or tails with 50% probability, making the other one indifferent between its options.

Isn't it a bit weird to call a "predicate"? Well, here's one way to think about it. If is a sentence in ---the unextended language of set theory---then and are obviously mutually exclusive, and also happen to satisfy the topological condition on the reflection principle; hence, if and only if is true. Thus, gives you a truth predicate for . For sentences in , it can't exactly give you a truth predicate (because of Tarski's undefinability theorem), but the reflection principle for is a way of coming close enough that we can model perfect Bayesian agents. (I'll say a bit more on this topic below.)

Formal set-up

Enough vague description; now it's time to fill in the mathematical details (at least of the truth predicate; I'll leave the application to Bayesian agents to future posts).

We'll work in ZFC + the existence of a strongly inaccessible cardinal as our metatheory. Fix a Grothendieck universe ; a Grothendieck universe is a set such that is a "nice" model of set theory, and the existence of a Grothendieck universe is equivalent to the existence of a strongly inaccessible cardinal. One of the ways in which Grothendieck universes are "nice" is that normal mathematical notions such as the real numbers or the arithmetic operations (and most other things) are absolute for , which means that when you interpret the definition of inside the model , you get the same set as when you interpret it in the class of all sets, and similarly for the set , and so on.

(You should probably be familiar with the notion of absoluteness and the basic results about it if you want to check the math in this post in detail. Sections IV.2--IV.5 of Kunen's set theory book give an introduction. The other main prerequisite is an understanding of the product topology.)

We call a function an assignment, and consider the space of all assignments to be endowed with the product topology. Then is an -structure which is a model of ZFC (and, moreover, a model of ZFC with the axiom schemas ranging over ). Thus, given an assignment and a sentence in the extended language, denotes that the sentence is true in this model. We define the extension of as the set of all assignments that make true.

A query is a pair of Gödel numbers such that and are disjoint and open in the product topology. (This is the topological condition we're placing on our reflection principle.) We say that an assignment reflects an assignment if for every query , we have if , and if . Intuitively, this means that when we ask questions of a probabilistic oracle which behaves according to , then this oracle gives answers that are true about .

An assignment is called reflective if it reflects itself; reflective assignments give rise to oracles whose answers are true about the oracle itself.

A reflective assignment can be seen as a kind of truth predicate. Note, for example, that if is any sentence in , the unextended language of set theory, then is either the whole space, (if ), or the empty set, (if ); thus, and are open and disjoint. Hence, a reflective assignment provides us with a truth predicate for : We have iff .

For sentences in the extended language, the information provided by is more restricted. However, can still be seen as a kind of truth predicate.

Suppose that is a query, i.e., that and are open and disjoint. Then does not quite tell us the truth values of and : if , then it guaranteed that , but this is only a sufficient condition; it is also possible that if neither nor is true. However, we are guaranteed that if , then , and if , then ; hence, we have the following weaker reflection property: If , then by looking at we can tell that , and if , then we can tell that . This weak reflection property may seem odd, but as we will see in later posts, it is exactly what we need in order to reproduce the notion of Nash equilibrium from classical game theory.

(Remark: It is easy to show that given any sentence in the extended language, we can find sentences and such that and are the interior and closure of , respectively. Thus, for any sentence , the pair is a query; by the above argument, then, if lies in the interior of , looking at will tell us that it does not lie in the exterior, and if it lies in the exterior, looking at will tell us that it does not lie in the interior.)

We now want to show that reflective assignments exist. However, in a future post, we will want to show that for any Nash equilibrium of a finite game, there is a reflective assignment that gives rise to this Nash equilibrium; for this, it will be helpful to have a slightly stronger result available, which shows that there are reflective assignments satisfying certain constraints. (The result about Nash equilibria, incidentally, is also the reason why we need to make a two-parameter function, instead of a one-parameter function which behaves like our .)

These constraints will take the form of a partial function , which specifies the values  should take on a certain set . We call a function of this type a partial assignment, and say that extends  if for all . We call a partial assignment reflective if for every assignment extending , there is an assignment , also extending , which reflects .

An existence theorem

With these preliminaries, we can state our existence result for reflective assignments:

Theorem.

(i) There is a reflective assignment . ​ (ii) For every reflective partial assignment , there is a reflective assignment  which extends .

Proof. We begin by showing that the empty partial assignment (i.e., the partial assignment satisfying ) is reflective, since (i) then follows from (ii). Thus, let  be any assignment (since every assignment extends ), and define  as follows: For any query such that , set ; for all other pairs , set . Then, clearly both reflects  and (trivially) extends . This finishes the proof that is a reflective partial assignment.

It remains to show (ii). For this, we use the infinite-dimensional generalization of Kakutani’s fixed point theorem:

Suppose that is a non-empty, convex and compact subset of a locally convex topological vector space. Suppose further that is a set-valued function such that is non-empty, convex and compact for all , and such that the graph of , is a closed set. Then  has a fixed point; that is, there is an  such that .

We let be the set of all assignments extending ; then , which is a locally convex topological vector space when endowed with the product topology (since this is true of any power of ). is clearly non-empty, convex, and closed, and it is a subset of  which, by Tychonoff’s theorem, is compact.

We choose to consist of all assignments  that reflect  and extend . By the assumption that  is reflective, is non-empty for every . Moreover, it is easy to see that is both closed and convex; hence, if we can also show that  has closed graph, then by the fixed point theorem, there is a  such that , which is exactly what we want to show.

Thus, assume that we have sequences and of assignments extending  such that for every , and suppose that these sequences converge to limits ; then we need to show that , i.e., that reflects . To see this, we must show that for every query , implies , and implies .

Without loss of generality, assume that . Since this is an open set and converges to , it follows that there is some such that for all , whence . But since converges to , and convergence in the product topology is pointwise, this implies , as desired. □

New Comment
8 comments, sorted by Click to highlight new comments since: Today at 1:05 PM

I think this is a nice model, thank you for writing it up!

I'm afraid that not many people will be able to wrap their head around it in this format. It seems like there is a more computer sciencey presentation which has a similar upside, which Scott Aaronson and I discussed in March 2013 and Jessica Taylor and I worked out that April after a MIRI workshop. (Sorry I didn’t mention this earlier, I vaguely recalled it when I saw this post and found it while rooting around in my emails.)

I think there exists an oracle O, which takes as input the description of a probabilistic oracle turing machine M, and has the following property. O(<M>, n) outputs a (possibly stochastic) rational number within 1/n of the interval [P(M[O] = 1), 1 - P(M[O] = 0)], where M[O] is M run with the oracle O. If M[O] outputs something in {0, 1} with probability 1, then this gives us an approximation to the distribution of its outputs. Not halting or outputting something outside of {0, 1} can be interpreted arbitrarily. If we want to make things total, we can also use the lim inf of O(<M>, n) to define a machine's output distribution, and I think this lets us reproduce the desired results on Nash equilibria.

This can then be used to define rational agents in a way that might be more comprehensible to computer scientists. Do you think there are big advantages of the more mathematical approach?

So you're saying that there is an oracle that can predict the outputs of any oracle you give it through a probability distribution of the possible outputs?

I'd love to see what this approach looked like worked out in more detail. I think there would be a lot of benefit to making this kind of result accessible to computer scientists without strong math backgrounds.

This may be the most underappreciated post on LessWrong that I've come across thus far.

That's a very beautiful and imaginative idea! I have a couple questions about the next post you're going to write, hope they're not too far off the mark.

  • For games with multiple Nash equilibria, do the different reflective assignments correspond to different methods of equilibrium selection? Does that mean that bargaining problems, like the ultimatum game, get "outsourced" to the oracle?

  • In the symmetric Prisoner's Dilemma, will your framework lead to mutual defection?

Thanks! :-)

The reflective assignments correspond very directly to Nash equilibria (albeit in an -to- manner, because given any finite game, a reflective assignment contains information about that game but also information about many other things). So I wouldn't quite say that they correspond to methods of equilibrium selection---e.g., if two methods of equilibrium selection give you the same Nash equilibrium, they're not distinguishable on the level of reflective assignments. But yeah, the question of what equilibrium gets played gets outsourced to the oracle.

And yeah, since these agents play Nash equilibria, they will defect on the symmetric PD. (I'm not advocating this as a desirable feature. :-))

Now I'm wondering how this compares to Paul's probabilistic reflection. Are there some nice "consistency" axioms that are satisfied by one approach but not the other, or vice versa?