Topological truth predicates: Towards a model of perfect Bayesian agents

Benya_Fallenstein

In this post, I'll introduce a new kind of self-referential "truth predicate" (of sorts), which avoids diagonalization by placing a certain topological condition on the formulas it can be applied to. In future posts, I'll show how this can be used to model perfect Bayesian agents that are able to reason about a world containing other, equally powerful agents, and how, in particular, this yields a variant of AIXI that can reason about a world containing other instances of the same kind of AIXI.

This is inspired by the way that classical game theory avoids diagonalization by its use of mixed strategies, and in fact I'll show that if they have enough common knowledge, these agents will play Nash equilibria against each other; but this framework doesn't require players to be special kinds of objects, or at least not to the degree that classical game theory does. (Besides classical game theory, Paul Christiano's reflection principle for probabilistic logic was the other main inspiration for this work.)

Perfect Bayesian agents

The fact that AIXI can't model environments containing other AIXIs isn't as specific to AIXI as it may seem; the way I see it, it's really a problem with the decision-theoretic ideal of a logically omniscient, perfect Bayesian agent. A perfect Bayesian may have uncertainty about which possible world it's living in, but given any possible world in its hypothesis space, its supposed to know everything that happens in that world (so that it can calculate expected utility). For example, if the world is some sort of deterministic cellular automaton, and our agent has uncertainty about the initial state of that automaton, then for every particular initial state, the agent needs to figure out what will happen later---despite the fact that it itself lives inside that automaton.

Of course, in reality, that isn't literally possible. But having models that idealize some aspects of reality is good, because it allows us to ignore complications that are irrelevant to the problem we're trying to solve, and because it allows us to see which complications are due to which aspects of reality. So it would be nice to have a model of a world with unlimited computing power, or a world containing oracles, which could contain perfect Bayesians that are able to reason about each other. But unlimited computing power and oracles can't avoid diagonalization.

Suppose, say, that our agents are Turing machines with an oracle that allows them to determine the behavior of other Turing machines with the same kind of oracle (so that they can compute the utility they'll get in every possible world). If the laws of physics of our toy world allow for oracles like this, we could construct a machine that uses its oracle to figure out whether it (the machine) will output "heads" or "tails", and which outputs "heads" iff the oracle said it would output "tails". The usual way to avoid this kind of diagonalization is to consider oracles which can only answer questions about worlds that contain no oracles, or strictly weaker oracles---which is exactly the way AIXI solves the problem, leading to the familiar limitations.

Classical game theory faces a version of the same diagonalization problem, but uses a rather different way to resolve it.

Mixed strategies

Consider the game of Matching Pennies. Two players, A and B, simultaneously choose between two actions, "heads" and "tails". If they make the same choice, A wins; otherwise, B wins.

A Nash equilibrium is an assignment of strategies to players such that every player's strategy maximizes their expected utility, given the other players' strategies. So what are the Nash equilibria of Matching Pennies? If A plays heads, then B wants to play tails; but if B wants to play tails, then A wants to play tails as well; but if A plays tails, then B wants to play heads, and so on...

The answer, of course, is that Matching Pennies has a Nash equilibrium, but only in mixed strategies, where each player chooses their action probabilistically (and independently of the other player). If a player flips a fair coin to determine their action, then their opponent is indifferent between choosing heads and tails, and is thus fine with also flipping a coin. There's all sorts of reasons to suspect that Nash equilibria may not be a great model for real-world players (including superintelligent AI systems), but they do avoid diagonalization in a rather general way: There's a theorem that every game with finitely many players and finitely many pure strategies has at least one Nash equilibrium, if we consider mixed strategies.

The "truth predicate" I propose in this post solves the diagonalization problem in a way that's closely related to classical game theory's solution: First, I show the consistency of its reflection principle by using the same kind of fixed point theorem that is usually used to show the existence of Nash equilibria. (So does Paul's reflection principle, which lead to this idea, but in this case the math is even closer to that used in the Nash equilibria case.) And second, I propose a way to model perfect Bayesian agents as machines with access to a probabilistic oracle, which answers some queries randomly; this randomness will lead to randomness in the agents' choices, and this will resolve diagonalization issues in the same way as it does in classical game theory. In fact, I will show in a future post that if you have these agents play a game against each other, and they have enough mutual knowledge, they will play a Nash equilibrium.

The new "truth predicate"

The "truth predicate" we'll discuss in this post will actually be a real-valued two-argument function, $θ : N \times N \to [0, 1]$ . Let $L$ be the language of set theory, and let $L_{θ}$ be $L$ extended with a single constant symbol, also written $θ$ (because this constant symbol will represent the function $θ (\cdot, \cdot)$ inside our object language); then the arguments to $θ (\cdot, \cdot)$ that we're interested in will be the Gödel numbers of sentences in $L_{θ}$ . Roughly speaking, the reflection principle for $θ (\cdot, \cdot)$ will be as follows: If $φ$ and $ψ$ are mutually exclusive sentences in $L_{θ}$ which satisfy a certain topological condition, then $θ (┌ φ ┐, ┌ ψ ┐)$ will equal $0$ if $φ$ is true, will equal $1$ if $ψ$ is true, and will be arbitrary otherwise.

(Why a constant symbol and not a function symbol? Because it will be more convenient to treat $θ (\cdot, \cdot)$ as a set-theoretic function, rather than as a function in the logic. In other words, $θ$ will be an element of the set $[0, 1]^{N \times N}$ .)

Before going into the technical details, let me sketch how we'll use this "truth predicate". We'll imagine a world which allows probabilistic oracles for $L$ to be built; such an oracle takes two Gödel numbers, $┌ φ ┐$ and $┌ ψ ┐$ , and returns "true" with probability $θ (┌ φ ┐, ┌ ψ ┐)$ . Thus, the "laws of physics" of this world are stochastic, and determine a probability distribution over outcomes (rather than a single outcome), which is parametrized by $θ (\cdot, \cdot)$ .

Suppose that we have a utility function $u (\cdot)$ over these outcomes, which satisfies a certain continuity condition. Then, given two actions $a$ and $a^{'}$ that an agent could take, the sentences $φ$ $\equiv$ " $a$ leads to strictly higher expected utility than $a^{'}$ " and $ψ$ $\equiv$ " $a^{'}$ leads to strictly higher expected utility than $a$ " will satisfy the conditions of the reflection principle, which means that $θ (┌ φ ┐, ┌ ψ ┐)$ will be $0$ if $φ$ is true, and $1$ if $ψ$ is true. (I'm glossing over the details, but the $φ$ and $ψ$ I have in mind will be implemented using causal counterfactuals.)

We can now implement a perfect Bayesian (CDT) agent by building a machine which queries a probabilistic oracle on $(┌ φ ┐, ┌ ψ ┐)$ , and does $a^{'}$ if that oracle returns "true", $a$ otherwise. There are three possible cases:

Either $a$ is strictly better than $a^{'}$ ; then $φ$ is true, $θ (┌ φ ┐, ┌ ψ ┐) = 0$ , and the machine will take action $a$ with probability $1$ .
Or $a^{'}$ is strictly better than $a$ ; then $ψ$ is true, $θ (┌ φ ┐, ┌ ψ ┐) = 1$ , and the machine will definitely take action $a^{'}$ .
Or both actions lead to the same expected utility; then neither $φ$ nor $ψ$ is true, $θ (┌ φ ┐, ┌ ψ ┐)$ is arbitrary, and the machine will take action $a^{'}$ with probability $θ (┌ φ ┐, ┌ ψ ┐)$ , action $a$ otherwise.

In each case, the machine takes an action which maximizes expected utility. Diagonalization will be avoided by mixed strategies: If two such agents play a game of Matching Pennies against each other, both will choose heads or tails with 50% probability, making the other one indifferent between its options.

Isn't it a bit weird to call $θ$ a "predicate"? Well, here's one way to think about it. If $φ$ is a sentence in $L$ ---the unextended language of set theory---then $φ$ and $\neg φ$ are obviously mutually exclusive, and also happen to satisfy the topological condition on the reflection principle; hence, $θ (┌ \neg φ ┐, ┌ φ ┐) = 1$ if and only if $φ$ is true. Thus, $θ$ gives you a truth predicate for $L$ . For sentences in $L_{θ}$ , it can't exactly give you a truth predicate (because of Tarski's undefinability theorem), but the reflection principle for $θ$ is a way of coming close enough that we can model perfect Bayesian agents. (I'll say a bit more on this topic below.)

Formal set-up

Enough vague description; now it's time to fill in the mathematical details (at least of the truth predicate; I'll leave the application to Bayesian agents to future posts).

We'll work in ZFC + the existence of a strongly inaccessible cardinal as our metatheory. Fix a Grothendieck universe $U$ ; a Grothendieck universe is a set such that $U := (U, \in)$ is a "nice" model of set theory, and the existence of a Grothendieck universe is equivalent to the existence of a strongly inaccessible cardinal. One of the ways in which Grothendieck universes are "nice" is that normal mathematical notions such as the real numbers or the arithmetic operations (and most other things) are absolute for $U$ , which means that when you interpret the definition of $R$ inside the model $U$ , you get the same set as when you interpret it in the class of all sets, and similarly for the set $(+) \in R^{R \times R}$ , and so on.

(You should probably be familiar with the notion of absoluteness and the basic results about it if you want to check the math in this post in detail. Sections IV.2--IV.5 of Kunen's set theory book give an introduction. The other main prerequisite is an understanding of the product topology.)

We call a function $θ \in [0, 1]^{N \times N}$ an assignment, and consider the space of all assignments to be endowed with the product topology. Then $(U, θ) = (U, \in, θ)$ is an $L_{θ}$ -structure which is a model of ZFC (and, moreover, a model of ZFC with the axiom schemas ranging over $L_{θ}$ ). Thus, given an assignment $θ$ and a sentence $φ$ in the extended language, $(U, θ) ⊨ φ$ denotes that the sentence $φ$ is true in this model. We define the extension $X_{φ}$ of $φ$ as the set $X_{φ} := {θ \in [0, 1]^{N \times N} : (U, θ) ⊨ φ}$ of all assignments that make $φ$ true.

A query is a pair $(┌ φ ┐, ┌ ψ ┐)$ of Gödel numbers such that $X_{φ}$ and $X_{ψ}$ are disjoint and open in the product topology. (This is the topological condition we're placing on our reflection principle.) We say that an assignment $θ^{'}$ reflects an assignment $θ$ if for every query $(┌ φ ┐, ┌ ψ ┐)$ , we have $θ^{'} (┌ φ ┐, ┌ ψ ┐) = 0$ if $θ \in X_{φ}$ , and $θ^{'} (┌ φ ┐, ┌ ψ ┐) = 1$ if $θ \in X_{ψ}$ . Intuitively, this means that when we ask questions of a probabilistic oracle which behaves according to $θ^{'}$ , then this oracle gives answers that are true about $θ$ .

An assignment is called reflective if it reflects itself; reflective assignments give rise to oracles whose answers are true about the oracle itself.

A reflective assignment can be seen as a kind of truth predicate. Note, for example, that if $φ$ is any sentence in $L$ , the unextended language of set theory, then $X_{φ}$ is either the whole space, $[0, 1]^{N \times N}$ (if $U ⊨ φ$ ), or the empty set, $\emptyset$ (if $U ⊭ φ$ ); thus, $X_{φ}$ and $X_{\neg φ}$ are open and disjoint. Hence, a reflective assignment $θ$ provides us with a truth predicate for $L$ : We have $θ (┌ \neg φ ┐, ┌ φ ┐) = 1$ iff $U ⊨ φ$ .

For sentences in the extended language, the information provided by $θ$ is more restricted. However, $θ$ can still be seen as a kind of truth predicate.

Suppose that $(┌ φ ┐, ┌ ψ ┐)$ is a query, i.e., that $X_{φ}$ and $X_{ψ}$ are open and disjoint. Then $θ$ does not quite tell us the truth values of $φ$ and $ψ$ : if $(U, θ) ⊨ ψ$ , then it guaranteed that $θ (┌ φ ┐, ┌ ψ ┐) = 1$ , but this is only a sufficient condition; it is also possible that $θ (┌ φ ┐, ┌ ψ ┐) = 1$ if neither $φ$ nor $ψ$ is true. However, we are guaranteed that if $θ \in X_{φ}$ , then $θ (┌ φ ┐, ┌ ψ ┐) = 0$ , and if $θ \in X_{ψ}$ , then $θ (┌ φ ┐, ┌ ψ ┐) = 1$ ; hence, we have the following weaker reflection property: If $θ \in X_{φ}$ , then by looking at $θ (┌ φ ┐, ┌ ψ ┐)$ we can tell that $θ \notin X_{ψ}$ , and if $θ \in X_{ψ}$ , then we can tell that $θ \notin X_{φ}$ . This weak reflection property may seem odd, but as we will see in later posts, it is exactly what we need in order to reproduce the notion of Nash equilibrium from classical game theory.

(Remark: It is easy to show that given any sentence $φ$ in the extended language, we can find sentences $φ^{\circ}$ and $¯ ¯¯ ¯ φ$ such that $X_{φ^{\circ}}$ and $X_{¯ ¯ ¯ φ}$ are the interior and closure of $X_{φ}$ , respectively. Thus, for any sentence $φ$ , the pair $(┌ φ^{\circ} ┐, ┌ \neg ¯ ¯¯ ¯ φ ┐)$ is a query; by the above argument, then, if $θ$ lies in the interior of $X_{φ}$ , looking at $θ (┌ φ^{\circ} ┐, ┌ \neg ¯ ¯¯ ¯ φ ┐)$ will tell us that it does not lie in the exterior, and if it lies in the exterior, looking at $θ (┌ φ^{\circ} ┐, ┌ \neg ¯ ¯¯ ¯ φ ┐)$ will tell us that it does not lie in the interior.)

We now want to show that reflective assignments exist. However, in a future post, we will want to show that for any Nash equilibrium of a finite game, there is a reflective assignment that gives rise to this Nash equilibrium; for this, it will be helpful to have a slightly stronger result available, which shows that there are reflective assignments satisfying certain constraints. (The result about Nash equilibria, incidentally, is also the reason why we need to make $θ$ a two-parameter function, instead of a one-parameter function $θ (┌ φ ┐)$ which behaves like our $θ (┌ φ^{\circ} ┐, ┌ \neg ¯ ¯¯ ¯ φ ┐)$ .)

These constraints will take the form of a partial function $π : N \times N ↛ [0, 1]$ , which specifies the values $θ$ should take on a certain set $d o m (π) \subseteq N \times N$ . We call a function of this type a partial assignment, and say that $θ$ extends $π$ if $θ (m, n) = π (m, n)$ for all $(m, n) \in d o m (π)$ . We call a partial assignment $π$ reflective if for every assignment $θ$ extending $π$ , there is an assignment $θ^{'}$ , also extending $π$ , which reflects $θ$ .

An existence theorem

With these preliminaries, we can state our existence result for reflective assignments:

Theorem.

(i) There is a reflective assignment $θ : N \times N \to [0, 1]$ . (ii) For every reflective partial assignment $π : N \times N ↛ [0, 1]$ , there is a reflective assignment $θ$ which extends $π$ .

Proof. We begin by showing that the empty partial assignment $π_{\emptyset}$ (i.e., the partial assignment satisfying $d o m (π_{\emptyset}) = \emptyset$ ) is reflective, since (i) then follows from (ii). Thus, let $θ$ be any assignment (since every assignment extends $π_{\emptyset}$ ), and define $θ^{'} : N \times N \to [0, 1]$ as follows: For any query $(┌ φ ┐, ┌ ψ ┐)$ such that $θ \in X_{ψ}$ , set $θ^{'} (┌ φ ┐, ┌ ψ ┐) = 1$ ; for all other pairs $(m, n)$ , set $θ^{'} (m, n) = 0$ . Then, $θ^{'}$ clearly both reflects $θ$ and (trivially) extends $π_{\emptyset}$ . This finishes the proof that $π_{\emptyset}$ is a reflective partial assignment.

It remains to show (ii). For this, we use the infinite-dimensional generalization of Kakutani’s fixed point theorem:

Suppose that $A$ is a non-empty, convex and compact subset of a locally convex topological vector space. Suppose further that $f : A \to P o w (A)$ is a set-valued function such that $f (x)$ is non-empty, convex and compact for all $x \in A$ , and such that the graph of $f$ , ${(x, y) : x \in A, y \in f (x)},$ is a closed set. Then $f$ has a fixed point; that is, there is an $x \in A$ such that $x \in f (x)$ .

We let $A$ be the set of all assignments extending $π$ ; then $A \subset R^{N \times N}$ , which is a locally convex topological vector space when endowed with the product topology (since this is true of any power of $R$ ). $A$ is clearly non-empty, convex, and closed, and it is a subset of $[0, 1]^{N \times N}$ which, by Tychonoff’s theorem, is compact.

We choose $f (θ)$ to consist of all assignments $θ^{'}$ that reflect $θ$ and extend $π$ . By the assumption that $π$ is reflective, $f (θ)$ is non-empty for every $θ \in A$ . Moreover, it is easy to see that $f (θ)$ is both closed and convex; hence, if we can also show that $f$ has closed graph, then by the fixed point theorem, there is a $θ \in A$ such that $θ \in f (θ)$ , which is exactly what we want to show.

Thus, assume that we have sequences $θ_{n}$ and $θ_{n}^{'}$ of assignments extending $π$ such that $θ_{n}^{'} \in f (θ_{n})$ for every $n \in N$ , and suppose that these sequences converge to limits $θ, θ^{'} \in A$ ; then we need to show that $θ^{'} \in f (θ)$ , i.e., that $θ^{'}$ reflects $θ$ . To see this, we must show that for every query $(┌ φ ┐, ┌ ψ ┐)$ , $θ \in X_{φ}$ implies $θ^{'} (┌ φ ┐, ┌ ψ ┐) = 0$ , and $θ \in X_{ψ}$ implies $θ^{'} (┌ φ ┐, ┌ ψ ┐) = 1$ .

Without loss of generality, assume that $θ \in X_{ψ}$ . Since this is an open set and $θ_{n}$ converges to $θ$ , it follows that there is some $n_{0}$ such that $θ_{n} \in X_{ψ}$ for all $n \geq n_{0}$ , whence $θ_{n}^{'} (┌ φ ┐, ┌ ψ ┐) = 1$ . But since $θ_{n}^{'}$ converges to $θ^{'}$ , and convergence in the product topology is pointwise, this implies $θ^{'} (┌ φ ┐, ┌ ψ ┐) = 1$ , as desired. □

LESSWRONG
LW