Finding reflective oracle distributions using a Kakutani map

jessicata

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(follow-up to: A correlated analogue to reflective oracles)

Motivation

Suppose players are playing in a correlated equilibrium using a reflective oracle distribution. How does the equilibrium they play in vary as a function of the parameters of the game, or of the players' policies? It turns out that the set of equilibria is a Kakutani map of the parameters to the game. This is a lot like it being a continuous map.

This might make it possible for players to reason about the effects of their policy on the equilibrium that they play (since the equilibrium is now a Kakutani map of the players' policies).

Definitions

Let $k$ be some natural number. We will consider reflective oracle distributions whose queries are parameterized on some vector in $θ \in R^{k}$ .

To do this, let the Turing machines $M$ , instead of outputting a raw query, output a continuous function from $θ \in R^{k}$ to the query. (The details of representing continuous functions don't seem that important). The reflectivity condition on oracle distributions is now relative to $θ$ (since the queries depend on $θ$ ).

Define the map

$P a r a m s T o D i s t r s (θ) := {D \in D | D is reflective relative to θ}$

which maps the parameters $θ$ to the set of reflective oracle distributions for $θ$ .

Theorem: $P a r a m s T o D i s t r s$ is a Kakutani map.

Proof:

From the previous post, we have:

For each $θ$ , $P a r a m s T o D i s t r s (θ)$ is nonempty (Theorem 1)
For each $θ$ , $P a r a m s T o D i s t r s (θ)$ is convex (Theorem 2)

So it is sufficient to show that $P a r a m s T o D i s t r s$ has a closed graph.

Let $θ_{1}, θ_{2}, . . .$ be an infinite sequence of $R^{k}$ values converging to $θ_{\infty}$ . Let $D_{1}, D_{2}, . . .$ be an infinite sequence of oracle distributions in $D$ converging to $D_{\infty}$ . Assume that for each natural $n$ , $D_{n}$ is reflective relative to $θ_{n}$ . We will show that $D_{\infty}$ is reflective relative to $θ_{\infty}$ ; this is sufficient to show that $P a r a m s T o D i s t r s$ has a closed graph.

Let $M \in M$ . Let $a$ be such that $D_{\infty} (O (M) = a) > 0$ . Then $D_{\infty} (O (M) = a) > ϵ$ for some $ϵ > 0$ . By convergence, there is some $N$ such that for all $n \geq N$ , $D_{n} (O (M) = a) > ϵ$ . By reflectivity of each $D_{n}$ relative to $θ_{i}$ , we have, for each $n \geq N$ ,

$a \in E v a l (M) (θ_{n}) (C o n d i t i o n (D_{n}, M, a))$

Let $q^{θ} := E v a l (M) (θ)$ . Rewriting the above statement:

$a \in arg max i \in {1, . . ., l (q)} E_{D_{n}} [q_{i}^{θ_{n}} (O)]$

Consider the set ${(θ, D) | a \in arg {max}_{i \in {1, . . ., l (q)}} E_{D} [q_{i}^{θ} (O)]}$ . This is the intersection of a finite number of sets of the form ${(θ, D) | E_{D} [q_{a}^{θ} (O) - q_{i}^{θ} (O)] \geq 0}$ . Each of these sets is closed because $(θ, D) \mapsto E_{D} [q_{a}^{θ} (O) - q_{i}^{θ} (O)]$ is continuous. Therefore the set ${(θ, D) | a \in arg {max}_{i \in {1, . . ., l (q)}} E_{D} [q_{i}^{θ} (O)]}$ is closed.

In total, this is sufficient to show $a \in arg {max}_{i \in {1, . . ., l (q)}} E_{D_{\infty}} [q_{i}^{θ_{\infty}} (O)]$ , as desired.

$□$

An immediate consequence of this theorem is that the set of correlated equilibria of a normal-form game is a Kakutani map of the parameters of the game.

LESSWRONG
LW

Finding reflective oracle distributions using a Kakutani map

1

Ω 1

Motivation

Definitions

New to LessWrong?

1

Ω 1