Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

The long-awaited continuation of this sequence.

Credit to Sam, Tsvi, Nisan, Scott, me, and Giles (who helped make the figures)

It may or may not become a paper, but if it doesn't, it's best for it to not be locked in a folder of drafts. This will also recap material from previous posts and papers.


Refresher on Reflective Oracles

Let denote an arbitrary probabilistic Turing machine with oracle access, and refer to the same machine equipped with the oracle . It is important to remember that refers to a randomized algorithm, while refers to the algorithm equipped with a particular oracle, so it doesn't make sense to ask what the output of is, only what the output of is.

Reflective oracles as originally defined here are oracles which accurately answer queries about whether a probabilistic Turing machine with access to the same oracle, , output a 1 with probability greater than or less than , by outputting 1 for "yes", 0 for "no", and randomizing between the two if the probability of outputting 1 is exactly . This is more expressive than it may seem at first, because it is possible to query the oracle about a probabilistic oracle machine that runs and checks whether the output has some arbitrary computable property.

It is also possible to use reflective oracles to make the probability distribution over the output of one algorithm vary continuously with respect to the probability distribution over the output of another algorithm, via repeated oracle calls and randomization, as demonstrated here.

Reflective oracles can be thought of as akin to halting oracles, except that they accurately answer questions about the class of probabilistic Turing machines with access to the same oracle, which halting oracles cannot do. If the diagonalization argument against the existence of a halting oracle is applied to a reflective oracle, where an algorithm invokes the oracle on itself, and outputs the bit which it is the least likely to output, a reflective oracle will randomize between outputting 0 and 1, but for any , if the oracle is asked whether the algorithm outputs 1 with greater than probability , it will respond "yes".

This ability to accurately answer questions about themselves makes reflective oracles well-suited for studying the unbounded computing power limit of game theory, and situations where an agent reasons about a surrounding environment which contains itself. In particular, if a set of agents in a game all use a reflective oracle to perfectly predict the behavior of all other agents in the game, and then select the optimal action, a Nash equilibrium will be played in the game. This will be called the shortsighted best response algorithm.

To begin, there is a canonical way to translate any game theory problem into the setting of probabilistic oracle machines. As a concrete example, take a game of Prisoner's Dilemma, with an exchange of source code between agents. Let and refer to the source code of the two players in a Prisoner's Dilemma game. Then the game may be thought of as the two probabilistic oracle machines and calling the reflective oracle on each other, and outputting 0 for "defect" and 1 for "cooperate". In general, any game can be expressed by expressing all players as a pair of two algorithms, one of which outputs a strategy, and one of which takes no input and computes the utility via oracle calls. By fixing a suitable input for the algorithm which generates the strategy, which represents the information a player has, all strategy algorithms become inputless probabilistic oracle machines which make oracle calls to other players.

More formally, let a player be a pair of a probabilistic oracle machine , and a probabilistic oracle machine which halts with probability 1 regardless of the choice of oracle, and outputs 0 or 1. is interpreted as the algorithm which produces the strategy of the player, and is interpreted as the utility function. Fixing an oracle , the probability of outputting 1 is the utility of the player .

There are many different variants of a reflective oracle, which are essentially equivalent. There is the standard single-bit reflective oracle, presented here but multi-bit reflective oracles may also be defined. Here, I used the semimeasure formulation of reflective oracles, which accurately answers any question about prefixes of a Turing machine output, and assume that the outputs of the Turing machine form a semimeasure over the set of finite bitstrings.

Instead of the queries consisting of a pair, they consist of a triple, asking about the probability of the component of halting with a bitstring that has as a prefix. Let be the probability that halts with a bitstring that has as a prefix, and let be the probability that halts with a bitstring that doesn't have as a prefix. Let be the probability of the oracle returning a 1 in response to the query . Then one of the defining properties of a reflective oracle is that, for all bitstrings , rational probabilities , and players :

If a probabilistic oracle machine halts with a certain probability distribution, it will be accurately captured by this variant of reflective oracle, but nonhalting probability mass may be assigned to an arbitrary distribution over bitstrings, as long as it makes a semimeasure.

Cooperative Oracles:

In the particular case where the two agents in a Prisoner's Dilemma both use shortsighted best response, the set of fixed points is the set of Nash equilibria, so they will mutually defect.

However, different algorithms may be used by the players. As long as each player's set of responses to everyone else's strategies is nonempty, convex, and has closed graph, Kakutani's fixed point theorem guarantees the existence of an "equilibrium set" where all players respond to knowledge of all other player strategies with the exact strategy that all other players predicted. This is the same as the fair set from here.

If we consider two agents in the Prisoner's Dilemma implementing the algorithm "cooperate with the same probability as my opponent cooperates", there is a continuum of reflective oracles for all probabilities of cooperation in . Intuitively, mutual cooperation should be selected. This splits the problem of formalizing Functional Decision Theory in the unbounded case into the problem of finding an algorithm for the agent to use, and the problem of finding an oracle which causes a Pareto optimal element of the equilibrium set to be played.

A naive implementation, where we select a reflective oracle that is Pareto optimal for all players, fails. This is because all utility functions have an opposing one, which makes all outcomes Pareto optimal, because there is no outcome that makes one player better off without making another player worse off. In the Prisoner's Dilemma, if there is a third player, a jailer who wants to maximize jail time, but cannot otherwise influence the outcome of the game, all outcomes are Pareto optimal, because there is no choice which makes a prisoner better off that doesn't also leave the jailer worse off, and vice-versa. Therefore, the optimality notion must somehow restrict to the set of players who are actually participating in a game, instead of the set of all players.

To clarify what it means to talk about the set of players who participate in a game, we can look at the structure of which players make oracle calls to other players. If there is a game with finitely many players, the utility functions of all players should make oracle calls to the output of all players, to compute a utility.

Let the dependency set of player be the set of all players that either the strategy algorithm or the utility function of may query the oracle about, as well as itself. Let the relation be the minimal relation such that if is in 's dependency set, . The transitive closure of this relation, , induces a poset over equivalence classes of players.

Minimal elements in this poset correspond to sets of players whose strategies and utility functions only depend on each other. Players in a self-contained game have this property. A non-minimal equivalence class in this poset corresponds to a game where the utility functions of some participants are affected by what happens in a different game they cannot influence. Due to these properties, is a natural choice to define the boundaries of a game, and to formalize the desideratum that our notion of optimality should not be Pareto optimal for spectators.

Now, the notion of stratified Pareto optimality may be defined. Let be the utility of the player , upon being equipped with the oracle . A reflective oracle is a stratified Pareto improvement on another reflective oracle , for a player , iff:

(1):

(2):

(3):

The first condition is that a stratified Pareto improvement for must make better off.

The second condition is that all players that depends on, even indirectly, must be unaffected or better off. In the case of a finite and minimal equivalence class of players under, this says that all other players in the game must be unaffected or better off, so all stratified Pareto improvements are Pareto improvements for this game. In the case of the jailer in the Prisoner's Dilemma, it says that both prisoners must receive the same or greater utility, because the jailer's utility function makes an oracle query about their output.

The third condition is that a change in oracles should only affect players whose strategy algorithm or utility function depends on the outcome of a game, and leave all other players unaffected. This condition is used to forbid cases where selecting a better oracle in a game of Stag Hunt makes the players in a totally unrelated game of Chicken worse off, because the same oracle is used by all players.

A point is stratified Pareto optimal (SPO) for a set of players if there are no stratified Pareto improvements for any players in . In the particular case of a Prisoner's Dilemma with a jailer, a stratified Pareto optimum will involve both prisoners selecting a Pareto optimal outcome in their equilibrium set, and ignoring the jailer.

However, even stratified Pareto optima aren't enough in all cases, by the case of the chain of infinitely many players detailed here

Let an almost stratified Pareto optimum for a set of players be a reflective oracle that is in the closure of the set of stratified Pareto optimal oracles for all players in . In the case of the infinitely descending chain of players, an almost stratified Pareto optimal reflective oracle would report that all players had probability 0 of outputting 1, because that is the limit of a sequence of SPO oracles for a player, which move the "first player which outputs 1" arbitrarily far back in the sequence.

Theorem 1 (weak form): There exists a reflective oracle that is almost stratified Pareto optimal for all players.

A proof sketch for Theorem 1 is as follows:

Given an arbitrary finite set of players, , there is a finite poset of equivalence classes of players in under . Given some initial set of reflective oracles, it is possible to construct a subset that is Pareto optimal for all players in a minimal equivalence class , and a Pareto optimal subset of that for a minimal equivalence class in the poset with removed from it, and so on. This process repeats until there is a set of SPO reflective oracles for all players in . Now that existence has been shown, take the closure of the set of SPO reflective oracles for . The resulting set is a compact set of ASPO reflective oracles for . This argument works for all finite . Because there is a compact and nonempty set of ASPO reflective oracles for all finite sets of players, by compactness, there must be a compact and nonempty set of ASPO reflective oracles for all players.

This is pretty much the old proof of "there's an ASPO point for all players", the only difference is that this proof takes place in reflective-oracle space, instead of the space of the actions of all players. Let a weak cooperative oracle be a reflective oracle with this property.

Let a well-founded equivalence class be an equivalence class of players under composed of finitely many players, such that players within only make oracle calls to other well-founded equivalence classes. Minimal equivalence classes under , of finitely many players, are well-founded. It is possible to strengthen Theorem 1 to:

Theorem 1: There exists a reflective oracle that is ASPO for all players, and SPO for all players in a well-founded equivalence class.

This is proved by constructing a suitable set of reflective oracles that are SPO for all players in a well-founded equivalence class, and then using this as the initial set in the previous proof sketch. The resulting set of ASPO reflective oracles is a subset, and inherits this property.

An oracle of this type is called a strong cooperative oracle. Almost all games of interest in game theory involve finitely many players playing against each other. This is the condition for a minimal finite equivalence class under , all of which are well-founded. Therefore, a strong cooperative oracle is an SPO oracle for any reasonable game that doesn't have non-well-founded chains of infinitely many players.

However, the proof of the existence of a cooperative oracle involves the infinite-dimensional space of reflective oracles, so it is not immediately linked to practical game theory. By imposing several natural conditions, it is possible to link the two settings. (Also, it is unclear whether the final condition of strategy continuity is necessary).

Theorem 2: In a game of finitely many players, which all have finitely many possible outputs, have oracle access to each other and no other players, halt with probability 1 given oracle access to the other players, and have continuous strategies w.r.t everyone's probability distribution over outputs, all points in the equilibrium set have reflective oracles which result in that point being played in the game.

Corollary 2: If the conditions of Theorem 2 hold, then for any point on the Pareto frontier of the equilibrium set, there is a strong cooperative oracle which results in that point being played in the game.

Therefore, the equilibrium selection problem is solved by the use of a special type of reflective oracle which results in Pareto optimal outcomes within the equilibrium set whenever it is used, aka a strong cooperative oracle.

However, there is still an open question about which algorithms result in something "FDT-like" when used with a cooperative oracle. To make partial progress towards a result, we will now look at the Ultimatum Game. In the Ultimatum Game, there is a pool of money, such as 100 dollars, and player A specifies a split of money between player A and player B, and then player B can either choose to accept the offer, or reject it, in which case both players get 0 dollars.

Imagine player B is implementing the strategy "I will reject the money unless I get 90 dollars or more". If player A would offer a 10/90 split in response to this strategy, and this is known to player B, this provides an incentive for player B to implement this extortionate strategy. Similarly, by viewing the game from the perspective of player B, accepting a split of money where 90 dollars goes to player A incentivises player A to offer a 90/10 split.

Additionally, if the strategies of player A and player B act to acquire 90 dollars, by proposing a 90/10 split, and rejecting all offers short of a 10/90 split, the game outcome will be A offering a 90/10 split, and B rejecting it, so both players walk away with no money. Extortionate strategies fail when they go against each other.

Is there a strategy that does not incentivize extortion? There is, in a certain sense, as long as there is some notion of a "fair outcome", such as the Nash bargaining solution. As an example, in the Ultimatum Game, if a 50/50 split is considered to be a fair outcome by player B, they could implement the strategy "if the money I am offered, , is less than 50 dollars, reject the offer with a probability, where ". Against this strategy, the optimal offer is a 50/50 split of money, because in expectation, money is lost by offering any split that is more self-favoring than 50/50. Each marginal dollar that goes towards player A in the initial split is compensated by an increase in the probability of player B rejecting the offer, making the expected value of taking one more dollar from the split zero or negative. Therefore, extortion beyond the "fair outcome" is disincentivized.

Another advantage of this strategy is that it is robust against different agents having different notions of a "fair outcome". If player A thinks that a 60/40 split is a fair outcome, and player B thinks that a 50/50 split is a fair outcome, and , then player B will reject the offer with a probability, leading to an expected gain of 50 dollars for player A and dollars for player B, instead of a guaranteed rejection of the offer.

Red is the strategy of player A, blue is the strategy of player B, the display is over utility-space, player B is implementing a coercion-proof strategy.

However, this class of strategies does not have an elegant way of a-priori selecting the "fair point" that it will attempt to enforce. Intuitively, in the Ultimatum Game, there's a tradeoff where enforcing a strict demand, like an 30/70 split, means that there are more possible opponents which will reject the offer, while enforcing a weak demand, like a 70/30 split, means that there are more possible opponents which will exploit this. The Nash bargaining solution is a candidate "fair point", but it is unknown whether there is an algorithm that would naturally attain it under appropriate circumstances.

The insight of "be increasingly vengeful when the opposing players cause you to lose utility" will likely be a component of the true solution, and generalizes to other games such as the Prisoner's Dilemma and Chicken, but it is not a complete account, as will be shown in the next section. Humans also seem to act according to this strategy to some degree, instead of following the Homo Economicus strategy of accepting any offer greater than 0 dollars.

On the Prisoner's Dilemma, a natural choice for an extortion-proof strategy is "cooperate with the same probability that the opponent cooperates". This leads to mutual cooperation against a copy of itself, when equipped with a cooperative oracle, as seen in Figure 2.

Prisoner's Dilemma, mutual cooperation is the Pareto-optimal point in the equilibrium set where the strategies intersect.

If the probability of player A cooperating is , then if player B implements the strategy "cooperate with probability for some ", mutual defection will occur, as that is the only fixed point. This appears suboptimal, but permitting exploitation, even in a mild form, incentivizes opponents to modify their strategy to an exploiting strategy. If Player B uses shortsighted best response, then mutual defection will occur. This is because player B will always play defect because it is a dominant strategy, and player A will play defect as retaliation. If Player B always cooperates, then Player A will cooperate as well. This shows that implementing an extortion-proof strategy is not enough. A satisfactory strategy should lead to defection against a player which cooperates regardless of opponent.

A similar strategy applies to the game of Chicken. The strategy is "play Straight with the same probability that the opponent plays Straight, unless the probability of them playing Swerve is greater than , in which case, play Straight." This leads to crashing into an opponent which goes straight, which incentivizes the opponent to avoid strategies such as tossing their steering wheel out the window. If two agents of this type play against each other, then by Figure 3, they both get expected utility with a probability of swerving. This is a Pareto improvement on the mixed-strategy Nash equilibrium where both players get 2 expected utility with a probability of swerving. If this player plays against a player B using shortsighted best-response, a cooperative oracle will result in the outcome where player A goes straight and player B swerves, for an expected utility of 3 and 2, respectively.

The game of chicken.

Consider the cooperation game where player A has a preference for going to a movie, and player B has a preference for going home to play a board game. They also have a preference for going to the same event, and going to a movie leads to nonzero utility even if the moviegoer is alone.

If both players implement the strategy "consult the oracle, and go to whichever event the other player has a higher probability of going to, with a rapid change of strategy around 50 percent", then the Pareto frontier of the equilibrium set consists of the two points where both players coordinate on an action. The choice of cooperative oracle determines which action is selected, and coordination occurs because both players are using the same oracle. This is similar to the choice of reflective oracle determining which Nash equilibrium is played in a game.

A cooperation game. Of the 3 points in the equilibrium set, the choice of cooperative oracle ensures that the outcome is coordination on an activity.

By looking at these examples, there are two further directions for designing a strategy which effectively makes use of a cooperative oracle. The first direction is developing a general criteria which allows a player to exploit some exploitable opponents. In the Prisoner's Dilemma, against an opponent which cooperates with 10 percent greater probability than you cooperate, the desired action is to cooperate with 90 percent probability. The second direction for improvement is finding a single decision rule, like for Nash equilibria, that produces an exploitation-proof strategy incentivizing opponents to play at a Pareto optimal point, instead of customizing a strategy of that type for each game separately.

(Proofs were omitted because they're quite long. I can post them if there is interest.)

New to LessWrong?

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 11:19 AM

How did you create your visualizations of game payoffs? I really like the way they show the relevant details of payoffs, equilibrium points, and the strategies of the players.

Giles Edkins coded up a thing which lets you plug in numbers for a 2-player, 2-move game payoff matrix and it automatically displays possible outcomes in utility-space. It may be found here. The equilibrium points and strategy lines were added later in MS Paint.

The definition of stratified Pareto improvement doesn't seem right to me. You are trying to solve the problem that there are too many Pareto optimal outcomes. So, you need to make the notion of Pareto improvement weaker. That is, you want more changes to count as Pareto improvements so that less outcomes count as Pareto optimal. However, the definition you gave is strictly stronger than the usual definition of Pareto improvement, not strictly weaker (because condition 3 has equality instead of inequality). What it seems like you need is dropping condition 3 entirely.

The definition of almost stratified Pareto optimum also doesn't make sense to me. What problem are you trying to solve? The closure of a set can only be larger than the set. Also, the closure of an empty set is empty. So, on the one hand, any stratified Pareto optimum is in particular an almost stratified Pareto optimum. On the other hand, if there exists an almost stratified Pareto optimum, then there exists a stratified Pareto optimum. So, you neither refine the definition of an optimum nor make existence easier.

It actually is a weakening. Because all changes can be interpreted as making some player worse off if we just use standard Pareto optimality, the second condition mean that more changes count as improvements, as you correctly state. The third condition cuts down on which changes count as improvements, but the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn't be improvements under the old concept of Pareto Optimality.

The definition of an almost stratified Pareto optimum was adapted from this , and was developed specifically to address the infinite game in that post involving a non-well-founded chain of players, where nothing is a stratified Pareto optimum for all players. Something isn't stratified Pareto optimal in a vacuum, it's stratified Pareto optimal for a particular player. There's no oracle that's stratified Pareto optimal for all players, but if you take the closure of everyone's SPO sets first to produce a set of ASPO oracles for every player, and take the intersection of all those sets, there are points which are ASPO for everyone.

"the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn't be improvements under the old concept of Pareto Optimality."

Why? Condition 3 implies that U_{RO,j} \leq U_{RO',j}. So, together with condition 2, we get that U_{RO,j} \leq U_{RO',j} for any j. That precisely means that this is a Pareto improvement in the usual sense.

[This comment is no longer endorsed by its author]Reply

The game fixes each player's utility function, and then we fix all strategies, yes? It doesn't seem right to bundle one strategy and one utility function, which aren't fixed at the same time, into one "player".

I'm not sure what you mean by fixing the utility function occurring before fixing the strategy. In the problem setup of a game, you specify a utility function machine and a strategy machine for everyone, and there isn't any sort of time or order on this (there's just a set of pairs of probabilistic oracle machines) and you can freely consider things such as "what happens when we change some player's strategies/utility function machines"

I expected that we want to ask questions such as which strategy in a game dominates all others, which requires that the utility functions are known but the strategies aren't.

Should the dependency relation care about oracle queries from strategies? After all, a strategy could preserve its behavior while removing a dependency by simulating what it was going to ask the oracle about, deferring all new queries to the oracle again. Or is this ruled out by interesting query simulations sometimes diverging? Worse, you can make an arbitrary utility function part of the pareto optimization by asking the oracle about the player with particular utility and trivial strategy.

The basic reason for the dependency relation to care about oracle queries from strategies is that, when you have several players all calling the oracle on each other, there's no good way to swap out the oracle calls with the computation. The trick you describe does indeed work, and is a reason to not call any more turing machines than you need to, but there's several things it doesn't solve. For instance, if you are player 1, and your strategy depends on oracle calls to player 2 and 3, and the same applies to the other two players, you may be able to swap out an oracle call to player two with player two's actual code (which calls players 1 and 3), but you can't unpack any more oracle calls into their respective computations without hitting an infinite regress.