Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is how I currently think about higher-order game theory, the study of agents thinking about agents thinking about agents....

This post doesn't add any new big ideas beyond what was already in the post by Diffractor linked above. I just have a slightly different perspective that emphasizes the "metathreat" approach and the role of nondeterminism.

This is a work in progress. There's a bunch of technical work that must be done to make this rigorous. I'll save the details for the last section.

Multiple levels of strategic thinking

Suppose you're an agent with accurate beliefs about your opponents. It doesn't matter where your beliefs come from; perhaps you have experience with these opponents, or perhaps you read your opponents' source code and thought about it. Your beliefs are accurate, although for now we'll be vague about what exactly "accurate" means.

In a game of Chicken, you may want to play Swerve, since that's the maximin strategy. This is zeroth-order thinking because you don't need to predict what your opponent will do.

Or maybe you predict what your opponent will do and play the best response to that, Swerving if they'll go Straight and going Straight if they'll Swerve. This is first-order thinking.

Or maybe you know your opponent will use first-order thinking. So you resolve to go Straight, your opponent will predict this, and they will Swerve. This is second-order thinking.

Beyond second-order, Chicken turns into a commitment race.

In Prisoner's Dilemma, zeroth- and first-order thinking recommend playing Defect, as that's a dominant strategy. Second-order thinking says that it's good to Cooperate on the margin if by doing so you cause your opponent to also Cooperate on the margin. Let's say it's worth it to Cooperate with probability if your opponent thereby Cooperates with probability more than (although the exact numbers depend on the payoff matrix).

If you're a third-order agent, you might commit to Cooperating with probability equal to 51% times the probability that your opponent Cooperates. This causes your opponent to Cooperate with certainty, and thus you're bound to Cooperate with probability 51%. This is a Pareto-optimal outcome that favors you heavily.

Fixed points

So far we've talked about agents of order playing against agents of order . What about games between agents of equal order?

If two first-order agents play the best response to each other, then they play a Nash equilibrium. Let's see how this works in Matching Pennies, which has a mixed-strategy Nash equilibrium:

If the Even player thinks Odd is more likely to play Heads, then Even will play Heads. If the Even player thinks Odd is more likely to play Tails, then Even will play Tails. If Even thinks Odd is equally likely to play Heads or Tails, then Even will be indifferent. So if both players have well-calibrated beliefs, we must be in the equilibrium where both players are indifferent and both players play Heads, Tails.

(What if Even always plays Heads when it's indifferent? In that case, there is no equilibrium, which is awkward for our theory. But this behavior requires Even to represent its beliefs as real numbers. If we make the more realistic assumption that Even's beliefs are arbitrary-precision numbers, then it can never be sure that it's on the decision boundary. It gets stuck in a for loop, querying its belief for ever more precision in case it's not exactly . (Then some other process, like infinitesimal errors in the beliefs, causes the agent to output the equilibrium strategy of Heads, Tails.))

The space of all agents

Now we're ready to define our types. (I'll leave some details to the last section.) The space of 0th-order agents is just the set of actions or pure strategies.[1]

The space of st-order agents is the space of functions on probability distributions on . That is, .

So an st-order agent is a stochastic function from its th-order beliefs to a successor agent of order . Creating a successor agent is a form of currying.

Playing a game

Two agents in interact as follows:

  1. Find distributions on such that and .

    So is a fixpoint representing consistent th-order beliefs that the agents have about each other.[2]

  2. Sample from and sample from .

  3. Repeat steps 1 and 2 on and until we end up with in .

  4. Calculate the payoff to the first player.

Now take the expectation over all samples in step 2, and take the minimum over all choices of fixpoint in step 1. This gives a lower bound on the first player's expected payoff.

Some agents

Now we can write down formulas for the kind of higher-order reasoning discussed earlier.

The first-order agent that always plays the best response to its opponent is defined by . It achieves a Nash equilibrium when playing against itself.

A second-order agent that always plays the best response to its opponent is .[3] This agent also achieves a Nash equilibrium when playing against itself. In a future post I'll construct a different 2nd-order agent that both plays the best response to any first-order agent, and also achieves a Pareto-optimal outcome when playing against itself.


Order | Beliefs about other agent | Strategic behavior | n vs. n equilibrium
0     | None                      | Maximin            | Maximin
1     | 0th order                 | Best response      | Nash equilibrium
2     | 1st order                 | Argmax/optimize    | Pareto optimum


There's a sense in which the 0th-order maximin agent is an approximation of the 1st-order best-response agent, which is itself an approximation of the 2nd-order argmaxing agent. Standard domain-theoretical constructions should allow construction of infinite-order agents, including an infinite-order strategic agent that subsumes all finite-order best-response behavior. Fortunately the hierarchy ends at infinity; has type . However, whether and how this works depends a lot on the technical details I haven't worked out, so I hesitate to say any more about it.

Correlated strategies

This framework doesn't have correlated strategies built in. But I have a hunch that agents of sufficiently high order can play correlated strategies anyways.

Technical details

Everything is a domain; in particular, a dcpo. is also a lower semilattice. The mapping spaces are spaces of Scott-continuous functions.

is some kind of probabilistic powerdomain. An approach similar to (Jones & Plotkin 1989) might just work: would be the set of probability measures on the Borel -algebra of which restrict to continuous evaluations on open sets.

But it's possible will have to contain sets of probability measures, in order to express nondeterministic choice of fixpoint. Possible approaches include (Mislove 2000), (Tix, Keimel, Plotkin 2009), (Goubault-Larrecq 2007), and of course (Appel and Kosoy 2021). There's also the field of quantitative semantics, which I don't know anything about.

I'd like to have fixpoints which are maximal elements of . This might follow easily from the Kakutani fixed point theorem, but it really depends on how is defined.

Another issue is that higher-order strategic agents have to compute fixpoints themselves (in order to calculate the expected value of playing a lower-order policy). I'm not sure if this can be done continuously. It might require the use of a reflective oracle somehow.

This work was supported by a grant from the Center on Long-Term Risk. Thanks to Alex Appel, Adele Dewey-Lopez, and Patrick LaVictoire for discussions about these ideas.

  1. Actually, we want the set of nonempty sets of actions, so an agent can express indifference between actions. ↩︎

  2. Really we want and to be maximal elements of the poset such that and . See the section on technical details. ↩︎

  3. Actually you can't compute an argmax over a function of a continuous variable. The best you can do is a distribution that's biased towards higher-payoff outcomes, like quantilization or best-of- sampling. I'll write more about this in the future. ↩︎


Ω 15

6 comments, sorted by Click to highlight new comments since: Today at 1:06 AM
New Comment

Tangential, but did you ever happen to read statistical physics of human cooperation?

No, I just took a look. The spin glass stuff looks interesting!

Yep, I skimmed it by looking at the colorful plots that look like Ising models and reading the captions. Those are always fun.

I have a question about this entirely divorced from practical considerations. Can we play silly ordinal games here?

If you assume that the other agent will take the infinite-order policy, but then naively maximize your expected value rather than unrolling the whole game-playing procedure, this is sort of like . So I guess my question is, if you take this kind of dumb agent (that still has to compute the infinite agent) as your baseline and then re-build an infinite tower of agents (playing other agents of the same level) on top of it, does it reconverge to  or does it converge to some weird ?

I think you're saying , right? In that case, since embeds into , we'd have embedding into . So not really a step up.

If you want to play ordinal games, you could drop the requirement that agents are computable / Scott-continuous. Then you get the whole ordinal hierarchy. But then we aren't guaranteed equilibria in games between agents of the same order.

I suppose you could have a hybrid approach: Order is allowed to be discontinuous in its order- beliefs, but higher orders have to be continuous? Maybe that would get you to .