10

Rereading Hofstadter's essays on superrationality prompted me to wonder what strategies superrational agents would want to commit to in asymmetric games. In symmetric games, everyone can agree on outcome they'd like to jointly achieve, leaving the decision-theoretic question of whether the players can commit or not. In asymmetric games, life becomes murkier. There are typically many Pareto-efficient outcomes, and we enter the wilds of cooperative game theory and bargaining solutions trying to identify the right one. While, say, the Nash bargaining solution is appealing on many levels, I have a hard time connecting the logic of superrationality to any particular solution. Recently though, I found some insight in "Cooperation in Strategic Games Revisited" by Adam Kalai and Ehud Kalai (working paper version and three-page summary version) for the special case of two-player games with side transfers.

Just to make sure everyone's on common ground, the prototypical game examined in the argument for superrationality is the prisoners' dilemma:

 Alice / Bob Cooperate Defect Cooperate 10 / 10 0 / 12 Defect 12 / 0 4 / 4

The unique dominant-strategy equilibrium is (Defect, Defect). However, Hofstadter argues that "superrational" players would recognize the symmetry in reasoning processes between each other and thus conclude that cooperating is in their interest. The argument is not in favor of unconditional cooperation. Instead, the reasoning is closer to "I cooperate if and only I expect you to cooperate if and only if I cooperate". Many bits have been devoted to formalizing this reasoning in timeless decision theory and other variants.

The symmetry in the prisoners' dilemma makes it easy to pick out (Cooperate, Cooperate) as the action profile each player ideally wants to see happen. Consider instead the following skewed prisoners' dilemma:

 Alice/Bob Cooperate Defect Cooperate 2 / 18 0 / 12 Defect 12 / 0 4 / 4

The (Cooperate, Cooperate) outcome still has the highest total benefit, but (Defect, Defect) is also Pareto-efficient. With this asymmetry, it seems reasonable for Alice to Defect, even as someone who would cooperate in the original prisoners' dilemma. Suppose however players can also agree to transfer utility between themselves on a 1-to-1 basis (like if they value cash equally and can make side-payments). Then, (Cooperate, Cooperate) with a transfer between 2 and 14 from Bob to Alice dominates (Defect, Defect). The size of the transfer is still up in the air, although a transfer of 8 (leaving both with a payoff of 10) is appealing since it takes us back to the original symmetric game. I feel confident suggesting this as an outcome the players should commit to if possible.

While the former game could be symmetrized in a nice way, what about more general games where payoffs could look even more askew or strategy sets could be completely different?

Let A be the payoff matrix for Alice and B be the payoff matrix for Bob in any given game. Kalai and Kalai point out that the game (A, B) can be decomposed into the sum of two games:

$(A,B)=\left(\frac{A+B}{2},\frac{A+B}{2}\right)+\left(\frac{A-B}{2},\frac{B-A}{2}\right),$

where payoffs are identical in the first game (the team game) and zero-sum in the second (the advantage game). Consider playing these games separately. In the team game, Alice and Bob both agree on the action profile that maximizes their payoff with no controversy. In the advantage game, preferences are exactly opposed, so each can play their maximin strategy, again with no controversy. Of course, the rub is the team game strategy profile could be very different from the advantage game strategy profile.

Suppose Alice and Bob could commit to playing each game separately. Kalai and Kalai define the payoffs each gets between the two games as

$\textrm{coco-value}(A,B)=\textrm{maxmax}\left(\frac{A+B}{2},\frac{A+B}{2}\right)\;+\;\textrm{maximin}\left(\frac{A-B}{2},\frac{B-A}{2}\right)$

where coco stands for cooperative/competitive. We don't actually have two games to be played separately, so the way to achieve these payoffs is for Alice and Bob to actually play the team game actions and hypothetically play the advantage game. Transfers then even out the gains from the team game results and add in the hypothetical advantage game results. Even though the original game might be asymmetric, this simple decomposition allows players to cooperate exactly where interests are aligned and compete exactly where interests are opposed.

For example, consider two hot dog vendors. There are 40 potential customers at the airport and 100 at the beach. If both choose the same location, they split the customers there evenly. Otherwise, the vendor at each location sells to everyone at that place. Alice turns a profit of $2 per customer, while Bob turns a profit of$1 per customer. Overall this yields the payoffs:

 Alice / Bob Airport Beach Airport 40 / 20 80 / 100 Beach 200 / 40 100 / 50

The game decomposes into the team game:

 Alice / Bob Airport Beach Airport 30 / 30 90 / 90 Beach 120 / 120 75 / 75

 Alice / Bob Airport Beach Airport 10 / -10 -10 / 10 Beach 80 / -80 25 / -25

The maximizing strategy profile for the team game is (Beach, Airport) with payoffs (120, 120). The maximin strategy profile for the advantage game is (Beach, Beach) with payoffs (25, -25). In total, this game has a coco-value of (145, 95), which would be realized by Alice selling at the beach, Bob selling at the airport, and Alice transferring 55 to Bob. Alice generates most of the profits in this situation, but Bob has to be compensated for his credible threat to start selling at the beach too.

The bulk of the Kalai and Kalai article is extending the coco-value to incomplete information settings. For instance, each vendor might have some private information about the weather tomorrow, which will affect the number of customers at the airport and the beach. The Kalais prove that being able to publicly observe the payoffs for the chosen actions is sufficient for agents to commit themselves to the coco-value ex-ante (before receiving any private information) and that being able to publicly observe all hypothetical payoffs from alternative action profiles is sufficient for commitment even after agents have private information.

The Kalais provide an axiomatization of the coco-value, showing it is the payoff pair that uniquely satisfies all of the following:

1. Pareto optimality: The sum of the values is maximal.
2. Shift invariance: Increasing a player's payoff by a constant amount in each cell increases their value by the same amount.
3. Payoff dominance: If one player always gets more than the other in each cell, that player can't get a smaller value for the game.
4. Invariance to redundant strategies: Adding a new action that is a convex combination of the payoffs of two other actions can't change the value.
5. Monotonicity in actions: Removing an action from a player can't increase their value for the game.
6. Monotonicity in information: Giving a player strictly less information can't increase their value for the game.

The coco-value is also easily computable, unlike Nash equilibria in general. I'm hard-pressed to think of any more I could want from it (aside from easy extensions to bigger classes of games). Given its simplicity, I'm surprised it wasn't hit upon earlier.