Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This blog post is composed as following:

  1. Review of Prisoners Dilemma
  2. Explanation of Game of Chicken by comparing it to Prisoners Dilemma
  3. Blackmail is a Game of Chicken
  4. Why we should care about blackmail/Game of Chicken
  5. What to do? Iterated Game of Chicken?

You are encouraged to skip ahead to the part that interests you

1. Review of Prisoners Dilemma

Prisoners Dilemma is a class of two player games which can represent for example mutual beneficial cooperation, or the tragedy of the commons. I don't think it is controversial to say that this class of games are important in almost any multi-agent scenario.

In a Prisoners Dilemma , each player gets to choose between two actions, usually called "cooperate" and "defect". Further more the payoffs haved to fulfill the following:

  • Holding my action constant it is better for me if you cooperate.
  • Holding your action constant, it is better for me if I defect.
  • Cooperate-cooperate is Pareto optimal (even when including mixed strategies).

Example of a payout matrix for Prisoners Dilemma:

In this particular example, cooperate corresponds to spending one of your own utility to give the other player two utility, and defect corresponds to doing nothing. This can represent a situation with the possibility of mutual benefit from cooperation, but where it is possible to win even more (at the other players expense) by cheating.

But we can also consider a negative game:

Here cooperate is doing nothing, while defect corresponds to gaining one utility for yourself while costing the other player two utility. This can represent burning the commons (if the players defect) or not (if they cooperate).

2. Explanation of Game of Chicken by comparing it to Prisoners Dilemma

Just like Prisoners Dilemma, Game of Chicken is a two player game, where each player can choose between two actions. These actions are typically called "swerve" and "straight", but in this blog post I will instead call the two actions "cooperate" and "defect" as to more easily compare with Prisoners Dilemma.

Also the same as Prisoners Dilemma: In Game of Chicken, I get the best payout if I defect and you cooperate (and vice versa). The difference is that conditional on you defecting, it is better for me if i cooperate.

A two action, two player game is a Game of Chicken if:

  • Holding my action constant it is better for me if you cooperate.
  • If you cooperate it is better for me if I defect.
  • If you defect is better for me to cooperate.
  • Cooperate-cooperate is Pareto optimal (even when including mixed strategies).

Furthermore, defect-defect is traditionally super bad for both players. But I would not say that this is a necessary condition for something to be a Game of Chicken.

Example payoff matrix:

The interesting part here is that I can pressure you to cooperate by credibly convincing you that I will defect. In other words, there is a first mover advantage, the first one to precommit to defecting will win against a rational player. However, this fact is of course known by every rational agent, so it might be a rational move to pre-commit to always defect in such games, no mater what. Then again, if two players with such commitments meet, they will both lose.

3. Blackmail is a Game of Chicken

I think that this is easiest explained by just writing out an example payout matrix

If the the blackmailed player gives in, then they pay two utility to give the other player one utility. If the blackmailed player doesn't give in the blackmailer will carry out the threat which is costing both players ten utility. If the the blackmailer doesn't actually blackmails, than nothing happens.

Compare this to the example payout matrix of Game of Chicken. The blackmail payout matrix is not exactly the same, but I claim that in essence this is the same game. If you can handle Game of Chicken then you can handle blackmail both as the blackmailer and the blackmailed.

Not all blackmail is a Game of Chicken. If there is not cost in carrying out the threat then we are in a different type of situation. However I expect this to be rare. It seems unlikely to me that there is no opportunity cost at all in carrying out the threat. Further more, even if costless threats exists in some situations this does not invalidate the argument for considering those blackmail situation where there is a cost to the blackmailer to carryout the threat.

If the blackmailer gains utility by carrying out the threat then I would argue that it is not exactly blackmail anymore. If I have an action that I can take that would help me but hurt you and I ask you for some compensation for refraining from taking this action, then this is more like a value trade than a blackmail.

4. Why we should care about blackmail/Game of Chicken

Prisoners Dilemma receives a lot of attention because this class of games represents an important type of situation in most multiplayer environments. I claim that this is also true for Game of Chicken.

In any situation where one agent (A) has the ability to use up some of its own resources to impose a cost on another agent (B), then A can choose to blackmail B, thus creating a Game of Chicken like situation. And if A thinks that it can win this game, then it will be tempted to engage in blackmail.

If you expect that:

  • It is important to build AI's that can act well in multi-agent situations (e.g. because there will be several simultaneous AIs that are similarly powerful, or there will be acausal trade and threats between agents in different universes simulating each other)


  • Toy model such as Prisoners Dilemma are useful

then you should also care about Game of Chicken.

5. What to do? Iterated Game of Chicken?

What should we do about these insights? I am not sure yet. But one possible directions is to study iterated Game of Chicken.

Abram Demski argues that In Logical Time, All Games are Iterated Games. Basically if agents are simulating each other then this is sort of equivalent to the agent playing an iterated game.

Question for the comment section: What would be the winning strategy in iterated Game of Chicken?

I might run a tournament with different strategies.

This post was written with the support of the EA Hotel

New Comment
17 comments, sorted by Click to highlight new comments since: Today at 8:00 AM

"If I have an action that I can take that would help me but hurt you and I ask you for some compensation for refraining from taking this action, then this is more like a value trade than a blackmail" - Maybe. What about if an action gives you 1 utility, but costs me a 100 and you demand 90. That sounds a lot like blackmail!

I would decompose that in to a value trade + a blackmail.

The default for me would be to take the action that gives me 1 utility. But you can offer me a trade where you give me something better in return for me not taking that action. This would be a value trade.

Lets now take me agreeing to your proposition as the default. If I then choose to threaten to call the deal off, unless you pay me a even higher amount, than this is blackmail.

I don't think that these parts (the value trade and the blackmail) should be viewed as sequential. I wrote it that way for illustrative purposes. However, I do think that any value trade has a Game of Chicken component, where each player can threaten to call of the trade if they don't get the more favorable deal.

As Dagon said, blackmail is a sequential game. And the chicken payoff matrix is a poor fit: if the blackmailer faces a large penalty for revealing their information to the world, then the blackmailer's threat is not credible.

I did not mean to imply that the choices had to be made simultaneous, or in any other particular order, just that this is the type of payoff matrix. But I also think that "simultaneous choice" v.s. "sequential game" is a false dichotomy. If both players are UDT, every game is a game simultaneous choice game (where the choices are over complete policies).

I know that according to what I describe, the blackmailers threat is not credible in the game theory sense of the word. Sow what? It is still possible to make credible threats in the common-use meaning of the word, which is what matters.

Threatening to crash your car unless the passenger gives you a dollar is also not credible in the common meaning of the word...

This mapping does not match any actual decisions in blackmail. First, it's not a simultaneous choice, it's a branching multi-turn decision tree. Second, there are more than 2 actions available at various stages. Either of these would make prisoner's dilemma analysis suspect, together it becomes much more like multi-street multi-bet poker than like PD.

The "victim" first makes choices (or is born into a situation) susceptible to blackmail. The blackmailer learns of this, and has at least 3 choices: publish the information, threaten to publish, or bury the information. The "victim" in the threaten-to-publish (blackmail) case offers incentives (which may be the same as the requested fee, or may not) to bury rather than publish, and the blackmailer chooses which action to take. Even leaving out true defection cases (accept the money and publish anyway, or killing the blackmailer), this is a fairly complex payout tree, and the correct choices are specific to the situation. In fact, since parts of the payout tree are unknown to one or both players, it's likely that mixed strategies come into play, to prevent exploitation of the unknowns.

This is a good point, but perhaps precommitting to give in/not give in vs. precommitting to blackmail/not blackmail is a simultaneous choice.

If there are reliable precommittment mechanisms for the topic, this makes it simultaneous. It still has a more complicated payout structure (publish/blackmail/bury X pay/no-pay in the blackmail case (and perhaps in the publish case, if the "victim" makes an unsolicited offer)), and different payouts for different amounts asked/offered in the blackmail box). It's not clear that the values in any part of the matrix correspond to PD for any specific piece of information.

Additionally, the same blackmail can be brought up again at a later date, giving in to a blackmail can be used as blackmail material in the future (ex: giving information to a foreign government), and giving into blackmail gives the blackmailer (and perhaps others) the information that you are a good target for future blackmail

Schelling's The Strategy of Conflict seems very relevant here; a major focus is precommitment as a bargaining tool. See here for an old review by cousin_it.

Iterated chicken seems fine to test, just as a spinoff of the IPD that maps to slightly different situations. (I believe that the iterated game of mutually modeling each other's single-shot strategy is different from iterating the game itself, so I don't think Abram's post necessarily implies that iterated chicken is relevant to ASI blackmail solutions.)

Speaking of iterated games, one natural form of blackmail is for the blackmailee to pay an income stream to the blackmailer; that way, at each time-step they're paying their fair price for the good of [not having their secret revealed between time t and time t+1]. Here's a well-cited paper that discusses this idea in the context of nuclear brinksmanship: Schwarz & Sonin 2007.

In a repeat game of blackmail I think that the optimal strategy of the blackmailee is the MAD counteroffer.

What about a situation where the threat would be considered non-credible because the cost to A of carrying out on it would cost more than not carrying it out, regardless whether B gives in, but A decides to carryout on it anyway with the sole objective to inflict more damage on B? Example A doesn't carryout threat, B doesn't give in payout is 0,0 ; A doesn't carryout threat, B gives in payout is 0,-5 ; A carries out threat, B gives in payout is 0,-10; A carries out threat, B doesn't give in payout is -10,-100. These situatiions happen often. A big flaw of Game Theory is to assume that agents act rationally.

I'm confused about your suggested payoffs. It looks to me like in your example, A is indifferent between no one does anything, and B gives in to A's threats. In this situation there is not even an incentive for A to threaten B.

A black mail situation is a situation where A prefers B to give in, over noting happens. But also if B don't give in, A have to give up some utility to in order to follow through with their treat. For this situation see my other response.

I agree that classical game theory (to the extent I understand it) don't describe blackmail. I disagree that the game theory definition of "rational" is always the best action.

Game theory (again to my best understanding) assumes optimal behaviour according to CDT. The problem with CDT is that it predictably don't stick to commitments. This is predictably bad in many ways. See for example Parfit's Hitchhiker, and there is a similar situation around blackmail or commitment to costly retaliation.

Because CDT predictably fails in these situations, I think it is wrong to claim that it's always rational to act according to CDT. Rationality is winning after all. 

Furthermore, defect-defect is traditionally super bad for both players. But I would not say that this is a necessary condition for something to be a Game of Chicken.

The traditional game of chicken, with cars racing at each other or toward a cliff edge, has likely death in the defect-defect box. If you're considering iterated games, an early death stops the series (and, depending on your modeling of utility, wipes out all prior gains in any other games). I would say this is a necessary condition, and is the primary thing which makes Chicken different from PD.

And this distinction makes modeling it trickier - the game is mostly about the unknown chance that one will be unable to defect when one decides to (due to physical constraints). It's best modeled as a series of decisions, with known ending (death), and increasing chance of accidental defection.

Cooperate-cooperate is Pareto optimal (even when including mixed strategies).

Am I right in thinking cooperate-defect is also Pareto optimal for both games (although obviously not optimal for total utility)? If they are iterated then a set of results is Pareto optimal provided at least one person cooperated in every round.

That’s right, only (Defect, Defect) is Pareto dominated in PD and Chicken games