(This article is cross-posted from my blog)
Imagine a world in which agents may interact, but may not coordinate in an enforceable way. Agents may communicate, but there is no mechanism to hold another agent accountable to commitments reached while communicating. In this world, all conflicts of interest between agents would resolve to a Nash Equilibrium.
A Nash Equilibrium, sometimes described as a "self-enforcing law", is an outcome from which no agent has an incentive to unilaterally deviate.
I will talk about the Prisoner's Dilemma, an illustration of a simple scenario that is famous because its Nash Equilibrium is /bad/. If the players could coordinate to escape the pull of the force that pulls them towards the Nash Equilibrium, they would be better off.
Scott Alexander's excellent work Meditations on Moloch has demonstrated that the world is full of negative Nash Equilibria, but there are also those that yield positive real-world consequences. This essay concludes by describing how free markets under perfect competition yield a Nash Equilibrium that is positive from the point of view of consumers.
Motivating example: Prisoner's Dilemma
Two criminals are detained by the police and interrogated separately. The police do not have strong evidence to convict the criminals of the full crime, and at most can put each criminal behind bars for one year, so they need the criminals to snitch on each other to prosecute the full extent of the crime. To incentivize snitching, the officers offer the prisoners a deal. If one of the prisoners "defects" from his criminal partnership and snitches, while the other continues to "cooperate" in crime, the police will offer the snitch the opportunity to walk free, but the prisoner that does not snitch will serve nine years in prison. If both criminals defect from the criminal partnership, both will serve six years in prison.
We can write down the penalties in a convenient table form. The rows represent the action of Prisoner/Player 1, and the columns represent the action of Prisoner/Player 2. The first number is the "payoff" given to Player 1 and the second is the "payoff" to Player 2. The payoffs are represented as negative numbers because the players "lose" years of their life rotting in prison.
|Player 1 Coop||Player 1 Defect|
|Player 2 Coop||-1, -1||-9, 0|
|Player 2 Defect||0, -9||-6, -6|
Time travel analogy
Suppose that the players play a single round of prisoner's dilemma, expecting that the outcome of the game will be permanent. Now, suppose that we select one of the players at random and, unbeknownst to the other player, give this player the option to travel back in time to change their decision, without coordinating with the other player. Neither player knew that time machines existed, much less that one of them might be given a chance to use such a device. If neither player would change their decision, meaning that neither player has any regrets, then the two players have played a Nash Equilibrium.
Imagine that both of the players choose to cooperate. Both of the prisoners face the prospect of spending one year in prison. Then we surprise Player 1 by secretly giving them a time machine, and give them the choice to use it to change their action, but Player 2 must keep their action. In this case, Player 1 would change their action to defect, since then Player 1 gets to walk free at the expense of Player 2. If instead we offer the time machine to Player 2, they should change their action to defect for the same reason.
Now suppose that Player 1 chooses to cooperate but Player 2 chooses to defect. In this case, Player 1 faces nine years of prison and Player 2 gets to walk free. Now choose a player at random, and secretly offer them a chance to use a time machine. If the chosen player was Player 2, this player has no incentive to cooperate, since that would cause them to spend 1 year behind bars. On the other hand, if we secretly offer the time machine to Player 1, they would choose to change their action to defect. The reason is that they are currently facing the threat of 9 years behind bars, but by defecting they would be able to reduce their sentence to only 6 years. A similar analysis holds for the case when Player 1 defects and Player 2 cooperates.
Finally, suppose that both of the players defect. Now, if we secretly offer a time machine to either of the players at random, neither would choose to change their action to cooperate, since that would increase their sentence from 6 years to nine years. For this reason, defect-defect is a Nash Equilibrium: neither player would have an incentive to change their action if offered the opportunity. Intuitively, Nash Equilibrium means that the players have no regrets. It is the only scenario that is stable to the random, unexpected occurrence of time machines. To avoid regrets, always play Nash.
In the Prisoner's Dilemma, the Nash Equilibrium is considered to be bad because it causes both players to spend six years behind bars, even though there is a way that both players could spend only one year behind bars. However, this optimal situation is not attainable, because both prisoners have an incentive to try to attain the zero-year prison sentence and simultaneously to avoid the nine-year sentence.
The role of moral values
A common objection is that negative Nash Equilibria might not occur if the players placed a moral value on cooperation. This moral value could be represented by adding a positive quantity to the payoff matrix at the cooperate-cooperate cell. In the particular case of this model of Prisoner's Dilemma, a high value on cooperation might be enough to offset the attractiveness of defection, but this may not always be the case. Even worse, in a general scenario, adding a new value to the payoff matrix may simply place the Nash Equilibrium at a different pair of actions. This new Nash Equilibrium may again be negative for other reasons.
In general, we assume that the payoff matrix models all of the things that the agents may care about, including moral values. The existence of Nash Equilibria is a feature of the numerical values of payoff matrices, unrelated from the considerations that created those numbers.
Communication is not enough
The prisoners cannot solve their dilemma by having a discussion ahead of time and agreeing to both cooperate, because, if each believes that the other prisoner will cooperate, they have an incentive to defect and walk free.
Similarly, it isn't enough if Player 1 makes their choice first and Player 2 can hear it. Assuming Player 1 cooperates, Player 2 still has an incentive to defect. Obviously if Player 1 defects, then Player 2 should also defect to shorten their sentence.
Escaping bad Nash Equilbria
Nash Equilibria occur in the real world. Some of them lead to positive outcomes, and some of them lead to negative outcomes.
The players require not only an agreement, but a mechanism to enforce that agreement. For example, there might be a mob boss, known on the street as Leviathan, who hates snitches, and Leviathan may torture defectors with enough pain to be worth 10 years of prison. Leviathan has a reputation for always finding out when people snitch, and always catching them. In that case, the payoff matrix changes to become:
|Player 1 Coop||Player 1 Defect|
|Player 2 Coop||-1, -1||-9, -10|
|Player 2 Defect||-10, -9||-16, -16|
Now, all prisoners have a strong incentive to cooperate on any occasion that they are detained. In fact, cooperate-cooperate becomes the new Nash Equilibrium.
The prisoners may strongly dislike Leviathan, who inspires fear and restricts freedom of the criminals to act as they otherwise would. It's very likely that given the opportunity, criminals would prefer to live in a world without Leviathan, even though he allows the criminals to coordinate their actions in a way that allows them to attain better results when they are detained by the police.
It is possible to change the location of Nash Equilibrium by having the players repeat the game an unknown number of times. This will be a topic for a future post.
Real-world Nash Equilibria
Positive examples of real-world Nash Equilibria fall into two cases. In the first case, the nature of the situation produces the best outcome without the need for coordination. In the second type, coordination is required, but existing structures, for example those provided by government or custom, ensure that the participants coordinate effectively.
Perfect competition in a free market
In economics, the concept of perfect competition represents a business sector in which suppliers provide exactly the same commodity and have no way to compete with each other except on price. For example, agricultural products, such as corn or oranges, are functionally identical from supplier to supplier. Some suppliers may produce dramatically inferior crops, but above a certain quality bar, all oranges are the same. This is even more the case for products, such as corn or soy, which will be ground down, and for which the tolerance on quality is much broader.
For these commodities, suppliers must charge the lowest possible price that covers the cost of inputs and of their labor. The price must be high enough to prevent the supplier from moving to a different industry, but no higher. Producers might want to coordinate to agree on setting a higher price, but would be vulnerable to being undercut by any defectors. Suppliers may be tempted to set up an enforcement mechanism to punish suppliers, but behavior that undercuts market competition is illegal under antitrust regulation.
The main source for this note was the excellent Reinforcement Learning Course by Michael Littman and Charles Isbel.