929

LESSWRONG
LW

928
Game Theory
Frontpage

35

Nash equilibriums can be arbitrarily bad

by Stuart_Armstrong
1st May 2019
AI Alignment Forum
3 min read
24

35

Ω 12

Game Theory
Frontpage

35

Ω 12

Nash equilibriums can be arbitrarily bad
18cousin_it
4Stuart_Armstrong
16Taymon Beal
2Stuart_Armstrong
9Shmi
9Stuart_Armstrong
9Unnamed
10Zvi
4Stuart_Armstrong
6quanticle
15SarahNibs
3quanticle
10Dagon
3Charlie Steiner
2Dagon
4Jay Molstad
2RocksBasil
1Jay Molstad
4RocksBasil
6Stuart_Armstrong
-2RocksBasil
2Stuart_Armstrong
3RocksBasil
4Stuart_Armstrong
New Comment
24 comments, sorted by
top scoring
Click to highlight new comments since: Today at 3:15 PM
[-]cousin_it7yΩ7180

Yeah. Usually the centipede game is used to teach this lesson, but your game is also very nice :-)

Reply
[-]Stuart_Armstrong7yΩ240

Cool! I prefer my example, though; it feels more intuitive (and has a single equilibrium).

Reply
[-]Taymon Beal7y160

Nit: I think this game is more standardly referred to in the literature as the "traveler's dilemma" (Google seems to return no relevant hits for "almost free lunches" apart from this post).

Reply
[-]Stuart_Armstrong7y20

That's useful; I added a link to the other game in the main text (as far as I can tell, I came up with this independently).

Reply
[-]Shmi7y90
So the situation is as bad as it could possibly be.

You mean, it is as bad as it could possibly be for the Nash equilibrium to be a good strategy and a good predictor in this setup? Yep, absolutely. All models tend to have their domain of validity, and this game shows the limits of the Nash equilibrium model of decision making.

Reply
[-]Stuart_Armstrong7y90

That is true, but I meant it as "as close as you want to the worst possible outcome, and as far as you want from the best mutual outcome".

Reply
[-]Unnamed7y90

Unilateral precommitment lets people win at "Almost Free Lunches".

One way to model precommitment is as a sequential game: first player 1 chooses a number, then player 1 has the option of either showing that number to player 2 or keeping it hidden, then player 2 chooses a number. Optimal play is for player 1 to pick £1,000,000 and show that number, and then for player 2 to choose £999,999.99.

An interesting feature of this is that player 1's precommitment helped player 2 even more than it helped player 1. Player 1 is "taking one for the team", but still winning big. This distinguishes it from games like chicken, where precommitment is a threat that allows the precommitter to win the larger share. Though this means that if either player can precommit (rather than one being pre-assigned to go first as player 1) then they'd both prefer to have the other one be the precommitter.

This benefit of precommitment does not extend to the two option version (n2 vs. n1). In that version, player 2 is incentivized to say "n1" regardless of what player 1 commits to, so unilateral precommitment doesn't help them avoid the Nash Equilibrium. As in the prisoner's dilemma.

Reply
[-]Zvi7y100

Is there a name for this type of equilibrium, where a player can pre-commit in a way where the best response leaves the first player very well-off, but not quite optimally well-off? What about if it is a mixed strategy (e.g. consider the version of this game where the player who gave the larger number gets paid nothing).

Reply
[-]Stuart_Armstrong7y40

I think another key difference between PD and traveller/AFL is that in the PD variant, (n2, n1) is a Pareto outcome - you can't improve the first player's outcome without making the second one worse off. However, in the other problem, (0,0) is very very far from being Pareto.

Reply
[-]quanticle7y60

Doesn't the existence of the rule that says that no money changes hands if there's a tie alter the incentives? If we both state that we want 1,000,000 pounds, then we both get it and we both walk away happy. What incentive is there for either of the two agents to name a value that is lower than 1,000,000?

Reply
[-]SarahNibs7y150

If your strategy remains unchanged, I can change my strategy to "999,999.99 please" and come away with 1,000,000.01 in total, so that's not a Nash equilibrium.

Reply
[-]quanticle7y30

I see. And from then it follows the same pattern as a dollar auction, until the "winning" bet goes to zero.

Reply
[-]Dagon7y100

The maximum theoretical payout is 1000000.01 pounds. And since both players know this, and for any given tie amount, a player's net can be increased by a penny by reducing their bid by a penny, they will recursively calculate to 0.

Unless they have a theory of mind and can model ways to take risks in order to increase results in the cases where the other player ALSO takes risks. We call this "trust". and it can be greatly increased with communication, empathy, and side-agreements.

I think it's long been known that Nash equilibria are not necessarily optimal, only guaranteed that the other player's choices can't hurt you. It's perfectly defensive in adversarial games. This is great for zero-sum games, where the other player's increase is exactly your decrease. It's nearly irrelevant (except as a lower bound) for cooperative (variable-sum) games.

Reply
[-]Charlie Steiner7y30

Yup. Though you don't necessarily need to imagine the money "changing hands" - if both people get paid 2 extra pennies if they tie, and the person who bids less gets paid 4 extra pennies, the result is the same.

The point is exactly what it says in the title. Relative to the maximum cooperative payoff, the actual equilibrium payoff can be arbitrarily small. And as you change the game, the transition from low payoff to high payoff can be sharp - jumping straight from pennies to millions just by changing the payoffs by a few pennies.

Reply
[-]Dagon7y20
Relative to the maximum cooperative payoff, the actual equilibrium payoff can be arbitrarily small.

Please be careful to specify "Nash equilibrium" here, rather than just "equilibrium". Nash is not the only possible result that can be obtained, especially if players have some ability to cooperate or to model their opponent in a way where Nash's conditions don't hold.

In actual tests (admittedly not very complete, usually done in psychology or economics classes), almost nobody ends up at the Nash equilibrium in this style game (positive-sum where altruism or trust can lead one to take risks).

Reply
[-]Jay Molstad7y40

It checks out empirically. War is a Nash semi-equilibrium - both sides would be better off if they could coordinate an end to it, but they usually can't (in the near term). If Stanislav Petrov had been a bit less chill, the Cold War would have ended in 1983 in the "everybody launches all the nukes; cockroaches win" Nash equilibrium. If that's not arbitrarily bad, it's plenty bad enough.

Reply
[-]RocksBasil7y20

I think "everybody launches all nukes" might not be a Nash Equilibrium.

We can argue that once one side launched their nukes the other side does not necessarily have an incentive to retaliate, given they won't really care whether the enemy got nuked or not after they themselves are nuked, and they probably will have an incentive to not launch the nukes to prevent the "everybody dies" outcome, which can be argued to be negative for someone who is about to die.

Reply
[-]Jay Molstad6y10

It seems to me that both parties to the Cold War favored the defect-defect outcome (launch all the nukes) over the cooperate-defect outcome (we die, they don't). It's hard to tell, though, because both sides had an incentive to signal that preference regardless of the truth.

But that's an extreme case. Any war you choose will have each side choosing between continuing to fight and surrendering. The cooperate-cooperate outcome (making peace in a way that approximates the likely outcome of a war) is probably best for all, but it's hard to achieve in practice. And it seems to me that at least part of the problem is that, if one side chooses to cooperate (sue for peace and refrain from maximally fighting), they run the risk that the other side will continue to defect (fight) and seize an advantage.

Reply
[-]RocksBasil7y40

It is interesting that experimental results of traveller's dilemma seems to give results which deviate strongly from the Nash Equilibrium, and in fact quite close to the Pareto Optimal Solution.

This is pretty strange for a game that has only one round and no collusion (you'd expect it to end as Prisoner's Dilemma, no?)

It is rather different from what we would see from the dollar auction, which has no Nash Equilibrium but always deviate far away from the Pareto optimal solution.

I suspect that the this game being one round-only actually improves the Pareto efficiency of its outcomes:

Maybe if both participants are allowed to change their bid after seeing each other's bid they WILL go into a downward spiral one cent by one cent until they reach zero or one player gives up at some point with a truce, just like how dollar auctions always stop at some point.

When there is only one round, however, there is no way for a player to make their bid exactly 1 or 2 cents less than the other player, and bidding any less than that is suboptimal compared to bidding more than the other player, so perhaps there is an incentive against lowering one's bidding indefinitely to 0 before the game even starts, just like no one would bid $1000 in the dollar auction's first round.

Reply
[-]Stuart_Armstrong7y60

you'd expect it to end as Prisoner's Dilemma, no?

I think a key difference is that in PD, (Defect, Cooperate) is a Pareto outcome (you can't make it better for the cooperator without making it worse for the defector). While (0, 0) is far from the Pareto boundary. So people can clearly see that naming numbers around 0 is a massive loss, so they focus on avoiding that loss rather than optimising their game vs the other player.

Reply
[-]RocksBasil7y-20

I haven't found any information yet, but I suspect there is a mixed Nash somewhere in TD.

[This comment is no longer endorsed by its author]Reply
[-]Stuart_Armstrong7y20

There is no mixed Nash equilibrium in the TD example above (see the proof above).

Reply
[-]RocksBasil7y30

Thanks, I forgot the proof before replying your comment.

You are correct that in PD (D,C) is Pareto, and so the Nash Equilibrium (D,D) is much closer to a Pareto outcome than the Nash Equilibrium (0,0) of TD is to its Pareto outcomes (somewhere along each person getting a million pounds, give or take a cent)

It still strange to see a game with only one round and no collusion to land pretty close to the optimal, while its repeated version (dollar auction) seems to deviate badly from the Pareto outcome.

Reply
[-]Stuart_Armstrong7y40

It still strange to see a game with only one round and no collusion to land pretty close to the optimal, while its repeated version (dollar auction) seems to deviate badly from the Pareto outcome.

It is a bit strange. It seems this is because in the dollar auction, you can always make your position slightly better unilaterally, in a way that will make it worse once the other player reacts. Iterate enough, and all value is destroyed. But in a one-round game, you can't slide down that path, so you pick by looking at the overall picture.

Reply
Moderation Log
More from Stuart_Armstrong
View more
Curated and popular this week
24Comments

Go hungry with Almost Free Lunches

Consider the following game, called "Almost Free Lunches" (EDIT: this seems to be a variant of the traveller dilemma). You name any pound-and-pence amount between £0 and £1,000,000; your opponent does likewise. Then you will both get whichever amount named was lowest.

On top of that, the person who named the highest amount must give £0.02 to the other. If you tie, no extra money changes hands.

What's the Nash equilibrium of this game? Well:

  • The only Nash equilibrium of Almost Free Lunches is for both of you to name £0.00.

Proof: Suppose player A has a probability distribution pA over possible amounts to name, and player B has a probability distribution pB over possible amounts. Let mA be the highest amount such that pA(mA) is non-zero; let mB be the same, for B. Assume that (pA,pB) is a Nash equilibrium.

Assume further that mA≥mB (if that's not the case, then just switch the labels A and B). Then either mA> £0.00 or mA= £0.00 (and hence both players select £0.00).

We'll now rule out mA> £0.00. If mB> £0.00, then player A can improve their score by replacing mA with m′A=mB−£0.01. To see this, assume that player B has said nB, and player A has said mA. If nB<m′A<mA, then player A can say m′A just as well as mA - either choice gives them the same amount (namely, nB− £0.02).

There remain two other cases. If nB=m′A, then m′A is superior to mA, getting m′A (rather than m′A− £0.02). And if nB=mB, then m′A gets m′A+ £0.02 =mB+ £0.01, rather than mB (if mA=mB) or mB−£0.02 (if mA>mB).

Finally, if mB= £0.00, then player A gets -£0.02 unless they also say £0.00.

Hence if mA> £0.00, the pA cannot be part of a Nash Equilibrium. Thus mA= £0.00 and hence the only Nash Equilibrium is at both players saying £0.00.

Pareto optimal

There are three Pareto-optimal outcomes: (£1,000,000.00, £1,000,000.00), (£1,000,000.01, £999,999.97), and (£999,999.97, £1,000,000.01). All of them are very much above the Nash Equilibrium.

Minmax and maximin

The minmax and maximin values are also both terrible, and also equal to £0.00. This is not surprising, though, as minmax and maximin implicitly assume the other players are antagonistic to you, and are trying to keep your profits low.

Arbitrary badness with two options

This shows that choosing the Nash Equilibrium can be worse than almost every other option. We can of course increase the maximal amount, and get the Nash Equilibrium to be arbitrarily worse than any reasonable solution (I would just say either £1,000,000.00 or £999,999.99, and leave it at that).

But we can also make the Nash Equilibrium arbitrarily close to the worst possible outcome, and that without even requiring more than two options for each player.

Assume that there are four ordered amounts of money/utility: n3>n2>n1>n0. Each player can name n2 or n1. Then if they both name the same, they get that amount of utility. If they name different ones, then then player naming n2 gets n0, and the player naming n1 gets n3.

By the same argument as above, the only Nash equilibrium is for both to name n1. The maximum possible amount is n3; the maximum they can get if they both coordinate is n2, the Nash equilibrium is n1, and the worst option is n0. We can set n1=n0+ϵ and n3=n2+ϵ for arbitrarily tiny ϵ>0, while setting n2 to be larger than n1 by some arbitrarily high amount.

So the situation is as bad as it could possibly be.

Note that this is a variant of the prisoner's dilemma with different numbers. You could describe it as "Your companion goes to a hideous jail if and only if you defect (and vice versa). Those that don't defect will also get a dust speck in their eye."

Mentioned in
49Self-confirming predictions can be arbitrarily bad
7 Why the empirical results of the Traveller’s Dilemma deviate strongly away from the Nash Equilibrium and seems to be close to the social optimum?