Real World Solutions to Prisoners' Dilemmas

by Scott Alexander6 min read3rd Jul 201288 comments

59

Game TheoryPrisoner's Dilemma
Frontpage

Why should there be real world solutions to Prisoners' Dilemmas? Because such dilemmas are a real-world problem.

If I am assigned to work on a school project with a group, I can either cooperate (work hard on the project) or defect (slack off while reaping the rewards of everyone else's hard work). If everyone defects, the project doesn't get done and we all fail - a bad outcome for everyone. If I defect but you cooperate, then I get to spend all day on the beach and still get a good grade - the best outcome for me, the worst for you. And if we all cooperate, then it's long hours in the library but at least we pass the class - a “good enough” outcome, though not quite as good as me defecting against everyone else's cooperation. This exactly mirrors the Prisoner's Dilemma.

Diplomacy - both the concept and the board game - involves Prisoners' Dilemmas. Suppose Ribbentrop of Germany and Molotov of Russia agree to a peace treaty that demilitarizes their mutual border. If both cooperate, they can move their forces to other theaters, and have moderate success there - a good enough outcome. If Russia cooperates but Germany defects, it can launch a surprise attack on an undefended Russian border and enjoy spectacular success there (for a while, at least!) - the best outcome for Germany and the worst for Russia. But if both defect, then neither has any advantage at the German-Russian border, and they lose the use of those troops in other theaters as well - a bad outcome for both. Again, the Prisoner's Dilemma.

Civilization - again, both the concept and the game - involves Prisoners' Dilemmas. If everyone follows the rules and creates a stable society (cooperates), we all do pretty well. If everyone else works hard and I turn barbarian and pillage you (defect), then I get all of your stuff without having to work for it and you get nothing - the best solution for me, the worst for you. If everyone becomes a barbarian, there's nothing to steal and we all lose out. Prisoner's Dilemma.

If everyone who worries about global warming cooperates in cutting emissions, climate change is averted and everyone is moderately happy. If everyone else cooperates in cutting emissions, but one country defects, climate change is still mostly averted, and the defector is at a significant economic advantage. If everyone defects and keeps polluting, the climate changes and everyone loses out. Again a Prisoner's Dilemma,

Prisoners' Dilemmas even come up in nature. In baboon tribes, when a female is in “heat”, males often compete for the chance to woo her. The most successful males are those who can get a friend to help fight off the other monkeys, and who then helps that friend find his own monkey loving. But these monkeys are tempted to take their friend's female as well. Two males who cooperate each seduce one female. If one cooperates and the other defects, he has a good chance at both females. But if the two can't cooperate at all, then they will be beaten off by other monkey alliances and won't get to have sex with anyone. Still a Prisoner's Dilemma!

So one might expect the real world to have produced some practical solutions to Prisoners' Dilemmas.

One of the best known such systems is called “society”. You may have heard of it. It boasts a series of norms, laws, and authority figures who will punish you when those norms and laws are broken.

Imagine that the two criminals in the original example were part of a criminal society - let's say the Mafia. The Godfather makes Alice and Bob an offer they can't refuse: turn against one another, and they will end up “sleeping with the fishes” (this concludes my knowledge of the Mafia). Now the incentives are changed: defecting against a cooperator doesn't mean walking free, it means getting murdered.

Both prisoners cooperate, and amazingly the threat of murder ends up making them both better off (this is also the gist of some of the strongest arguments against libertarianism: in Prisoner's Dilemmas, threatening force against rational agents can increase the utility of all of them!)

Even when there is no godfather, society binds people by concern about their “reputation”. If Bob got a reputation as a snitch, he might never be able to work as a criminal again. If a student gets a reputation for slacking off on projects, she might get ostracized on the playground. If a country gets a reputation for backstabbing, others might refuse to make treaties with them. If a person gets a reputation as a bandit, she might incur the hostility of those around her. If a country gets a reputation for not doing enough to fight global warming, it might...well, no one ever said it was a perfect system.

Aside from humans in society, evolution is also strongly motivated to develop a solution to the Prisoner's Dilemma. The Dilemma troubles not only lovestruck baboons, but ants, minnows, bats, and even viruses. Here the payoff is denominated not in years of jail time, nor in dollars, but in reproductive fitness and number of potential offspring - so evolution will certainly take note.

Most people, when they hear the rational arguments in favor of defecting every single time on the iterated 100-crime Prisoner's Dilemma, will feel some kind of emotional resistance. Thoughts like “Well, maybe I'll try cooperating anyway a few times, see if it works”, or “If I promised to cooperate with my opponent, then it would be dishonorable for me to defect on the last turn, even if it helps me out., or even “Bob is my friend! Think of all the good times we've had together, robbing banks and running straight into waiting police cordons. I could never betray him!”

And if two people with these sorts of emotional hangups play the Prisoner's Dilemma together, they'll end up cooperating on all hundred crimes, getting out of jail in a mere century and leaving rational utility maximizers to sit back and wonder how they did it.

Here's how: imagine you are a supervillain designing a robotic criminal (who's that go-to supervillain Kaj always uses for situations like this? Dr. Zany? Okay, let's say you're him). You expect to build several copies of this robot to work as a team, and expect they might end up playing the Prisoner's Dilemma against each other. You want them out of jail as fast as possible so they can get back to furthering your nefarious plots. So rather than have them bumble through the whole rational utility maximizing thing, you just insert an extra line of code: “in a Prisoner's Dilemma, always cooperate with other robots”. Problem solved.

Evolution followed the same strategy (no it didn't; this is a massive oversimplification). The emotions we feel around friendship, trust, altruism, and betrayal are partly a built-in hack to succeed in cooperating on Prisoner's Dilemmas where a rational utility-maximizer would defect a hundred times and fail miserably. The evolutionarily dominant strategy is commonly called “Tit-for-tat” - basically, cooperate if and only if your opponent did so last time.

This so-called "superrationality” appears even more clearly in the Ultimatum Game. Two players are given $100 to distribute among themselves in the following way: the first player proposes a distribution (for example, “Fifty for me, fifty for you”) and then the second player either accepts or rejects the distribution. If the second player accepts, the players get the money in that particular ratio. If the second player refuses, no one gets any money at all.

The first player's reasoning goes like this: “If I propose $99 for myself and $1 for my opponent, that means I get a lot of money and my opponent still has to accept. After all, she prefers $1 to $0, which is what she'll get if she refuses.

In the Prisoner's Dilemma, when players were able to communicate beforehand they could settle upon a winning strategy of precommiting to reciprocate: to take an action beneficial to their opponent if and only if their opponent took an action beneficial to them. Here, the second player should consider the same strategy: precommit to an ultimatum (hence the name) that unless Player 1 distributes the money 50-50, she will reject the offer.

But as in the Prisoner's Dilemma, this fails when you have no reason to expect your opponent to follow through on her precommitment. Imagine you're Player 2, playing a single Ultimatum Game against an opponent you never expect to meet again. You dutifully promise Player 1 that you will reject any offer less than 50-50. Player 1 offers 80-20 anyway. You reason “Well, my ultimatum failed. If I stick to it anyway, I walk away with nothing. I might as well admit it was a good try, give in, and take the $20. After all, rejecting the offer won't magically bring my chance at $50 back, and there aren't any other dealings with this Player 1 guy for it to influence.”

This is seemingly a rational way to think, but if Player 1 knows you're going to think that way, she offers 99-1, same as before, no matter how sincere your ultimatum sounds.

Notice all the similarities to the Prisoner's Dilemma: playing as a "rational economic agent" gets you a bad result, it looks like you can escape that bad result by making precommitments, but since the other player can't trust your precommitments, you're right back where you started

If evolutionary solutions to the Prisoners' Dilemma look like trust or friendship or altruism, solutions to the Ultimatum Game involve different emotions entirely. The Sultan presumably does not want you to elope with his daughter. He makes an ultimatum: “Touch my daughter, and I will kill you.” You elope with her anyway, and when his guards drag you back to his palace, you argue: “Killing me isn't going to reverse what happened. Your ultimatum has failed. All you can do now by beheading me is get blood all over your beautiful palace carpet, which hurts you as well as me - the equivalent of pointlessly passing up the last dollar in an Ultimatum Game where you've just been offered a 99-1 split.”

The Sultan might counter with an argument from social institutions: “If I let you go, I will look dishonorable. I will gain a reputation as someone people can mess with without any consequences. My choice isn't between bloody carpet and clean carpet, it's between bloody carpet and people respecting my orders, or clean carpet and people continuing to defy me.”

But he's much more likely to just shout an incoherent stream of dreadful Arabic curse words. Because just as friendship is the evolutionary solution to a Prisoner's Dilemma, so anger is the evolutionary solution to an Ultimatum Game. As various gurus and psychologists have observed, anger makes us irrational. But this is the good kind of irrationality; it's the kind of irrationality that makes us pass up a 99-1 split even though the decision costs us a dollar.

And if we know that humans are the kind of life-form that tends to experience anger, then if we're playing an Ultimatum Game against a human, and that human precommits to rejecting any offer less than 50-50, we're much more likely to believe her than if we were playing against a rational utility-maximizing agent - and so much more likely to give the human a fair offer.

It is distasteful and a little bit contradictory to the spirit of rationality to believe it should lose out so badly to simple emotion, and the problem might be correctable. Here we risk crossing the poorly charted border between game theory and decision theory and reaching ideas like timeless decision theory: that one should act as if one's choices determined the output of the algorithm one instantiates (or more simply, you should assume everyone like you will make the same choice you do, and take that into account when choosing.)

More practically, however, most real-world solutions to Prisoner's Dilemmas and Ultimatum Games still hinge on one of three things: threats of reciprocation when the length of the game is unknown, social institutions and reputation systems that make defection less attractive, and emotions ranging from cooperation to anger that are hard-wired into us by evolution. In the next post, we'll look at how these play out in practice.

59

88 comments, sorted by Highlighting new comments since Today at 2:06 AM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Yay Dr. Zany! And a good post in general.

However, Western behavior in the Ultimatum Game seems to be a cultural, not biological, phenomenon.

By the mid‐1990s researchers were arguing that a set of robust experimental findings from behavioral economics were evidence for set of evolved universal motivations (Fehr & Gächter 1998, Hoffman et al. 1998). Foremost among these experiments, the Ultimatum Game, provides a pair of anonymous subjects with a sum of real money for a one‐shot interaction. One of the pair—the proposer—can offer a portion of this sum to a second subject, the responder. Responders must decide whether to accept or reject the offer. If a responder accepts, she gets the amount of the offer and the proposer takes the remainder; if she rejects both players get zero. If subjects are motivated purely by self‐interest, responders should always accept any positive offer; knowing this, a self‐interested proposer should offer the smallest non‐zero amount. Among subjects from industrialized populations—mostly undergraduates from the U.S., Europe, and Asia—proposers typically offer an amount between 40% and 50% of the total, with a modal offer of usually 50% (Camerer 2003).

... (read more)
9shminux9yI'm guessing that the results would be significantly affected by the perceived relative status, as the offer can be more about signaling than rational choice. If the two players happen to perceive the relative status similarly, or if the second player perceives equal or larger status disparity, s/he will likely accept. Maybe even think of the first player as foolish for offering too much and being a lousy bargainer. A rejection would often be due to the status-related outrage ("Who does s/he think s/he is to offer me only a pittance?") So, if you think that being in control of how much to offer raises your status, you are likely to offer less, and if you think that not having any say in the amount automatically makes you lower status, you would be likely to accept a low offer. Thus I would expect that in a society where equality is not considered an unalienable right, but is rather determined by material possessions, the average accepted offer would be lower. Not sure if this matches the experimental results.
5Vaniver9yThanks for posting one of the comparative ultimatum game studies! I knew they were out there but didn't remember quite where.

There are many problems here.

At the end of paragraph 2 and the other examples, you say

This exactly mirrors the Prisoner's Dilemma.

But it doesn't, as you point out later in the post, because the payoff matrix isn't D-C > C-C > D-D, as you explain, but rather C-C > D-C > C-D, because of reputational effects, which is not a prisoner's dilemma. "Prisoner's dilemma" is a very specific term, and you are inflating it.

evolution is also strongly motivated [...] evolution will certainly take note.

I doubt that quite strongly!

The evolutionarily dominant strategy is commonly called “Tit-for-tat” - basically, cooperate if and only if you expect your opponent to do so.

That is not tit-for-tat! Tit-for-tat is start with cooperate and then parrot the opponent's previous move. It does not do what it "expects" the opponent to do. Furthermore, if you categorically expect your opponent to cooperate, you should defect (just like you should if you expect him to defect). You only cooperate if you expect your opponent to cooperate if he expects you to cooperate ad nauseum.

This so-called "superrationality” appears even more [...]

That is not superrationalit... (read more)

I agree with pretty much everything you've said here, except:

You only cooperate if you expect your opponent to cooperate if he expects you to cooperate ad nauseum.

You don't actually need to continue this chain - if you're playing against any opponent which cooperates iff you cooperate, then you want to cooperate - even if the opponent would also cooperate against someone who cooperated no matter what, so your statement is also true without the "ad nauseum" (provided the opponent would defect if you defected).

2Grognor9yYou're right. I assumed symmetry, which was wrong.
9shokwave9yYou're reading this uncharitably. There are also parts that are unclear on Yvain's part, sure, but not to the extent that you claim. The original group project situation Yvain explores does mirror the Prisoner's Dilemma. Then, later, he introduces reputational effects to illustrate one of the Real World Solutions to the Prisoner's Dilemma [http://lesswrong.com/lw/del/real_world_solutions_to_prisoners_dilemmas/] that we have already developed. It's not made crystal clear.... Well, actually it is. ... I understood Yvain to be speaking metaphorically, or perhaps tongue-in-cheek, when talking about what evolution would take note of. I believe this was his intention, and furthermore is a reasonable reading given our knowledge of Yvain. I expect that Yvain used 'rational' against the theme of LW on purpose, to create a tension - rationality failing to outperform emotional hangups is a contradiction, that would motivate readers to find the false premise or re-analyse the situation. I do concur with your point about tit-for-tat. Similarly for super-rationality; although it's possible Yvain is not familiar with Hofstadter's definition and was using 'super' as an intensifier, it seems unlikely.
7wedrifid9yI had this downvoted based on on form and irritating tone before I looked closely and decided enough of the quotes from Yvain are, indeed, plainly wrong and I encourage hearty dismissal. Agree. Who is he and what has he done to the real Yvain?
4cousin_it9yGreat critique! The first time I read the post, I stopped reading when "tit-for-tat" and "superrationality" were misused in two consecutive sentences. Sadly, that part seems to be still inaccurate after Yvain edited it, because TFT is not dominant in the 100-fold repeated PD, if the strategy pool contains strategies that feed on TFT [http://lesswrong.com/lw/7l3/fixedlength_selective_iterative_prisoners_dilemma/] .
0wedrifid9yTo be fair he doesn't seem to make the claim that TFT is dominant in the fixed length iterated PD. (I noticed how outraged I was that Yvain was making such a basic error so I thought I should double check before agreeing emphatically!) Even so I'm not comfortable with just saying TFT is "evolutionarily dominant" in completely unspecified circumstances.
1Scott Alexander9yYou'll notice I used scare quotes around most of the words you objected to. I'm trying to point out the apparent paradox, using the language that game theorists and other people not already on this website would use, without claiming that the paradox is real or unsolvable.
4Vladimir_Nesov9y"This so-called "superrationality” " in the post is still wrong, I think. Would work without "so-called", since the meaning is clear from the context, but it's not conventional usage.
1MarkusRamikin9yThis is disturbing. I had been looking forward to the sequence, but based on this comment and some others I'm starting to wonder how accurate it is overall.

This is disturbing. I had been looking forward to the sequence, but based on this comment and some others I'm starting to wonder how accurate it is overall.

Overall, it is accurate. There are nits to pick, sure, and some loose language, but the actual math and theory is solid so far. The generalisations to real life

If I am assigned to work on a school project with a group ... Diplomacy - both the concept and the board game ... Civilization - again, both the concept and the game ... global warming ... etc

are correct as well.

1sixes_and_sevens9yBy definition [http://lesswrong.com/lw/nz/arguing_by_definition/]? I am largely appreciative of your overall comment, but "rational" is a historically legitimate term to describe naive utility-maximisers in this manner. The original post introduced it in inverted commas, suggesting a special usage of the term. While there are less ambiguous ways this could have been expressed, it seems to me the main benefit of doing so would be to pre-empt people complaining about an unfavourable usage of the term "rational". Your response to it seems excessive.
7VincentYu9yAgreed. I want to point out that Eliezer's (and LW's general) use of the word 'rationality' [http://lesswrong.com/lw/31/what_do_we_mean_by_rationality/] is entirely different from the use of the word in the game theory literature, where it usually means VNM-rationaliity [https://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem], or is used to elaborate concepts like sequential rationality [http://scholar.google.com/scholar?q=%22sequential+rationality%22] in SPNE [https://en.wikipedia.org/wiki/Subgame_perfect_equilibrium]-type equilibria. ETA: Reading Grognor's reply to the parent, it seems that much of the negative affect is due to inconsistent use of the word 'rational(ity)' on LW. Maybe it's time to try yet again [http://lesswrong.com/lw/2z/taboo_rationality_please/] to taboo LW's 'rationality' to avoid the namespace collision with academic literature.

I want to point out that Eliezer's (and LW's general) use of the word 'rationality' is entirely different from the use of the word in the game theory literature

And the common usage of 'rational' on lesswrong should be different to what is used in a significant proportion of game theory literature. Said literature gives advice, reasoning and conclusions that is epistemically, instrumentally and normatively bad. According to the basic principles of the site it is in fact stupid and not-rational to defect against a clone of yourself in a true Prisoner's Dilemma. A kind of stupidity that is not too much different to being 'rational' like Spock.

ETA: Reading Grognor's reply to the parent, it seems that much of the negative affect is due to inconsistent use of the word 'rational(ity)' on LW. Maybe it's time to try yet again to taboo LW's 'rationality' to avoid the namespace collision with academic literature.

No. The themes of epistemic and instrumental rationality are the foundational premise of the site. It is right there in the tagline on the top of the page. I oppose all attempts to replace instrumental rationality with something that involves doing stupid things.

I do endorse avoiding excessive use of the word.

6VincentYu9yThis is a recurring issue, so perhaps my instructor and textbooks were atypical: we never discussed or even cared whether someone should defect on PD in my game theory course. The bounds were made clear to us in lecture – game theory studies concepts like Nash equilibria and backward induction (using the term 'rationality' to mean VNM-rationality) and applies them to situations like PD; that is all. The use of any normative language in homework sets or exams was pretty much automatically marked incorrect. What one 'should' or 'ought' to do were instead relegated to other courses in, e.g, economics, philosophy, political science. I'd like to know from others if this is a typical experience from a game theory course (and if anyone happens to be working in the field: if this is representative of the literature). Upon reflection, I tend to agree with these statements. In this case, perhaps we should taboo 'rationality' in its game theoretic meaning – use the phrase 'VNM-rationality' whenever that is meant instead of LW's 'rationality'.
-1[anonymous]9yHow about let's not taboo anything, just make it clear up front what is meant when really necessary. I would prefer that because I think such taboos contribute to the entry barrier for every LW newcomer; I don't want newcomers used to the game-theoretic jargon to keep unwittingly running afoul of this and getting downvoted. Perhaps it would be useful if Yvain inserted a clarification of this early in the Sequence.
-1wedrifid9yThe normative claim is one I am making now about the 'rationality' theories in question. It is the same kind of normative claim I make when I say "empirical tests are better than beliefs from ad baculum". I could agree to that---conditional on confirmation from one of the Vladimirs that the axioms in question do, in fact, imply the faux-rational (CDT like) conclusions the term would be used to represent. I don't actually see it at a glance and would expect another hidden assumption to be required. I wouldn't be comfortable using the term without confirmation.
-1VincentYu9yI quoted badly; I believe there was a misunderstanding. The first quote in the parent to this should be taken in the context of your sentence segment that "Said literature gives advice". In my paragraph, I was objecting to this from my experiences in my course, where I did not receive any advice on what to do in games like PD. Instead, the type of advice that I received was on how to calculate Nash equilibria and find SPNEs. Otherwise, I am mostly in agreement with the latter part of that sentence. (ETA: That is, I agree that if current game theoretic equilibrium solutions are taken as advice on what one ought to do, then that is often epistemically, instrumentally, and normatively bad.) More ETA: You are correct – VNM-rationality is incredibly weak (though humans don't satisfy it [https://en.wikipedia.org/wiki/Allais_paradox]). It is, after all, logically equivalent to the existence of a utility function (the proof of this by von Neumann and Morgenstern led to the eponymous VNM theorem [https://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem] ). The faux-rationality on LW and in popular culture requires much stronger assumptions. But again, I don't think these assumptions are made in the game theory literature – I think that faux-rationality is misattributed to game theory. The game theory I was taught used only VNM-rationality, and gave no advice.
0drnickbone9yI remember hearing about studies where economics and game theory students ended up less "moral" by many usual measures after completing their courses. Less inclined to co-operate, more likely to lie and cheat, more concerned about money, more likely to excuse overtly selfish behaviour and so on. And then these fine, new, upstanding citizens, went on to become the next generation of bankers, traders, stock-brokers, and advisers to politicians and industry. The rest as they say is history.
-2PhilosophyTutor9ySaid literature makes statements about what is game-theory-rational. Those statements are only epistemically, instrumentally or normatively bad if you take them to be statements about what is LW-rational or "rational" in the layperson's sense. Ideally we'd use different terms for game-theory-rational and LW-rational, but in the meantime we just need to keep the distinction clear in our heads so that we don't accidentally equivocate between the two.
0wedrifid9yDisagree on instrumentally and normatively. Agree regarding epistemically---at least when the works are careful with what claims are made. Also disagree with the "game-theory-rational", although I understand the principle you are trying to get at. A more limited claim needs to be made or more precise terminology.
-2PhilosophyTutor9yI would be interested in reading about the bases for your disagreement. Game theory is essentially the exploration of what happens if you postulate entities who are perfectly informed, personal utility-maximisers who do not care at all either way about other entities. There's no explicit or implicit claim that people ought to behave like those entities, thus no normative content whatsoever. So I can't see how the game theory literature could be said to give normatively bad advice, unless the speaker misunderstood the definition of rationality being used, and thought that some definition of rationality was being used in which rationality is normative. I'm not sure what negative epistemic or instrumental outcomes you foresee either, but I'm open to the possibility that there are some. Is there a term you prefer to "game-theory-rational" that captures the same meaning? As stated above, game theory is the exploration of what happens when entities that are "rational" by that specific definition interact with the world or each other, so it seems like the ideal term to me.
0wedrifid9yUnder this definition you can't claim epistemic accuracy either. In particular the 'perfectly informed' assumption when combined with the personal utility maximization leads to different behaviors to those described as 'rational'. (It needs to be weakened to "perfectly informed about everything except those parts of the universe that are the other agent.) This isn't about the agents having selfish desires (in fact, they don't even have to "not care at all about other entities"---altruism determines what the utility function is, not how to maximise it.) No, this is about shoddy claims about decision theory that are either connotatively misleading or erroneous depending on how they are framed. All those poor paperclip maximisers who read such sources and take them at face value will end up producing less paperclips than they could have if they knew the correct way to interact with the staples maximisers in contrived scenarios.
-5PhilosophyTutor9y
3Grognor9yI think this was a legitimate use of "by definition", since it's the definition we use on this website. You're right that "rational" has often meant "blindly crunching numbers without looking at all available information &c." but I thought we had a widespread agreement here not to use the word like that. You're right that my response seems excessive, but I don't know if it actually is excessive rather than merely seeming so.
1Viliam_Bur9yA term "bad rationality [http://wiki.lesswrong.com/wiki/Valley_of_bad_rationality]" is also used on this website. It is a partial rationality, and it may be harmful. On the other hand, as humans, partial rationality is all we have, don't we? But now I am discussing labels on the map, not the territory.
-1sixes_and_sevens9yYou're attaching a negative connotation where there doesn't have to be one. In econ and game theory literature, "rational" means something else, not necessarily something bad. It also refers to something specific. If we want to talk about that specific referent, we have limited options. I would propose suffixing alternative uses of the word "rational" with a disambiguating particle. Thus above, Yvain could have used "econ-rational". If we ever have cause to talk about the Rationalist philosophical tradition, they can be "p-Rationalists". Annoyingly, I don't actually believe we need to do this for disambiguation purposes.
0Cyan9yundefined
1Andreas_Giger9yUpvoted. By the way, I think you meant to type "C-C > D-C > D-D" instead of "C-C > D-C > D-C", and you might also want to include C-D. Not sure whether C-D is supposed to be last or second last in Yvain's example, because of the reputational effects.

This sequence is really great - thank you for writing it!

1Kaj_Sotala9ySeconded - game theory is awfully useful, and this sequence is making a great job of explaining it in a clear and engaging way.
2wedrifid9yDo you say this after reading this particular post? The others were good, this one is embarrassing.
1Kaj_Sotala9yI thought this was good, even if he could have been a little more precise with regard to some terms.

Game theory is particularly interesting because it adds up to normalcy so fast - very simple math on very simple situations very rapidly describes real life, macro-level behaviour.

Tom Siegfried has a powerful quote: Game theory captures something about how the world works. (note: bizarre HTML book setup, read pg 73-5)

Tangentially, I now understand exactly what I don't like about Eric S. Raymond's morality:

I am among those who fear... that the U.S. response to 9/11 was not nearly as violent and brutal as it needed to be. To prevent future acts of this kind, it is probably necessary that those who consider them should shit their pants with fear at the mere thought of the U.S.’s reaction.

... the correct response to a person who says “You do not own yourself, but are owned by society (or the state), and I am society (or the state) speaking.” is to injure him as gravely as you think you can get away with ... In fact, I think if you do not do violence in that situation you are failing in a significant ethical duty.

A precommitment to retribution is effective when dealing with "rational" agents or CDT agents. In fact a self-interested TDT agent in a world of CDT agents would do well by retaliating against all injuries with disproportionate force. (And also issuing extortionate threats; to be fair, ESR doesn't advocate this.) If you buy Gary Drescher's reduction of morality to decision theory, this is where the moral duty of revenge comes from. But a superrational agent in a world of supe... (read more)

6Desrtopa9yI think the utility-maximizing reasons to avoid disproportionate punishment in such cases among humans have more to do with likely perpetrators being somewhat blind to such disincentives (remember, these are people who attack others by killing themselves,) and the fact that nations operate in a reputation system where acts of disproportionality which are too large tend to attract negative reputation. Since humans tend to operate under friendship/anger/fairness formulations rather than utility maximizing ones, a sultan who honors a precommitment to cut off the head of a man who elopes with his daughter is seen as reasonable in circumstances (including a historically common degree of protectiveness of one's offspring) where one who massacred the man's entire village to be even more sure of deterring other attempts to cross him would be viewed as cruel and tyrannical.
4[anonymous]7yI'd be curious to see what the results of a tournament of iterated PD with noise (i.e., each move is flipped with probability 5% -- and the opponent will never know the pre-noise move) would be.
4gwern7yhttp://en.wikipedia.org/wiki/Trembling_hand_perfect_equilibrium [http://en.wikipedia.org/wiki/Trembling_hand_perfect_equilibrium] may be a useful starting point.
2[anonymous]8yThe link doesn't go to where I think it was supposed to [http://esr.ibiblio.org/?p=1068].
2Nisan8yThanks!
0[anonymous]8yNot exactly about the same thing, but see this [http://lesswrong.com/lw/24o/eight_short_studies_on_excuses/].

The evolutionarily dominant strategy is commonly called “Tit-for-tat” - basically, cooperate if and only if you expect your opponent to do so.

No, Tit-for-Tat co-operates if and only if the other player co-operated last time. It works only in an iterated Prisoner's Dilemma, where you have multiple interactions with the same player.

Co-operate if and only if you expect the other player to co-operate (because of reputation, emotional behaviour etc.) is a quite different strategy. Strategies with some reputational or prediction element like this will work i... (read more)

The evolutionarily dominant strategy is commonly called “Tit-for-tat” - basically, cooperate if and only if you expect your opponent to do so.

That strategy is neither evolutionarily dominant nor "tit-for-tat". Tit-for-tat is applicable in the Iterated Prisoner's Dilemma with unknown duration and involves cooperating on the first round thereafter doing whatever the opponent did in the round before the current round. As the name implies it is somewhat like a specific implementation of "eye for an eye".

As for evolutionary dominance the... (read more)

I like it, but suggest that you link back to the previous entry in the sequence and/or the sequence index.

0Kenny6yThe sequence index [http://wiki.lesswrong.com/wiki/Introduction_to_Game_Theory_(Sequence] The previous post – Introduction to Prisonner's Dilemma [http://lesswrong.com/lw/dd3/introduction_to_prisoners_dilemma/] The following post – Interlude for Behavioral Economics [http://lesswrong.com/lw/dgc/interlude_for_behavioral_economics/]
[-][anonymous]9y 7

You may want to be more careful about using game theory on real-world problems. Game theory makes a lot of assumptions (some explizit, others implizit) that most of the time are not given in real life.

You will even have a hard time to find good examples for real life prisoners who are in a game theoretic PD. In reality, most of the times the prisoners dilemma looks rather like this: Same payoff matrix as the classical PD, BUT both prisoners may chose to break their silence any time. Once a prisoner has confessed, there is no more going back to silence. T... (read more)

1[anonymous]9yOf course, you can still model this with game theory, but you need to break "turns" into smaller units (Planck seconds, if you want to go all the way), as for iteration vs something being a one shot game, you could say either of these is universally the case based on definitions of what is a repetition of the same game, and what is different enough to qualify as a new scenario. So game theory is not broken for real world problems, but like any theory I have seen when you scale it up from a simple puzzle to interactions with the universe you make the problem more difficult.

Many uses of the word "rational" here were fine ("rational economic agent" is understandable), but others really bothered me ("It is distasteful and a little bit contradictory to the spirit of rationality to believe it should lose out so badly to simple emotion" -- why perpetuate the Spock myth? I want to show this to my friends!). I have no specific suggestion at hand, but circumlocuting around the word in some of the cases above would bring the article from excellent to perfection.

Excellent article, but the image is too wide so it gets cropped by LW's fixed-width content section. Looks like this. Using Firefox 9.0.

0Vladimir_Nesov9yReduced the width from 800px to 700px (fixed the issue for me).

We all enjoy defecting of a salesman, who doesn't cooperate holding a price high, but defect and lower it to have a gain.

The defection in economy has its implication in this mechanism of pricing.

The defecting is just as crucial!

2Raoul5898yIn this way, defection seems to have two social meanings: Defecting proactively is betrayal. Defecting reactively is punishment. We seem to have strong negative opinions of the former and somewhat positive opinions of the latter. I think in your salesman example you're talking about punishment being crucial. In fact, the defection of the customer is only necessary as a response to the salesman's original defection. I am curious as to whether you have a similarly real life example of where proactive defection (i.e. betrayal) is crucial (for some societal or group benefit)?
4wedrifid8yAnd for this reason we tend to be predisposed to interpreting the behavior of enemies as 'proactive/betrayal' and our own as 'reactive/punishment' (where we acknowledge that we have defected at all).

Prisoners' Dilemmas even come up in nature. In baboon tribes, when a female is in “heat”, males often compete for the chance to woo her. The most successful males are those who can get a friend to help fight off the other monkeys, and who then helps that friend find his own monkey loving. But these monkeys are tempted to take their friend's female as well. Two males who cooperate each seduce one female. If one cooperates and the other defects, he has a good chance at both females. But if the two can't cooperate at all, then they will be beaten off by othe

... (read more)

Reputation is a way to change many one-time Prisonner's Dilemmas into one big Iterated Prisonner's Dilemma, where mutual cooperation is the best strategy for rational players. But how exactly does it work in real life?

I guess it works better when a small group of people interact again and again; and it works worse in a large group of people where many interactions are with strangers. So we should expect more cooperation in a village than in a big city.

Even in big cities people can create smaller units and interact more frequently within these units. So the... (read more)

8RichardKennaway9yAnd they are -- itinerant people are universally less trusted than the ones with home addresses.
0AspiringRationalist9yPeople are hard-wired to obey social norms and cooperate with members of their communities even in the absence of personal consequences like reputation. Most people would not steal from a stranger even if they knew they would not be caught.
1wedrifid9yPeople are hard-wired to manage reputation through some forms of cooperation without explicitly thinking about the consequences to reputation in each instance.
0TimS9yCan you give any example of people saying the two types of judgment are comparable? As you say, there's a sense in modern society that unchosen traits should not be treated with moral disdain. But the analysis is totally different for chosen traits.
3Viliam_Bur9yTwo real-world examples, but both can also be interpreted differently: * SlutWalks [http://en.wikipedia.org/wiki/SlutWalk] * hoodies [http://www.contracostatimes.com/news/ci_20360913/richmond-marchers-wear-hoodies-call-end-discrimination] In the first case, people are protesting against claim that "dressing like a slut increases the probability that the woman will be raped". Of course the discussion is not strictly Bayesian, but mostly about connotations. In the second case, people are protesting the fact that looking like a criminal, while not being a criminal increases their probability of being killed in a supposed self-defense. (The second example seems like an ad-absurdum version of anti-discrimination, but apparently those people mean it.)
0TimS9yBehaviors can change in frequency. Debates about whether to punish behaviors are debates about whether a decrease in frequency of the behavior (dressing sexually provocatively or conforming to the norms of a lower-status subgroup) is desired. But contrast, non-behavior characteristics don't change frequency. Productive social reactions are about whether the characteristic should be accommodated (red heads - yes, ax-crazy murders - no). The difference in the topics of the two debates makes me think that attempting to draw them in parallel is misleading.
0AspiringRationalist9yWhether a decrease in the frequency of the behavior is desired is only one piece of the debate. Other important pieces (from a consequentialist perspective) include how effective the punishment will be, how costly it will be to implement the punishment and what the side effects will be. Even if, for example, society collectively decides that if fewer women dressed like sluts there would be fewer rapes, it does not immediately follow that dressing that way should be a punishable offense.
-1wedrifid9yInteresting. Source?
1Viliam_Bur9yJust my prediction. An example in my mind was an interaction between a state (represented by some person) and individual: e.g. if you are entitled to receive a support in unemployment, you will get it, even if the common sense makes it obvious that you are just abusing the rules; as long as you pretend to follow them. This is open to interpretation, but my understanding is: "help unemployed people" = cooperate, "let them die" = defect; "inform truthfully about your employment" = cooperate, "falsely pretend to be unemployed (while making money illegally)" = defect. I suppose there are more examples like that, which could be generalized that a state (or other big organization) becomes a CooperateBot when trying to achieve a win/win situation, and is abused later.

Another, mostly unrelated comment: the ultimatum game can actually tell you two different things. First, what divisions do people propose, and second, what divisions do people accept?

Presumably, everyone accepts fair divisions. Different groups of people have different percentages that reject unfair divisions, and different percentages that offer unfair divisions (a simplification, since the degree of fairness can also be varied). There are four potential clusters: groups that propose fair and accept unfair, groups that propose fair and reject unfair, grou... (read more)

If I defect but you cooperate, then I get to spend all day on the beach and still get a good grade - the best outcome for me, the worst for you.

! No, it's not. The "you" in this example prefers getting a good grade and privately fuming about having to do the work themselves to failing and not having to do the work. (And actually, for many of the academic group projects I've been involved in, it's less and happier work for the responsible member to do everything themselves, because there's too little work for too many people otherwise.)

The bas... (read more)

0mwengler9yI don't understand this, please explain. Suggesting to "add other considerations to the game until it is no longer the prisoner's dilemma when denominated in utility" seems about equivalent to lying to yourself about what you really want (what your utility is) in order to justify doing something. So if you have a chance to explain what you mean I would appreciate it.
2Vaniver9y"Features" might be a clearer word than "considerations." [edit] For example, consider the group project example. If you just look at the amount of work done, it's a prisoner's dilemma, with the result that no one does any work. When you look at the whole situation, you notice that, oh, grades are involved- and so it's not a prisoner's dilemma at all, because everyone (presumably) prefers passing the project and working to failing the project and not working. That's a 'consideration' or 'feature' that gets added to the game to make it not a PD. "Utility" is the numerical score you assign to the desirability of a particular future, and it has some neat properties when it comes to probabilistic reasoning. For example, if going to the beach has a utility of 5 and hitting yourself in the head with a hammer has a utility of -20, then a 80% chance of going to the beach and a 20% chance of hitting yourself in the head with a hammer has a utility of 0, and you should be indifferent between that gamble and some other action with a utility of 0. If you aren't indifferent, then you mismeasured your utilities! Game theory is the correct way to go from payoff matrices to courses of action, but the issue is that the payoff matrices are subjective. Consider one gamble, in which you and your partner both go free with 95% probability and both go to prison for 20 years with 5% probability. Would you be indifferent between that and it being certain that you would go to prison for 1 year and your partner would go to prison for 20 years? The Alice and Bob Yvain described would be. It's what they really want. Is it any surprise that the right play for monsters like that is to defect? For most humans, that's not what they really want. They do actually care about the well-being of others; they care about having a good reputation and high standing in their community; they care about being proud of themselves and their actions. Their utility score is more than just 0 minus the number of y

Most people, when they hear the rational arguments in favor of defecting every single time on the iterated 100-crime Prisoner's Dilemma, will feel some kind of emotional resistance. Thoughts like “Well, maybe I'll try cooperating anyway a few times, see if it works”, or “If I promised to cooperate with my opponent, then it would be dishonorable for me to defect on the last turn, even if it helps me out., or even “Bob is my friend! Think of all the good times we've had together, robbing banks and running straight into waiting police cordons. I could never

... (read more)
3Scott Alexander9yNo.

though not quite as good as me cooperating against everyone else's defection.

Shouldn't it be the other way around? (you defecting while everyone else cooperates)

ETA: liking this sequence so far, feels like I'm getting the concepts better now.

0[anonymous]9yShhhh! We can totally defect now without feeling the least bit guilty!

What is your source for your baboon anecdote? it is contrary to what I have read, eg, in Baboon Metaphysics. Or here:

Lower-ranking males form alliances and can harass newly immigrated but dominant males and protect adult females with which they have bonds. Overall, though, dominance rank of a male indicates reproductive success so that high-ranking males have more mating opportunities, more offspring, and increased fitness compared to lower-ranking males

[-][anonymous]9y 0

So far as I can analyse, isn't a hostage negotiation a Prisoners Dilemma too? (Terrorists can spare (C) or kill (D), Government can pay (C) or raid (D))

4shokwave9yNo, it's a game of Chicken. It's not a Prisoner's Dilemma because when the government pays, the terrorists gain no extra value from killing a hostage.
2Kaj_Sotala9yAlso, if the government cooperates, that will encourage other terrorists to take more hostages later on, making the CC payoff unsymmetrical.
2wedrifid9yI'm actually curious as to whether this has been studied in practice. This is the kind of thing I expect people with big egos to do regardless of whether the actual expected value is positive.
2Andreas_Giger9yIt's not Chicken either, because of the reason you just gave. Edit: Seeing how this post got downvoted with no reply being posted, I have to assume it was someone who doesn't know much about game theory, so let me explain: If the terrorists gain no extra value from killing a hostage if the government pays, then DC > CC is false for the terrorist side; however both PD and Chicken are symmetrical problems that require DC > CC to be true for all sides. Therefore, this problem is neither PD nor Chicken.
0wedrifid9yBut the hostages are (presumably) infidels or something enemy-like.
2Andreas_Giger9yYes, in that case it would actually be PD.
[-][anonymous]9y -4

Most people, when they hear the rational arguments in favor of defecting every single time on the iterated 100-crime Prisoner's Dilemma, will feel some kind of emotional resistance.

The rationalist strategy is not to defect from the beginning, but to cooperate till somewhere into the 90s

1wedrifid9yThat certainly isn't implied by the problem and the agent. It depends entirely on what the opponent is expected to do. I would use that strategy when playing against an average lesswrong user. I wouldn't use it when playing against some overconfident kid who has done some first year economics classes. I wouldn't use it when playing against a CDT bot or a TDT agent.