"Extortionate" strategy beats tit-for-tat in iterated Prisoner's Dilemma

by teageegeepea1 min read26th Jun 201212 comments

4

Personal Blog

Less Wrong had a Prisoner's Dilemma contest some time back, whose results I've forgotten. Perhaps it should be rerun with William H. Press & Freeman Dyson's proposed extortionate strategies.

I hope Pinker gives a response at Edge.org, since P.D played a significant role his book "The Better Angels of Our Nature" as a source of morality embedded in the nature of logic/reality.

Hat-tip to Marginal Revolution.

12 comments, sorted by Highlighting new comments since Today at 1:27 PM
New Comment

The linked article is a complete waste of time as the authors don't bother to explain what the extortionate strategy is, only insist that it turns the game into an ultimatum. And the title must be a lie, since halfway through, it explicitly says TFT gets the same score as its opponent. (In other words, it doesn't get "beat" by anything.) So the parts of the article that are true are useless, the parts that are supposedly interesting are asserted, unexplained, and the title is certainly false. Downvoted.

There was a previous post about this topic that actually linked to the paper, which I think you'll be happier with.

In particular, what the extortionate strategy does is the following: if player 2 accepts that player 1 will play the extortionate strategy, and there's nothing to be done about that, then there is a linear relation between their scores, and he can only maximize his score by giving an even higher score to player 1. In particular, if player 2 plays TFT (which is also an extortionate strategy, in a degenerate sense, with extortion factor 1) then the two players eventually end up in the (Defect, Defect) state, and get 0 points per turn, which satisfies both relations.

How does this actually get implemented in code?

All of the "ZD strategies" are described by 4-tuples of probabilities: the probabilities of cooperation given the outcome of the previous turn, which can be one of (CC, CD, DC, DD). In comments to the previous post I calculated two examples, and the paper contains the general formulas in equations [8] and [12].

Ah, thank you. Made that much clearer for me; I had the slightly incorrect impression that a ZD strategy was any strategy that could be described by such a 4-tuple, but I didn't make the connection that the evolution could apply directly to the probabilities instead of the strategy that generated the probabilities.

Perhaps it should be rerun

It is, in fact. I'm finalising the code and the announcement post. Expect it in a couple of days.

William H. Press & Freeman Dyson's proposed extortionate strategies.

As I understand them (which is admittedly not much; the paper is surprisingly dense and metaphorical) can take advantage of 'evolutionary' strategies but not strategies with a 'theory of mind' - and it appears that tit-for-tat falls under the 'theory of mind' category. Although I would state once more that I'm not sure I understand the paper.

How's your progress? You said you were "finalising the code and the announcement post", so I'm wondering whether you ran into any unexpected problems since then?

Unexpected, yes; problems, no. I'm currently re-implementing the entire thing in Clojure, as it gives me a really elegant, simple, "simulate my opponent" function (which is exciting!) and makes everything else much neater to boot. I've also gotten a friend interested in this project; he's probably going to help me build a semi-permanent results / interaction page for the project - something like this and this put together. On that second page, about shuffle algorithms, you can enter your own custom shuffling method. We are excited about having something like that for bots - visualise the data, and enter your own custom strategy to see how it performs. This lets us have a sort of long-running 'informal' tournament. But it is still on its way!

Fantastic! I'm very much interested in trying my hand at developing a strategy, thank you for putting this work together and I'm sure you will get a lot of positive feedback, once you finish it and release it.

Press and Dyson's setup had two areas where 'strategies' come into play.

The first area is the set of four probabilities you provide to the game, which determine your score when combined with the other player's set of four probabilities. Tit-for-tat is one particular choice of four probabilities (and, based on the nature of the game, should actually be represented as "slightly forgiving tit-for-tat", which cooperates with probability epsilon in the defect-defect or defect-cooperate case, so that when playing against itself all states will terminate with cooperate-cooperate).

The second area is how the players modify those sets over time. Here, 'theory of mind' is relevant: both players with and without theories of mind can play any particular 4-probability set, like tit-for-tat. Players with theory of mind think (at least) two steps ahead- when I change my probabilities, how will my opponent change their probabilities? Players without theory of mind think only one step ahead- given my opponent's probabilities, which play maximizes my score?

I am skeptical about the more complex meta-strategies discussed in the interview. If you play a ZD strategy, but switch to a different ZD strategy if after 100 moves you're not doing as well as you should be, then you're not playing a ZD strategy: you're playing some different strategy that has a 100-move memory. The extortion arguments only go through if you set a strategy at the beginning of time and never touch it again, in which case there is no place for bargaining.