Announcing the 2014 program equilibrium iterated PD tournament

by tetronian21 min read31st Jul 201463 comments

38

ProgrammingPrisoner's Dilemma
Personal Blog

Last year, AlexMennen ran a prisoner's dilemma tournament with bots that could see each other's source code, which was dubbed a "program equilibrium" tournament. This year, I will be running a similar tournament. Here's how it's going to work: Anyone can submit a bot that plays the iterated PD against other bots. Bots can not only remember previous rounds, as in the standard iterated PD, but also run perfect simulations of their opponent before making a move. Please see the github repo for the full list of rules and a brief tutorial.

There are a few key differences this year:

1) The tournament is in Haskell rather than Scheme.

2) The time limit for each round is shorter (5 seconds rather than 10) but the penalty for not outputting Cooperate or Defect within the time limit has been reduced.

3) Bots cannot directly see each other's source code, but they can run their opponent, specifying the initial conditions of the simulation, and then observe the output.

All submissions should be emailed to pdtournament@gmail.com or PM'd to me here on LessWrong by September 15th, 2014. LW users with 50+ karma who want to participate but do not know Haskell can PM me with an algorithm/psuedocode, and I will translate it into a bot for them. (If there is a flood of such requests, I would appreciate some volunteers to help me out.)

63 comments, sorted by Highlighting new comments since Today at 7:24 AM
New Comment

I don't think this game is a prisoner's dilemma, since the tournament implies that all pure deterministic strategies yield the same outcome (an all-players tie) if applied by everyone, while the interesting thing about PD is that mutual cooperation is preferable to mutual defection.

Also, it is not a "program equilibrium" game, since running simulations gives much less information than analyzing the source code. For instance, clique strategies, which are often occur in the Nash equilibria of program equilibrium games, are impossible in your game.

Good points. Simulations do give much less information than visible source code--this was a deliberate design choice on my part, but you are probably right that this kind of setup is not really "program equilibrium." I admit, I don't have a very good understanding of where the limits of the PD game really are, and I hadn't considered the problem of the all-players tie (fortunately I don't think it is going to happen in practice). Mutual cooperation and cliques are still viable strategies here, but they may not be the Nash equilibrium anymore. Regardless, I agree that this tournament is horribly misnamed.

Is there a particular reason you've decided on using exactly 100 rounds per bout? Having a fixed end-point can cause some unusual behaviour to crop up at the end turn, and depending on how many levels of recursion are applied, multiple turns before that.

(One alternative I've seen is that after any given round, there's a 99.30925% chance there will be another, which gives close to 50% odds that the game will last 100 turns or less.)

I figured that this game is sufficiently different from the classical PD that the same Nash equilibrium/style of thinking isn't going to apply, and I was interested to see what participants can do with the fixed-size rounds. So it's just an arbitrary experimental parameter that I'm curious about.

I am glad to see this trend continuing!

You seem to have neglected to include the Bitcoin address to contribute to the prize pool.

I pledge to contribute 10 mBTC to the prize pool and 1 mBTC tip to the organizer once the relevant addresses are publicly posted. This is conditional on a vaguely sane prize distribution policy, and a pledge to return the contribution if there are insufficient submissions to be interesting. (I'm hoping that 11 mBTC is a small enough amount that no one feels a need to quibble over the fact that my pledge is pretty vague. If you'd like me to make it less vague, please say so.)

Found something slightly amusing:

-- Simulate my opponent playing a round against me, and to the opposite of
-- whatever my opponent does. Limit my opponent to 10ms, defect if they go
-- over.
trollBot :: Bot
trollBot = Bot run where
    run op hist = do
        simulation <- time 10000 . runBot op mirrorBot $ invert hist
        return (case simulation of
                Nothing   -> Cooperate
                Just Cooperate -> Defect
                Just Defect -> Cooperate)

When I enter trollBot into the simulation tournament, it actually ends up doing better than any of the default bots during the first round, pretty consistently. It also results in a win for defectBot.

I'm puzzled as to why trollBot does as well as it does. Is it just a function of the particular players it's up against, or is that actually a viable strategy?

Limit my opponent to 10ms, defect if they go over.

You actually cooperate in this case.

Quick analysis: you're going to defeat CooperateBot 500 points), lose against DefectBot (0 points), and tie against TitForTatBot (250 points from alternating D/C and C/D). Against RandomBot, you are also RandomBot, both of you scoring 225 on average.

When you simulate MirrorBot, the infinite recursion makes ver time out, so you cooperate. So MirrorBot cooperates against you as well (300 points). SmarterMirrorBot and JusticeBot both time out as well. SmarterMirrorBot can't work out what you'll do, and cooperates (300 points). JusticeBot may or may not be able to work out what you'll do against CooperateBot, and defects either way (0 points).

But I think TitForTatBot should beat that, at least: 300 against CooperateBot, 99 against DefectBot, 300 against JusticeBot, 223.5 against RandomBot, and all other scores the same.

So, I'm puzzled too, if TrollBot is getting the highest score in the first round.

Limit my opponent to 10ms, defect if they go over.

You actually cooperate in this case.

Whoops. Effect goes away if I fix it, too.

Here are the average results for the first round:

http://pastebin.com/qN5H2A25

For some reason, TrollBot always wins 500 / 0 against SmarterMirrorBot. DefectBot actually beats TrollBot by a narrow margin (1604 - 1575 = 29 points) on average, but there is quite a bit of randomness from RandomBot, so TrollBot often comes out ahead of even DefectBot, and they both come out over 100 points ahead of the next bot (TitForTatBot).

Since I built TrollBot as a sanity check on my actual bot to make sure that it would defect against TrollBot, I was definitely surprised by the fact that TrollBot outperformed not only my attempt at a serious bot, but also most of the other bots... :/

Oh! You're also running your opponent playing a game against MirrorBot, not against TrollBot.

Which I still don't understand... you run SMB versus MB, time limit 10000. SMB runs MB versus MB, time limit 10000. MB versus MB times out, which meas SMB runs for > 10000 us, which means that SMB should time out and you should cooperate.

Meanwhile, SMB runs you versus MB, time limit 10000. You run MB versus MB, time limit 10000. That times out, so you time out, so SMB cooperates.

But you're defecting, which means that you're running SMB versus MB to completion, which seems like it shouldn't happen.

That does seem exploitable, if one can figure out exactly what's happening here.

Yes, it's an issue with how the "time" command is set up. Basically, timeouts are done via threads an exceptions; there's a detailed description of the "timeout" function in System.Timeout here: http://chimera.labs.oreilly.com/books/1230000000929/ch09.html#sec_timeout

However, tetronian2's "time" command catches all exceptions, which means that if multiple timeouts are nested, you can get some pretty strange behaviour.

In the TB vs SMB scenario, here's what happens:

  1. The world runs TB vs SMB
  2. TB runs SMB vs MB (timeout #1, 10000us)
  3. SMB runs MB vs MB (timeout #2, 10000us)
  4. Timeout #1 triggers an exception, which is caught by timeout #2 and handled, causing (3) to return Nothing.
  5. Before the exception for timeout #2 can actually trigger, (2) finishes normally and kills the timeout thread; this means that (2) return Just Cooperate.
  6. Since (2) returned Just Cooperate, TrollBot returns Defect.

I've brought this up as an issue on the GitHub issues page here: https://github.com/pdtournament/pdtournament/issues/3

I've suggested a solution that fixes this problem, but unfortunately race conditions are ultimately still present.

Thank you very much for catching this and for explaining it in detail here and on the github page; I'm going to play around with your proposed solution today and brainstorm any other ways of dealing with this.

Last year, AlexMennen ran an iterated prisoner's dilemma tournament

Correction: my tournament was one-shot prisoner's dilemma.

Anyway, I'm glad to see this is being tried again.

Whoops, fixed. And thanks, I'm curious to see what happens with this ruleset.

I am so excited. When will you post results?

I'm glad! It depends on how long it takes me to write up all of the results, but probably within 1 week.

Great! I can't wait to see how thoroughly I was trounced by everyone!

I noticed my bot has a typo. (I do not expect you to change it or anything, just letting you know for your write up if you are confused by my code.)

The two times it says "(fst y,Cooperate)" should have said "(Cooperate, fst y)" I wanted to simulate them next round assuming my previous prediction of their move was correct and I cooperate, not assuming they cooperate and I repeat my last move.

Should have taken you up on the offer to code it for me :( oh well.

Thank you, I was a bit confused, and that did help me understand what you were going for. I'm actually very curious to see how the correct version performs; all of the bots will be open-sourced soon, so you're welcome to experiment. I will probably run some tests as well.

How many entries were there?

There are 11 entries; I am very impressed by the amount of effort everyone put into their bots.

[-][anonymous]7y 0

This is very exciting

[This comment is no longer endorsed by its author]Reply

Submitted! I'm excited to see how this pans out.

Really cool project, thanks a lot! I actually like your choice of haskell - the example bots are beautiful. I thought about implementing something like this myself some time ago, but I wouldn't have been able to do it as well as you did.

Just one small concern about the tournament rules:

One round could take up to 10 second if both players make full use of their time, so one match of a 100 rounds could take up to 1000 seconds, right? And let's say there are at a very rough estimate 100 matches per tournament. And you want to run 1000 tournaments. Maybe I just made a stupid mistake, I am quite tired right now, but this could theoretically require years of computation time, couldn't it?

Oh, and I could probably translate bots into haskell, if you need help.

In the examples you show how to run the opponent, but how do you access the source code? For example, how do you distinguish a cooperate bot from a (if choice < 0.0000001 then Defect else Cooperate) bot without a million simulations?

Unlike last year's tournament, you can't, and this is part of the challenge. Simulations are a precious resource, and you have to make an educated guess about your opponent based on the limited simulations you have.

Thanks for running this!

One thought however is that Haskell is a lot less well known then python, so if you want a large stable of bots, you may wind up needing to give some coding help to aspiring entrants

I agree. I chose Haskell because I was intrigued by how easy it was to restrict what bots are able to do, using only Haskell's type system. I would actually love it if a large number of non-Haskellers participated--I enjoy coding Haskell, and I don't mind putting aside some time to help if the result is a larger, more interesting tournament.

How is the tournament going to work in terms of what plays what? Is every bot going to play against every other bot? Is there a big pool, where successful bots get more copies and then we run for a while? Can a person submit multiple bots?

The best strategy depends heavily on the expected opponents. For example, a bot that tries to detect itself in order to cooperate only makes sense if it might play itself.

Excellent questions. Here are the rules pertaining to the tournament structure (from the readme):

One submission per person.

The tournament will be round-robin elimination: Each bot will play one match against all other bots, where a match consists of 100 rounds of the prisoner's dilemma. At the end of the round-robin round, the lower-scoring half of the tournament pool will be eliminated. This process will be repeated until only one bot remains, or there is a tie. The whole tournament will be run 1000 times, and the bot that places first most frequently will be declated the overall winner.

In other words, bots do not play against themselves, but they will play against all other bots at least once per tournament iteration. Let me know if this is unclear. (This setup was suggested to me by James Miller.)

To clarify: your score in each round-robin round is the sum of your scores from each of the matches; and this score doesn't carry over to the next round-robin round?

Exactly right.

Thanks for running this, and for translating my psuedocode bot into Haskell!

I assume we are only allowed to enter one bot, correct?

Thank you for submitting! Yes, I am accepting one bot per person.

What is the deadline?

This post says September 1st, while the linked github repo says September 15, 2014 13:59 UTC.

The correct deadline is Sept. 15th--I extended it by 15 days because of an error in the tournament code that went unfixed for two weeks, which delayed some people from writing their bots. I've updated the post to reflect this, thanks for catching it. Anyone who already submitted a bot is welcome to re-submit if they want to make changes to their bot before the 15th.

OK; I'm reasonably sure I've come up with a strategy that is a simulation equilibrium (a la "program equilibrium") and cooperates with itself.

I'm having trouble comprehending what running one of these bots would consist of. Suppose MyBot is put against OpponentBot. So MyBot is going to run a simulation of OpponentBot, and that instance of OpponentBot is going to run a simulation of MyBot, and that instance is going to run a simulation of OpponentBot ... how does this process terminate? My head hurts.

I'm having trouble comprehending what running one of these bots would consist of. Suppose MyBot is put against OpponentBot. So MyBot is going to run a simulation of OpponentBot, and that instance of OpponentBot is going to run a simulation of MyBot, and that instance is going to run a simulation of OpponentBot ... how does this process terminate? My head hurts.

I would guess that in that case, you run out of time and get penalized. Which suggests a strategy: Program your own bot to wait until the last possible millisecond before transmitting its choice.

What do you mean, "in that case"? My issue is that I don't see how two bots can both simulate each other. I don't see what I presented as a special case, but the general case.

What do you mean, "in that case"? My issue is that I don't see how two bots can both simulate each other. I don't see what I presented as a special case, but the general case.

Just because your bot has the option of simulating its opponent doesn't mean that you would necessarily program it to do so. Evidently there is a time limit in the game so if your bot went overboard simulating it would end up running out of time and being penalized a lot of the time.

For example one possible strategy is "transparent tit for tat," i.e. program your bot not to simulate anything at all but to simply play tit for tat. With the hope that other bots will simulate yours and figure out that the best strategy is repeated cooperation.

Transparent tit-for-tat is a decent enough strategy, but it is easily beaten 1-on-1 by "cooperate on every turn except the very last one", and against many other simple bots by "TFT but defect on the last turn".

That being said, when the game is 100 turns long, it's pretty hard to do significantly better than Tit-for-Tat - the aforementioned strategies only beat TFT with a score of 302-297.

I think that's probably a good reason to shorten the round length from 100 down to, say, 10 or 20 - otherwise the variance from a single RandomBot would drown out that score difference.

Transparent tit-for-tat is a decent enough strategy, but it is easily beaten 1-on-1 by "cooperate on every turn except the very last one", and against many other simple bots by "TFT but defect on the last turn".

I agree, I was simply giving an example of a strategy which would not lead to the sort of infinite regress envisioned by ThisSpaceAvailable.

I do think it's worth noting that "TFT but defect on the last turn" would be beaten 1-on-1 by "TFT but defect on the last two turns." And both of them would be beaten 1-on-1 by "TFT but defect on the last 50 turns."

In fact, in a 1-on-1 game, it seems to me the best strategy would seem to be to always defect and hope the other fellow is stupid enough to cooperate at least once. Actually it seems pretty much pointless to talk about 1-on-1 prisoner's dilemma, because it turns things back into a zero sum game.

If you think about it as win/lose, then yes, of course. However, it's still instructive to ask the question "what's the best response strategy"? In other words, "what strategy will maximise my utility against this given, fixed opponent strategy?".

In that sense, the best response to TFT is indeed "TFT but defect on the last turn", because "TFT but defect on the last two turns" does strictly worse when playing against TFT.

Fortunately, you can indeed do significantly better than TFT in this game!

In that sense, the best response to TFT is indeed "TFT but defect on the last turn", because "TFT but defect on the last two turns" does strictly worse when playing against TFT.

But what if you are playing against "TFT but defect on the last turn"? In that case, the best strategy is TFT-2. And if the other fellow is playing TFT-2, the best strategy is TFT-3. And so on. Agreed?

Fortunately, you can indeed do significantly better than TFT in this game!

Seems to me it depends on what strategies the other contestants are playing.

Oh, yes, of course; the best response to TFT-1 is clearly TFT-2, and so on.

As for how well strategies do, while it's clear that it will depend on the strategies of other contestants and in that sense there cannot be a "best strategy", I think one can do better - for example, if there's a Nash Equilibrium strategy that isn't simply (Defect, Defect).

At a bare minimum, you can improve upon TFT by also making sure it defects against CooperateBots, and doesn't wait until the second turn to defect against DefectBots. Of course, there may indeed be JusticeBots out there who punish you for defecting against CooperateBots...

At a bare minimum, you can improve upon TFT by also making sure it defects against CooperateBots,

That's assuming that the new algorithm can correctly identify its opponents. Also, if other algorithms correctly sense that yours is opportunistic, they might change their strategy to your detriment.

"TFT but defect on the last turn"

That's fixable by not letting the bot know how many round are there other than, say, more than 20 and less than a hundred.

In that case the game becomes significantly less interesting, because there's far less you can do to improve on TFT and similar strategies, because TFT is a Nash Equilibrium. Granted, you can still include provisions to detect against RandomBots, CooperateBots, and DefectBots, but the finitely repeated PD is significantly more interesting.

"Interesting" is in the eye of the beholder. Not knowing the exact number of rounds adds complexity. I doubt that adding complexity means "there's far less you can do to improve".

(1) Saying "complexity" is not an argument.
(2) The unknown horizon case is actually strictly less complex, because now each round is the same w.r.t. the future, whereas for the finite horizon case each round has a distinctly different future.
(3) You haven't countered the point that TFT is a Nash Equilibrium in the infinite-horizon case - that makes it fairly trivial and obvious.
(4) http://lesswrong.com/lw/to/the_truly_iterated_prisoners_dilemma/

(1) It's a about the same kind of argument as saying "less interesting" :-P

(2) No, each round is not the same because the horizon is not unknown, it's just that instead of a fixed number you have a probability distribution to deal with.

(3) We are not talking about the infinite-horizon case.

(4) Yes, and?

My main point is that the NE for the fixed-length IPD is "always defect", but the fact that we really ought to be able to do much better than "always defect" is what makes that case particularly interesting.

(2 & 3) Sorry; I misunderstood. was thinking about the infinite-horizon case. If you have a probability distribution over possible lengths of the game then the problem is indeed more complex, but I don't see that much benefit to it - it really doesn't change things all that much.

In particular, if you still have a limit on the number of rounds, then the same reasoning by backwards induction still applies (i.e .the optimal strategy is to always defect), and so the same interesting aspect of the problem is still there.

Similarly, the optimal counter-strategy to TFT stays mostly the same. It will simply be "TFT, but always defect starting from round N" where N is some number that will depend on the specific probabilities in question.

The interesting aspect of the problem is still the part that comes from the finite limit, regardless of whether it's fixed or has some kind of distribution over a finite number of possibilities.

then the same reasoning by backwards induction still applies (i.e .the optimal strategy is to always defect)

Not so fast.

Once a prisoner condemned to death was brought before the king to set the date of the execution. But the king was in a good mood, having just had a tasty breakfast, and so he said: "You will be executed next week, but to spare you the agony of counting down the hours of your life, I promise you that you will not know the day of your execution until the jailers come to take you to the gallows".

The prisoner was brought back to his cell where he thought for a while and then exclaimed: "But I cannot be executed if the king is to keep his word! I cannot be executed on Sunday because if I'm alive on Saturday I'll know the day of my execution and that breaks the king's promise. And I cannot be executed on Saturday because I know I cannot be executed on Sunday so if Friday comes around and I'm still alive, I'll know I have to be executed on Saturday. Etc., etc. This perfect reasoning by backwards induction says I cannot be executed during any day. And since the king always keeps his word, I'm safe!". Much relieved, he lay down on his cot whistling merrily.

And so, when on Tuesday the guards came for him, he was very surprised. The king kept his word.

Sorry; I meant to say the "optimal" strategy is to defect. I don't agree with the backwards induction; my point was that that argument is precisely what makes the problem interesting.

EDIT: By the way, I'm pretty sure I've come up with a strategy that is a Nash equilibrium (in IPD + simulations) and always cooperates with itself, so I very strongly agree that always defecting is not optimal.

Each bot can run a simulation of the other if the circumstances under which they run their opponents are not the same.

For example, JusticeBot does indeed run a simulation of its opponent, but this is not a problem because JusticeBot simulates what the opponent will do against CooperateBot, not against JusticeBot.

Similarly, an agent could run a simulation of its opponent against itself, but with a different history.

But that different history would have to result in not running a further simulation.

Not immediately, but yes, the recursion would eventually need to hit a base case.

If you run out of time, you defect. If your opponent simulates you without a time limit, and you take long enough to cause ver to run out of time, ve'll defect, which isn't what you want.

But there's a function that lets you run a function for up to a specified number of microseconds, and tells you what it returns or whether it times out.

If you run out of time, you defect. If your opponent simulates you without a time limit, and you take long enough to cause [him] to run out of time, [he'll] defect, which isn't what you want.

I'm not sure about that - the original post says this:

the penalty for not outputting Cooperate or Defect within the time limit has been reduced.

So presumably most bots will be set up to make a choice within the time limit.

But there's a function that lets you run a function for up to a specified number of microseconds, and tells you what it returns or whether it times out.

Right, so the practical effect of the strategy I proposed would be to deny opponents knowledge. Of course, one can envision situations where you want to be transparent.

This looks fun! I will participate.

[-][anonymous]7y 0

Sounds like fun!

Throwing this idea out there. If instead of returning Nothing on a timeout, it would be nice if a "current Choice" state were returned. Bots could update this state until time ran out.

[This comment is no longer endorsed by its author]Reply