**Prerequisites:** Familiarity with decision theories (in particular, Eliezer's Timeless Decision Theory) and of course the Prisoner's Dilemma.

**Summary:** I show an apparent paradox in a three-agent variant of the Prisoner's Dilemma: despite full knowledge of each others' source codes, TDT agents allow themselves to be exploited by CDT, and lose completely to another simple decision theory. Please read the post and think for yourself about the Exercises and the Problem below before reading the comments; this is an opportunity to become a stronger expert at and on decision theory!

We all know that in a world of one-shot Prisoner's Dilemmas with read-access to the other player's source code, it's good to be Timeless Decision Theory. A TDT agent in a one-shot Prisoner's Dilemma will correctly defect against an agent that always cooperates (call this CooperateBot) or always defects (call this DefectBot, and note that CDT trivially reduces to this agent), and it will cooperate against another TDT agent (or any other type of agent whose decision depends on TDT's decision in the appropriate way). In fact, if we run an evolutionary contest as Robert Axelrod famously did for the Iterated Prisoner's Dilemma, and again allow players to read the other players' source codes, TDT will annihilate both DefectBot and CooperateBot over the long run, and it beats or ties any other decision theory.^{1} But something interesting happens when we take players in threes...

Consider a population of agents in a simulated world. Omega, being the trickster from outside the Matrix that ze is, decides to spend a couple of eons playing the following game with these agents: ze selects three of them at random (call them X, Y and Z), wipes their memories,^{2} gives them each others' source codes, and privately asks each whether they cooperate or defect. If X defects, then Omega will create 2 "children" of X (distinct near-copies of X, with the same decision theory as X) and return them to the simulation. If X cooperates, then Omega will return 3 "children" of Y and 3 "children" of Z to the simulation. Simultaneously, Y and Z make the analogous decisions.

(Just to reiterate: cooperating gives +3 to each other player, nothing to oneself; defecting gives +2 to oneself, nothing to anyone else. The analogy to the Prisoner's Dilemma should be obvious.)

Assume maximal selfishness: each agent is motivated solely to maximize its own number of children (the agent itself doesn't get returned!), and doesn't care about the other agents using the same decision theory, or even about its other "relatives" in the simulation.^{3} Although we've had to explicitly state quite a few conditions, this seems like a pretty simple and fair evolutionary tournament.

It's clear that CDT agents will simply defect each time. What about TDT agents?

**Exercise 1:** Prove that if the population consists of TDT agents and DefectBots, then a TDT agent will cooperate precisely when at least one of the other agents is also TDT. *(Difficulty: 1 star.)*

Notice that we've created a free-rider problem. Any DefectBot paired with two TDT agents gets 8 children- even better than the 6 that each of three TDT agents get in their best case! As you might expect, this bonus balances against the fact that three TDTs played together will fare much better than three DefectBots played together, and so it turns out that the population settles into a nontrivial equilibrium:

**Exercise 2:** Prove that if a very large population starts with equal numbers of TDTs and DefectBots, then the expected population growth in TDTs and DefectBots is practically equal. (If Omega samples with replacement– assuming that the agents don't care about their exact copy's children– then the expected population growth is precisely equal.) *(Difficulty: 2 stars.)*

**Exercise 3: **Prove that if the initial population consists of TDTs and DefectBots, then the ratio of the two will (with probability 1) tend to 1 over time. *(Difficulty: 3 stars.)*

This should already perplex the reader who believes that rationalists should win, and that in particular TDT should beat the socks off of DefectBots in any fair fight. The DefectBots aren't harmless parasites, either: the TDTs' rate of reproduction in equilibrium with DefectBots is less than 30% of their rate of reproduction in a population of TDTs alone!* (Easy to verify if you've done Problem 2.)*

And it gets worse, in two ways.

First, if we adjust the payoff matrix so that defecting gets (+200,+0,+0) and cooperating gets (+0,+201,+201), then any population of TDTs and DefectBots ends up (with probability 1) with the DefectBots outnumbering TDTs by a ratio of 100:1. *(Easy to verify if you've done Exercise 3.)*

Secondly and more damningly, we can introduce CliqueBots, who cooperate only if both other agents are CliqueBots. These, and not TDTs, are the champions of the three-way Prisoner's Dilemmas:

**Exercise 4: **If the initial population consists of CliqueBots, DefectBots and TDTs^{4} in any proportion, then the ratio of both others to CliqueBots approaches 0 (with probability 1). *(Difficulty: 4 stars.)*

**Problem: **The setup looks perfectly fair for TDT agents. So why do they lose? *(Difficulty: 2+3i stars.)*

**Note:** This problem is solved in a more general followup post; but do try and think about it yourself first! Also, I've posted solutions to the exercises in the discussion section. It's worth noting that I asked Eliezer, Vladimir Nesov, Wei Dai, cousin_it, and other decision-theory heavyweights to avoid posting spoilers on the main problem below, and they obliged; many of the other commenters figured out the problem, but nobody posted solutions to the exercises in the comments (as of 9/5/11).

**Footnotes:**

**1.** What's meant by "beat or tie" is sort of complicated- there are decision theories that (with high probability) crush TDT when starting out as a majority, and (with equally high probability) get crushed by TDT when starting out as a minority. Also, it gets intractably messy if you allow populations consisting of three or more decision theories (because then A can decide, not just based on how B will decide to act towards A, but also on how B would act towards C, etc).

**2.** No playing Tit-for-Tat in this game! Omega will make sure there's no funny business steganographically hidden in the source codes, either.

**3.** I'm intentionally blocking the analogue of kin altruism among related populations, in order to exclude any reasons for cooperation besides the purely decision-theoretical ones.

**4.** Yeah, I know I said that populations with three different decision theories are basically intractable. That's true in general, but here we have the simplifying factor that two of them (CliqueBots and DefectBots) are simple decision theories that don't bother to simulate the other agents.

If each TDT agent cares about its number of descendants, then they will instrumentally care about having a greater proportion of TDT agents to mutually cooperate with, and will therefore defect when one of the three players is a CDT agent, like the clique bot.

They only behave as you describe if they consider only the immediate, but not long term, consequences.

Seems correct to me. Judging agents with a different utility function than they use to decide will lead to them looking stupid.

I mentioned this to JGWeissman privately at the time, but I want to confirm now that this was exactly the solution I intended.

The TDT agents are maximizing their number of descendants, you have no right to criticize them for failing to maximize their share of the population. The whole point of the prisoner's dilemma is that it's not a zero sum game, but if you count the population of Defectbots against the TDT agents, you are treating it as a zero sum game.

And yet if they all switched to being cliquebots they would not only drive out the defectbots, but would have far more children in the long run.

Theywould not have far more children in the long run. Their descendants would have more children, but their utility function doesn't care about that.Edit: and if they do start caring about grandchildren the problem stops being a straightforward Prisoners' Dilemma.

So can we show that TDT would play perfectly in such a scenario? I think yes.

Your decisions interact in complex ways with your peer's decisions, your descendant's decisions, etc. There might be a nice simple formula that classifies this, but:

These are TDT agents. So they all behave the same. At least, the ones during the same timestep behave the same. The ones during different time steps are in different situations, but have the same values. Since they (overall) have he same values as past generations, they will together implement a coherent strategy.

And if they all switched to being paperclip maximisers they would make more paperclips. Neither of these are going to help them maximise their actual utility function.

Fair point. I failed to spot initially that the trick lay in equivocating between "maximise direct descendants" and "maximise all descendants".

There is an easy proof that no strategy can win against every field of opponents: just make two bots, CliqueBotA and CliqueBotB, which each cooperate with copies of themselves and defect otherwise. No strategy can possibly win against both. Environments containing only one or the other are evolutionarily stable equilibria.

It gets weirder. It's also possible to construct an environment such that there is no optimal solution for that particular environment. Design an agent LargeNumberRewarder, which generates a random number from a geometric distribution, inspects its opponent's source code, picks out the largest integer, and cooperates iff the number it found was greater than the random number. There is no optimal agent against this field, because there is no largest number. Hybridize this with Cliquebot, and you get an evolutionarily stable population except that the agents contain a numeric parameter that increases with every generation. Play with the parameters a bit and you can make the derivative of that numeric parameter fall off quickly so that it converges to an almost-optimal equilibrium.

As I said in the first footnote, sometimes the best that an agent can do is tie with another strategy (

ETA:Or, more precisely, go to war with the other strategy, at symmetric odds.)Wouldn't the LNRs die out fairly quickly against any kind of CliqueBot, or TDT, or even DefectBot?

I think jim was

hybridizingthis with CliqueBot. So it plays as a CliqueBot except that among other CliqueBot hybrids it does the large number thing. So it (roughly speaking, with the right details) first just dominates then it plays silly games. :)...

Um, they

don'tlose. What the TDT agents care about is the number of children they have. If they cared about the total number of descendants they have the cost of cooperating with DefectBots or CliqueBots would be exponentially higher (and TDT would act as CliqueBots).Edit: Come to think of it, they wouldn't act like CliqueBots. Unlike CliqueBots they would also cooperate with CooperateBots. And if you include CooperateBots in the simulation TDT would beat CliqueBots.

Oh, good point. CliqueBot is a fail strategy! CliqueBot++ would need to allow exceptions such that it cooperates with (CliqueBoT++ && anything that it does not consider a serious rival).

TDT wouldn't cooperate with CooperateBot- that would be throwing away utility.

Er, it cooperates with two TDTs and one CooperateBot, it defects with two CooperateBots and one TDT. That TDT cooperates even when one of the other agents is a CooperateBot and not TDT gives it an edge over CliqueBot in a population that includes all three.

Sorry, I was thinking 2-player games. Your solution doesn't work, though; the CooperateBots vanish first, followed by the TDTs.

Maybe it wasn't clear: I'm proposing that if the TDT agents cared about their total descendants the strategy they would adopt would be the same as the CliqueBot except that they would cooperate with another TDT and a CooperateBot. Once the CooperateBots disappear TDT and CliqueBots would be using the same strategy except that there would be more TDTs (since they cooperated with the CooperateBots while they were around).

Ah! I hadn't thought of that wrinkle before- it makes my analogy (for next time) even stronger than I'd thought. Thanks!

I like the pun in the title.

I don't doubt it, but when was this proven?

I would go as far as to actually doubt it. TDT seems to be insufficiently well-specified for this to clearly follow one way or the other, and in some cases I expect that TDT should be designed so that it won't unconditionally cooperate with other TDTs, picking other mixed strategies on the Pareto frontier instead.

(I assume for me to discuss stuff about this post not engaging the solution to its riddles directly is fine; I haven't even read it yet.)

Of course.

In the simplest possible case, where two agents in a one-shot PD have isomorphic TDT algorithms maximizing different utility functions, they should cooperate. Let me know if I overstated the case in my first paragraph or Footnote 1.

The question is of course in what counts as "isomorphic TDT algorithms" and how do the agents figure out if that's the case. However this post appears conclusively free of these problems.

Uh, do you mean "this post wrongly sweeps these problems under the rug", or "this post sweeps these problems under the rug, and that's OK"?

Anyway, although we don't have a coding implementation of any of these decision theories, Eliezer's description of TDT seems to keep the utility function separate from the causal network.

These problems don't affect this post, so far as we assume the TDT agents to be suitably identical, since the games you consider are all symmetrical with respect to permutations of TDT agents, so superrationality (that TDT agents know how to apply) does the trick.

(Don't understand what you intended to communicate by this remark.)

Ah, good.

In retrospect, that remark doesn't apply to multiplayer games; I was thinking of the way that in Newcomb's Problem, the Predictor only cares what you choose and doesn't care about your utility function, so that the only place a TDT agent's utility function enters into its calculation there is at the very last stage, when summing over outcomes. But that's not the case for the Prisoner's Dilemma, it seems.

Right, for TDT agents to expect each other acting identically from the symmetry argument, we need to be able to permute not just places of TDT agents in the game, but also simultaneously places of TDT agents in TDT agents' utility functions without changing the game, which accomodates the difference in agents' utility functions.

I assume you are not expecting these cases to include the simple one shot prisoner's dilemma with full code access? I would be skeptical.

If you doubt it, then I doubt it as well. I thought I saw a formal specification a while back, but perhaps that was UDT.

In a simple special case where everything is symmetric, they will cooperate if the problem is formalized in the spirit of TDT, but this is basically good old superrationality, not something TDT-specific. The doubt I expressed is about the case where the TDT agents are not exactly symmetric, so that each of them can't automagically assume that the other will do exactly the same thing. In the context of this post, this assumption may be necessary.

I think it is unfair to TDT to say that it is

justHofstadter's superrationality. If TDT is an actual algorithm to which Hofstadter's argument applies, even just in the purely symmetric version, that is a great advance. I would definitely say that about UDT.Yes, TDT is underspecified. But is it a class of fully specified algorithms, all of which cooperate with pure clones, or is it not clear if there is any way of specifying which logical counterfactuals it can consider?

Two relevant links: Gary Drescher on a problem with (a specification of?) TDT; you on underspecification.

The assumption of symmetry is

notnecessary in the context of this post. The ability to read the code and know that if they read your code and find that you would cooperate if they do (etc) is all that is necessary. Being the same as you isn't privileged at all. It's justconvenient.Code access is basically just

betterthan knowledge they will do exactly the same thing.I think Vladimir is saying that TDT agents with a superior bargaining position might extract further concessions from TDTs with an inferior bargaining position- or, rather, that we can't yet rigorously show that they wouldn't do such things. In the world of one-shot PDs, numerical superiority of one kind of TDT agent over another might be such a bargaining advantage.

I had been considering a whole population of agents doing lots of prisoner's dilemmas among themselves to

not be a one shot prisoner's dilemma. It does make sense for all sorts of other plays to be made when the situation becomes political.Omega can wipe their memories of past interactions with other particular agents, as in the example I made up. That would make each interaction a one-shot, and it wouldn't prevent the sort of leverage we're talking about.

I wouldn't call a game one shot just because memory constraints are applied. What matters is that the game that is being played is so much bigger than one prisoner's dilemma. Again, I don't dispute that there are all sorts of potential considerations that can be made, even if very little evidence about the external political environment is available to the agents, as in this case. Given this it seems likely that I don't disagree with Vlad significantly.

You're probably thinking of cousin_it's proof sketch of cooperation in PD. That was ADT/UDT. TDT talking about formal proofs is not part of its theory that was discussed anywhere that I know of.

V yvxr ubj lbh engrq gur qvssvphygl bs guvf dhrfgvba. Gung vf,

vg'f zbfgyl vzntvanel. Gur GQG ntragfqba'g ybfr. Lbh tnir gurz fbzrguvat gb znkvzvfr naq gurl znkvzvfrq vg. Gurl qvq abg znkvzvfr na ragveryl qvssrerag ceboyrzf bs znxvat gurve qrfpraqnagf unir gur uvturfg cebcbegvba bs gur cbchyngvba be znxvat gurve trareny pynff bs qrpvfvba gurbevfg unir gur uvturfg cebcbegvba. Vs gubfr jrer gur tbnyf gura gur npghny hgvyvgl shapgvbaf tvira gb gur ntragf ercerfrag ybfg checbfrf. Gur 'cnenqbk', nf vf fb bsgra gur pnfr jvgu fhpu cnenqbkrf, vf bar bs ireony naq pbaprcghny fyrvtug bs unaq.For the purpose of this problem I may as well be so I'll go as far as rot13.

You should distinguish between problems 3 and 4. Your answer would confuse someone who only understood one of those.

EDIT: like me.

Errr... neither? I was answering the 'problem' part, as quoted. It has its own rating and everything! I will go along with whatever association with 3 and 4 that ortho intended 'Problem' to have. It applies to 3 in as much as it involves the question "why do they only draw?" but far more interestingly to the variant in 4 when they get thrashed.

"Gur GQG ntragf qba'g ybfr. Lbh tnir gurz fbzrguvat gb znkvzvfr naq gurl znkvzvfrq vg. Gurl qvq abg znkvzvfr na ragveryl qvssrerag ceboyrzf bs znxvat gurve qrfpraqnagf unir gur uvturfg cebcbegvba bs gur cbchyngvba be znxvat gurve trareny pynff bs qrpvfvba gurbevfg unir gur uvturfg cebcbegvba." is the natural answer to 3, and question 4 is distinguished from question 3 and I think elicits a better answer that adds to the discussion in a way the answer to question 3 wouldn't if one were to say "that which was wrong with question 3, is also wrong with question 4". An autistic reading of that answer applies to question 4, and isn't wrong, but there's more to say.

"Gurl qvqa'g ybbx nurnq" is something you can't say about 3, that you can say about 4, for had they done that in 3, it wouldn't increase their score, while they need to do that in 4 to increase their score (above zero).

Gung'f abg dhvgr evtug, yrffqnmrq; va gur fvghngvba jvgu GQGf naq QrsrpgObgf, gur GQGf jbhyq vapernfr gurve cbchyngvba snfgre va gur ybat eha vs gurl ershfrq gb nyybj QrsrpgObgf gb serr-evqr. Nyfb, vs gurl qvq fb, gurve cebcbegvba jbhyq evfr gb bar vafgrnq bs fgvpx ng bar unys.

Good point. I missed that since the number goes to infinity as the trials do regardless.

If you could devise a problem where the obvious flaw in "GQG fubhyq orng gur fbpxf bss bs QrsrpgObgf va nal snve svtug." alone would be implicated, that would be good, since it really is a different one than snvyvat gb ybbx nurnq.

Lrf, V qvq erpragyl qvfnterr jvgu lbh ba n qvssrerag guernq. Ubjrire zl pbzzrag urer fgvyy nccyvrf gb "Ceboyrz: ", abg "Rkrepvfr 3" be "Rkrepvfr 4". Gur eryngvbafuvc bs zl ercyl gb rvgure bs gubfr bgure rkrepvfrf vf jungrire eryngvbafuvc vg vaurevgrq sebz "Ceboyrz: " vgfrys naq jungrire ryfr unccraf gb nevfr cheryl ol pbvapvqrapr. Vs V jnagrq gb nyfb gnyx nobhg Rkrepvfr 3 naq 4 gurer ner znal bgure guvatf V pbhyq jevgr nobhg gurz!

The OP appears to conflate them, so I did too, see: "(Easy to verify if you've done Problem 2.)", but there is no problem 2, just an exercise 2. My comment "distinguish between problems 3 and 4" should really have read "distinguish between exercises 3 and 4", sorry.

I hadn't realized that for some number less than infinite iterations (and every higher number) in problem 3, snvyher gb ybbx nurnq would be important, so my comment is based on that error on my part. I overlooked it because for all iterations less than that high number, it isn't at all important. It is a legitimate thing to say from one with that specific wrong thought, so...whence "Lrf, V qvq erpragyl qvfnterr jvgu lbh ba n qvssrerag guernq"? I was going to say "you conflated the cases, I think you are wrong" but soft-pedaled it as much as I could, to be "you should distinguish" rather than "You are wrong" or "I think you are wrong". It turns out I was wrong, so...?

I could not understand why you had been continuing to target which of 3 and 4 I was responding to when I was actually targeting "P: ". It especially didn't make sense to me since "Problem: " seemed to be relevant to both 3 and 4 with 4 (just) being a somewhat exaggerated version of 3 that really emphasized the 'paradox'. It did not occur to me that you could just be wrong about 3. Take that as flattery of your intellect!

I also didn't pick up that you were trying to say that I was wrong in a nice way. Because the possibility that I was wrong about something so obvious just wasn't primed as a likely intent. ;)

That is sad for past you but doubly good for present you because you updated when ortho explained!

But see:

If that wasn't stipulated, one could look ahead and see by acting like a defectbot you could tie, and self-modify into a defectbot/imitate a defectbot. Rationalists win provided rationalists don't mind others doing even better.

Likewise:

Because they only think one step ahead.

Upvoted for title.

As I argued here, that is precisely the behaviour you don't want for your copies/descendants.

The problem is underspecified. I think the post is really operating under the assumption that each generation Omega takes the population, divides it in triples, kills the remaining 0-2 without issue, runs the PD, and takes the result as the population for the next generation.* But if (as the post seems to say) Omega replaces each triple by its children before picking the next triple, exercise 1 is considerably more difficult than the others. In the first version, the agents care only about their children and not the children of other agents or of defectbot. But in the second case, an agent might encounter someone else's child when it plays the game, so it does care about the proportion of population. Then the situation is not a 1-shot PD and cannot be analyzed so simply. In particular, the answer depends on the utility function. It is not enough to say that the agent prefers more children to fewer; the answer depends on how much.

Consider the modified situation in which I claim that a homogeneous population of TDT/UDT agents

defect. In this version, Omega picks replaces a triple with its children before picking the new triples. The payoffs depend on on the number of triples Omega has run: defection is always 1 copy, but cooperation by the N-th triple is worth N copies. If everyone cooperates, the population grows quadratically. There is a chance that an agent will never get picked. A population of sufficiently risk-averse agents will prefer to defect and get 1 child with probability 1 over cooperating and risking having no children. (There are some anthropic issues here, but I don't think that they matter.)I had to change a lot of details to get there, but that means that the answer to Exercise 1 must depend on those details. If Omega replaces population every trial, then it cannot be analyzed like a triple in isolation, but must consider the population and utility function explicitly. If Omega runs it generation by generation, then exercise does reduce to an isolated triple. But since the agents only care about their children, they only care about what happens the first generation, and don't care whether Omega runs any more generations, making the other questions about later generations seem out of place.

* If you change the payoffs from 2/3 to 6/9, then there are no stragglers to kill. Or you could let the stragglers survive to to play next generation; if the population is large enough, this will not affect strategy.

I don't understand what you mean by this.

Here, I believe you're saying that you can pick a rate of growth in the payoffs such that, with probability 1, if everyone cooperates, then eventually one particular lineage (chosen randomly) comes to dominate to the extent that nobody else even gets selected to reproduce, and if the starting utility functions are all sufficiently long-term and risk-averse, they might prefer not to undergo that lottery. Is that right?

Your restatement of my example is correct, though the utility functions being long-term is not supposed to distinguish my example from yours. In a generational model where all the children happen at a predetermined time, discounting is not an issue. Your example seems to say that the agents care about the number of children (not descendents) that they have, regardless of when they occur. If they care about their infinite descendents they need discounting to get a finite number. If they care about exponentially discounted children, even with your payoffs and linear utility, they might choose to defect if the discount rate is high enough.

My version is exaggerated so that the question is whether an agent gets to play the game at all. With the original constant payoffs, every agent gets put in some triple (with probability 1), so a homogeneous population of TDT agents cooperate. Now go back to your payoffs and consider a mix of TDT and defectors with Omega doing the triples one at a time. The only interesting strategy question for the TDT agent is whether to cooperate against 1 defector or to act cliquey. To decide the expected utility of a strategy, the agent must compute the probability of the different triples it might encounter. These depend on the population when it gets picked, which depend on the strategy it chose. In order to answer exercise 1, it must do exercises 2-4 and more. It is unlikely to encounter the asymptotic population, but it is also unlikely to encounter the original population.

Here's a rough calculation. Let's start with 1 defector. Let's assume that it encounters 2 of the original agents, not their descendents. Let's further assume its immediate children encounter original agents, not descendents (a quite unreasonable assumption); and that they don't get picked together, but face 2 TDT agents. Then the difference between cooperating and acting cliquey is 3 encounters with a defector against 9 such encounters. The number of children lost to such an encounter (compared to 3 TDTs) is 5 or 6, depending on cooperating or acting cliquey, which is quite large compared to the 1 that is gained by cooperating rather than acting cliquey. The assumption is ridiculous and the answer with linear utility is probably to cooperate, but it requires calculation and depends on the utility function (and maybe the initial population).

This is the most interesting (to me) post to appear on lesswrong in ages and I have

downvotedit until such time as the "exercises" have worked solutions provided.I'll write up my explanations and post them to Discussion this weekend.

Interesting post, but I think the answer to the paradox being missing made it worse.