This discussion article was provoked in part by Yvain's post on Main a few weeks ago, and some of the follow-up comments.
EDIT: I've also just noticed that there was a recent sequence rerun on the point about finite iterations. My bad: I simply didn't see the rerun article, as it had already slipped down a couple of pages when I posted. If you down-voted (or didn't read) out of a feeling of "Didn't we just do this?" then sorry.
In any case, one of my main motivations for running this article was point 5 (Does an environment of commitment and reputation create the background against which TDT - or something like it - can easily evolve?) I didn't get any responses on that point, so might try to run it again in a future article.
It is well-known that in a one-shot prisoner's dilemma, the only stable solution (Nash equilibrium) is for both parties to defect. But, perhaps less well-known, this is true for any finite-shot version of the dilemma, or any version where there is a finite upper bound on the number of iterations. For instance, a more sophisticated strategy than Tit For Tat (TFT) would determine when it has reached the last iteration, and then defect. Call this TFT-1. But then once TFT-1 has established itself, a strategy which detects and defects the last two iterations (TFT-2) would establish itself, and so on.
Since prisoners' dilemmas are always finite in practice, and always have been (we are mortal, and the Sun will blow up at some point), this raises the question of why we actually co-operate in practice. Why is TFT, or something very like it, still around?
Somehow, evolution (biological, cultural or both) has managed to engineer into us a strategy which is not a Nash equilibrium. Because any "evolutionarily stable strategy" (as usually defined) is a Nash equilibrium, somehow we have evolved a strategy which is not strictly evolutionarily stable. How could that have happened?
I can think of a few possibilities, and have a view about which of these are more realistic. I'm also wondering if other Less Wrong contributors have seriously thought through the problem, and have alternative suggestions.
1. Strategies like TFT succeed because they are very simple, and the alternatives are too complicated to replace them.
The argument here is that there are big costs to a strategy in "hardware" or "software" complexity, so that a crude strategy will out-compete a more sophisticated strategy. In particular TFT-1 is more complex than TFT and the additional computational costs outweigh the benefits. This is most plausibly the case where there is a very large upper bound on iterations (such as 100 years), but the upper bound is so rarely (if ever) reached in practice, that strategies which do something different in the final phase just don't have a selective advantage compared to the cost of the additional complexity. So the replacement of TFT by TFT-1 never happens.
The difficulty with this explanation is that humans can (often) recognize when "this time is the last", and the computational cost of doing something different in that case is not great. Yet we either don't change, or we change in ways that TFT-1 would not predict. For instance, we can tell when we are visiting a restaurant we will never visit again (on a trip abroad say), but are still likely to tip. Also, it is striking that people co-operate about 50% of the time in known one-shot prisoners' dilemmas and similar games (see this analysis of Split or Steal?). Why 50%, rather than nearly 0%, or nearly 100%? And we often change our behaviour radically when we know we are going to die soon, but this change rarely involves antisocial behaviour like stealing, mugging, running up huge debts we'll never have to pay back and so on.
So I'm not convinced by this "alternatives are too complicated" explanation.
2. Emotional commitments change the pay-offs
Victims of defection don't take it lying down. They react angrily, and vengefully. Even if there are no obvious opportunities for future co-operation, and even where it involves further cost, victims will go out of their way to attempt to hurt the defector. On the nicer side, emotions of friendliness, indebtedness, duty, loyalty, admiration or love can cause us to go out of our way to reward co-operators, again even if there are no obvious opportunities for future co-operation.
Given these features of human nature as a background, the pay-offs change in a one-shot or finite-bound prisoner's dilemma, and may convert it to a non-dilemma. The pay-off for co-operating becomes greater than the pay-off for defection. This "solves" the problem of why we co-operate in a PD by denying it - effectively there wasn't a true Prisoner's Dilemma in the first place.
There are a number of difficulties with this "solution", one being that even allowing for emotional reactions, there are some true PDs and we can usually recognize them. Scenarios such as the foreign restaurant, where we know we will not be pursued across the world by a vengeful waiter demanding pay-back for a missing tip. So why don't we always defect in such cases? Why is there a voice of conscience telling us not to? Perhaps this objection could be solved by the "too complicated" response. For example, a strategy which could reliably detect when it is safe to defect (no vengeful payback) would in principle work, but it is likely to have a large complexity overhead. And a strategy which almost works (sometimes thinks it can "get away with it" but actually can't) may have a big negative payoff, so there is no smooth evolutionary pathway towards the "perfect" strategy.
A further difficulty is to explain why humans react in this convenient pay-off-shifting fashion anyway. On one level, it is obvious: we are committed to doing so by strong emotions. Even when we suspect that emotions of vengeance and duty are "irrational" (all pain to us from now on, no gain) we can't help ourselves. Yet, it is this emotional commitment that increases the likelihood that others co-operate with us in the first place. So we can tell a plausible-sounding story about how ancestors with emotional commitments induced more co-operation from their fellows than those without, and hence the "irrationally emotional" ancestors out-competed the "coldly rational" non-ancestors.
But there is a major problem with this story: the "emotionally committed" ancestors could be out-competed in turn by bluffers. Anyone who could fake the emotional signals would be able to elicit the benefits of co-operation (they would successfully deter defection), but without having to follow through on the (costly) commitments in case the co-operation failed. Bluffing out-competes commitment.
Ahh, but if the bluff has been called, and the threatened vengeance (or promised loyalty) doesn't materialise, won't this lead to more defection? So won't people who genuinely follow-through on their commitments succeed at the expense of the bluffers? The answer is yes, but again only in the case of iterated interactions, and only in a potentially infinite scenario. The problem of the finite bound returns: it is always better to "bluff" a commitment on the very last interaction. And once bluffing on the last turn has been established, it is better to bluff on the next-to-last. And so on, leading to bluffing on all turns. And then there is no advantage in believing the bluffs, so no deterrent effect, and (in the final equilibrium), no advantage in making the bluffs either. The only true equilibrium has no commitment, no deterrence and no co-operation.
Again, we can try to rescue the "commitment" theory by recourse to the "too complicated" theory. Quite possibly, alternatives to true commitment are very costly in hardware or software: it is just too hard to bluff convincingly and successfully. That might be true, but on the other hand, there are plenty of poker players and con artists who would say differently.
3. Social pressures and reputational effects change the pay-offs
Human decisions to co-operate or defect are very rarely made in isolation, and this could help explain why we co-operate even though we know (or can predict) "this time is the last". We won't benefit from defection if we simultaneously gain reputations as defectors.
As in explanation 2, the effect of this social pressure is to change the pay-off matrix. Although there may appear to be a benefit from one-shot/last-shot defecting, in a social context where our actions are known (and defections by us will lead to defections by third parties against us), then there is a greater pay-off from co-operating rather than defecting.
Once again this "solves" the problem of why we co-operate in PDs by denying it. Once again it faces the objection that there are true PDs (involving secret defection) and we can recognize them, but often don't defect in them. Again, perhaps this objection could be met by the "too complicated" response; it is just too hard to tell when the defection is really secret.
A second objection is that this reputational theory still doesn't cover end-of-life effects: why are we worried at all about our reputation when death is near? (Why do we even worry more about our reputation in such cases?)
But a more basic objection is "How did we ever get into a social environment where third party reputation matters like this?" Consider for instance a small society involving Anne, Bob, and Charles. Anne and Bob are engaging in an iterated prisoners' dilemma, and regularly co-operating. Bob and Charles meet in a one-shot prisoners' dilemma, and Bob defects. Anne sees this. How does it help Anne in this situation to start defecting against Bob? Generally it doesn't. A reputational system only helps if it identifies and isolates people who won't co-operate at all (the pure defectors). But Bob is not a pure defector, so why does he end up being penalized by Anne?
Perhaps the relevant model is where Anne hasn't interacted with Bob yet at all, but there is a new opportunity for iterated co-operation coming up. By seeing Bob defect against Charles, Anne gets evidence that Bob is a defector rather than a co-operator, so she won't even start to co-operate with him. If Anne could discriminate a bit more clearly, she would see that Bob is not a pure defector, but she can't. And this is enough to penalize Bob for defecting against Charles. Possibly that works, but I'm doubtful if these "new opportunity to co-operate," cases occur often enough in practice to really penalize one-shot defection (which is observed at exactly the time needed to spoil the opportunity). Or more to the point, did they occur often enough in human history and pre-history to matter?
But suppose for the moment that we have an explanation for how the reputational system arises and persists. Then the reputational effect will apply to commitments as well: individuals won't benefit if they are identified as bluffers, so truly committed individuals (with strong emotions) benefit over those who are known to fake emotions, or to "coldly" override their emotions. So a reputational explanation for co-operation can strengthen a commitment explanation for co-operation. Or in the other direction, any emotional commitments (to principles of justice, disgust at exploitation etc.) can reinforce the reputational system. So it seems we have two somewhat dubious mechanisms which could nevertheless reinforce each other and build to a strong mechanism. Perhaps.
4. Group selection
There have been different societies / social groups through history. Perhaps some have had reputational systems which successfully converted Prisoners' Dilemmas into non-Prisoners' Dilemmas, while others haven't, and their members were left with lots of true PDs (and lots of defection). The societies which avoided true PDs experienced less defection, and out-competed the others.
This has a ring of plausibility about it, but suffers from many of the same general problems as any Group-selection theory. Human groups aren't isolated from each other like separate organisms, and don't reproduce like organisms: they exchange members too often.
Still, this solution might address one of the main theoretical objections to Group selection, that "co-operating" groups are unstable to defection (either arising from internal changes, or brought in by new members), and the defection will spread through the group faster than the group can out-reproduce rival groups. Groups with the right reputational systems are - by hypothesis - stable against defection. So it might work.
Or perhaps reputational systems aren't quite stable against defection - they eventually collapse because of secret defections, "last time" defections which can't be punished by other members, laziness of other members in enforcing the co-operation, false reputations and so on. This slow erosion eventually kills the group, but not before it has established child groups of some sort. Again perhaps this might work.
5. Prediction and Omegas : from TFT to TDT
One striking feature about both the commitment explanation (2) and the reputational explanation (3) is how they reward successful prediction of human behaviour. This is obvious for commitments: it is the predictable emotional commitment that creates the deterrent against defection (or the lure towards co-operation). And being able to predict who is really vengeful and loyal, and who is just bluffing, gives individuals a further strong advantage.
But this extends to the reputational system too. Suppose Bob defects against Charles, while Charles co-operates. Anne sees this and is disgusted (how could Bob exploit poor Charles like that?). Yet suppose Charles defects as well. Then Anne admires Bob for his prudence (rather than being taken for a ride by that evil Charles). So Bob gets the reputational pay-off precisely when he can successfully predict how Charles will behave, and do the same. If the reputational pay-off is high, then there is a strong pressure towards a "mirror" strategy (try to predict whether the other person will co-operate or defect and then do likewise).
This is rather interesting, since it is starting to sound like Newcomb's problem, where we have a (hypothetical) predictor who can't be outwitted. Why is that a believable story at all? Why don't we just stare in bemusement at the very idea? Well, suppose we model "co-operation" as the human player taking one box, which Omega fills with $1 million, versus "defection" as the human player taking both boxes (and Omega not filling the opaque one). Or suppose we treat a resolution to take one box as a "commitment" and an after-the-fact decision to take two boxes (because it no longer makes a difference) as bluffing on a commitment. And of course the rationale for a human player to "co-operate" or to truly "commit" is Omega's reputation for always predicting correctly!
So, here is a story about how "Timeless Decision Theory" (or something like it) could emerge from "Tit for Tat". A combination of commitment effects (2) and reputational effects (3) leads to an environment where successful prediction of human behaviour is rewarded. Such an environment is - possibly - maintained by group selection (4).
People get rather good at prediction. When meeting a successful predictor who will co-operate if you co-operate, and defect if you defect, it is better to co-operate. When the successful predictor will defect if he suspects you are bluffing on a commitment, it is better to have a true commitment. But it is still not obvious what to do on a one-shot prisoner's dilemma, because you don't know how the other party's prediction will go, and don't know what will enhance your own reputation (so sometimes people co-operate, sometimes defect).
All this favours a style of reasoning rather like TDT. But it can also favour a rather "superstitious" approach to justifying the reasoning, since there is no causal connection between our action and the prediction. Instead we get weird pseudo-causal explanations/justifications like gods who are always watching, ancestral spirits who can be angered, bad karma, what goes around comes around etc. and a general suspicion of those who don't go along with the local superstition (since they can't be predicted to co-operate with those who do).
Does this sound familiar?