In this post I’ll discuss three reasons to cooperate in a truly one-shot prisoner’s dilemma:

**Kindness**: You care about the other player.**Correlation**. Your decision is correlated with the other player’s decision.**Reciprocity**. Your decision is correlated with the other player’s belief about your decision.

Kindness makes common sense, but correlation and reciprocity are often lumped together under “weird acausal stuff.” I think they are worth distinguishing because they work quite differently.

I’ll talk about details of correlation and reciprocity, and then argue that most incentives to cooperate in the true prisoner’s are likely to come from interactions of multiple mechanisms.

## Setup: the true prisoner’s dilemma

Imagine that one day our civilization sees another through a wormhole—we’ve never met before, and we are far enough away from each other that after today we will never meet again. From each civilization, a single person is chosen to play a game.

The two players then play a one-shot prisoner’s dilemma with survival as the stakes. Each player chooses to **cooperate** in which case they die with probability 1%, or **defect** in which case the other play dies with probability 2%. If both players cooperate they each die with 1% chance, if both defect they each die with 2% chance.

No matter what the other player does, you have a lower probability of death if you defect. In this post we analyze reasons that you might cooperate anyway.

I **will not** discuss the incentives to make commitments in advance based on the possibility that we may play a prisoner’s dilemma in the future (even though I think those incentives are important). I will analyze the question mostly for an agent who uses EDT or UDT.

You could just as well replace the ratio 2%/1% with any X>1, and ask “how large does X need to be before I’d cooperate?” I think it is pretty robust that you should cooperate at some sufficiently large scale (unless you specifically *want* the other player to die) and so this is ultimately a quantitative question. For this post I’ll just keep X=2 and leave it up to the reader to generalize.

## Kindness

The most boring reason to cooperate is that I don’t want the other player to die.

That might be because I value their life and experiences for its own sake, in the same way that I value mine. Or it could be that I mostly care about other consequences of survival—the effect a death will have on the people around us, the things we will accomplish if we survive, and so on—and I think their survival will have nearly as many good consequences as mine.

Let K be how much I value the other player’s survival. If K > 0.5 then I should cooperate. (Since then 2% * K > 1%.)

## Correlation

When my deliberation begins, I’m not sure what I’ll end up deciding. I’m also not sure what the other player will decide. Those two unknown quantities are slightly correlated because our decisions have some common structure: what happens when someone is whisked away from their everyday life and spends a day deciding how nice to be to someone from an alien civilization?

A causal decision theorist doesn’t care about this correlation, but an evidential decision theorist considers it a reason to cooperate. We can measure the strength of the correlation as C = P(they cooperate | I cooperate) – P(they cooperate | I defect). If C > 0.5 then I should cooperate.

Smaller correlations give us a partial reason to cooperate, and they combine linearly with kindness: if K + C > 0.5 I should cooperate.

It’s really hard to estimate C. It’s probably not zero but it’s plausible that it’s extremely small. Much of the correlation is likely to be screened off by other stuff I’ve seen (e.g. observing other people playing the prisoner’s dilemma).

The effective correlation is likely to be (much) larger for someone using UDT. I do think altruistic agents should use UDT, see EDT with updating double counts and Decision theory and dynamic consistency, and therefore I think this can be a significant consideration.

## Reciprocity

If the other player is very different from me, my decision may not be very correlated with their decision. But if they understand me well enough, my decision might be very correlated with their *prediction* about my decision.

If the other player is using a policy like “I cooperate only with people who I predict will cooperate with me” then I would care a lot about their prediction and so this could give an evidential decision-theorist reason to cooperate.

You can think of “I cooperate only with people who I predict will cooperate with me” as a kind of logical tit for tat. I know what I predict the other player would do. That prediction involves predicting what they’d predict I’d do, which in turn involves predicting what they’d predict that I’d predict they’d do, and so on. If the real world is just another representative entry in this sequence, then cooperating based on whether I predict my opponent will cooperate is a lot like cooperating based on whether they cooperated in the last step, i.e. if I predict they’d cooperate.

I find it more convincing to think of this as a kind of logical trade. In a usual physical trade, I have a policy like “I will give you a toaster if you give me $1.” I have that policy because I believe you will give me $1 if I have that policy and not otherwise. The logical version is just the same. It’s easiest to make that arrangement when you and I can actually talk. But we can also do the same thing with lower fidelity based only on predictions. To an EDT agent the size of the incentive depends on the quality of predictions. To a CDT agent, the incentive only kicks in when the other player’s reasoning about me takes the form of a simulation such that “I” might be in the simulation. For the rest of the post I’ll consider only EDT or UDT agents.

Let R be my overall estimate for the difference between P(other player cooperates | I cooperate) and P(other player cooperates | I defect) that comes the channel of my decision affecting their prediction which in turn affects their action. We’ll change the definition of C to be the “direct” correlation that comes from the similarity of our decisions rather than through this prediction channel.

This factor combines additively with kindness and correlation, so I should cooperate if R+C+K > 0.5.

(This is a simplification, because I should also *myself* use a reciprocity-like strategy where my action is based on my prediction about whether they are cooperating based on a reciprocity-like strategy… but I’m going to keep setting aside those complexities. Program equilibrium via provability logic gives a taste for some of the issues.)

## Correlation + Kindness

Correlation and kindness also have an important nonlinear interaction, which is often discussed under the heading of “evidential cooperation in large worlds” or ECL.

Consider all of the pairs of civilizations and people who might have been randomly selected to play this prisoner’s dilemma. For each of them, we can define K and C as in the last sections. We can also think of K=C=1 for ourselves.

Let’s imagine that this crazy wormhole-prisoner’s-dilemma situation arises periodically throughout the universe, and that the universe is very large (e.g. because of quantum branching). Even if K and C are extremely small on average, the *sum* of K and C across all other players can be astronomically large.

To an evidential decision theorist, that means:

- Because the sum of K is large, most of what I care about in the universe is not
*this particular*game of the prisoner’s dilemma, but the sum over all the other games. - Because the sum of C is large, most of the difference between the world conditioned on “I cooperate” and the world conditioned on “I defect” is not the fact that
*I in particular*cooperate but the correlation with all the other games.

So when I cooperate, the costs and benefits are effectively distributed across the universe, rather than being concentrated on me and the other player. The calculus is no longer “Is R+C+K > 0.5?”. Instead, conditioned on me cooperating:

- The extra fraction of players who cooperate is E[C].
- The extra fraction of players who survive from cooperation is 2% * E[C].
- My utility from that is 2% * E[C] * E[K], because the cost is distributed across all players.
- The extra fraction of players who die because they cooperated is 1% * E[C].
- My disutility from that is 1% * E[C * K], because the cost is distributed based on C.

Thus the question is whether the *correlation* between C and K is large enough. In a small universe where K and C are usually small, the correlation between K and C can be huge—I myself am an example of a player with K=C=1, and all of E[C], E[K] and E[C*K] may be equal to “what fraction of all players do I represent?”. But once the sum of K and C get very large, I personally make almost no contribution to E[C] and E[K], and so the correlation becomes dominated by features of other civilizations.

Overall I think that this may give impartial players (who mostly care about people other than themselves) a strong reason to cooperate with each other even if they have very different values and don’t intrinsically care about one another. For selfish players, i.e. whose who mostly care about the future experiences of observers very similar to themselves, it’s a much weaker reason to cooperate. Most humans would be a mix of the two and so it depends on the parameters.

This calculation is pretty subtle and I’ve glossed over a lot of details. One particularly important issue is that C is not just a property of a civilization. In fact it makes most sense for me to cooperate with players who have the kind of values that *someone who is correlated with me might have* *had*. And in that case, I could end up making very different decisions than a copy of me who was playing against a rock. Sorting through all those complexities is complicated and does change the recommended policy, but the net effect seems to be to create a generally stronger reason to cooperate with other impartial players.

## Reciprocity + correlation + kindness

Intuitively there are two factors limiting the strength of reciprocity motivations for cooperation:

- Are the other player’s predictions correlated with my behavior?
- Does the other player implement a reciprocity-like policy? (Or would they, if they thought I might?)

Condition 1 seems quite hard to achieve, especially in civilizations like ours that don’t have sophisticated technology for making predictions. So I expect reciprocity on its own to not be much of a motive in the true prisoner’s dilemma unless the players have an extremely good ability to reason about one another—it doesn’t require going all the way to literal simulations, but does require much more ability than we have today. (Note that the situation changes if we are able to make commitments: in that case it may be natural to commit to a trade-like policy, allowing us to cooperate with other people who make similar commitments by making the prediction problem easier.)

But I think that combining reciprocity with correlation+kindness can create a significant reason to cooperate even when players are not able to make very accurate predictions about one another. In this section we’ll try to analyze that effect. We’ll assume that we are playing against someone who is using a reciprocity-like strategy such that they are more likely to cooperate if they think that I will cooperate. If the other player is only possibly using a reciprocity-like strategy, or only has a weak tendency to cooperate if they think I will cooperate, then the conclusions will get scaled down.

We can model the other players’ limited prediction ability as uncertainty about exactly who I am: they’ve seen some facts about me, but there are lot of possibilities consistent with what they’ve seen. Some of those possibilities cooperate and some of them defect. (This is only an approximation, which I’m using here to give a rough sketch. It does apply to logical uncertainty as well as empirical uncertainty, though in some situations we will care about correlation+kindness across logically impossible worlds which is a more complicated discussion.)

Suppose that from the other player’s perspective I belong to a set of possible players S, and that they have well-calibrated beliefs about S. Then conditioned on one additional player from S cooperating, the other player has a 1/|S| higher probability that each player in S cooperates—the effect of their limited prediction ability is to “smear out” the update across all of the indistinguishable-to-them players.

Conditioned on my cooperation, the fraction of cooperating players in S increases by E[C], which we now define as the average correlation between my decision and the decision from someone else in S who is facing a player using a reciprocity-like strategy (where the correlation is computed *my* perspective). That then increases the frequency with which people cooperate with players in S by E[C].

The benefits of this cooperation are distributed across all the players in S, and I care based on E[K], resulting in a total benefit I care about of 2% * E[C] * E[K]. The costs are distributed across the players in S proportional to C, and so the cost I care about is 1% * E[C * K].

This leads us to a calculus very similar to the correlation+kindness analysis in the last section: it’s worth cooperating only if the correlation between C and K is weaker than 0.5 * (the extent to which the other player reciprocates). However the correlation is now being computed over the set S, of people who look like me as far as the other player is concerned.

Conditioning on what the other player knows about me will tend to reduce the correlation between C and K, potentially greatly. One way to think about this is that the C-K correlation is driven by some common causes—perhaps my decision and values are both more correlated with overthinking things. To compute the C-K correlation conditioned on S, we remove any of those common causes that are observed by the other player. This won’t significantly reduce correlations coming from selfishness, but is likely to be a very big deal for impartial values.

The bottom line is that if I am facing a player who uses a reciprocity-like strategy, such that they will cooperate more if they expect me to cooperate, and if I have impartial values and the other player knows this fact, then reciprocity+correlation+kindness can give a very big reason to cooperate: if the other player is a weak predictor then the benefits of cooperation will spill over to lots of other people who “look like me” from their perspective, but the *costs* are also shared across the other people who look like me, and I probably care about as much about the people who get the benefits as the people who pay the costs.

I think the best reason for the other person to run a reciprocity-like strategy is because they expect that you are more likely to cooperate if they do. In practice both players are using a policy like X = “cooperate if the other player uses a policy like X.” So I expect this kind of cooperation to get off the ground when *both* players have sufficiently impartial values and are inclined to follow this argument (and learn enough about each other to verify those facts).

I think this can greatly strengthen the correlation+kindness effect even while you have weak predictors and may often lead to a net incentive to cooperate for impartial updateless decision theorists. For agents who update / have selfish values, such cooperation probably requires the ability to carry out more detailed simulations.

I think the most dicy step of this argument was modeling the other player’s uncertainty as ignorance about my identity; we’ve modeled them as thinking that I could be anyone in S but having correct beliefs about S on average. Concretely, we’re imagining something like: all the other player can tell is that they are playing against someone who tends to overthink things. I believe that my decision is **very slightly** correlated with the average behavior of players who overthink things, and I think that the other player will end up with well-calibrated beliefs about the class of players who overthink things. So my decision has the same strength of correlation with their beliefs as it does with the truth, which gives us reason to cooperate no matter how small in absolute value both correlations are. But if that 1:1 assumption doesn’t hold (because the other player can’t reason accurately about people who overthink things), then we broaden the set S, potentially including logically impossible versions of “players who overthink things.” I think this modeling choice is much more plausible for a UDT than an EDT agent, because a UDT agent hasn’t yet updated on any of the factors driving the other player’s belief. But even for a UDT agent this might be wrong and requires more thought.

## Conclusion

There seems to be pretty rich decision-theoretic structure in the true prisoner’s dilemma, and some reasons to cooperate will involve a multiplicative interaction between different mechanisms that are often lumped together as “weird acausal stuff.” In particular, I think understanding the strength of reciprocity-like arguments for cooperation requires thinking about the dynamics of evidential cooperation in large worlds (at least until you reach the limit involving literal simulations).

This post informally describes my current best guesses and the very rough analysis behind them; those have a minor effect on my attitudes to some odd philosophical questions, but I wouldn’t personally trust them to be correct or robust. Some of the underlying correlations and frequencies across the universe seem almost impossible to assess, but I do think it’s feasible to analyze the basic situation much more carefully, and that could still make a real difference in our best guesses. There’s a good chance that the basic picture in this post may be completely wrong, and conversely if a more careful analysis had similar conclusions then I’d feel much more comfortable with them.

Could you say more about why you think this? (Or, have you written about this somewhere else?) I think I agree if by "UDT" you mean something like "EDT + updatelessness"

^{[1]}; but if you are essentially equating UDT with FDT, I would expect the "correlation"/"logi-causal effect on your opponent" to be pretty minor in practice due to the apparent brittleness of "logical causation".This is not how I would characterize ECL. Rather, ECL is about correlation + caring about what happens in your opponent's universe, i.e. not specifically about the welfare/life of your opponent.

^{^}Because updatelessness can arguably increase the game-theoretic symmetry of many kinds of interactions, which is exactly what is needed to get EDT to cooperate.

Yeah, by UDT I mean an updateless version of EDT.

I'm not sure I understand the distinction (or what you mean "your opponent's universe"), and I might have miscommunicated. What I mean is:

Thanks for clarifying. I still don't think this is exactly what people usually mean by ECL, but perhaps it's not super important what words we use. (I think the issue is that your model of the acausal interaction—i.e. a PD with survival on the line—is different to the toy model of ECL I have in my head where cooperation consists in benefitting the values of the other player [without regard for their life

per se]. As I understand it, this is essentially the principal model used in the original ECL paper as well.)The toy model seems like an example, though maybe I misunderstand. I'm just using survival as an example of a thing that someone could care about, and indeed you only have an ECL reason to cooperate if you care about the survival of other agents.

I've been using ECL, and understanding others (including the original paper) using ECL to mean:

Someone should correct me if this is wrong.

Ah, okay, got it. Sorry about the confusion. That description seems right to me, fwiw.

I think this is too weak. I think even without precommitments I think I am pretty good at predicting who of my friends will defect/cooperate in prisoner's dilemma-like scenarios, and I think you are underestimating the ability for humans to predict one another.

I at least make a lot of decisions per week that I think are in-structure pretty similar to a prisoner's dilemma where I expect the correlation part to check out, because I am pretty good at predicting what other people are doing (in general my experience is that people are pretty honest about how they reason in situations like this, so you can just ask them, and then predicting that they will do what the algorithm they described to you at some point in the past would do gets you pretty high accuracy).

I'm definitely more skeptical of this mechanism than you are (at least on its own, i.e. absent either correlation+kindness or the usual causal+moral reasons to cooperate). I don't think it's obvious.

The relevant parameter is something like R = E[my opponent's probability that I'll cooperate | I cooperate] vs E[my opponent's probability that I'll cooperate | I defect]. My feeling is that this parameter can't be that close to 1 because there is a bunch of other stuff that affects my opponent's probability (like my past behavior, their experience with agents similar to me, etc.) beyond my actual decision about which I'm currently uncertain.

For an EDT agent I think that's pretty clear that R<<1. For a UDT agent I think it's less clear since everything is so confusing and R > 0.5 is not crazy.

Shouldn't this be "... where I based my prediction ..."?

Thanks. It should be "I should also myself use a reciprocity-like strategy where my cooperation is based on my prediction that...". It's a very confusing sentence even after the correction, I probably won't fix it further though.