Three reasons to cooperate

paulfchristiano

In this post I’ll discuss three reasons to cooperate in a truly one-shot prisoner’s dilemma:

Kindness: You care about the other player.
Correlation. Your decision is correlated with the other player’s decision.
Reciprocity. Your decision is correlated with the other player’s belief about your decision.

Kindness makes common sense, but correlation and reciprocity are often lumped together under “weird acausal stuff.” I think they are worth distinguishing because they work quite differently.

I’ll talk about details of correlation and reciprocity, and then argue that most incentives to cooperate in the true prisoner’s are likely to come from interactions of multiple mechanisms.

Setup: the true prisoner’s dilemma

Imagine that one day our civilization sees another through a wormhole—we’ve never met before, and we are far enough away from each other that after today we will never meet again. From each civilization, a single person is chosen to play a game.

The two players then play a one-shot prisoner’s dilemma with survival as the stakes. Each player chooses to cooperate in which case they die with probability 1%, or defect in which case the other play dies with probability 2%. If both players cooperate they each die with 1% chance, if both defect they each die with 2% chance.

No matter what the other player does, you have a lower probability of death if you defect. In this post we analyze reasons that you might cooperate anyway.

I will not discuss the incentives to make commitments in advance based on the possibility that we may play a prisoner’s dilemma in the future (even though I think those incentives are important). I will analyze the question mostly for an agent who uses EDT or UDT.

You could just as well replace the ratio 2%/1% with any X>1, and ask “how large does X need to be before I’d cooperate?” I think it is pretty robust that you should cooperate at some sufficiently large scale (unless you specifically want the other player to die) and so this is ultimately a quantitative question. For this post I’ll just keep X=2 and leave it up to the reader to generalize.

Kindness

The most boring reason to cooperate is that I don’t want the other player to die.

That might be because I value their life and experiences for its own sake, in the same way that I value mine. Or it could be that I mostly care about other consequences of survival—the effect a death will have on the people around us, the things we will accomplish if we survive, and so on—and I think their survival will have nearly as many good consequences as mine.

Let K be how much I value the other player’s survival. If K > 0.5 then I should cooperate. (Since then 2% * K > 1%.)

Correlation

When my deliberation begins, I’m not sure what I’ll end up deciding. I’m also not sure what the other player will decide. Those two unknown quantities are slightly correlated because our decisions have some common structure: what happens when someone is whisked away from their everyday life and spends a day deciding how nice to be to someone from an alien civilization?

A causal decision theorist doesn’t care about this correlation, but an evidential decision theorist considers it a reason to cooperate. We can measure the strength of the correlation as C = P(they cooperate | I cooperate) – P(they cooperate | I defect). If C > 0.5 then I should cooperate.

Smaller correlations give us a partial reason to cooperate, and they combine linearly with kindness: if K + C > 0.5 I should cooperate.

It’s really hard to estimate C. It’s probably not zero but it’s plausible that it’s extremely small. Much of the correlation is likely to be screened off by other stuff I’ve seen (e.g. observing other people playing the prisoner’s dilemma).

The effective correlation is likely to be (much) larger for someone using UDT. I do think altruistic agents should use UDT, see EDT with updating double counts and Decision theory and dynamic consistency, and therefore I think this can be a significant consideration.

Reciprocity

If the other player is very different from me, my decision may not be very correlated with their decision. But if they understand me well enough, my decision might be very correlated with their prediction about my decision.

If the other player is using a policy like “I cooperate only with people who I predict will cooperate with me” then I would care a lot about their prediction and so this could give an evidential decision-theorist reason to cooperate.

You can think of “I cooperate only with people who I predict will cooperate with me” as a kind of logical tit for tat. I know what I predict the other player would do. That prediction involves predicting what they’d predict I’d do, which in turn involves predicting what they’d predict that I’d predict they’d do, and so on. If the real world is just another representative entry in this sequence, then cooperating based on whether I predict my opponent will cooperate is a lot like cooperating based on whether they cooperated in the last step, i.e. if I predict they’d cooperate.

I find it more convincing to think of this as a kind of logical trade. In a usual physical trade, I have a policy like “I will give you a toaster if you give me $1.” I have that policy because I believe you will give me $1 if I have that policy and not otherwise. The logical version is just the same. It’s easiest to make that arrangement when you and I can actually talk. But we can also do the same thing with lower fidelity based only on predictions. To an EDT agent the size of the incentive depends on the quality of predictions. To a CDT agent, the incentive only kicks in when the other player’s reasoning about me takes the form of a simulation such that “I” might be in the simulation. For the rest of the post I’ll consider only EDT or UDT agents.

Let R be my overall estimate for the difference between P(other player cooperates | I cooperate) and P(other player cooperates | I defect) that comes the channel of my decision affecting their prediction which in turn affects their action. We’ll change the definition of C to be the “direct” correlation that comes from the similarity of our decisions rather than through this prediction channel.

This factor combines additively with kindness and correlation, so I should cooperate if R+C+K > 0.5.

(This is a simplification, because I should also myself use a reciprocity-like strategy where my action is based on my prediction about whether they are cooperating based on a reciprocity-like strategy… but I’m going to keep setting aside those complexities. Program equilibrium via provability logic gives a taste for some of the issues.)

Correlation + Kindness

Correlation and kindness also have an important nonlinear interaction, which is often discussed under the heading of “evidential cooperation in large worlds” or ECL.

Consider all of the pairs of civilizations and people who might have been randomly selected to play this prisoner’s dilemma. For each of them, we can define K and C as in the last sections. We can also think of K=C=1 for ourselves.

Let’s imagine that this crazy wormhole-prisoner’s-dilemma situation arises periodically throughout the universe, and that the universe is very large (e.g. because of quantum branching). Even if K and C are extremely small on average, the sum of K and C across all other players can be astronomically large.

To an evidential decision theorist, that means:

Because the sum of K is large, most of what I care about in the universe is not this particular game of the prisoner’s dilemma, but the sum over all the other games.
Because the sum of C is large, most of the difference between the world conditioned on “I cooperate” and the world conditioned on “I defect” is not the fact that I in particular cooperate but the correlation with all the other games.

So when I cooperate, the costs and benefits are effectively distributed across the universe, rather than being concentrated on me and the other player. The calculus is no longer “Is R+C+K > 0.5?”. Instead, conditioned on me cooperating:

The extra fraction of players who cooperate is E[C].
The extra fraction of players who survive from cooperation is 2% * E[C].
My utility from that is 2% * E[C] * E[K], because the cost is distributed across all players.
The extra fraction of players who die because they cooperated is 1% * E[C].
My disutility from that is 1% * E[C * K], because the cost is distributed based on C.

Thus the question is whether the correlation between C and K is large enough. In a small universe where K and C are usually small, the correlation between K and C can be huge—I myself am an example of a player with K=C=1, and all of E[C], E[K] and E[C*K] may be equal to “what fraction of all players do I represent?”. But once the sum of K and C get very large, I personally make almost no contribution to E[C] and E[K], and so the correlation becomes dominated by features of other civilizations.

Overall I think that this may give impartial players (who mostly care about people other than themselves) a strong reason to cooperate with each other even if they have very different values and don’t intrinsically care about one another. For selfish players, i.e. whose who mostly care about the future experiences of observers very similar to themselves, it’s a much weaker reason to cooperate. Most humans would be a mix of the two and so it depends on the parameters.

This calculation is pretty subtle and I’ve glossed over a lot of details. One particularly important issue is that C is not just a property of a civilization. In fact it makes most sense for me to cooperate with players who have the kind of values that someone who is correlated with me might have had. And in that case, I could end up making very different decisions than a copy of me who was playing against a rock. Sorting through all those complexities is complicated and does change the recommended policy, but the net effect seems to be to create a generally stronger reason to cooperate with other impartial players.

Reciprocity + correlation + kindness

Intuitively there are two factors limiting the strength of reciprocity motivations for cooperation:

Are the other player’s predictions correlated with my behavior?
Does the other player implement a reciprocity-like policy? (Or would they, if they thought I might?)

Condition 1 seems quite hard to achieve, especially in civilizations like ours that don’t have sophisticated technology for making predictions. So I expect reciprocity on its own to not be much of a motive in the true prisoner’s dilemma unless the players have an extremely good ability to reason about one another—it doesn’t require going all the way to literal simulations, but does require much more ability than we have today. (Note that the situation changes if we are able to make commitments: in that case it may be natural to commit to a trade-like policy, allowing us to cooperate with other people who make similar commitments by making the prediction problem easier.)

But I think that combining reciprocity with correlation+kindness can create a significant reason to cooperate even when players are not able to make very accurate predictions about one another. In this section we’ll try to analyze that effect. We’ll assume that we are playing against someone who is using a reciprocity-like strategy such that they are more likely to cooperate if they think that I will cooperate. If the other player is only possibly using a reciprocity-like strategy, or only has a weak tendency to cooperate if they think I will cooperate, then the conclusions will get scaled down.

We can model the other players’ limited prediction ability as uncertainty about exactly who I am: they’ve seen some facts about me, but there are lot of possibilities consistent with what they’ve seen. Some of those possibilities cooperate and some of them defect. (This is only an approximation, which I’m using here to give a rough sketch. It does apply to logical uncertainty as well as empirical uncertainty, though in some situations we will care about correlation+kindness across logically impossible worlds which is a more complicated discussion.)

Suppose that from the other player’s perspective I belong to a set of possible players S, and that they have well-calibrated beliefs about S. Then conditioned on one additional player from S cooperating, the other player has a 1/|S| higher probability that each player in S cooperates—the effect of their limited prediction ability is to “smear out” the update across all of the indistinguishable-to-them players.

Conditioned on my cooperation, the fraction of cooperating players in S increases by E[C], which we now define as the average correlation between my decision and the decision from someone else in S who is facing a player using a reciprocity-like strategy (where the correlation is computed my perspective). That then increases the frequency with which people cooperate with players in S by E[C].

The benefits of this cooperation are distributed across all the players in S, and I care based on E[K], resulting in a total benefit I care about of 2% * E[C] * E[K]. The costs are distributed across the players in S proportional to C, and so the cost I care about is 1% * E[C * K].

This leads us to a calculus very similar to the correlation+kindness analysis in the last section: it’s worth cooperating only if the correlation between C and K is weaker than 0.5 * (the extent to which the other player reciprocates). However the correlation is now being computed over the set S, of people who look like me as far as the other player is concerned.

Conditioning on what the other player knows about me will tend to reduce the correlation between C and K, potentially greatly. One way to think about this is that the C-K correlation is driven by some common causes—perhaps my decision and values are both more correlated with overthinking things. To compute the C-K correlation conditioned on S, we remove any of those common causes that are observed by the other player. This won’t significantly reduce correlations coming from selfishness, but is likely to be a very big deal for impartial values.

The bottom line is that if I am facing a player who uses a reciprocity-like strategy, such that they will cooperate more if they expect me to cooperate, and if I have impartial values and the other player knows this fact, then reciprocity+correlation+kindness can give a very big reason to cooperate: if the other player is a weak predictor then the benefits of cooperation will spill over to lots of other people who “look like me” from their perspective, but the costs are also shared across the other people who look like me, and I probably care about as much about the people who get the benefits as the people who pay the costs.

I think the best reason for the other person to run a reciprocity-like strategy is because they expect that you are more likely to cooperate if they do. In practice both players are using a policy like X = “cooperate if the other player uses a policy like X.” So I expect this kind of cooperation to get off the ground when both players have sufficiently impartial values and are inclined to follow this argument (and learn enough about each other to verify those facts).

I think this can greatly strengthen the correlation+kindness effect even while you have weak predictors and may often lead to a net incentive to cooperate for impartial updateless decision theorists. For agents who update / have selfish values, such cooperation probably requires the ability to carry out more detailed simulations.

I think the most dicy step of this argument was modeling the other player’s uncertainty as ignorance about my identity; we’ve modeled them as thinking that I could be anyone in S but having correct beliefs about S on average. Concretely, we’re imagining something like: all the other player can tell is that they are playing against someone who tends to overthink things. I believe that my decision is very slightly correlated with the average behavior of players who overthink things, and I think that the other player will end up with well-calibrated beliefs about the class of players who overthink things. So my decision has the same strength of correlation with their beliefs as it does with the truth, which gives us reason to cooperate no matter how small in absolute value both correlations are. But if that 1:1 assumption doesn’t hold (because the other player can’t reason accurately about people who overthink things), then we broaden the set S, potentially including logically impossible versions of “players who overthink things.” I think this modeling choice is much more plausible for a UDT than an EDT agent, because a UDT agent hasn’t yet updated on any of the factors driving the other player’s belief. But even for a UDT agent this might be wrong and requires more thought.

Conclusion

There seems to be pretty rich decision-theoretic structure in the true prisoner’s dilemma, and some reasons to cooperate will involve a multiplicative interaction between different mechanisms that are often lumped together as “weird acausal stuff.” In particular, I think understanding the strength of reciprocity-like arguments for cooperation requires thinking about the dynamics of evidential cooperation in large worlds (at least until you reach the limit involving literal simulations).

This post informally describes my current best guesses and the very rough analysis behind them; those have a minor effect on my attitudes to some odd philosophical questions, but I wouldn’t personally trust them to be correct or robust. Some of the underlying correlations and frequencies across the universe seem almost impossible to assess, but I do think it’s feasible to analyze the basic situation much more carefully, and that could still make a real difference in our best guesses. There’s a good chance that the basic picture in this post may be completely wrong, and conversely if a more careful analysis had similar conclusions then I’d feel much more comfortable with them.

The effective correlation is likely to be (much) larger for someone using UDT.

Could you say more about why you think this? (Or, have you written about this somewhere else?) I think I agree if by "UDT" you mean something like "EDT + updatelessness"^[1]; but if you are essentially equating UDT with FDT, I would expect the "correlation"/"logi-causal effect" to be pretty minor in practice due to the apparent brittleness of "logical causation".

Correlation and kindness also have an important nonlinear interaction, which is often discussed under the heading of “evidential cooperation in large worlds” or ECL.

This is not how I would characterize ECL. Rather, ECL is about correlation + caring about what happens in your opponent's universe, i.e. not specifically about the welfare/life of your opponent.

^{^}
Because updatelessness can arguably increase the game-theoretic symmetry of many kinds of interactions, which is exactly what is needed to get EDT to cooperate.

Yeah, by UDT I mean an updateless version of EDT.

Rather, ECL is about correlation + caring about what happens in your opponent's universe, i.e. not specifically about the welfare/life of your opponent.

I'm not sure I understand the distinction (or what you mean "your opponent's universe"), and I might have miscommunicated. What I mean is:

Correlation in a large world means that conditioned on you cooperating, a lot of people cooperate.
What matters is how much you care about the welfare of the people who cooperate relative to the welfare of the beneficiaries.
So you can end up cooperating owing to a combination of correlation+kindness even if you are neither correlated with nor care about your opponent.

Thanks for clarifying. I still don't think this is exactly what people usually mean by ECL, but perhaps it's not super important what words we use. (I think the issue is that your model of the acausal interaction—i.e. a PD with survival on the line—is different to the toy model of ECL I have in my head where cooperation consists in benefitting the values of the other player [without regard for their life per se]. As I understand it, this is essentially the principal model used in the original ECL paper as well.)

The toy model seems like an example, though maybe I misunderstand. I'm just using survival as an example of a thing that someone could care about, and indeed you only have an ECL reason to cooperate if you care about the survival of other agents.

I've been using ECL, and understanding others (including the original paper) using ECL to mean:

In a large world there are lots of people whose decisions are correlated with mine.
Conditioned on me doing something that is bad for me and good for someone else, more of those correlated people will do the same.
I will be a beneficiary of many of those decisions---perhaps nearly as often as I pay a cost.
This indirect update dwarfs the direct consequences of my decision.

Someone should correct me if this is wrong.

Wait, why is ECL lumped under Correlation + Kindness instead of just Correlation? I think this thread is supposed to answer that question but I don't get it.

It's not true that you only have an ECL reason to cooperate if you care about the survival of other agents. Paperclippers, for example, have ECL reason to cooperate.

I think you have to care about what happens to other agents. That might be "other paperclippers."

If you only care about what happens to you personally, then I think the size of the universe isn't relevant to your decision.

Ahhh, I see. I think that's a bit misleading, I'd say "You have to care about what happens far away," e.g. you have to want there to be paperclips far away also. (The current phrasing makes it seem like a paperclipper wouldn't want to do ECL)

Also, technically, you don't actually have to care about what happens far away either, if anthropic capture is involved.

Ah, okay, got it. Sorry about the confusion. That description seems right to me, fwiw.

I don't think I understand the principled difference between correlation and reciprocity; the latter seems like a subset of the former. Let me try say some things and see where you disagree. This is super messy and probably doesn't make sense, sorry.

There are many factors which could increase the correlation between two agents' decisions. For agents that are running reciprocity-like policies, the predictions they make about other agents are a particularly big factor.
In picking out reciprocity as a separate phenomenon, you seem to be saying "we can factorize the correlation into two parts: the correlation that would arise if we weren't predicting each other's decisions, and then the additional correlation that arises from us predicting each other's decisions".*
But I don't think that "predicting each other's decisions" constitute a clear-cut category. For example, maybe I partly have kindness genes because my ancestors kept running into people that are kind for roughly the same reasons as you might be kind (e.g. facts about how evolution works and how neural networks work). So in some sense, being kind counts as a prediction about your decision. Or more directly: I have a bunch of cached heuristics about how to interact with other agents, that I'll draw on when trying to make my decision, which implicitly involves predicting that you'll be like other agents.
Maybe instead the factorization you're using is "fix a time T, everything that I knew before T is background correlation, and all the thinking I do after time T can count as part of reciprocity". This seems like a reasonable factorization but again it doesn't seem like it's very clear-cut, and I'm not actually sure that's what you're doing.

Maybe this is still the most useful factorization regardless. But here's a tentative guess at what a more principled way of thinking about this might look like:

Identify the different components of my decision-making algorithm, like:
1. Evolutionarily-ingrained drives and instinctive decision theories
2. Heuristics shaped by my experiences
3. More cerebral conclusions based on doing high-level reasoning (e.g. "reciprocity good")
4. Predictions about what you'll do given your predictions of me
5. Predictions about what you'll do given your predictions of what I'll do given my predictions of what you'll do
6. Etc...
7. My current best-guess decision
Search for modifications to each of these components which have the property such that, in the nearest worlds where this modification were true, the overall outcome would be better by my current standards (both via influencing my traits and decisions and via influencing your traits and decisions). E.g. if I had evolved to be kinder, then maybe you would have evolved to be kinder too; if I had decided on a different decision theory maybe you would have too; etc.
Use my modified decision-making algorithm to make a decision.

Ofc our lack of a good way of reasoning about "nearest worlds" means that only pretty small changes will in practice be justifiable.

* In fact you could think of this as infinite regress: there's some level-0 correlation before we predict each other, then there's some level-1 correlation given our predictions of each other's predictions of the level-0 correlation, and so on. But that doesn't seem important here.

I think it was confusing for me to use "correlation" to refer to a particular source of correlation. I probably should have called it something like "similarity." But I think the distinction is very real and very important, and crisp enough to be a natural category.

More precisely, I think that:

Alice and Bob are correlated because Alice is similar to Bob (produced by similar process, running similar algorithm, downstream of the same basic truths about the universe...)

is qualitatively and crucially different from:

Alice and Bob are correlated because Alice is more likely to cooperate if Bob cooperates (so Alice is correlated with her model of Bob, which she constructed to be similar to Bob)

I don't think either one is a subset of the other. I don't think these are an exhaustive taxonomy of reasons that two people can be correlated, but I think they are the two most important ones.

For example, maybe I partly have kindness genes because my ancestors kept running into people that are kind for roughly the same reasons as you might be kind (e.g. facts about how evolution works and how neural networks work). So in some sense, being kind counts as a prediction about your decision. Or more directly: I have a bunch of cached heuristics about how to interact with other agents, that I'll draw on when trying to make my decision, which implicitly involves predicting that you'll be like other agents.

On its own I don't see why this would lead me to be kind (if I generally deal with kind people, why does that mean I should be kind?) I think you have to fill in the remaining details somehow, e.g.: maybe I dealt with people who are kind if and only if X is true, and so I have learned to be kind when X is true.

In my taxonomy this is a central example of reciprocity---the correlation flows through a pressure for me to make predictions about when you will be kind, and then be kind when I think that you will be kind, rather than from us using similar procedures to make decisions. I don't think I would call any version of this story "correlation" (the concept I should have called "similarity").

Condition 1 seems quite hard to achieve, especially in civilizations like ours that don’t have sophisticated technology for making predictions. So I expect reciprocity on its own to not be much of a motive in the true prisoner’s dilemma unless the players have an extremely good ability to reason about one another—it doesn’t require going all the way to literal simulations, but does require much more ability than we have today.

I think this is too weak. I think even without precommitments I think I am pretty good at predicting who of my friends will defect/cooperate in prisoner's dilemma-like scenarios, and I think you are underestimating the ability for humans to predict one another.

I at least make a lot of decisions per week that I think are in-structure pretty similar to a prisoner's dilemma where I expect the correlation part to check out, because I am pretty good at predicting what other people are doing (in general my experience is that people are pretty honest about how they reason in situations like this, so you can just ask them, and then predicting that they will do what the algorithm they described to you at some point in the past would do gets you pretty high accuracy).

I'm definitely more skeptical of this mechanism than you are (at least on its own, i.e. absent either correlation+kindness or the usual causal+moral reasons to cooperate). I don't think it's obvious.

The relevant parameter is something like R = E[my opponent's probability that I'll cooperate | I cooperate] vs E[my opponent's probability that I'll cooperate | I defect]. My feeling is that this parameter can't be that close to 1 because there is a bunch of other stuff that affects my opponent's probability (like my past behavior, their experience with agents similar to me, etc.) beyond my actual decision about which I'm currently uncertain.

For an EDT agent I think that's pretty clear that R<<1. For a UDT agent I think it's less clear since everything is so confusing and R > 0.5 is not crazy.

because I should also myself use a reciprocity-like strategy where my based on my prediction

Shouldn't this be "... where I based my prediction ..."?

Thanks. It should be "I should also myself use a reciprocity-like strategy where my cooperation is based on my prediction that...". It's a very confusing sentence even after the correction, I probably won't fix it further though.

The effective correlation is likely to be (much) larger for someone using UDT.

Correlation and kindness also have an important nonlinear interaction, which is often discussed under the heading of “evidential cooperation in large worlds” or ECL.

This is not how I would characterize ECL. Rather, ECL is about correlation + caring about what happens in your opponent's universe, i.e. not specifically about the welfare/life of your opponent.

^{^}
Because updatelessness can arguably increase the game-theoretic symmetry of many kinds of interactions, which is exactly what is needed to get EDT to cooperate.

Yeah, by UDT I mean an updateless version of EDT.

Rather, ECL is about correlation + caring about what happens in your opponent's universe, i.e. not specifically about the welfare/life of your opponent.

I'm not sure I understand the distinction (or what you mean "your opponent's universe"), and I might have miscommunicated. What I mean is:

Correlation in a large world means that conditioned on you cooperating, a lot of people cooperate.
What matters is how much you care about the welfare of the people who cooperate relative to the welfare of the beneficiaries.
So you can end up cooperating owing to a combination of correlation+kindness even if you are neither correlated with nor care about your opponent.

I've been using ECL, and understanding others (including the original paper) using ECL to mean:

In a large world there are lots of people whose decisions are correlated with mine.
Conditioned on me doing something that is bad for me and good for someone else, more of those correlated people will do the same.
I will be a beneficiary of many of those decisions---perhaps nearly as often as I pay a cost.
This indirect update dwarfs the direct consequences of my decision.

Someone should correct me if this is wrong.

I think you have to care about what happens to other agents. That might be "other paperclippers."

If you only care about what happens to you personally, then I think the size of the universe isn't relevant to your decision.

Ah, okay, got it. Sorry about the confusion. That description seems right to me, fwiw.

There are many factors which could increase the correlation between two agents' decisions. For agents that are running reciprocity-like policies, the predictions they make about other agents are a particularly big factor.
In picking out reciprocity as a separate phenomenon, you seem to be saying "we can factorize the correlation into two parts: the correlation that would arise if we weren't predicting each other's decisions, and then the additional correlation that arises from us predicting each other's decisions".*
But I don't think that "predicting each other's decisions" constitute a clear-cut category. For example, maybe I partly have kindness genes because my ancestors kept running into people that are kind for roughly the same reasons as you might be kind (e.g. facts about how evolution works and how neural networks work). So in some sense, being kind counts as a prediction about your decision. Or more directly: I have a bunch of cached heuristics about how to interact with other agents, that I'll draw on when trying to make my decision, which implicitly involves predicting that you'll be like other agents.
Maybe instead the factorization you're using is "fix a time T, everything that I knew before T is background correlation, and all the thinking I do after time T can count as part of reciprocity". This seems like a reasonable factorization but again it doesn't seem like it's very clear-cut, and I'm not actually sure that's what you're doing.

Maybe this is still the most useful factorization regardless. But here's a tentative guess at what a more principled way of thinking about this might look like:

Identify the different components of my decision-making algorithm, like:
1. Evolutionarily-ingrained drives and instinctive decision theories
2. Heuristics shaped by my experiences
3. More cerebral conclusions based on doing high-level reasoning (e.g. "reciprocity good")
4. Predictions about what you'll do given your predictions of me
5. Predictions about what you'll do given your predictions of what I'll do given my predictions of what you'll do
6. Etc...
7. My current best-guess decision
Search for modifications to each of these components which have the property such that, in the nearest worlds where this modification were true, the overall outcome would be better by my current standards (both via influencing my traits and decisions and via influencing your traits and decisions). E.g. if I had evolved to be kinder, then maybe you would have evolved to be kinder too; if I had decided on a different decision theory maybe you would have too; etc.
Use my modified decision-making algorithm to make a decision.

Ofc our lack of a good way of reasoning about "nearest worlds" means that only pretty small changes will in practice be justifiable.

More precisely, I think that:

Alice and Bob are correlated because Alice is similar to Bob (produced by similar process, running similar algorithm, downstream of the same basic truths about the universe...)

is qualitatively and crucially different from:

Alice and Bob are correlated because Alice is more likely to cooperate if Bob cooperates (so Alice is correlated with her model of Bob, which she constructed to be similar to Bob)

I don't think either one is a subset of the other. I don't think these are an exhaustive taxonomy of reasons that two people can be correlated, but I think they are the two most important ones.

For example, maybe I partly have kindness genes because my ancestors kept running into people that are kind for roughly the same reasons as you might be kind (e.g. facts about how evolution works and how neural networks work). So in some sense, being kind counts as a prediction about your decision. Or more directly: I have a bunch of cached heuristics about how to interact with other agents, that I'll draw on when trying to make my decision, which implicitly involves predicting that you'll be like other agents.

Condition 1 seems quite hard to achieve, especially in civilizations like ours that don’t have sophisticated technology for making predictions. So I expect reciprocity on its own to not be much of a motive in the true prisoner’s dilemma unless the players have an extremely good ability to reason about one another—it doesn’t require going all the way to literal simulations, but does require much more ability than we have today.

I'm definitely more skeptical of this mechanism than you are (at least on its own, i.e. absent either correlation+kindness or the usual causal+moral reasons to cooperate). I don't think it's obvious.

For an EDT agent I think that's pretty clear that R<<1. For a UDT agent I think it's less clear since everything is so confusing and R > 0.5 is not crazy.

because I should also myself use a reciprocity-like strategy where my based on my prediction

Shouldn't this be "... where I based my prediction ..."?

86

Three reasons to cooperate

86

Setup: the true prisoner’s dilemma

Kindness

Correlation

Reciprocity

Correlation + Kindness

Reciprocity + correlation + kindness

Conclusion

86

86