"The man who is cheerful and merry has always a good reason for being so,—the fact, namely, that he is so." The Wisdom of Life, Schopenhauer (1851)

TL;DR

Descriptions of the Prisoner's Dilemma typically suggest that the optimal policy for each prisoner is to selfishly defect instead of to cooperate. I disagree with the traditional analysis and present a case for cooperation.

The core issue is the assumption of independence between the players. Articulations of the game painstakingly describe how the prisoners are in explicitly separate cells with no possibility of communication. From this, it's assumed that one's action can have no causal effect on the decision of the other player. However, (almost) everything is correlated, and this significantly changes the analysis.

Imagine the case where the prisoners are clones and make exactly the same decision. Then, when they compare the expected payout for each possible action, their payout will be higher in the case where they cooperate because they are certain the other player is having the same thoughts and deterministically will make the same choice. This essay generalizes and formalizes this line of reasoning.

Here's what to expect in what follows. In the first section, we begin by introducing the standard causal decision theory analysis suggesting (Defect, Defect). Then, we introduce the machinery for mixed strategies in the following section. From there we discuss the particular case where both participants are clones, which motivates our new framework. Then, we introduce a bit more formalism around causal modeling and dependence. We proceed to analyze a more general case where both players converge to the same mixed strategy. Then we discuss the most general model, where the players' mixed strategies have some known correlation. Finally, we conclude the analysis. In summary, given some dependency structure due to upstream causal variables, we uncover the cases where the game theory actually suggests cooperation as the optimal policy.

The Traditional Analysis

In Game Theory 101, here's how the Prisoner's Dilemma analysis traditionally goes. Alice and Bob are conspirators in a crime. They're both caught and brought to separate interrogation rooms. They're presented with a Faustian bargain: to snitch or not to snitch. If neither snitches on the other, they both get a 1-year sentence. If one snitches and the other does not, then the snitch goes home free while the non-snitch has to serve 3 years. If they both snitch, they serve 2 years.

Here's the payoff diagram corresponding with this setup:

BCBDAC(−1,−1)(−3,0)AD(0,−3)(−2,−2)

If Alice cooperates, Bob is better off defecting to get 0 years instead of 1 year. If Alice defects, Bob is better off defecting to get 2 years instead of 3 years. So in either case Bob is better off defecting. A strategy which is optimal regardless of the choices of an opponent is called a dominant strategy. Symmetrically, Alice is better off defecting no matter what Bob does. This means that even though they're both happier in the case where they both cooperate, serving just one year each, the "Nash equilibrium" is the case where they both defect, serving two years each. A state is called a Nash equilibrium if no player can benefit by changing their action, holding the actions of all other players constant.

We can represent each player's preferences and optimal choices with an arrow diagram.

Nash equilibria are represented by a state with no arrows pointing away from it, meaning no player would prefer to switch their choice, holding the other players' choices the same. In the example above, (Defect, Defect) is the single Nash equilibrium.

We can generalize the payoff matrix a bit to represent all situations that capture the structure of a "Prisoner's Dilemma"-like scenario.

BCBDAC(R,R)(S,T)AD(T,S)(Q,Q)

I use the variables Q,R,S,T to keep organized. These can be remembered in the following way. Q is the Quarrel payout when they rat each other out. R is the Reward for mutual cooperation. S is the Sucker's payout if the other player snitches and they do not. T is the Temptation payout for snitching while the other does not. The necessary and sufficient conditions for a prisoner's dilemma structure are that S<Q<R<T.

Probabilistic Play

Now, let's make our model a bit more rigorous and extend our binary action space to a probabilistic strategy model.

BCBDAC(R,R)(S,T)AD(T,S)(Q,Q)

Instead of discrete actions, let's suppose Alice and Bob each choose mixed strategy vectors [p(Ac),p(Ad)] and [p(Bc),p(Bd)], respectively, which represent their probabilities of cooperation or defection, such that p(Ac)+p(Ad)=1 and p(Bc)+p(Bd)=1.

We formalize the analysis by noting that Alice wants to maximize her expected (VNM) utility. We now compute her optimal cooperation fraction p∗(Ac|B∗) given Bob's optimal policy.

E[UA(G)]=E[UA(G|Ac)]p∗(Ac)+E[UA(G|Ad)]p∗(Ad)

We decompose the expected value of the game to Alice E[UA(G)] into the expected value of the game given that she cooperates E[UA(G|Ac)] times the probability that she cooperates p∗(Ac) plus the expected value of the game given that she defects E[UA(G|Ad)] times the probability she defects p∗(Ad).

We can further clarify Alice's expected utility given her action E[UA(G|Ac)] into the cases where Bob cooperates and does not cooperate.

Normally to find the p∗(Ac) which maximizes E[UA(G)] we would differentiate E[UA(G)] with respect to p∗(Ac) and set this result to zero. But because the equation is linear in p∗(Ac), the derivative degenerates. This means that there won't be an optimal "mixed strategy" which is not a "pure strategy", meaning that given the prisoner's dilemma payout structure presented thus far Alice and Bob are better off making a decision to 100% cooperate or 100% defect.

To see if Alice's optimal cooperation percentage p∗(Ac) is 100% or 0%, we just need to see if the term p∗(Bc)(Q+R−S−T)+(S−Q) is greater than or less than zero. If it's greater than zero, then Alice best maximizes E[UA(G)] when p∗(Ac)=1, and correspondingly when it's less than zero then Alice should always defect (p∗(Ac)=0).

Rp∗(Bc)+S−Sp∗(Bc)−Tp∗(Bc)−Q+Qp∗(Bc)?<0

0?<(Tp∗(Bc)−Rp∗(Bc))+(Sp∗(Bc)−S)+(Q−Qp∗(Bc))

0?<p∗(Bc)(T−R)+Q(1−p∗(Bc))−S(1−p∗(Bc))

0?<p∗(Bc)(T−R)+(1−p∗(Bc))(Q−S)

We now show that each piece is greater than zero. p∗(Bc)>0, T−R>0 because T>R, 1−p∗(Bc)>0, and Q−S>0 because Q>S. Therefore, 0>p∗(Bc)(T−R)+(1−p∗(Bc))(Q−S), and Alice's optimal policy is to always defect. The same analysis can be performed for Bob as well.

This is how the analysis typically goes. The common perspective is that, given the payoff diagram and the constraint that T>R>Q>S, in a two-player simultaneous game where the players cannot communicate with one another, then unfortunately the game theory optimal policy is always to defect.

BCBDAC(R,R)(S,T)AD(T,S)(Q,Q)

I disagree with this conclusion and present my own analysis in the subsequent section.

The Clone Case for Cooperation

In the typical analysis of the prisoner's dilemma, we consider the choices of Alice and Bob to be independent of each other. This is because they make their decisions simultaneously and have no way of affecting the decision of the other. But, there is an underlying dependence structure between A and B that we must include in our model.

To illustrate this, imagine the case where Alice is cloned and given exactly the same environment such that both clones will make the same decisions. If we label clone A and clone B, in this case we now have p(BC|AC)=1, implying p(BD|AC)=0, p(BD|AD)=1, and p(BC|AD)=0. This is the case where there's perfect correlation between the decisions of A and B, which we analyze in this section.

Let's now recompute A's expected utility E[UA(G)] which she wants to maximize.

E[UA(G)]=E[UA(G|AC)]p∗(AC)+E[UA(G|AD)]p∗(AD)

Variable definitions:

E[UA(G)] is the expected utility of the game to player A.

E[UA(G|AC)] is the expected utility of the game to player A given that A choses to cooperate.

E[UA(G|AD)] is the expected utility of the game to player A given that A choses to defect.

p∗(AC) is the probability that A cooperates, which is optimized to maximize A's expected utility given her knowledge.

p∗(AD) is the probability that A defects, which is optimized to maximize A's expected utility given her knowledge.

Given B's potentially mixed strategy, the expected utility of the game to A given each of A's possible choices can be decomposed into the cases for each of B's choices given A's choice.

Notice that we condition B's probabilities on A's choice. In the previous analysis, we claimed that B has no way of seeing A's choice so we make them independent, claiming p(BC|AC)=p(BC). But this is NOT RIGHT. Also, notice we place a hat on ^pA(Bx|Ay) to denote that these are A's estimates of B's action probabilities based on A's own knowledge. We will come back to discuss this step in more detail in a later section, but note that this is the heart of the difference between my perspective and the traditional analysis.

Academic economists would likely claim (have claimed) that this approach doesn’t sufficiently separate belief formation and decision making. The type of recursive relationship I’m suggesting between estimating a counterparties’ policy and our own action selection seems “irrational”, in the sense that it doesn’t follow the typical pre-defined "rational decision making" algorithm described by economists. However, I point out that the policy presented here wins at games. That is, it is the policy which results in both agents obtaining maximum expected utility.

It’s been recommended by an econ friend (who I love dearly), that while this approach might not fit the classification of “rational decision making”, it could find a home in the behavioral economics literature as a form of *motivated reasoning*. The claim is that the agent wants to maximize their ex-ante expected utility and this model assumes their beliefs are malleable. A motivated reasoner wants to believe their counterparty to act as they do, so they “trick” themselves into believing it even though they haven’t rationally deduced that the counterparty will actually cooperate. The academic argument against cooperation continues, “but this cannot be fully rational behavior, because my ex-post decision has to be fully independent from the other agent, as my decision, because it is unknown to them at the time of their own decision by construction, cannot influence their beliefs and so should not influence my own beliefs about their action.” It's this notion that decisions are made independently that I take issue with. I discuss this further in the Everything is Correlated section which follows.

Notice that, if the choices of the players are truly independent, then ^pA(Bx|Ay)=^pA(Bx), yielding the traditional analysis of the last section.

Let's now explore the case where A and B are clones who will make exactly the same choice, so ^pA(Bx|Ax)=1 and ^pA(Bx|A¬x)=0. Let's now update our E[UA(G)] calculation.

Because Reward >Quarrel, E[UA(G)] is maximized when p∗(AC)=1! This means A will cooperate and receive E[UA(G)]=Reward instead of Quarrel, and similarly for B. This seems trivial, but if A and B can make the same decision, they can obtain the globally optimal solution instead of the strictly worse typical Nash equilibrium which is reached when they believe they act independently.

Everything is Correlated

Alright, if Alice knows that Bob will make exactly the same decision, she can fearlessly cooperate, knowing for certain that he will too, dissolving the dilemma. So now, let's reincorporate uncertainty back into our model.

Let's say Alice and Bob are not clones but are instead siblings with similar family values, experiences, dispositions, genes, neurological structures, etc. We know that many, many, many things in the world are correlated in some way. The dependency structure of the world is extremely interconnected. There's also an evolutionary argument for this that I won't get to in much detail. But when picking the prisoners, we're actually sampling players who have played an iterated version of this game many many times. If you believe that humans are not purely rational agents and make decisions based on instinct, then there's likely to be downstream similarity in their policies. And if you believe that the agents are purely rational, they also have exactly the same game-theoretic payouts, so their GTO play should be exactly the same. It would be extremely improbable for Alice and Bob's choices here to be perfectly independent. So let's model some dependence structure.

(I will note that there's a semi-compelling argument suggesting that we don't even need to introduce more subtle forms of dependence. Because they have identical payoff structures, the game-theoretic optimal play should be exactly the same. Given their GTO play should be exactly the same and mixed strategies degenerate to pure strategies, we get correlation 1, and so they should both coordinate 100% of the time. But the real world is messy, so we proceed with slightly more subtlety.)

Causal Modeling 101

To better understand the dependencies and correlations between Alice and Bob's choices, we can frame the problem in the light of causal modeling a la Judea Pearl. In causal modeling, we represent the dependencies between variables using a directed acyclic graph (DAG), where nodes represent variables and directed edges represent causal relationships.

In the traditional analysis of the Prisoner's Dilemma, Alice and Bob are assumed to make their decisions independently. However, in reality, their decisions will definitely be influenced by common upstream factors. For example, Alice and Bob may have similar upbringings, values, experiences, and expectations about loyalty and cooperation that influence their decision-making. And if that's not compelling, they also have access to similar information about the situation, and have exactly the same payout structure, so there's a strong case that they will end up making similar decisions.

We can represent these upstream factors with a variable V. The causal model can be depicted as follows:

V
/ \
A B

Here, V is a common cause that affects both Alice's and Bob's decisions. This creates a dependency between A and B.

Given the upstream factor V, the choices of Alice and Bob could be conditionally independent. However, if we do not observe V, the choices A and B become dependent. This is known as d-separation in causal graphs.

Mathematically, this can be expressed as:

P(A,B)=∑vP(A∣V=v)P(B∣V=v)P(V=v)

This shows that the joint probability of A and B depends on their conditional probabilities given V and the distribution of V.

Independent Case (Optional)

Suppose A and B were truly independent. Here's the causal model we might associate with this hypothesis.

V_1 V_2
| |
A B

Here V1 and V2 are separate upstream factors that affect Alice's and Bob's decisions, respectively. This creates a scenario where A and B are conditionally independent given V1 and V2. Mathematically, this can be expressed as:

P(A,B)=∑v1∑v2P(A∣V1=v1)P(B∣V2=v2)P(V1=v1)P(V2=v2)

This equation shows that the joint probability of A and B depends on their individual probabilities given V1 and V2, and the distribution of V1 and V2.

To prove the typical independence equationP(A,B)=P(A)P(B) from the more complicated seeming equation above, we first find the marginal probabilities P(A) and P(B) and sum over the respective upstream variables:

Then we expand the product of the sums and we notice that this expression is the same as the expression for the joint probability P(A,B) obtained in step 1:

P(A)P(B)=∑v1∑v2P(A∣V1=v1)P(V1=v1)P(B∣V2=v2)P(V2=v2) Because the two expressions are identical, we've shown that P(A,B)=P(A)P(B) assuming that Alice's and Bob's decisions are influenced by separate, independent upstream variables V1 and V2. With this extra machinery in hand, we proceed into the heart of our analysis.

Incorporating Dependence into Expected Utility

In The Clone Case for Cooperation section, I make the assumption that Alice and Bob deterministically make exactly the same choice. This time, let's relax that assumption and suppose that because they're rational agents presented with identical payoff matrices, they'll conclude the same mixed strategies to be optimal. Though we're not sure what their cooperate/defect probabilities will be yet, we just suppose that their GTO policies will be the same given the same payout structures.

To model this formulation, we go back to the start and split A's expected utility based on her two possible actions.

E[UA(G)]=E[UA(G|AC)]p∗(AC)+E[UA(G|AD)]p∗(AD)

Then, we split each of those cases for each of B's possible choices.

Now we need thoughtful ways of modeling ^pA(BC|AC) and ^pA(BC|AD). For Alice's approximation of Bob's probability of collaborating, ^pA(BC|AC), we can suppose that this matches Alice's own probability of collaborating p∗(AC) because of our assumption that rational agents will come to the same mixed strategy policies given identical payoff matrices. So, ^pA(BC|AC)=p∗(AC) and ^pA(BC|AD)=1−^pA(BD|AD)=1−p∗(AD)=p∗(AC).

All that's left to do from here is to find the p∗(AC) which maximizes E[UA(G)] subject to the constraints that 0≤p∗(AC)≤1 and S<Q<R<T. (The cases get slightly hairy so this section is skippable.)

Unlike the previous case where fE[UA(G)](p∗(AC)) was linear, this time we actually can optimize via the derivative method suggested earlier. To do so, we compute:

dE[UA(G)]dp∗(AC)=2(Q+R−S−T)p∗(AC)+(S+T−2Q)=0

Because fE[UA(G)](p∗(AC)) is a second-degree polynomial, the parabola it sweeps out will have its vertex at p∗(AC)=(2Q−S−T)2(Q+R−S−T). While there's still some tidying up to do to determine which cases apply to which solutions, excitingly, the possible solution set is constrained to {0,(2Q−S−T)2(Q+R−S−T),1}. This vertex won't always be the maximum, and sometimes this vertex will be outside the valid probability range [0,1]. In either of these countercases, the maximum value for p∗(AC) will be either 0 or 1.

We now describe the cases where the vertex is a valid maximum. For the vertex to be a maximum, the parabola must be concave. As we know, a y=ax2+bx+c parabola is concave when a<0. So, Q+R-S-T<0 or Q+R<S+T. For the vertex to be valid, we say it must lie within the domain 0≤p∗(AC)≤1. First, to find the inequalities implied by 0≤p∗(AC)=(2Q−S−T)2(Q+R−S−T), we note that because the parabola is concave down, the denominator Q+R−S−T will be negative, so 0≥2Q−S−T⟹S+T≥2Q. Second, p∗(AC)=(2Q−S−T)2(Q+R−S−T)≤1 implies that 2Q−S−T≤2(Q+R−S−T)⟹S+T≤2R. We've found that, when Q+R<S+T and 2Q≤S+T≤2R, then the optimal cooperation fraction for Alice will be p∗(AC)=12(Q−RQ+R−S−T)+12.

We still have to sort out when p∗(AC) will be 1 or 0, and there's a bit of tidying up to do when the denominator 2(R−S−T+Q) is zero (which actually was the case in the first example in the Traditional Analysis section where R=−1, S=−3, T=0, and Q=−2). To solve this indeterminate case this when R−S−T+Q=0, we take the second derivative of E[UA(G)]: d2E[UA(G)]d(p∗(AC))2=2(Q+R−S−T). Because in this case Q+R-S-T = 0, the second derivative is zero, implying that the expected utility curve will be linear.

So, when R−S−T+Q=0, E[UA(G)]=(S+T−2Q)p∗(AC)+Q, where here E[UA(G)] is a linear function of p∗(AC). In this edge case, we find that:

If S+T−2Q>0, E[UA(G)] increases with p∗(AC). Therefore, p∗(AC)=1 is optimal.

If S+T−2Q<0, E[UA(G)] decreases with p∗(AC). Therefore, p∗(AC)=0 is optimal.

If S+T−2Q=0, E[UA(G)] is constant, and any p∗(AC) is optimal.

For the other cases where we're deciding between 0 or 1, we just need to compare fE[UA(G)](p(AC)=0) with fE[UA(G)](p(AC)=1). That is, we need to compare (R−S−T+Q)02+(S+T−2Q)0+Q=Q with (R−S−T+Q)12+(S+T−2Q)1+Q=(R−S−T+Q)+(S+T−2Q)+Q=R. That is, in these cases, if R>Q then p∗(AC)=1, and if Q>R then p∗(AC)=0.

Putting this all together:

If Q+R<S+T and 2Q≤S+T≤2R, then the optimal cooperation fraction for Alice will be p∗(AC)=12(Q−RQ+R−S−T)+12.

If Q+R=S+T then:

If S+T−2Q>0, then p∗(AC)=1.

If S+T−2Q<0, then p∗(AC)=0.

If S+T−2Q=0, then any p∗(AC) is optimal.

Otherwise:

If R<Q then p∗(AC)=0.

If R=Q then p∗(AC)=12.

If R>Q then p∗(AC)=1.

Represented as one piecewise expression: p∗(AC)=⎧⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪⎨⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪⎩12(Q−RQ+R−S−T)+12if Q+R<S+T and 2Q≤S+T≤2R,0if Q+R=S+T and S+T−2Q<0,p∗(AC)∈[0,1]if Q+R=S+T and S+T−2Q=0,1if Q+R=S+T and S+T−2Q>0,0if Q+R>S+T and R<Q,12if Q+R>S+T and R=Q,1if Q+R>S+T and R>Q.

At last, if we make the assumption that A and B will conclude the same mixed strategy fractions to be optimal, given that they have identical payoffs, we now have a complete policy for determining their choices of optimal p∗(AC) and p∗(BC).

Interpretation

Let's interpret this solution a bit to make sure it accords with our intuitions. We start with the complicated seeming expression, p∗(AC)=12(Q−RQ+R−S−T)+12, in the case where Q+R<S+T and 2Q≤S+T≤2R.

This equation indicates that the optimal cooperation probability depends on the difference between the Quarrel and Reward payoffs, normalized by the overall differences between the players' payoffs when they perform the same action minus when they perform different actions. This value should be between -1 and 1, which bounds p∗(AC) between -1 and 1, as we would expect a probability to behave.

We continue to examine some important boundary conditions:

If Quarrel =Reward, then p∗(AC)=12, indicating that the player is indifferent between cooperation and defection.

If Quarrel >Reward, then p∗(AC)<12, indicating a higher probability of defection. (Remember the denominator is negative in this case.)

If Quarrel <Reward, then p∗(AC)>12, indicating a higher probability of cooperation. (Again the denominator is negative.)

This result seems reasonable because if the quarrel payoff Q is better than the reward for mutual cooperation R, Alice is more likely to defect. Conversely, if R is better, Alice is more likely to cooperate.

Let's now try to apply this policy to our initial toy example. To see if our solution holds, we test the case when R=−1, S=−3, T=0, Q=−2.

BCBDAC(−1,−1)(−3,0)AD(0,−3)(−2,−2)

First, we calculate R−S−T+Q=−1−(−3)−0+(−2)=0. Then, because R−S−T+Q=0, we use the indeterminate case formula. We calculate S+T−2Q=−3+0−2(−2)=1. Finally, since, S+T−2Q>0, the optimal policy is p∗(AC)=1. This aligns with the interpretation that Alice should always cooperate in this particular setup.

We can also confirm this computationally as well.

import matplotlib.pyplot as plt
import numpy as np
# Define the payoffs
Q = -3
R = -2
S = -4
T = -1
# Define the cooperation probabilities
p_A_C = np.linspace(0, 1, 1000)
# Calculate the expected utility for Alice
E_U_A_G = (R - S - T + Q) * p_A_C**2 + (S + T - 2 * Q) * p_A_C + Q
# Plot the expected utility
plt.figure(figsize=(10, 6))
plt.plot(p_A_C, E_U_A_G, label=r'$E[U_A(G)]$', color='blue')
plt.xlabel(r'Cooperation Probability $p^*(A_C)$')
plt.ylabel(r'Expected Utility $E[U_A(G)]$')
plt.title('Expected Utility of the Game for Alice')
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(0, color='black', linewidth=0.5)
plt.grid(color='gray', linestyle='--', linewidth=0.5)
plt.ylim([np.min(E_U_A_G), np.max(E_U_A_G)])
plt.legend()
plt.show

Overall, the analysis and derived equations seem to correctly capture the dependence and provide a policy for determining the optimal cooperation probability in this correlated Prisoner's Dilemma. From what I can tell based on the tests I've performed, the solution works as intended and aligns with my theoretical expectations from the payoffs.

Incorporating Correlation into Expected Utility

If you're not convinced by the case where we assume that two rational agents make exactly the same decision given identical payoffs or by the extended case where the two rational agents converge on the same optimal mixed strategy policy, I present a third generalization which introduces noise into our model.

To model the dependence structure between the agents' decisions alongside some natural uncertainty, let's introduce a correlation parameter ρ that captures the degree of correlation between Alice's and Bob's choices due to the common upstream factors, V. We can think of ρ as a measure of how likely it is that if Alice cooperates, Bob will also cooperate, and similarly for defection. This parameter will range from -1 to 1, where:

ρ=1 indicates perfect positive correlation (Alice and Bob always make the same choice). When ρ=1, this model degenerates to our clone case.

ρ=0 indicates no correlation (Alice and Bob's choices are independent).

ρ=−1 indicates perfect negative correlation (Alice and Bob always make opposite choices).

With this in mind, let's redefine the probabilities ^pA(BC|AC), ^pA(BD|AC), ^pA(BC|AD), and ^pA(BD|AD) to incorporate ρ.

We assume the following relationships for the conditional probabilities based on ρ:

^pA(BC|AC)=1+ρ2

^pA(BD|AC)=1−ρ2

^pA(BC|AD)=1−ρ2

^pA(BD|AD)=1+ρ2

These expressions ensure that the probabilities are consistent with the correlation parameter.

We can now substitute these probabilities into our expected utility equation for Alice from earlier:

TL;DRDescriptions of the Prisoner's Dilemma typically suggest that the optimal policy for each prisoner is to selfishly defect instead of to cooperate. I disagree with the traditional analysis and present a case for cooperation.

The core issue is the assumption of independence between the players. Articulations of the game painstakingly describe how the prisoners are in explicitly separate cells with no possibility of communication. From this, it's assumed that one's action can have no causal effect on the decision of the other player. However, (almost) everything is correlated, and this significantly changes the analysis.

Imagine the case where the prisoners are clones and make exactly the same decision. Then, when they compare the expected payout for each possible action, their payout will be higher in the case where they cooperate because they are certain the other player is having the same thoughts and deterministically will make the same choice. This essay generalizes and formalizes this line of reasoning.

Here's what to expect in what follows. In the first section, we begin by introducing the standard causal decision theory analysis suggesting (Defect, Defect). Then, we introduce the machinery for mixed strategies in the following section. From there we discuss the particular case where both participants are clones, which motivates our new framework. Then, we introduce a bit more formalism around causal modeling and dependence. We proceed to analyze a more general case where both players converge to the same mixed strategy. Then we discuss the most general model, where the players' mixed strategies have some known correlation. Finally, we conclude the analysis. In summary, given some dependency structure due to upstream causal variables, we uncover the cases where the game theory actually suggests cooperation as the optimal policy.

The Traditional AnalysisIn Game Theory 101, here's how the Prisoner's Dilemma analysis traditionally goes. Alice and Bob are conspirators in a crime. They're both caught and brought to separate interrogation rooms. They're presented with a Faustian bargain: to snitch or not to snitch. If neither snitches on the other, they both get a 1-year sentence. If one snitches and the other does not, then the snitch goes home free while the non-snitch has to serve 3 years. If they both snitch, they serve 2 years.

Here's the payoff diagram corresponding with this setup:

BCBDAC(−1,−1)(−3,0)AD(0,−3)(−2,−2)

If Alice cooperates, Bob is better off defecting to get 0 years instead of 1 year. If Alice defects, Bob is better off defecting to get 2 years instead of 3 years. So in either case Bob is better off defecting. A strategy which is optimal regardless of the choices of an opponent is called a dominant strategy. Symmetrically, Alice is better off defecting no matter what Bob does. This means that even though they're both happier in the case where they both cooperate, serving just one year each, the "Nash equilibrium" is the case where they both defect, serving two years each. A state is called a Nash equilibrium if no player can benefit by changing their action, holding the actions of all other players constant.

We can represent each player's preferences and optimal choices with an arrow diagram.

Nash equilibria are represented by a state with no arrows pointing away from it, meaning no player would prefer to switch their choice, holding the other players' choices the same. In the example above, (Defect, Defect) is the single Nash equilibrium.

We can generalize the payoff matrix a bit to represent all situations that capture the structure of a "Prisoner's Dilemma"-like scenario.

BCBDAC(R,R)(S,T)AD(T,S)(Q,Q)

I use the variables Q,R,S,T to keep organized. These can be remembered in the following way. Q is the Quarrel payout when they rat each other out. R is the Reward for mutual cooperation. S is the Sucker's payout if the other player snitches and they do not. T is the Temptation payout for snitching while the other does not. The necessary and sufficient conditions for a prisoner's dilemma structure are that S<Q<R<T.

Probabilistic PlayNow, let's make our model a bit more rigorous and extend our binary action space to a probabilistic strategy model.

BCBDAC(R,R)(S,T)AD(T,S)(Q,Q)

Instead of discrete actions, let's suppose Alice and Bob each choose mixed strategy vectors [p(Ac),p(Ad)] and [p(Bc),p(Bd)], respectively, which represent their probabilities of cooperation or defection, such that p(Ac)+p(Ad)=1 and p(Bc)+p(Bd)=1.

We formalize the analysis by noting that Alice wants to maximize her expected (VNM) utility. We now compute her optimal cooperation fraction p∗(Ac|B∗) given Bob's optimal policy.

E[UA(G)]=E[UA(G|Ac)]p∗(Ac)+E[UA(G|Ad)]p∗(Ad)

We decompose the expected value of the game to Alice E[UA(G)] into the expected value of the game given that she cooperates E[UA(G|Ac)] times the probability that she cooperates p∗(Ac) plus the expected value of the game given that she defects E[UA(G|Ad)] times the probability she defects p∗(Ad).

We can further clarify Alice's expected utility given her action E[UA(G|Ac)] into the cases where Bob cooperates and does not cooperate.

E[UA(G|Ac)]=E[UA(G|Ac,Bc)]p∗(Bc)+E[UA(G|Ac,Bd)]p∗(Bd)

E[UA(G|Ad)]=E[UA(G|Ad,Bc)]p∗(Bc)+E[UA(G|Ad,Bd)]p∗(Bd)

Bringing this back into our equation for E[UA(G)] yields:

E[UA(G)]=E[UA(G|Ac,Bc)]p∗(Ac)p∗(Bc) +E[UA(G|Ac,Bd)]p∗(Ac)p∗(Bd) +E[UA(G|Ad,Bc)]p∗(Ad)p∗(Bc) +E[UA(G|Ad,Bd)]p∗(Ad)p∗(Bd)

We can reduce the mess a bit by substituting in our variables Q,R,S,T.

E[UA(G)]=R p∗(Ac)p∗(Bc) +S p∗(Ac)p∗(Bd) +T p∗(Ad)p∗(Bc) +Q p∗(Ad)p∗(Bd)

Now we apply p(Ac)+p(Ad)=1 and p(Bc)+p(Bd)=1.

E[UA(G)]=R p∗(Ac)p∗(Bc) +S p∗(Ac)(1−p∗(Bc)) +T (1−p∗(Ac))p∗(Bc) +Q (1−p∗(Ac))(1−p∗(Bc))

Normally to find the p∗(Ac) which maximizes E[UA(G)] we would differentiate E[UA(G)] with respect to p∗(Ac) and set this result to zero. But because the equation is linear in p∗(Ac), the derivative degenerates. This means that there won't be an optimal "mixed strategy" which is not a "pure strategy", meaning that given the prisoner's dilemma payout structure presented thus far Alice and Bob are better off making a decision to 100% cooperate or 100% defect.

Expanding out E[UA(G)] a bit further we see:

E[UA(G)]=R p∗(Ac)p∗(Bc) +S p∗(Ac)−S p∗(Ac) p∗(Bc) +T p∗(Bc)−T p∗(Bc) p∗(Ac) +Q−Q p∗(Ac)−Q p∗(Bc)+Q p∗(Ac) p∗(Bc)

Now we isolate p∗(Ac):

E[UA(G)]=p∗(Ac)[R p∗(Bc) +S−S p∗(Bc) −T p∗(Bc)−Q +Q p∗(Bc)] +[T p∗(Bc)+Q−Q p∗(Bc)]

E[UA(G)]=p∗(Ac)[p∗(Bc)(Q+R−S−T)+(S−Q)] + [p∗(Bc) (T−Q)+Q]

To see if Alice's optimal cooperation percentage p∗(Ac) is 100% or 0%, we just need to see if the term p∗(Bc)(Q+R−S−T)+(S−Q) is greater than or less than zero. If it's greater than zero, then Alice best maximizes E[UA(G)] when p∗(Ac)=1, and correspondingly when it's less than zero then Alice should always defect (p∗(Ac)=0).

R p∗(Bc) +S−S p∗(Bc) −T p∗(Bc)−Q +Q p∗(Bc)?<0

0?<(T p∗(Bc)−R p∗(Bc))+(S p∗(Bc)−S)+(Q−Q p∗(Bc))

0?<p∗(Bc) (T−R)+Q (1−p∗(Bc))−S (1−p∗(Bc))

0?<p∗(Bc) (T−R)+(1−p∗(Bc)) (Q−S)

We now show that each piece is greater than zero. p∗(Bc)>0, T−R>0 because T>R, 1−p∗(Bc)>0, and Q−S>0 because Q>S. Therefore, 0>p∗(Bc) (T−R)+(1−p∗(Bc)) (Q−S), and Alice's optimal policy is to always defect. The same analysis can be performed for Bob as well.

This is how the analysis typically goes. The common perspective is that, given the payoff diagram and the constraint that T>R>Q>S, in a two-player simultaneous game where the players cannot communicate with one another, then unfortunately the game theory optimal policy is always to defect.

BCBDAC(R,R)(S,T)AD(T,S)(Q,Q)

I disagree with this conclusion and present my own analysis in the subsequent section.

The Clone Case for CooperationIn the typical analysis of the prisoner's dilemma, we consider the choices of Alice and Bob to be independent of each other. This is because they make their decisions simultaneously and have no way of affecting the decision of the other. But, there is an underlying dependence structure between A and B that we must include in our model.

To illustrate this, imagine the case where Alice is cloned and given exactly the same environment such that both clones will make the same decisions. If we label clone A and clone B, in this case we now have p(BC|AC)=1, implying p(BD|AC)=0, p(BD|AD)=1, and p(BC|AD)=0. This is the case where there's perfect correlation between the decisions of A and B, which we analyze in this section.

Let's now recompute A's expected utility E[UA(G)] which she wants to maximize.

E[UA(G)]=E[UA(G|AC)]p∗(AC)+E[UA(G|AD)]p∗(AD)

Variable definitions:

Given B's potentially mixed strategy, the expected utility of the game to A given each of A's possible choices can be decomposed into the cases for each of B's choices given A's choice.

E[UA(G|AC)]=E[UA(G|AC,BC)]^pA(BC|AC)+E[UA(G|AC,BD)]^pA(BD|AC)

E[UA(G|AD)]=E[UA(G|AD,BC)]^pA(BC|AD)+E[UA(G|AD,BD)]^pA(BD|AD)

We now recombine this into E[UA(G)]:

E[UA(G)]=(E[UA(G|AC,BC)]^pA(BC|AC)+E[UA(G|AC,BD)]^pA(BD|AC))p∗(AC)+(E[UA(G|AD,BC)]^pA(BC|AD)+E[UA(G|AD,BD)]^pA(BD|AD))p∗(AD)

E[UA(G)]=E[UA(G|AC,BC)]^pA(BC|AC)p∗(AC)+E[UA(G|AC,BD)]^pA(BD|AC)p∗(AC)+E[UA(G|AD,BC)]^pA(BC|AD)p∗(AD)+E[UA(G|AD,BD)]^pA(BD|AD)p∗(AD)

Now we substitute in Q,R,S,T and use p∗(AC)+p∗(AD)=1 and ^pA(BC|Ax)+^pA(BD|Ax)=1.

E[UA(G)]=^pA(BC|AC) p∗(AC) R+ (1−^pA(BC|AC)) p∗(AC) S+ ^pA(BC|AD) (1−p∗(AC)) T+ (1−^pA(BC|AD)) (1−p∗(AC)) Q

Notice that, if the choices of the players are truly independent, then^pA(Bx|Ay)=^pA(Bx), yielding the traditional analysis of the last section.Let's now explore the case where A and B are clones who will make exactly the same choice, so ^pA(Bx|Ax)=1 and ^pA(Bx|A¬x)=0. Let's now update our E[UA(G)] calculation.

E[UA(G)]=1 p∗(AC) R+ (1−1) p∗(AC) S+ 0 (1−p∗(AC)) T+ (1−0) (1−p∗(AC)) Q

Which we simplify into:

E[UA(G)]= p∗(AC) R+(1−p∗(AC)) Q

E[UA(G)]= p∗(AC) R+Q−Q p∗(AC)

E[UA(G)]=p∗(AC)(R−Q)+Q

Because Reward > Quarrel, E[UA(G)] is maximized when p∗(AC)=1! This means A will cooperate and receive E[UA(G)]=Reward instead of Quarrel, and similarly for B. This seems trivial, but if A and B can make the same decision, they can obtain the globally optimal solution instead of the strictly worse typical Nash equilibrium which is reached when they believe they act independently.

Everything is CorrelatedAlright, if Alice knows that Bob will make

exactlythe same decision, she can fearlessly cooperate, knowing for certain that he will too, dissolving the dilemma. So now, let's reincorporate uncertainty back into our model.Let's say Alice and Bob are not clones but are instead siblings with similar family values, experiences, dispositions, genes, neurological structures, etc. We know that many, many, many things in the world are correlated in some way. The dependency structure of the world is extremely interconnected. There's also an evolutionary argument for this that I won't get to in much detail. But when picking the prisoners, we're actually sampling players who have played an iterated version of this game many many times. If you believe that humans are

notpurely rational agents and make decisions based on instinct, then there's likely to be downstream similarity in their policies. And if you believe that the agentsarepurely rational, they also have exactly the same game-theoretic payouts, so their GTO play should be exactly the same. It would be extremely improbable for Alice and Bob's choices here to be perfectly independent. So let's model some dependence structure.(I will note that there's a semi-compelling argument suggesting that we don't even need to introduce more subtle forms of dependence. Because they have identical payoff structures, the game-theoretic optimal play should be exactly the same. Given their GTO play should be exactly the same and mixed strategies degenerate to pure strategies, we get correlation 1, and so they should both coordinate 100% of the time. But the real world is messy, so we proceed with slightly more subtlety.)Causal Modeling 101To better understand the dependencies and correlations between Alice and Bob's choices, we can frame the problem in the light of causal modeling a la Judea Pearl. In causal modeling, we represent the dependencies between variables using a directed acyclic graph (DAG), where nodes represent variables and directed edges represent causal relationships.

In the traditional analysis of the Prisoner's Dilemma, Alice and Bob are assumed to make their decisions independently. However, in reality, their decisions will definitely be influenced by common upstream factors. For example, Alice and Bob may have similar upbringings, values, experiences, and expectations about loyalty and cooperation that influence their decision-making. And if that's not compelling, they also have access to similar information about the situation, and have exactly the same payout structure, so there's a strong case that they will end up making similar decisions.

We can represent these upstream factors with a variable V. The causal model can be depicted as follows:

Here, V is a common cause that affects both Alice's and Bob's decisions. This creates a dependency between A and B.

Given the upstream factor V, the choices of Alice and Bob could be conditionally independent. However, if we do not observe V, the choices A and B become dependent. This is known as d-separation in causal graphs.

Mathematically, this can be expressed as:

P(A,B)=∑vP(A∣V=v)P(B∣V=v)P(V=v)

This shows that the joint probability of A and B depends on their conditional probabilities given V and the distribution of V.

Independent Case (Optional)Suppose A and B were truly independent. Here's the causal model we might associate with this hypothesis.

Here V1 and V2 are separate upstream factors that affect Alice's and Bob's decisions, respectively. This creates a scenario where A and B are conditionally independent given V1 and V2. Mathematically, this can be expressed as:

P(A,B)=∑v1∑v2P(A∣V1=v1)P(B∣V2=v2)P(V1=v1)P(V2=v2)

This equation shows that the joint probability of A and B depends on their individual probabilities given V1 and V2, and the distribution of V1 and V2.

To prove the typical independence equation P(A,B)=P(A)P(B) from the more complicated seeming equation above, we first find the marginal probabilities P(A) and P(B) and sum over the respective upstream variables:

P(A)=∑v1P(A∣V1=v1)P(V1=v1)

P(B)=∑v2P(B∣V2=v2)P(V2=v2)

Now we multiply P(A) and P(B) together:

P(A)P(B)=(∑v1P(A∣V1=v1)P(V1=v1))(∑v2P(B∣V2=v2)P(V2=v2))

Then we expand the product of the sums and we notice that this expression is the same as the expression for the joint probability P(A,B) obtained in step 1:

P(A)P(B)=∑v1∑v2P(A∣V1=v1)P(V1=v1)P(B∣V2=v2)P(V2=v2)

Because the two expressions are identical, we've shown that P(A,B)=P(A)P(B) assuming that Alice's and Bob's decisions are influenced by separate, independent upstream variables V1 and V2. With this extra machinery in hand, we proceed into the heart of our analysis.

## Incorporating Dependence into Expected Utility

In

The Clone Case for Cooperationsection, I make the assumption that Alice and Bob deterministically makeexactlythe same choice. This time, let's relax that assumption and suppose that because they're rational agents presented with identical payoff matrices, they'll conclude thesame mixed strategiesto be optimal. Though we're not sure what their cooperate/defect probabilities will be yet, we just suppose that their GTO policies will be the same given the same payout structures.To model this formulation, we go back to the start and split A's expected utility based on her two possible actions.

E[UA(G)]=E[UA(G|AC)]p∗(AC)+E[UA(G|AD)]p∗(AD)

Then, we split each of those cases for each of B's possible choices.

E[UA(G)]=E[UA(G|AC,BC)]^pA(BC|AC)p∗(AC)+E[UA(G|AC,BD)]^pA(BD|AC)p∗(AC)+E[UA(G|AD,BC)]^pA(BC|AD)p∗(AD)+E[UA(G|AD,BD)]^pA(BD|AD)p∗(AD)

Now we substitue in Q,R,S,T and use p∗(AC)+p∗(AD)=1 and ^pA(BC|Ax)+^pA(BD|Ax)=1.

E[UA(G)]=^pA(BC|AC) p∗(AC) R+ (1−^pA(BC|AC)) p∗(AC) S+ ^pA(BC|AD) (1−p∗(AC)) T+ (1−^pA(BC|AD)) (1−p∗(AC)) Q

Now we need thoughtful ways of modeling ^pA(BC|AC) and ^pA(BC|AD). For Alice's approximation of Bob's probability of collaborating, ^pA(BC|AC), we can suppose that this matches Alice's own probability of collaborating p∗(AC) because of our assumption that rational agents will come to the same mixed strategy policies given identical payoff matrices. So, ^pA(BC|AC)=p∗(AC) and ^pA(BC|AD)=1−^pA(BD|AD)=1−p∗(AD)=p∗(AC).

This turns our E[UA(G)] into:

E[UA(G)]=p∗(AC) p∗(AC) R+ (1−p∗(AC)) p∗(AC) S+ p∗(AC) (1−p∗(AC)) T+ (1−p∗(AC)) (1−p∗(AC)) Q

Which simplifies to:

E[UA(G)]=p∗(AC)2 R+ p∗(AC) (1−p∗(AC)) S+ p∗(AC) (1−p∗(AC)) T+ (1−p∗(AC))2 Q

E[UA(G)]=p∗(AC)2 R+ (p∗(AC)−p∗(AC)2) S+ (p∗(AC)−p∗(AC)2) T+ (1−2p∗(AC)+p∗(AC)2) Q

E[UA(G)]=R p∗(AC)2+ S p∗(AC)−S p∗(AC)2+ T p∗(AC)−T p∗(AC)2+ Q−2 Q p∗(AC)+Q p∗(AC)2

E[UA(G)]=(R−S−T+Q) p∗(AC)2+(S+T−2Q) p∗(AC)+Q

Optimizing (Optional)All that's left to do from here is to find the p∗(AC) which maximizes E[UA(G)] subject to the constraints that 0≤p∗(AC)≤1 and S<Q<R<T.

(The cases get slightly hairy so this section is skippable.)Unlike the previous case where fE[UA(G)](p∗(AC)) was linear, this time we actually can optimize via the derivative method suggested earlier. To do so, we compute:

dE[UA(G)]dp∗(AC)=2(Q+R−S−T)p∗(AC)+(S+T−2Q)=0

Because fE[UA(G)](p∗(AC)) is a second-degree polynomial, the parabola it sweeps out will have its vertex at p∗(AC)=(2Q−S−T)2(Q+R−S−T). While there's still some tidying up to do to determine which cases apply to which solutions, excitingly, the possible solution set is constrained to {0,(2Q−S−T)2(Q+R−S−T),1}. This vertex won't always be the maximum, and sometimes this vertex will be outside the valid probability range [0,1]. In either of these countercases, the maximum value for p∗(AC) will be either 0 or 1.

We now describe the cases where the vertex is a

valid maximum. For the vertex to be amaximum, the parabola must be concave. As we know, a y=ax2+bx+c parabola is concave when a<0. So, Q+R-S-T<0 or Q+R<S+T. For the vertex to bevalid, we say it must lie within the domain 0≤p∗(AC)≤1. First, to find the inequalities implied by 0≤p∗(AC)=(2Q−S−T)2(Q+R−S−T), we note that because the parabola is concave down, the denominator Q+R−S−T will be negative, so 0≥2Q−S−T ⟹ S+T≥2Q. Second, p∗(AC)=(2Q−S−T)2(Q+R−S−T)≤1 implies that 2Q−S−T≤2(Q+R−S−T) ⟹ S+T≤2R.We've found that, whenQ+R<S+Tand2Q≤S+T≤2R, then the optimal cooperation fraction for Alice will bep∗(AC)=12(Q−RQ+R−S−T)+12.We still have to sort out when p∗(AC) will be 1 or 0, and there's a bit of tidying up to do when the denominator 2(R−S−T+Q) is zero (which actually was the case in the first example in the

Traditional Analysissection where R=−1, S=−3, T=0, and Q=−2). To solve this indeterminate case this when R−S−T+Q=0, we take the second derivative of E[UA(G)]: d2E[UA(G)]d(p∗(AC))2=2(Q+R−S−T). Because in this case Q+R-S-T = 0, the second derivative is zero, implying that the expected utility curve will be linear.So, when R−S−T+Q=0, E[UA(G)]=(S+T−2Q) p∗(AC)+Q, where here E[UA(G)] is a linear function of p∗(AC). In this edge case, we find that:

For the other cases where we're deciding between 0 or 1, we just need to compare fE[UA(G)](p(AC)=0) with fE[UA(G)](p(AC)=1). That is, we need to compare (R−S−T+Q) 02+(S+T−2Q) 0+Q=Q with (R−S−T+Q) 12+(S+T−2Q) 1+Q = (R−S−T+Q)+(S+T−2Q)+Q=R.

That is, in these cases, ifR>Qthenp∗(AC)=1, and ifQ>Rthenp∗(AC)=0.Putting this all together:

Represented as one piecewise expression:

p∗(AC)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩12(Q−RQ+R−S−T)+12if Q+R<S+T and 2Q≤S+T≤2R,0if Q+R=S+T and S+T−2Q<0,p∗(AC)∈[0,1]if Q+R=S+T and S+T−2Q=0,1if Q+R=S+T and S+T−2Q>0,0if Q+R>S+T and R<Q,12if Q+R>S+T and R=Q,1if Q+R>S+T and R>Q.

At last, if we make the assumption that A and B will conclude the same mixed strategy fractions to be optimal, given that they have identical payoffs, we now have a complete policy for determining their choices of optimal p∗(AC) and p∗(BC).

InterpretationLet's interpret this solution a bit to make sure it accords with our intuitions. We start with the complicated seeming expression, p∗(AC)=12(Q−RQ+R−S−T)+12, in the case where Q+R<S+T and 2Q≤S+T≤2R.

This equation indicates that the optimal cooperation probability depends on the difference between the Quarrel and Reward payoffs, normalized by the overall differences between the players' payoffs when they perform the same action minus when they perform different actions. This value should be between -1 and 1, which bounds p∗(AC) between -1 and 1, as we would expect a probability to behave.

We continue to examine some important boundary conditions:

(Remember the denominator is negative in this case.)(Again the denominator is negative.)This result seems reasonable because if the quarrel payoff Q is better than the reward for mutual cooperation R, Alice is more likely to defect. Conversely, if R is better, Alice is more likely to cooperate.

Let's now try to apply this policy to our initial toy example. To see if our solution holds, we test the case when R=−1, S=−3, T=0, Q=−2.

BCBDAC(−1,−1)(−3,0)AD(0,−3)(−2,−2)

First, we calculate R−S−T+Q=−1−(−3)−0+(−2)=0. Then, because R−S−T+Q=0, we use the indeterminate case formula. We calculate S+T−2Q=−3+0−2(−2)=1. Finally, since, S+T−2Q>0, the optimal policy is p∗(AC)=1. This aligns with the interpretation that Alice should always cooperate in this particular setup.

We can also confirm this computationally as well.

Overall, the analysis and derived equations seem to correctly capture the dependence and provide a policy for determining the optimal cooperation probability in this correlated Prisoner's Dilemma. From what I can tell based on the tests I've performed, the solution works as intended and aligns with my theoretical expectations from the payoffs.

Incorporating Correlation into Expected UtilityIf you're not convinced by the case where we assume that two rational agents make exactly the same decision given identical payoffs or by the extended case where the two rational agents converge on the same optimal mixed strategy policy, I present a third generalization which introduces noise into our model.

To model the dependence structure between the agents' decisions alongside some natural uncertainty, let's introduce a correlation parameter ρ that captures the degree of correlation between Alice's and Bob's choices due to the common upstream factors, V. We can think of ρ as a measure of how likely it is that if Alice cooperates, Bob will also cooperate, and similarly for defection. This parameter will range from -1 to 1, where:

Whenρ=1, this model degenerates to our clone case.With this in mind, let's redefine the probabilities ^pA(BC|AC), ^pA(BD|AC), ^pA(BC|AD), and ^pA(BD|AD) to incorporate ρ.

We assume the following relationships for the conditional probabilities based on ρ:

These expressions ensure that the probabilities are consistent with the correlation parameter.

We can now substitute these probabilities into our expected utility equation for Alice from earlier:

E[UA(G)]=(E[UA(G|AC,BC)]^pA(BC|AC)+E[UA(G|AC,BD)]^pA(BD|AC))p∗(AC)+(E[UA(G|AD,BC)]^pA(BC|AD)+E[UA(G|AD,BD)]^pA(BD|AD))p∗(AD)

E[UA(G)]=(E[U