@Jonah Wilberg's post on the Evolutionary One-shot Prisoner's Dilemma explains how the Moloch-like equilibrium of mutual betrayal becomes the standard when agents who can only cooperate or defect interact with each other, then those who receive greater results reproduce themselves. This post had a recent followup where he suggested The Goddess of Everything Else which somehow allowed the agents to receive bigger payoffs from mutual coordination. However, the Goddess of Everything Else has two fairly natural ways to act, which are long-term interactions and acausal trade-like interactions with one's kin.
Evolution of the Iterated Prisoners' Dilemma
It is natural to modify the model of Moloch emerging from the Evolutionary Prisoner's Dilemma. Suppose that the agents play N turns of the Iterated Prisoners' Dilemma, and each of them precommits to cooperate on the very first turn, then imitate what the other player did, unless the amount of turns left is at most m, in which case the agent inevitably defects. Suppose also that the agents receive C when both cooperate, d if both defect, c if only the other agent defects and D if the agent defected and the other agent cooperated. Denote the ratio of agents who precommit to defect during the last m turns as xm. Suppose that two agents who precommited to defect for m1 and m2≥m1 last turns meet each other. If m1=m2, then they cooperate for the first N−m1 turns, receiving C(N−m1)+dm1 reward. If m1<m2, then the first N−m2 turns have both agents cooperate, the next turn has the second agent defect, and all turns after that have both agents defect. As a result, the second agent receives C(N−m2)+d(m2−1)+D, while the first agent receives C(N−m2)+d(m2−1)+c.
Thus, if two agents who precommited to defect during the last m and n turns met, then the first agent receives reward is
Alternatively, one could use ~R(m,n)=(d−C)(max(m,n))+⎧⎨⎩0,m=nc−d,m<nD−d,m>n
Consider the difference of rewards between Agent A who defects at the last m turns and Agent B who defects at the last m+1 turn. An encounter with an agent who defects for the last m+2 or more turns doesn't change anything. An encounter with an agent defecting for the last m−1or less turns has A win C−dmore than B. An encounter with an agent defecting for the last m turns has B win D−C more; an encounter with an agent defecting for the last m+1 turn has B win d−c more. Defining pm:=x0+⋯+xm−1, we find that A wins (C−d)pm+(C−D)xm+(c−d)xm+1 more than B. This factor ensures that cooperation is actually beneficial for a sufficiently large pm, but not for the few most honest agents.
Were the evolutionary model to hold perfectly, deception would establish itself at a logarithmically slow rate since at any step m the amount of time when pm was close to 1 and cooperation was beneficial would have to be comparable with the amount of time when xm or xm+1 were large enough for the defectors to receive benefits.
How distortions affect the IPD's evolution
The next change of the model is rapid diffusion or low situational awareness. Suppose that instead of precommiting to defect after m turns the agent precommits to set the number n to be equal to m, flip 2k fair coins, increase n by 1/2 for every tail and decrease by 1/2 for every head, then to defect at the last n turns. Call such an agent k-quasi-precommited to defect at the last m turns. Then the agent which k-quasi-precommited to defect at the last m turns chooses the number n with probability 2−2k(2kn−m+k). Therefore, the probability Xn to encounter an agent who ends up defecting at the last n turns unless forced to retailate before is
Xn=∑2−2k(2kn−m+k)xm≤2−2k(2kk)
For a large enough value of k the analogue of the probability pm to encounter the agent who defects at the last less than m turns unless forced to retailate would change from 0 to 1 slowly and the values of Xm and Xm+1 would be little, making the agents unlikely to receive big benefits from defection.
Alternatively, the agents can mutate and change the value of m by a big random number during the reproduction with the same results of making xm and xm+1 little and pm change slowly. Once pm exceeds 1/3 while xm and xm+1 are sufficiently little,
the value of coordination at step N−m, which is (C−d)pm+(C−D)xm+(c−d)xm+1, exceeds (C−d)/6, and all agents who coordinate at the median level or lower face a severe pressure to coordinate as compared to the agents with pm∈[1/4;1/3], ensuring evolution towards coordination and not against it.
Interaction with Kin
An alternate source of cooperation is acausal trade-like[1] interaction with one's kin. Suppose that the agent just cooperates with probability pc and the agent who gets to replicate always produces two descendants who have the probability pc±ϵ to cooperate pc+ϵand pc−ϵ. In a finite-dimensional setup, each agent's descendants after at most k generations will not have probabilities to coordinate bigger than pc+kϵ. Then wholesale deception would cause the group of descendants to lose (C-d) of value per kin interaction while gaining at most (D-C) or (d-c) of value per non-kin interaction. Were the interactions with one's kin to severely outnumber the interactions with non-kins, the group would lose value and have a smaller chance of reproduction, meaning that patches of deception cannot grow big even in a cooperator-filled environment. On the other hand, patches of uncertain coordination would be able to outgrow the defectors if the episodes where one cooperates and the other fails are beneficial[2] for the group as a whole: if (D+c)/2>d, then for ϵ≪p≪1 a patch of agents having probability p to coordinate would start outgrowing its deceiving neighbors, since the patch would lose at most p(d−c) per non-kin interaction while gaining 2p((D+c)−2d)+2p2(C−d) as compared to the counterfactual of the patch being filled with defectors.
Conditions of Emergent Coordination
Similar considerations imply that in practice coordination emerges from either acausal trade-like interactions with one's kin and having interactions with them outnumber interactions with the rest of the world[3]or from long-term interactions and the ability of agents to retailate against each other and to mutate so that new holier agents would constantly emerge and prevent the bulk of the sinners from further degradation. What undermines coordination is the ability to attack each other, to exploit a common resourse without retailation risks and the lack of the agents' ability to relate to each other.
Unlike actual acausal trade, where two agents think and each of them decides to coordinate in hope that the counterpart coordinates as well, the mechanism described here doesn't involve anything like moral reflection or decisions.
However, some variants of the Prisoner's Dilemma have the coordinator receive far more severe punishment that what the group would receive if both defected.
A similar mechanism related to kinship and/or mutual retailations might also be an explanation of nostalgia related to small-scale environments where everyone knows everyone else, unlike large-scale ones (e.g. modern cities) where peer groups are far less stable.
@Jonah Wilberg's post on the Evolutionary One-shot Prisoner's Dilemma explains how the Moloch-like equilibrium of mutual betrayal becomes the standard when agents who can only cooperate or defect interact with each other, then those who receive greater results reproduce themselves. This post had a recent followup where he suggested The Goddess of Everything Else which somehow allowed the agents to receive bigger payoffs from mutual coordination. However, the Goddess of Everything Else has two fairly natural ways to act, which are long-term interactions and acausal trade-like interactions with one's kin.
Evolution of the Iterated Prisoners' Dilemma
It is natural to modify the model of Moloch emerging from the Evolutionary Prisoner's Dilemma. Suppose that the agents play N turns of the Iterated Prisoners' Dilemma, and each of them precommits to cooperate on the very first turn, then imitate what the other player did, unless the amount of turns left is at most m, in which case the agent inevitably defects. Suppose also that the agents receive C when both cooperate, d if both defect, c if only the other agent defects and D if the agent defected and the other agent cooperated. Denote the ratio of agents who precommit to defect during the last m turns as xm. Suppose that two agents who precommited to defect for m1 and m2≥m1 last turns meet each other. If m1=m2, then they cooperate for the first N−m1 turns, receiving C(N−m1)+dm1 reward. If m1<m2, then the first N−m2 turns have both agents cooperate, the next turn has the second agent defect, and all turns after that have both agents defect. As a result, the second agent receives C(N−m2)+d(m2−1)+D, while the first agent receives C(N−m2)+d(m2−1)+c.
Thus, if two agents who precommited to defect during the last m and n turns met, then the first agent receives reward is
R(m,n)=C(N−max(m,n))+d(max(m,n)−1)+⎧⎨⎩d,m=nc,m<nD,m>n
Alternatively, one could use ~R(m,n)=(d−C)(max(m,n))+⎧⎨⎩0,m=nc−d,m<nD−d,m>n
Consider the difference of rewards between Agent A who defects at the last m turns and Agent B who defects at the last m+1 turn. An encounter with an agent who defects for the last m+2 or more turns doesn't change anything. An encounter with an agent defecting for the last m−1 or less turns has A win C−d more than B. An encounter with an agent defecting for the last m turns has B win D−C more; an encounter with an agent defecting for the last m+1 turn has B win d−c more. Defining pm:=x0+⋯+xm−1, we find that A wins (C−d)pm+(C−D)xm+(c−d)xm+1 more than B. This factor ensures that cooperation is actually beneficial for a sufficiently large pm, but not for the few most honest agents.
Were the evolutionary model to hold perfectly, deception would establish itself at a logarithmically slow rate since at any step m the amount of time when pm was close to 1 and cooperation was beneficial would have to be comparable with the amount of time when xm or xm+1 were large enough for the defectors to receive benefits.
How distortions affect the IPD's evolution
The next change of the model is rapid diffusion or low situational awareness. Suppose that instead of precommiting to defect after m turns the agent precommits to set the number n to be equal to m, flip 2k fair coins, increase n by 1/2 for every tail and decrease by 1/2 for every head, then to defect at the last n turns. Call such an agent k-quasi-precommited to defect at the last m turns. Then the agent which k-quasi-precommited to defect at the last m turns chooses the number n with probability 2−2k(2kn−m+k). Therefore, the probability Xn to encounter an agent who ends up defecting at the last n turns unless forced to retailate before is
Xn=∑2−2k(2kn−m+k)xm≤2−2k(2kk)
For a large enough value of k the analogue of the probability pm to encounter the agent who defects at the last less than m turns unless forced to retailate would change from 0 to 1 slowly and the values of Xm and Xm+1 would be little, making the agents unlikely to receive big benefits from defection.
Alternatively, the agents can mutate and change the value of m by a big random number during the reproduction with the same results of making xm and xm+1 little and pm change slowly. Once pm exceeds 1/3 while xm and xm+1 are sufficiently little,
the value of coordination at step N−m, which is (C−d)pm+(C−D)xm+(c−d)xm+1, exceeds (C−d)/6, and all agents who coordinate at the median level or lower face a severe pressure to coordinate as compared to the agents with pm∈[1/4;1/3], ensuring evolution towards coordination and not against it.
Interaction with Kin
An alternate source of cooperation is acausal trade-like[1] interaction with one's kin. Suppose that the agent just cooperates with probability pc and the agent who gets to replicate always produces two descendants who have the probability pc±ϵ to cooperate pc+ϵand pc−ϵ. In a finite-dimensional setup, each agent's descendants after at most k generations will not have probabilities to coordinate bigger than pc+kϵ. Then wholesale deception would cause the group of descendants to lose (C-d) of value per kin interaction while gaining at most (D-C) or (d-c) of value per non-kin interaction. Were the interactions with one's kin to severely outnumber the interactions with non-kins, the group would lose value and have a smaller chance of reproduction, meaning that patches of deception cannot grow big even in a cooperator-filled environment. On the other hand, patches of uncertain coordination would be able to outgrow the defectors if the episodes where one cooperates and the other fails are beneficial[2] for the group as a whole: if (D+c)/2>d, then for ϵ≪p≪1 a patch of agents having probability p to coordinate would start outgrowing its deceiving neighbors, since the patch would lose at most p(d−c) per non-kin interaction while gaining 2p((D+c)−2d)+2p2(C−d) as compared to the counterfactual of the patch being filled with defectors.
Conditions of Emergent Coordination
Similar considerations imply that in practice coordination emerges from either acausal trade-like interactions with one's kin and having interactions with them outnumber interactions with the rest of the world[3] or from long-term interactions and the ability of agents to retailate against each other and to mutate so that new holier agents would constantly emerge and prevent the bulk of the sinners from further degradation. What undermines coordination is the ability to attack each other, to exploit a common resourse without retailation risks and the lack of the agents' ability to relate to each other.
Unlike actual acausal trade, where two agents think and each of them decides to coordinate in hope that the counterpart coordinates as well, the mechanism described here doesn't involve anything like moral reflection or decisions.
However, some variants of the Prisoner's Dilemma have the coordinator receive far more severe punishment that what the group would receive if both defected.
A similar mechanism related to kinship and/or mutual retailations might also be an explanation of nostalgia related to small-scale environments where everyone knows everyone else, unlike large-scale ones (e.g. modern cities) where peer groups are far less stable.