Epistemic status: I consider everything written here pretty obvious, but I haven't seen this anywhere else. It would be cool if you could provide sources on topic!
Reason to write: I've seen once pretty confused discussion in Twitter about how multiple superintelligences will predictably end up in Defect-Defect equilibrium and I suspect that discussion would had been better if I could throw in this toy example.
PrudentBot cooperates with agent with known source code if agent cooperates with PrudentBot and don't cooperate with DefectBot. It's unexploitable and doesn't leave outrageous amount of utility on table. But can we do better? How can we formalize notion of "both agents understand what program equilibrium is, but they predictably end up in Defect-Cooperate situation because one agent is wastly smarter"?
Let's start with toy model. Imagine that you are going to play against PrudentBot or CooperateBot with p, 1−p probability each one. Payoff matrix is 5;5, 10;0, 2;2. Bots can't play with you directly, but you can write program to play. Your goal is to get maximum expected value.
If you cooperate, you are always going to get 5, so you should defect if you are going to get more than 5 in expectation:
Thus, our UncertainBot should take probability distribution, find if probability of encountering PrudentBot is less than 5/8 and defect, otherwise cooperate. The same with mixture of PrudentBot and DefectBot: you are guaranteed to get 2 if you defect, so
Can we invent better version of DefectBot? We can imagine TraitorBot, which takes state of beliefs of UncertainBot and predict if it can get away with defection and otherwise cooperate. Given previous analysis with mixture of PrudentBot and DefectBot, it's clear that TraitorBot defects if probability of PrudentBot is higher than 2/5 and cooperates otherwise, yielding strictly no lower utility than utility of Cooperate;Cooperate.
Such setup provides amazing amount of possibilities to explore.
Possibilities to explore how defection can happen between sufficiently smart agents:
Important theoretical moments:
In perfect ideal development, I would like to have a theory of deception in Prisoner's Dilemma that can show us under which conditions smart agents can get away with defection against less smart agent and whether we can prevent such conditions from emerging in first place.
With randomness, including logical uncertainty, (C,C) is not a privileged point of the Pareto frontier. All points on lines between (C,C) and (C,D), and between (C,C) and (D,C), are fair game. This turns the problem into bargaining over a particular point, with the threat of damaging Pareto efficiency. The points in the convex hull of pure outcomes can be thought of as contracts. If both players agree on the contract of 0.7(C,C)+0.3(C,D), that's still better than a lot of less coordinated alternatives.