Thanks to Sylvester Kollin and Nicolas Macé for fruitful discussions, as well as for benevolently teaching me some of the maths/game theory I used (mainly in the Appendix).
Thanks to Caspar Oesterheld, Johannes Treutlein, Lukas Gloor, Matīss Apinis, and Antonin Broi for very helpful feedback, suggestions, and discussions. Credits to Johannes Treutlein and Oscar Delaney for spotting a few crucial math and/or notation errors in earlier drafts.
Most of the work put into this post has been funded by CERI (now ERA) through their summer research fellowship. I’ve also benefited quite a lot from being welcome to work from the office of the Center on Long-Term Risk. I’m grateful to those two organizations, to their respective teams, as well as to all their summer research fellows with whom I had a very nice and productive time.
All assumptions/claims are my own. No organization or individual other than me is responsible for my potential inaccuracies, mistakes, or omissions.
It has been argued that, if two very similar agents follow decision theories allowing for superrationality (e.g., EDT and FDT), they would cooperate in a prisoner’s dilemma (PD) (see e.g., Oesterheld 2017). But how similar do they need to be exactly? In what way? This post is an attempt at addressing these questions. This is, I believe, particularly relevant to the work of the Center on Long-Term Risk on acausal reasoning and the foundations of rational agency (see section 7 of their research agenda).
I’d be very interested in critics/comments/feedback. This is the main reason why I’m posting this here. :)
Consider this traditional PD between two agents:
|C||3, 3||0, 5|
|D||5, 0||1, 1|
We can compute the expected payoffs of Alice and Bob (and ) as a function of (the probability that Alice plays C) and (the probability that Bob plays C):
Now, Alice wants to find (the optimal , i.e., the p that will maximize her payoff). Symmetrically, Bob wants to find . They do some quick math and find that =0, i.e., they should both play D. This is the unique Nash equilibrium of this game.
Now, say Alice and Bob are perfect copies. How does it change the game presented above? We still have the same payoffs:
However, this time, . Whatever one does, that’s evidence that the other does the exact same. They are decision-entangled.
What does that mean for the payoff functions of Alice and Bob? Well, decision theorists disagree. Let’s see what the two most popular decision theories (CDT and EDT) say, according to my (naive?) understanding:
- EDT: “Alice should substitute q for p and her formula. Symmetrically, Bob should do the exact opposite in his”.
- CDT: “Alice should hold q fixed. Same for Bob and p. They should behave as if they could change their action unilaterally through some kind of magic.” Therefore, CDT computes the dominant strategy from the original payoffs, ignoring the fact that p=q.
For CDT, just like in the normal PD above. For EDT, however, we now get (Alice and Bob should both cooperate). EDT is one of the decision theories that allow for superrationality: cooperation via entangled decision-making (Hofstadter 1983), or basically “factoring in the possibility that ”, as I understand it. So the difference between the Normal PD and the Perfect-copy PD matters only if both players have at least some credence in superrationality.
Formalizing the Conditions for Superrationality-motivated Cooperation in a one-shot PD
Given the above, we can hypothesize that Alice will (superrationally) cooperate with Bob in a one-shot PD iff:
- She has a significant credence in the possibility that they’re playing a Perfect-copy PD – as opposed to a Normal PD – (i.e., that they are decision-entangled), and
- She has a significant credence in superrationality, such that she takes into account this decision-entanglement when she does the math. (This is assuming Alice might have decision-theoretic uncertainty.)
We then get those two neat conditions for cooperation:
- Significant credence in decision-entanglement
- Significant credence in superrationality
But what makes two agents decision-entangled?
Conditions for decision-entanglement
How does/should Alice form her credence in decision-entanglement? What are the required elements for two agents to have entangled decisions in a particular game?
First of all, you obviously need them to have compatible decision theories (DTs). Here’s (I think) a somehow representative instance of what happens if you don’t:
Now, replace Hilary with some EDT players, such that the compatible DTs condition is met. Does that mean the players have entangled decisions? No! Here’s an example proving that this doesn’t suffice:
Although they both follow EDT, their beliefs regarding decision-entanglement diverge. In addition to “I believe we have compatible DTs”, Arif thinks there are other requirements that are not met, here.
To identify what those requirements are, it is important to clarify what outputs the players’ beliefs: their epistemic algorithms (which themselves take some pieces of evidence as inputs).
It then becomes clear what the requirements are besides “I believe we have compatible DTs” for Arif to believe there is decision-entanglement:
- “I believe we have entangled epistemic algorithms (or that there is epistemic-entanglement, for short)”, and
- “I believe we have been exposed to compatible pieces of evidence”.
Since rational Arif doesn’t believe he’s decision-entangled with John, that means he must think that at least one of the two latter statements is false.
Now, what is the evidence John and Arif should be looking for?
First, they want to compare their DTs to see if they’re compatible, as well as their epistemics to see if they’re entangled.
Then, if they have compatible DTs and entangled epistemics, they also need common knowledge of that fact, which means that they need to somehow check whether they have been exposed to compatible evidence regarding those two things, and to check that they have been exposed to compatible evidence regarding their exposure to evidence, and so on ad infinitum. If they don’t verify all of this, they would end up with non-entangled beliefs and non-entangled decisions.
So here is how, I tentatively think, one-shot-PD players should reason:
Recall our conditions for superrationality-motivated cooperation in a one-shot PD:
- Significant credence in decision-entanglement
- Significant credence in superrationality
Assuming God doesn’t tell Alice whether her counterpart is decision-entangled with her, Alice would have a significant credence regarding #1 iff:
- Significant credence in compatible DTs
- Significant credence in epistemic-entanglement
- Significant credence in the possibility that they have been exposed to some compatible pieces of evidence
Therefore, (again, assuming God doesn’t tell her whether her counterpart is decision-entangled with her) Alice would cooperate iff she has:
1. Significant credence in decision-entanglement
1.1 Significant credence in compatible DTs
1.2 Significant credence in epistemic-entanglement
1.3 Significant credence in the possibility that they have been exposed to some compatible pieces of evidence
2. Significant credence in superrationality
Remaining open questions
- In our Normal PD and Perfect-copy PD games, we took two extreme examples where the credences were maximally low and maximally high, respectively. But what if Alice has uncertain beliefs when it comes to these conditions? What should she do?
- For what it’s worth, the Appendix addresses the case where Alice is uncertain about #1 (without specifying credences about 1.1, 1.2, 1.3, though).
- Alice now knows (thanks to me; you’re welcome Alice) that, in order to estimate the probability that she’s decision-entangled with Bob, she should factor the probability of i) Bob also being superrational, ii) Bob and she being epistemic-entangled, and iii) Bob and she having been exposed to compatible pieces of evidence. Coming up with a credence regarding i) doesn’t seem insuperable. The distinction between DTs that allow for superrationality and those that don’t is pretty clear. Coming up with a credence regarding ii) and iii), however, seems much more challenging. How would she do that? Where should she even look at? What about the infinite recursion when looking for relevant pieces of evidence?
Appendix: What if Alice is uncertain whether she and Bob are decision-entangled?
A few clarifications on this notion of decision-entanglement and my use of it:
- I am, here, assuming that the presence of decision-entanglement is an objective fact about the world, i.e., that there is something that does (or doesn’t) make the decisions of two agents entangled, and that it is not up to our interpretation (this doesn’t mean that decision-entanglement doesn’t heavily rely on the subjectivity of the two agents). This assumption is non-obvious and controversial. However, I am using this “entanglement realist” framework all along the post, and think the takeaways would be the same if I was adopting an “anti-realist” view. This is the reason why I don’t wanna bother thinking too much about this “entanglement (anti-)realism” thing. It doesn’t seem useful. Nonetheless, please let me know if you think my framework leads me to conclusions that are peculiar to it, such that they would be more questionable.
- Note that, although we took an example with perfect copies here, two agents do not need to have entangled decisions in absolutely every possible situation, in order to be (relevantly) decision-entangled. We only care about the decision they make in the PD presented here, so they could as well be imperfect copies and make unentangled decisions in other situations.
- Unless specified otherwise, I assume decision-entanglement with regard to one decision to be something binary (on a given problem, the decisions of two agents are entangled or they aren’t; no in between), for the sake of simplicity.
As demonstrated in the Appendix, what "significant" exactly means depends on the payoffs of the game. This applies to every time I use that term in this post.
By “compatible”, I mostly mean something like “similar”, although it’s sort of arbitrary what counts as “similar” or not (e.g., Alice and Bob could have two DTs that seem widely different from our perspective, although they’re compatible in the sense that they both allow or don’t allow for superrationality).
Thanks to Sylvester Kollin for suggesting to me to clearly differentiate between decision and epistemic algorithms in such games.
John and Arif are epistemically entangled iff 1) in the particular situation they’re in, their epistemic algos output similar results, given similar inputs, and 2) in the particular situation they’re in, they can’t unilaterally modify their epistemic algos.
Thanks to Caspar Oesterheld for informing me that the infinite recursion I was gesturing at was known as “common knowledge” in game theory.
Nice, I think I followed this post (though how this fits in with questions that matter is mainly only clear to me from earlier discussions).
I think something can't be both neat and so vague as to use a word like 'significant'.
In the EDT section of Perfect-copy PD, you replace some p's with q's and vice versa, but not all, is there a principled reason for this? Maybe it is just a mistake and it should be U_Alice(p)=4p-pp-p+1=1+3p-p^2 and U_Bob(q) = 4q-qq-q+1 = 1+3q-q^2.
I am unconvinced of the utility of the concept of compatible decision theories. In my mind I am just thinking of it as 'entanglement can only happen if both players use decisions that allow for superrationality'. I am worried your framing would imply that two CDT players are entangled, when I think they are not, they just happen to both always defect.
Also, if decision-entanglement is an objective feature of the world, then I would think it shouldn't depend on what decision theory I personally hold. I could be CDTer who happens to have a perfect copy and so be decision-entangeled, while still refusing to believe in superrationality.
Sorry I don't have any helpful high-level comments, I think I don't understand the general thrust of the research agenda well enough to know what next directions are useful.
Thanks a lot for these comments, Oscar! :)
I forgot to copy-paste a footnote clarifying that "as made explicit in the Appendix, what "significant" exactly means depends on the payoffs of the game"! Fixed. I agree this is vague, still, although I guess it has to be since the payoffs are unspecified?
Also a copy-pasting mistake. Thanks for catching it! :)
This may be an unimportant detail, but -- interestingly -- I opted for this concept of "compatible DT" precisely because I wanted to imply that two CDT players may be decision-entangled! Say CDT-agent David plays a PD against a perfect copy of himself. Their decisions to defect are entangled, right? Whatever David does, his copy does the same (although David sort of "ignores" that when he makes his decision). David is very unlikely to be decision-entangled with any random CDT agent, however (in that case, the mutual defection is just a "coincidence" and is not due to some dependence between their respective reasoning/choices). I didn't mean the concept of "decision-entanglement" to pre-assume superrationality. I want CDT-David to agree/admit that he is decision-entangled with his perfect copy. Nonetheless, since he doesn't buy superrationality, I know that he won't factor the decision-entanglement into his expected value optimization (he won't "factor in the possibility that p=q".) That's why you need significant credence in both decision-entanglement and superrationality to get cooperation, here. :)
Agreed, but if you're CDTer, you can't be decision-entangled with an EDTer, right? Say you're both told you're decision-entangled. What happens? Well, you don't care so you still defect while EDTer cooperates. Different decisions. So... you two weren't entangled after all. The person who told you you were was mistaken.
So yes, decision-entanglement can't depend on your DT per see, but doesn't it have to depend on its "compatibility" with the other's for there to be any dependence between your algos/choices? How could a CDTer and an EDTer be decision-entangled in a PD?
Not very confident about my answers. Feel free to object. :) And thanks for making me rethink my assumptions/definitions!