Logical Counterfactuals & the Cooperation Game

byChris_Leong1y14th Aug 201826 comments

17

Ω 2


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Logical counterfactuals (as in Functional Decision Theory) are more about your state of knowledge than the actual physical state of the universe. I will illustrate this with a relatively simple example.

Suppose there are two players in a game where each can choose A or B with the payoffs as follows for the given combinations:

AA: 10, 10

BB: 0, 0

AB or BA: -10, -10

Situation 1:

Suppose you are told that you will make the same decision as the other player. You can quickly conclude that A provides the highest utility.

Situation 2:

Suppose you are told that the other player chooses A. You then reason that A provides the highest utility

Generalised Situation: This situation combines elements of the previous two. Player 1 is an agent that will choose A, although this is not known by Player 2 unless option b) in the next sentence is true. Player 2 is told one of the following:

a) They will inevitably make the same decision as Player 1

b) Player 1 definitely will choose A

If Player 2 is a rational timeless agent, then they will choose A regardless of which one they are told. This means that both agents will choose A, making both a) and b) true statements.

Analysis:

Consider the Generalised Situation, where you are Player 2. Comparing the two cases, we can see that the physical situation is identical, apart from the information you (Player 2) are told. Even the information Player 1 is told is identical. But in one situation we model Player 1's decision as counterfactually varying with yours, while in the other situation, Player 1 is treated as a fixed element of the universe.

On the other hand, if you were told that the other player would choose A and that they would make the same choice as you, then the only choice compatible with that would be to choose A. We could easily end up in all kinds of tangles trying to figure out the logical counterfactuals for this situation. However, the decision problem is really just trivial in this case and the only (non-strict) counterfactual is what actually happened. There is simply no need to attempt to figure out logical counterfactuals given perfect knowledge of a situation.

It is a mistake to focus too much on the world itself as given precisely what happened all (strict) counterfactuals are impossible. The only thing that is possible is what actually happened. This is why we need to focus on your state of knowledge instead.

Resources:

A useful level distinction: A more abstract argument that logical counterfactuals are about mutations of your model rather than an attempt to imagine an external inconsistent universe.

What a reduction of "could" could look like: A conception of "could" in terms of what the agent can prove

Reducing collective rationality to individual optimization in common-payoff games using MCMC: Contains a similar game

17

Ω 2