Thoughts on counterfactual reasoning

These examples of counterfactuals are presented as equivalent, but they seem meaningfully distinct:

What if the sun suddenly went out?
What if 2+2=3?

Specifically, they don't seem equally difficult for me to evaluate. I can easily imagine the sun going out, but I'm not even sure what it would mean if 2+2=3. It confuses me that these two different examples are presented as equivalent, because they seem to be instances of meaningfully distinct classes of something. I spent some time trying to characterize why the sun example is intuitively easy for me and the math example is intuitively difficult for me. I came up with some ideas, but I won't go into details yet because they seem like the obvious sorts of things that anyone who has read The Sequences (a.k.a., Rationality: A-Z) would have thought of. I strongly suspect there's prior work. It is also possible that I don't fully understand the problem yet.

Questions about counterfactual reasoning

The two counterfactual reasoning examples above (and others) are presented as equivalent, but they seem like they are not.

1. Is this an intentional simplification for the benefit of new readers?

2. If so, can someone point me to the prior work exploring the omitted nuances of counterfactuals? I don't want to re-invent the wheel.

3. If not, would exploration of the characteristics of different kinds of counterfactuals be a fruitful area of research?

Decision Theory

by abramdemski, Scott Garrabrant 1 min read31st Oct 201837 comments

101

Ω 24


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(A longer text-based version of this post is also available on MIRI's blog here, and the bibliography for the whole sequence can be found here.)

The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.

Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.