Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

*(A longer text-based version of this post is also available on MIRI's blog* *here, and the bibliography for the whole sequence can be found* *here.)*

*The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.*

*Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.*

## Thoughts on counterfactual reasoning

These examples of counterfactuals are presented as equivalent, but they seem meaningfully distinct:

Specifically, they don't seem equally difficult for

meto evaluate. I can easily imagine the sun going out, but I'm not even sure what it would mean if 2+2=3. It confuses me that these two different examples are presented as equivalent, because they seem to be instances of meaningfully distinct classes ofsomething. I spent some time trying to characterize why the sun example is intuitively easy for me and the math example is intuitively difficult for me. I came up with some ideas, but I won't go into details yet because they seem like the obvious sorts of things that anyone who has read The Sequences (a.k.a., Rationality: A-Z) would have thought of. I strongly suspect there's prior work. It is also possible that I don't fully understand the problem yet.## Questions about counterfactual reasoning

The two counterfactual reasoning examples above (and others) are presented as equivalent, but they seem like they are not.

1. Is this an intentional simplification for the benefit of new readers?

2. If so, can someone point me to the prior work exploring the omitted nuances of counterfactuals? I don't want to re-invent the wheel.

3. If not, would exploration of the characteristics of

different kindsof counterfactualsbe a fruitful area of research?