Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

*(A longer text-based version of this post is also available on MIRI's blog* *here, and the bibliography for the whole sequence can be found* *here.)*

*The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.*

*Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.*

Cross-posting some comments from the MIRI Blog:

Konstantin Surkov:

Abram Demski:

Sure, one can imagine hypothetically taking $5, even if in reality they would take $10. That's a spurious output from a different algorithm altogether. it assumes the world where you are not the same person who takes $10. So, it would make sense to examine which of the two you are, if you don't yet know that you will take $10, but not if you already know it. Which of the two is it?