Cross-posting some comments from the MIRI Blog:

Konstantin Surkov:

Re: 5/10 problem
I don't get it. Human is obviously (in that regard) an agent reasoning about his actions. Human also will choose 10 without any difficulty. What in human decision making process is not formalizable here? Assuming we agree that 10 is rational choice.

Abram Demski:

Suppose you know that you take the $10. How do you reason about what would happen if you took the $5 instead? It seems easy if you know how to separate yourself from the world, so that you only think of external consequences (getting $5). If you think about yourself as well, then you run into contradictions when you try to imagine the world where you take the $5, because you know it is not the sort of thing you would do. Maybe you have some absurd predictions about what the world would be like if you took the $5; for example, you imagine that you would have to be blind. That's alright, though, because in the end you are taking the $10, so you're doing fine.
Part of the point is that an agent can be in a similar position, except it is taking the $5, knows it is taking the $5, and unable to figure out that it should be taking the $10 instead due to the absurd predictions it makes about what happens when it takes the $10. It seems kind of hard for a human to end up in that situation, but it doesn't seem so hard to get this sort of thing when we write down formal reasoners, particularly when we let them reason about themselves fully (as natural parts of the world) rather than only reasoning about the external world or having pre-programmed divisions (so they reason about themselves in a different way from how they reason about the world).

Sure, one can imagine hypothetically taking $5, even if in reality they would take $10. That's a spurious output from a different algorithm altogether. it assumes the world where you are not the same person who takes $10. So, it would make sense to examine which of the two you are, if you don't yet know that you will take $10, but not if you already know it. Which of the two is it?

Decision Theory

by abramdemski, Scott Garrabrant 1 min read31st Oct 201837 comments


Ω 24

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(A longer text-based version of this post is also available on MIRI's blog here, and the bibliography for the whole sequence can be found here.)

The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.

Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.