For example, in the 5&10 game an agent would examine its own algorithm, see that it leads to taking $10 and stop there.

Why do even that much if this reasoning could not be used? The question is about the reasoning that could contribute to the decision, that could describe the algorithm, and so has the option to not "stop there". What if you see that your algorithm leads to taking the $10 and instead of stopping there, you take the $5?

Nothing stops you. This is the "chicken rule" and it solves some issues, but more importantly illustrates the possibility in how a decision algorithm can function. The fact that this is a thing is evidence that there may be something wrong with the "stop there" proposal. Specifically, you usually don't know that your reasoning is actual, that it's even logically possible and not part of an impossible counterfactual, but this is not a hopeless hypothetical where nothing matters. Nothing compels you to affirm what you know about your actions or conclusions, this is not a necessity in a decision making algorithm, but different things you do may have an impact on what happens, because the situation may be actual after all, depending on what happens or what you decide, or it may be predicted from within an actual situation and influence what happens there. This motivates learning to reason in and about possibly impossible situations.

What if you examine your algorithm and find that it takes the $5 instead? It could be the same algorithm that takes the $10, but you don't know that, instead you arrive at the $5 conclusion using reasoning that could be impossible, but that you don't know to be impossible, that you haven't decided yet to make impossible. One way to solve the issue is to render the situation where that holds impossible, by contradicting the conclusion with your action, or in some other way. To know when to do that, you should be able to reason about and within such situations that could be impossible, or could be made impossible, including by the decisions made in them. This makes the way you reason in them relevant, even when in the end these situations don't occur, because you don't a priori know that they don't occur.

(The 5-and-10 problem is not specifically about this issue, and explicit reasoning about impossible situations may be avoided, perhaps should be avoided, but my guess is that the crux in this comment thread is about things like usefulness of reasoning from within possibly impossible situations, where even your own knowledge arrived at by pure computation isn't necessarily correct.)

Decision Theory

by abramdemski, Scott Garrabrant 1 min read31st Oct 201837 comments


Ω 24

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(A longer text-based version of this post is also available on MIRI's blog here, and the bibliography for the whole sequence can be found here.)

The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.

Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.