If you know your own actions, why would you reason about taking different actions? Wouldn't you reason about someone who is almost like you, but just different enough to make a different choice?

Showing 3 of 6 replies (Click to show all)
2shminux2yThank you for your explanation! Still trying to understand it. I understand that there is no point examining one's algorithm if you already execute it and see what it does. I don't understand that point. you say "nothing stops you", but that is only possible if you could act contrary to your own algorithm, no? Which makes no sense to me, unless the same algorithm gives different outcomes for different inputs, e.g. "if I simply run the algorithm, I take $10, but if I examine the algorithm before running it and then run it, I take $5". But it doesn't seem like the thing you mean, so I am confused. How can it be possible? if your examination of your algorithm is accurate, it gives the same outcome as mindlessly running it, with is taking $10, no? So your reasoning is inaccurate, in that you arrive to a wrong conclusion about the algorithm output, right? You just don't know where the error lies, or even that there is an error to begin with. But in this case you would arrive to a wrong conclusion about the same algorithm run by a different agent, right? So there is nothing special about it being your own algorithm and not someone else's. If so, the issue is reduced to finding an accurate algorithm analysis tool, for an algorithm that demonstrably halts in a very short time, producing one of the two possible outcomes. This seems to have little to do with decision theory issues, so I am lost as to how this is relevant to the situation. I am clearly missing some of your logic here, but I still have no idea what the missing piece is, unless it's the libertarian free will thing, where one can act contrary to one's programming. Any further help would be greatly appreciated.
4Vladimir_Nesov2yRather there is no point if you are not going to do anything with the results of the examination. It may be useful if you make the decision based on what you observe (about how you make the decision). You can, for a certain value of "can". It won't have happened, of course, but you may still decide to act contrary to how you act, two different outcomes of the same algorithm. The contradiction proves that you didn't face the situation that triggers it in actuality, but the contradiction results precisely from deciding to act contrary to the observed way in which you act, in a situation that a priori could be actual, but is rendered counterlogical as a result of your decision. If instead you affirm the observed action, then there is no contradiction and so it's possible that you have faced the situation in actuality. Thus the "chicken rule", playing chicken with the universe, making the present situation impossible when you don't like it. You don't know that it's inaccurate, you've just run the computation and it said $5. Maybe this didn't actually happen, but you are considering this situation without knowing if it's actual. If you ignore the computation, then why run it? If you run it, you need responses to all possible results, and all possible results except one are not actual, yet you should be ready to respond to them without knowing which is which. So I'm discussing what you might do for the result that says that you take the $5. And in the end, the use you make of the results is by choosing to take the $5 or the $10. This map from predictions to decisions could be anything. It's trivial to write an algorithm that includes such a map. Of course, if the map diagonalizes, then the predictor will fail (won't give a prediction), but the map is your reasoning in these hypothetical situations, and the fact that the map may say anything corresponds to the fact that you may decide anything. The map doesn't have to be identity, decision doesn't have to reflect predic
You can, for a certain value of "can". It won't have happened, of course, but you may still decide to act contrary to how you act, two different outcomes of the same algorithm.

This confuses me even more. You can imagine act contrary to your own algorithm, but the imagining different possible outcomes is a side effect of running the main algorithm that takes $10. It is never the outcome of it. Or an outcome. Since you know you will end up taking $10, I also don't understand the idea of playing chicken with the universe. Are there any re... (read more)

Decision Theory

by abramdemski, Scott Garrabrant 1 min read31st Oct 201837 comments


Ω 24

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(A longer text-based version of this post is also available on MIRI's blog here, and the bibliography for the whole sequence can be found here.)

The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.

Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.