Thank you for your explanation! Still trying to understand it. I understand that there is no point examining one's algorithm if you already execute it and see what it does.

What if you see that your algorithm leads to taking the $10 and instead of stopping there, you take the $5?

I don't understand that point. you say "nothing stops you", but that is only possible if you could act contrary to your own algorithm, no? Which makes no sense to me, unless the same algorithm gives different outcomes for different inputs, e.g. "if I simply run the algorithm, I take $10, but if I examine the algorithm before running it and then run it, I take $5". But it doesn't seem like the thing you mean, so I am confused.

What if you examine your algorithm and find that it takes the $5 instead?

How can it be possible? if your examination of your algorithm is accurate, it gives the same outcome as mindlessly running it, with is taking $10, no?

It could be the same algorithm that takes the $10, but you don't know that, instead you arrive at the $5 conclusion using reasoning that could be impossible, but that you don't know to be impossible, that you haven't decided yet to make impossible.

So your reasoning is inaccurate, in that you arrive to a wrong conclusion about the algorithm output, right? You just don't know where the error lies, or even that there is an error to begin with. But in this case you would arrive to a wrong conclusion about the same algorithm run by a different agent, right? So there is nothing special about it being your own algorithm and not someone else's. If so, the issue is reduced to finding an accurate algorithm analysis tool, for an algorithm that demonstrably halts in a very short time, producing one of the two possible outcomes. This seems to have little to do with decision theory issues, so I am lost as to how this is relevant to the situation.

I am clearly missing some of your logic here, but I still have no idea what the missing piece is, unless it's the libertarian free will thing, where one can act contrary to one's programming. Any further help would be greatly appreciated.

Decision Theory

by abramdemski, Scott Garrabrant 1 min read31st Oct 201837 comments


Ω 24

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(A longer text-based version of this post is also available on MIRI's blog here, and the bibliography for the whole sequence can be found here.)

The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.

Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.