You can, for a certain value of "can". It won't have happened, of course, but you may still decide to act contrary to how you act, two different outcomes of the same algorithm.

This confuses me even more. You can imagine act contrary to your own algorithm, but the imagining different possible outcomes is a side effect of running the main algorithm that takes $10. It is never the outcome of it. Or an outcome. Since you know you will end up taking $10, I also don't understand the idea of playing chicken with the universe. Are there any references for it?

You don't know that it's inaccurate, you've just run the computation and it said $5.

Wait, what? We started with the assumption that examining the algorithm, or running it, shows that you will take $10, no? I guess I still don't understand how

What if you see that your algorithm leads to taking the $10 and instead of stopping there, you take the $5?

is even possible, or worth considering.

This map from predictions to decisions could be anything.

Hmm, maybe this is where I miss some of the logic. If the predictions are accurate, the map is bijective. If the predictions are inaccurate, you need a better algorithm analysis tool.

The map doesn't have to be identity, decision doesn't have to reflect prediction, because you may write an algorithm where it's not identity.

To me this screams "get a better algorithm analyzer!" and has nothing to do with whether it's your own algorithm, or someone else's. Can you maybe give an example where one ends up in a situation where there is no obvious algorithm analyzer one can apply?

Decision Theory

by abramdemski, Scott Garrabrant 1 min read31st Oct 201837 comments


Ω 24

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(A longer text-based version of this post is also available on MIRI's blog here, and the bibliography for the whole sequence can be found here.)

The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.

Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.