Yeah, so its like you have this private data, which is an infinite sequence of bits, and if you see all 0's you take an exploration action. I think that by giving the agent these private bits and promising that the bits do not change the rest of the world, you are essentially giving the agent access to a causal counterfactual that you constructed. You don't even have to mix with what the agent actually does, you can explore with every action and ask if it is better to explore and take 5 or explore and take 10. By doing this, you are essentially giving the agent access to a causal counterfactual, because conditioning on these infinitesimals is basically like coming in and changing what the agent does. I think giving the agent a true source of randomness actually does let you implement CDT.

If the environment learns from the other possible worlds, It might punish or reward you in one world for stuff that you do in the other world, so you cant just ask which world is best to figure out what to do.

I agree that that is how you want to think about the matching pennies problem. However the point is that your proposed solution assumed linearity. It didn't empirically observe linearity. You have to be able to tell the difference between the situations in order to know not to assume linearity in the matching pennies problem. The method for telling the difference is how you determine whether or not and in what ways you have logical control over Omega's prediction of you.

Decision Theory

by abramdemski, Scott Garrabrant 1 min read31st Oct 201837 comments


Ω 24

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(A longer text-based version of this post is also available on MIRI's blog here, and the bibliography for the whole sequence can be found here.)

The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.

Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.