Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

*(A longer text-based version of this post is also available on MIRI's blog* *here, and the bibliography for the whole sequence can be found* *here.)*

*The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.*

*Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.*

In the matching pennies game, U() would be proven to be ∫A()(p)∗min(p,1−p)dp. A could maximize this by returning ε when p isn't 12, and 1−∫ ε dp (where ε is so small that this is still infinitesimally close to 1) when p is 12.

The linearity is always in the function between ε-adjoined open affine spaces. Whether the utilities also end up linear in the closed affine space (ie nobody cares about our reasoning process) is for the object-level information gathering process to deduce from the environment.

You never prove that you will with certainty decide p=12. You always leave a so-you're-saying-there's-a chance of exploration, which produces a grain of uncertainty. To execute the action, you inspect the ceremonial Boltzmann Bit (which is implemented by being constantly set to "discard the ε"), but which you treat as having an ε chance of flipping.

The self-modification module could note that inspecting that bit is a no-op, see that removing it would make the counterfactual reasoning module crash, and leave up the Chesterton fence.