Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

*(A longer text-based version of this post is also available on MIRI's blog* *here, and the bibliography for the whole sequence can be found* *here.)*

*The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.*

*Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.*

I'm not convinced that an inconsequential grain of uncertainty couldn't handle this 5-10 problem. Consider an agent whose actions are probability distributions on {5,10} that are nowhere 0. We can call these points in the open affine space spanned by the points 5 and 10. U is then a linear function from this affine space to utilities. The agent would search for proofs that U is some particular such linear function. Once it finds one, it uses that linear function to compute the optimal action. To ensure that there is an optimum, we can adjoin infinitesimal values to the possible probabilities and utilities.

If the agent were to find a proof that the linear function is the one induced by mapping 5 to 5 and 10 to 0, it would return (1-ε)⋅5+ε⋅10 and get utility 5+5ε instead of the expected 5-5ε, so Löb's theorem wouldn't make this self-fulfilling.

## It sounds similar to the matrices in the post:

## A solvable Newcomb-like problem