Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

*(A longer text-based version of this post is also available on MIRI's blog* *here, and the bibliography for the whole sequence can be found* *here.)*

*The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.*

*Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.*

(There was a LaTeX error in my comment, which made it totally illegible. But I think you managed to resolve my confusion anyway).

I see! It's not

provablethat Provable(A=10⇒U=10) implies A=10. It seems like it should be provable, but the obvious argument relies on the assumption that, if *A=10⇒U=0 is provable, then it's not also provable that A=10⇒U=10 - in other words, that the proof system is consistent! Which may betrue, but is notprovable.The asymmetry between 5 and 10 is that, to choose 5, we only need a proof that 5 is optimal, but to choose 10, we need to

notfind a proof that5is optimal. Whichseemseasier than finding a proof that 10 is optimal, but is notprovablyeasier.