I think I don't understand the Löb's theorem example.

If is provable, then , so it is true (because the statement about is vacuously true). Hence by Löb's theorem, it's provable, so we get .

If is provable, then it's true, for the dual reason. So by Löb, it's provable, so .

The broader point about being unable to reason yourself out of a bad decision if your prior for your own decisions doesn't contain a "grain of truth" makes sense, but it's not clear we can show that the agent in this example will definitely get stuck on the bad decision - if anything, the above argument seems to show that the system has to be inconsistent! If that's true, I would guess that the source of this inconsistency is assuming the agent has sufficient reflective capacity to prove "If I can prove , then . Which would suggest learning the lesson that it's hard for agents to reason about their own behaviour with logical consistency.

4Gurkenglas9moThe agent has been constructed such that Provable("5 is the best possible action") implies that 5 is the best (only!) possible action. Then by Löb's theorem, 5 is the only possible action. It cannot also be simultaneously constructed such that Provable("10 is the best possible action") implies that 10 is the only possible action, because then it would also follow that 10 is the only possible action. That's not just our proof system being inconsistent, that's false!

(There was a LaTeX error in my comment, which made it totally illegible. But I think you managed to resolve my confusion anyway).

I see! It's not provable that Provable() implies . It seems like it should be provable, but the obvious argument relies on the assumption that, if * is provable, then it's not also provable that - in other words, that the proof system is consistent! Which may be true, but is not provable.

The asymmetry between 5 and 10 is that, to choose 5, we only need a proof that 5 is optimal, but to choose 10, w

... (read more)

Decision Theory

by abramdemski, Scott Garrabrant 1 min read31st Oct 201837 comments

101

Ω 24


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(A longer text-based version of this post is also available on MIRI's blog here, and the bibliography for the whole sequence can be found here.)

The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.

Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.