Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

*(A longer text-based version of this post is also available on MIRI's blog* *here, and the bibliography for the whole sequence can be found* *here.)*

*The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.*

*Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.*

But how do you avoid proving with certainty that p=1/2?

Since your proposal does not say what to do if you find inconsistent proofs that the linear function is two different things, I will assume that if it finds multiple different proofs, it defaults to 5 for the following.

Here is another example:

You are in a 5 and 10 problem. You have twin that is also in a 5 and 10 problem. You have exactly the same source code. There is a consistency checker, and if you and your twin do different things, you both get 0 utility.

You can prove that you and your twin do the same thing. Thus you can prove that the function is 5+5p. You can also prove that your twin takes 5 by Lob's theorem. (You can also prove that you take 5 by Lob's theorem, but you ignore that proof, since "there is always a chance") Thus, you can prove that the function is 5-5p. Your system doesn't know what to do with two functions, so it defaults to 5. (If it is provable that you both take 5, you both take 5, completing the proof by Lob.)

I am doing the same thing as before, but because I put it outside of the agent, it does not get flagged with the "there is always a chance" module. This is trying to illustrate that your proposal takes advantage of a separation between the agent and the environment that was snuck in, and could be done incorrectly.

Two possible fixes:

1) You could say that the agent, instead of taking 5 when finding inconsistency takes some action that exhibits the inconsistency (something that the two functions give different values). This is very similar to the chicken rule, and if you add something like this, you don't really need the rest of your system. If you take an agent that whenever it proves it does something, it does something else. This agent will prove (given enough time) that if it takes 5 it gets 5, and if it takes 10 it gets 10.

2) I had one proof system, and just ignored the proofs that I found that I did a thing. I could instead give the agent a special proof system that is incapable of proving what it does, but how do you do that? Chicken rule seems like the place to start.

One problem with the chicken rule is that it was developed in a system that was deductively closed, so you can't prove something that passes though a proof of P without proving P. If you violate this, by having a random theorem prover, you might have an system that fails to prove "I take 5" but proves "I take 5 and 1+1=2" and uses this to complete the Lob loop.