Allowing a formal proof system to self improve while avoiding Lobian obstacles.

byDonald Hobson3mo23rd Jan 20194 comments

6

Ω 2


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

[Epistemic status, can't spot a mistake, but am not confidant that there isn't one, if you find anything please say so. Posting largely because the community value of a new good idea is larger than any harm that might be caused by a flawed proof. ]

Suppose you have an automatic proof checker. Its connected to a source of statements that are sometimes correct proofs, and sometimes not. The proof checker wants to reject all false proofs, and accept as many of the true proofs as possible. It also wants to be able to update its own proof framework.

Define to be a set of statements in a particular formalism, say those that are grammatically well defined in PA. Let be any sequence from some alphabet of symbols. Let

and be the set of programs that have the property that

In other words, is the set of all programs that never prove false statements. We should never leave or need to talk about any program not in it.

For write to mean . Ie proves

A simple setup would consist of a starting program and a rule that says,

If then you can add to your list of trusted provers. If proves the soundness of , then you can accept any statement when given a proof of it in .

The lobian obstacle is that must be strictly weaker than , in that can prove any statement that can, but can prove the soundness of and can't prove its own soundness. This means that each trusted prover has to be strictly weaker than the one that generated it. You could start with PA+3^^^3 and say that a few weakenings aren't a problem, but that isn't an elegant solution.

Note: You can't get around this problem using cycles. Suppose

This would imply

So any cycle could be shrunk by 1, and inductively, shrunk to a self trusting system.

I propose instead that you use the rule.

If then accept any future proofs that are accepted by , and give all rights given to , including taking to mean that is accepted, and so on recursively. If is sound, then so anything that is proven by can be proven by . If isn't sound, it can prove anything anyway.

Note that if is a straightforward isomorphism of then they are easily proven equivalent. However, if says "run Turing machine for steps, if it doesn't halt, check if is a valid proof of , if does halt, return " then it could be equivalent to from a second order logic perspective, but can't prove that never halts.

Still, this rule allows us to prove anything provable in and only things provable in , while also allowing the user to add shorthands and semantic sugar.

Note that the "proof" could just be the number of processor cycles you want to run a proof search for before giving up. In fact, this framework lets you swap and mix between hand proved results, and automatically proved results (with search time cut off) as you see fit.

This formalism allows a any system containing a proof checker to automatically upgrade itself to a version that has the same Godelian equivalence class, but is more suited to the hardware available.