Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

[Epistemic status, can't spot a mistake, but am not confidant that there isn't one, if you find anything please say so. Posting largely because the community value of a new good idea is larger than any harm that might be caused by a flawed proof. ]

Suppose you have an automatic proof checker. Its connected to a source of statements that are sometimes correct proofs, and sometimes not. The proof checker wants to reject all false proofs, and accept as many of the true proofs as possible. It also wants to be able to update its own proof framework.

Define to be a set of statements in a particular formalism, say those that are grammatically well defined in PA. Let be any sequence from some alphabet of symbols. Let

and be the set of programs that have the property that

In other words, is the set of all programs that never prove false statements. We should never leave or need to talk about any program not in it.

For write to mean . Ie proves

A simple setup would consist of a starting program and a rule that says,

If then you can add to your list of trusted provers. If proves the soundness of , then you can accept any statement when given a proof of it in .

The lobian obstacle is that must be strictly weaker than , in that can prove any statement that can, but can prove the soundness of and can't prove its own soundness. This means that each trusted prover has to be strictly weaker than the one that generated it. You could start with PA+3^^^3 and say that a few weakenings aren't a problem, but that isn't an elegant solution.

Note: You can't get around this problem using cycles. Suppose

This would imply

So any cycle could be shrunk by 1, and inductively, shrunk to a self trusting system.

I propose instead that you use the rule.

If then accept any future proofs that are accepted by , and give all rights given to , including taking to mean that is accepted, and so on recursively. If is sound, then so anything that is proven by can be proven by . If isn't sound, it can prove anything anyway.

Note that if is a straightforward isomorphism of then they are easily proven equivalent. However, if says "run Turing machine for steps, if it doesn't halt, check if is a valid proof of , if does halt, return " then it could be equivalent to from a second order logic perspective, but can't prove that never halts.

Still, this rule allows us to prove anything provable in and only things provable in , while also allowing the user to add shorthands and semantic sugar.

Note that the "proof" could just be the number of processor cycles you want to run a proof search for before giving up. In fact, this framework lets you swap and mix between hand proved results, and automatically proved results (with search time cut off) as you see fit.

This formalism allows a any system containing a proof checker to automatically upgrade itself to a version that has the same Godelian equivalence class, but is more suited to the hardware available.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 12:54 PM

I've spotted a mistake, I think: the relationship inside the first prover should be an if and only if relationship. This is because if says everything is a theorem, then trivially holds. Thus, the condition to give "proof privileges" should be . There might still be some problems from the modal logic, I'll check when I have some more time.

I can't believe that I was the first one to spot this. The post also has very few upvotes. Did nobody that knows this stuff see this and spot the mistake immediately? Was the post dismissed to start with? (in my opinion unfairly, but perhaps not).

Let f map each prover p1 to one adding (at least) the rule of inference of "If _(p1) proves that _(p1) proves all that p2 does, then f(p1) proves all that p2 does."

It is unclear which blanks are filled with f and which with the identity function to match your proposal. The last f must be there because we can only have the new prover prove additional things. If all blanks are filled with f, f(p1) is inconsistent by Löb's theorem and taking p2 to be inconsistent.

Both blanks are the identity function.

Here is some psudo code

class Prover:

____def new(self):

________self.ps=[PA]

____def prove(self, p, s, b):

________assert p in self.ps

________return p(s,b)

____def upgrade(self, p1, p2, b):

________if self.prove(p1,"forall s:(exists b2: p2(s,b2))=> (exists b1: p2(s,b1))", b)

____________self.ps.append(p2)

prover=Prover()

prover.upgrade(PA, nPA, proof)

Where PA is a specific peano arithmatic proof checker. nPA is another proof checker. and 'proof' is a proof that anything nPA can prove, PA can prove too.

PA+1 can already provide this workflow: Given that nPA proves s and that PA proves all that nPA does, we can get that PA can prove s, and then use the +1 to prove s. And then nnPA can still be handled by PA+1.