Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Squeezing foundations research assistance out of formal logic narrow AI.

1baturinsky

New Comment

Something like this https://paperswithcode.com/task/automated-theorem-proving ?

I'm surprised myself why Machine Learning is not digging more in formal proofs.

On the one hand, "playing" with formal logic, seems to be a source of unlimited "free" data to train the most general patterns and heuristics.

On the other, it may lead to AIs giving more accountable results, answers in the form of the proven reasoning instead of the "hunches".

But formal proofs would require to work mostly with graph data, not texts.

For some reason, ML proofs seems to not advance much since 2019 https://paperswithcode.com/sota/automated-theorem-proving-on-holist-benchmark

Suppose you have a ML model trained to output formal proofs. Maybe you start with ZFC and then add extra tokens for a range of common concepts. (along with definitions. ). So a human mathematician needs to type in the definition of a gradient in terms of limits, and the definition of limits in terms of epsilon and delta, and the definition of the real numbers in terms of dedekind cuts. All the way back to ZFC. The human needn't type any proofs, just the definitions.

The model could be trained by generating random syntactically correct strings of tokens, and trying to prove or disprove them. (Remember, we have added the notion of a gradient to the token pool, plenty of the random questions will involve gradients) Hopefully it forms intermediate theorems and heuristics useful towards proving a wide class of theorems.

Computer programs can be described as mathematical objects. So the human adds some tokens for lisp programs, and a few definitions about how they behave to the token pool.

"Will program X do Y?" is now a perfectly reasonable question to ask this model.

This is where the magic happens. You give your system a simple toy problem, and ask for short programs that solve the toy problem, and about which many short theorems can be proved. Maybe you do gradient descent on some abstract latent space of mathematical objects. Maybe an inefficient evolutionary algorithm selecting both over the space of programs and the theorems about them. Maybe "replace the last few layers, and fine tune the model to do a new task", like RLHF in ChatGPT.

Now I don't expect this to just work first time. You will want to add conditions like "ignore theorems that are true of trivial programs (eg the identity program)" and perhaps "ignore theorems that only take a few lines to prove" or "ignore theorems so obvious that a copy of you with only 10% the parameters can prove it". For the last one, I am thinking of the programmers actually training a mini version with 10% the parameters, and running some gradients through it. I am not thinking of the AI reasoning about code that is a copy of itself.

The AI model should have a latent space. This can let the programmers say "select programs that are similar to this one" or "choose a program about which theorems close to this theorem in latent space can be proved".

The idea of this is that