Integrating LLMs with Lean prover scaffolding.
Summary provided by Xixidu:
Hermes introduces an architecture where a language model is wrapped around an external, high-reliability verifier: the Lean4 proof assistant.
Instead of just asking the AI “Does this look right?”, it translates key mathematical steps into Lean4 code and asks a proof engine “Can this be formally proved?” If the autoformalization is accurate and Lean finds a proof, this gives a strong mathematical guarantee for that formalized step.
Steps:
1. Reasoning (LLM): The model proposes the next logical step of a proof in natural language.
2. Translation Module: An autoformalizer converts that step into a Lean4 statement. A back-translation check compares the Lean statement to the original text to ensure they match in meaning.
3. Prover Module: A prover, working inside Lean4, attempts to prove the statement (or sometimes its negation). It returns a signal such as “proved”, “disproved” (negation proved), or “not decided / failed”.
4. Feedback & Memory:
- If the step is proved, it is stored in a Memory Block (a database of verified facts) and can be retrieved to support later reasoning.
- If the step is disproved or cannot be justified, the LLM is informed of this outcome and is prompted to revise its reasoning rather than continuing from a shaky premise.
In this way, Hermes interleaves informal LLM reasoning with formal Lean4 checking, using the proof assistant as an external source of ground truth for crucial steps.