Hermes introduces an architecture where a language model is wrapped around an external, high-reliability verifier: the Lean4 proof assistant.
Instead of just asking the AI “Does this look right?”, it translates key mathematical steps into Lean4 code and asks a proof engine “Can this be formally proved?” If the autoformalization is accurate and Lean finds a proof, this gives a strong mathematical guarantee for that formalized step.
Steps:
1. Reasoning (LLM): The model proposes the next logical step of a proof in natural language.
2. Translation Module: An autoformalizer converts that step into a Lean4 statement. A back-translation check compares the Lean statement to the original text to ensure they match in meaning.
3. Prover Module: A prover, working inside Lean4, attempts to prove the statement (or sometimes its negation). It returns a signal such as “proved”, “disproved” (negation proved), or “not decided / failed”.
4. Feedback & Memory:
- If the step is proved, it is stored in a Memory Block (a database of verified facts) and can be retrieved to support later reasoning.
- If the step is disproved or cannot be justified, the LLM is informed of this outcome and is prompted to revise its reasoning rather than continuing from a shaky premise.
In this way, Hermes interleaves informal LLM reasoning with formal Lean4 checking, using the proof assistant as an external source of ground truth for crucial steps.
Integrating LLMs with Lean prover scaffolding.
Summary provided by Xixidu: