HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs

Gunnar_Zarncke

8 HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs

by Gunnar_Zarncke

1st Dec 2025

1 min read

0

8

This is a linkpost for https://arxiv.org/abs/2511.18760v1

Integrating LLMs with Lean prover scaffolding.

May be a graphic of text that says "Hermes Agent M #1 Complete the 100v (۷ 4100 80x 40)2 (x Translation Module repeat K, times autoformalizer model invoke he agent ！ Senanan Lean piler #2 import Mathlib Informal proof step backtranslator theorem test Feedback Module 0xx100*y+4100 （×-40）*2+（y-50）2 sorry self-verify the step Formal Lean4 proof step #3 import Mathlib unverified revise your reasoning no Verified by Compiler? goal theorem test Prover Module True proceed with generation yes H +" True repeat up up K, times Lean4 80*x100xy+4100 )*2+(y-50)2 linarith prover model goal 00 Formal Leand proof code α LT Figure 2: Full Hermes framework with illustrative examples."

Summary provided by Xixidu:

Hermes introduces an architecture where a language model is wrapped around an external, high-reliability verifier: the Lean4 proof assistant.
Instead of just asking the AI “Does this look right?”, it translates key mathematical steps into Lean4 code and asks a proof engine “Can this be formally proved?” If the autoformalization is accurate and Lean finds a proof, this gives a strong mathematical guarantee for that formalized step.
Steps:
1. Reasoning (LLM): The model proposes the next logical step of a proof in natural language.
2. Translation Module: An autoformalizer converts that step into a Lean4 statement. A back-translation check compares the Lean statement to the original text to ensure they match in meaning.
3. Prover Module: A prover, working inside Lean4, attempts to prove the statement (or sometimes its negation). It returns a signal such as “proved”, “disproved” (negation proved), or “not decided / failed”.
4. Feedback & Memory:
- If the step is proved, it is stored in a Memory Block (a database of verified facts) and can be retrieved to support later reasoning.
- If the step is disproved or cannot be justified, the LLM is informed of this outcome and is prompted to revise its reasoning rather than continuing from a shaky premise.
In this way, Hermes interleaves informal LLM reasoning with formal Lean4 checking, using the proof assistant as an external source of ground truth for crucial steps.

Formal ProofGuaranteed Safe AILanguage Models (LLMs)AI

Frontpage

8

New Comment

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

8

HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs

8

8

8