Readers who enjoyed this might also like: https://lawrencecpaulson.github.io/2026/01/15/Broken_proofs.html
I agree with everything you said here.
I also think -- and I believe that the Harmonic folks are on the same page here -- that we need a variety of (purely static) techniques for compressing proof-history context about things like, "what tactics have we tried and how did they fail? How is the (potentially very large) proof in which this particular thm lives structured?" etc.
In short, I agree that there is a lot of work to be done around improving ITPs (especially Lean which seems to be the dominant choice for this kind of work) to work well in the kind of RL environments we are discussing :)
That's a good point. Maybe I'm naive, but I think my (very strong) prior is that we are basically locked in to Python and Typescript as the de norm programming languages for normie vibe-coders for the foreseeable future, for a number of reasons including ease of novice human code review, model expertise in these languages, and rapidly proliferating infrastructure and libraries to support vibing in these languages, and thus the challenge is to take such pre-existing code and somehow "bring it in" to the ITP. But you are of course correct that someone could find a way to make it easy and fun for non-FM-pilled engineers to vibe traditional software directly in Lean, and this would simplify things greatly.
Right, those are all kinds of semantic gaps. Absolutely agreed! Mike Dodds is focused/actively working on these specific kinds of questions, if you'd like to chat about them I recommend emailing him directly :)