LESSWRONG
LW

This one is even more surprising than OpenAI's entry (in its details). Since it can now write proofs well automatically (even if it costs a lot and takes a lot of time), in a few months regular reasoning models might get enough training data to reliably understand what proofs are directly, and that's an important basic ingredient for STEM capabilities.

[-]sanxiyn3mo70

I think it is important to note that Gemini 2.5 Pro Capable of Winning Gold at IMO 2025, with good enough scaffolding and prompt engineering.

[-]anaguma3mo30

I wonder if a single model wrote the it's solutions itself, or if it had a more messy CoT like o3 which humans (or another instance of gemini) translated into latex.

Moderation Log