Why the new generation of “reasoning LLMs” is so exciting, and how the authors trained their model to actually think. (This is a translation of my original post in Russian).
Overview
In recent months, there’s been a growing buzz around “reasoning models,” i.e., large language models (LLMs) that can do more than just continue a text in a plausible style: they solve complex tasks step by step, using an internal chain-of-thought (CoT). The first notable demonstrations of such a technique emerged with OpenAI’s “o1” (though many details remain under wraps). Recently, the DeepSeek team stirred the community by releasing two open-source variants, R1 and R1-Zero, built on top of their own large Mixture-of-Experts (MoE) model, DeepSeek-V3.
I won’t... (read 2102 more words →)