What Happens When You Try to Merge AR Reasoning Into Diffusion Models
Diffusion language models are the rising topic of discussion lately. Models like LLaDA, Dream 7B, Mercury, and SDAR are as good as autoregressive models on standard benchmarks while offering 2-4x faster inference through parallel token generation. For a deeper dive into how diffusion LMs work, see our previous post. SDAR...
Jan 161