This is a special post for quick takes by Håvard Tveit Ihle. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Mercury 2, the new diffusion language model from Inception labs scored 43.2% on WeirdML (my benchmark). For reference this is comparable to o1 or opus-4. It is at the cost/accuracy frontier (and of course much much faster than comparable models), although still far behind the top models (gpt-5.3 at 79.3%).
I had expected this model to do much worse (as most models you hear about from non-frontier labs), but this is a very solid result.
Is anyone paying attention to this?
Did anyone else try it?
How significant is this?
Links: https://x.com/htihle/status/2030987979758416023?s=20 https://htihle.github.io/weirdml.html