Håvard Tveit Ihle's Shortform

Håvard Tveit Ihle

This is a special post for quick takes by Håvard Tveit Ihle. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Mercury 2, the new diffusion language model from Inception labs scored 43.2% on WeirdML (my benchmark). For reference this is comparable to o1 or opus-4. It is at the cost/accuracy frontier (and of course much much faster than comparable models), although still far behind the top models (gpt-5.3 at 79.3%).

I had expected this model to do much worse (as most models you hear about from non-frontier labs), but this is a very solid result.

Is anyone paying attention to this?

Did anyone else try it?

How significant is this?

Links: https://x.com/htihle/status/2030987979758416023?s=20 https://htihle.github.io/weirdml.html

I had expected this model to do much worse (as most models you hear about from non-frontier labs), but this is a very solid result.

Is anyone paying attention to this?

Did anyone else try it?

How significant is this?

Links: https://x.com/htihle/status/2030987979758416023?s=20 https://htihle.github.io/weirdml.html