How far do open weights trail the frontier?

RobinHa

20 How far do open weights trail the frontier?

by RobinHa

18th Jun 2026

1 min read

3

20

This is a linkpost for https://robinhaselhorst.com/blog/open-source-lag

I saw this Twitter post today and really liked the idea. But I think the AA Index is a rather crude way and much prefer ECI from Epoch, which uses IRT. The resulting graph does meaningfully diverge from the Twitter post (which seems to weirdly collapse at the end, maybe because of no logistic assumptions being taken into consideration):

*Each open-weight frontier model and how many months earlier a closed model had reached its ECI.*

[see linkpost to actually interact with graphs, like seeing what model is what, etc]

For context, the two raw frontiers - the running best ECI over time for open-weight vs closed models:

*Running-best ECI over time.* *Open weights* vs *closed; the horizontal gap is the lag above.*

Sadly, GLM-5.2 has not been scored yet, but I'll update the website when it is.

You can also generalize to other criteria (though this is probably the most interesting one). One such example would be the OpenAI vs Anthropic rivalry:

*Each* *OpenAI* *frontier model and how many months ahead of (or behind, below zero)* *Anthropic* *it was at that ECI.*

AI

Frontpage

20

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:45 PM

[-]StanislavKrym9h30

Thank you! I always wanted such a site to exist so that we could track Chinese progress as opposed to American one. Additionally, I'd like to add the ability to track the progress of untrustworthy companies like xAI as opposed to that of trustworthy ones.

Reply

[-]RobinHa9h10

I like the idea - but would you consider companies like deepseek also untrustworthy or should this be focused on closed-weight companies?

Reply

[-]StanislavKrym8h0-2

I would spread untrustworthiness to open-weight companies which operate at far lower security standards. And that's ignoring the threat of rogue replication which open-sourced models find easier to enact...

Reply

Moderation Log