This is a linkpost for https://robinhaselhorst.com/blog/open-source-lag
Thank you! I always wanted such a site to exist so that we could track Chinese progress as opposed to American one. Additionally, I'd like to add the ability to track the progress of untrustworthy companies like xAI as opposed to that of trustworthy ones.
I like the idea - but would you consider companies like deepseek also untrustworthy or should this be focused on closed-weight companies?
I would spread untrustworthiness to open-weight companies which operate at far lower security standards. And that's ignoring the threat of rogue replication which open-sourced models find easier to enact...
I saw this Twitter post today and really liked the idea. But I think the AA Index is a rather crude way and much prefer ECI from Epoch, which uses IRT. The resulting graph does meaningfully diverge from the Twitter post (which seems to weirdly collapse at the end, maybe because of no logistic assumptions being taken into consideration):
Each open-weight frontier model and how many months earlier a closed model had reached its ECI.
[see linkpost to actually interact with graphs, like seeing what model is what, etc]
For context, the two raw frontiers - the running best ECI over time for open-weight vs closed models:
Running-best ECI over time. Open weights vs closed; the horizontal gap is the lag above.
Sadly, GLM-5.2 has not been scored yet, but I'll update the website when it is.
You can also generalize to other criteria (though this is probably the most interesting one). One such example would be the OpenAI vs Anthropic rivalry:
Each OpenAI frontier model and how many months ahead of (or behind, below zero) Anthropic it was at that ECI.