I just found out that METR released an updated version of their time horizons work with extra tasks and different evaluation infrastructure. This was released on 29th Jan and I think has been overshadowed by the Moltbook stuff.
Main points:
Similar overall trend since 2021
50% time horizon doubling time went from 165 days with 1.0 to 131 days with 1.1 over the period since 2023
The top model, Claude 4.5 Opus, has gone from a 4h49 time horizon to 5h20
I just found out that METR released an updated version of their time horizons work with extra tasks and different evaluation infrastructure. This was released on 29th Jan and I think has been overshadowed by the Moltbook stuff.
Main points: