Exploring the capabilities spike with METR's time horizon data: no clear signal
Fair warning: this is mostly a null result. I tried to figure out what drives the capabilities spike using METR’s time horizon data, and didn’t find much signal. I’m sharing it because it seems good to share null results. Key takeaways * The “capabilities spike” refers to the observation that...