While I think it is plausible the results would have been different if the devs had had e.g. 100 hours more experience with cursor, it is worth also noting that:
- 14/16 of the devs rated themselves as 'average' or above cursor users at the end of the study
- The METR staff working on the project thought the devs were qualitatively reasonable cursor users (based on screen recordings etc.)
So I think it is unlikely the devs were using cursor in an unusually unskilled way.
The forecasters were told that only 25% of the devs had prior cursor experience (the actual number ended up being 44%), and still predicted substantial speedup, so if there is a steep cursor learning curve here that seems like a fact people didn't expect.
With that all being said the skill ceiling for using AI tools is clearly at least *not being slowed down* (as they could simply not use the AI tools), so it would be reasonable to expect eventually some level of experience would lead to that result.
(I consulted with METR on the stats in the paper, so am quite familiar with it).
While I think it is plausible the results would have been different if the devs had had e.g. 100 hours more experience with cursor, it is worth also noting that:
- 14/16 of the devs rated themselves as 'average' or above cursor users at the end of the study
- The METR staff working on the project thought the devs were qualitatively reasonable cursor users (based on screen recordings etc.)
So I think it is unlikely the devs were using cursor in an unusually unskilled way.
The forecasters were told that only 25% of the devs had prior cursor experience (the actual number ended up being 44%), and still predicted substantial speedup, so if there is a steep cursor learning curve here that seems like a fact people didn't expect.
With that all being said the skill ceiling for using AI tools is clearly at least *not being slowed down* (as they could simply not use the AI tools), so it would be reasonable to expect eventually some level of experience would lead to that result.
(I consulted with METR on the stats in the paper, so am quite familiar with it).