Basically, the linkpost argues that the broad reason why benchmark progress is disconnected from broader progress is because AI progressed so fast that they didn't realize their benchmarks would be obsolete soon.

More is in the link above.

Discuss!

Relatedly, another part of the story may be that researchers underestimated the rate of AI progress. For instance, researchers working on autoregressive language models in 2016 tended not to think of AI systems as performing “economically useful tasks”, thinking of these systems as insufficiently capable of doing so. As such, benchmarks may have been deliberately designed to be relatively cheap proxies for real-world tasks, and also relatively simple.

New Comment
Curated and popular this week