A benchmark is a sensor
by Håvard Tveit Ihle and Mathias Bynke
The simple mental picture A simple mental picture we have for an AI capability benchmark is to think of it as a sensor with a certain sensitivity within a certain range of capabilities. The sensitivity of a benchmark, i.e. it's ability to distinguish the capability of different models, is given...
May 836