This is a special post for quick takes by gum1h0x. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Most benchmarks face inherent problem of goodhart's law: as soon as they become a target metric, efforts converge on optimizing for the benchmark itself, potentially diverging from the capabilities it was meant to measure.