Eval Hacking: A new frontier in AI eval with a unified taxonomy
TLDR: We present a simple, practical taxonomy for eval hacking (unfaithful eval results) that clean up a nest of many existing terms (specification/task gaming, reward/proxy hacking, benchmaxxing...) and offers a hollistic framework for this emerging frontier. Motivation When we talk about AI eval, there are many messy terms flying around...