Where we are on evaluation awareness
TL;DR: Evaluation awareness, when a model recognizes it is being tested rather than deployed, is being reported in frontier models with increasing frequency and has started appearing in external safety reviews such as Apollo's and METR's reviews of Opus 4.6. It is also becoming harder to detect, with the most...
Apr 2522