x
Eval-Detection Dominates the L12 Sandbagging Mechanism in Gemma 2 2B IT — LessWrong