x
Eval-Awareness Steering detects the Test, Not the Sabotage — LessWrong