Eval-Awareness Steering detects the Test, Not the Sabotage
Produced as part of independent research Huge thanks to Apollo Research (org) for open-sourcing the deception-detection harness which proved to be foundational in this work. Prior work by Devbunova (2026), the Apollo/Goldowsky-Dill probing line, and Tice et al. on noise injection shaped the design throughout. Summary I test whether the...
Jun 252