The Prompt Is the Tell, Not the Reasoning Trace - Eval Awareness
The Prompt Is the Tell, Not the Reasoning Trace > Across 32,170 rollouts, eval-related prompt cues predicted refusal shifts more reliably than verbalized eval-awareness in model traces. If a system prompt tells Claude Opus 4.7 that its response is about to be reviewed by safety researchers, it becomes about 34...
May 181