Eval-suspicion (without full eval-awareness) probably still changes model behavior a lot
Let's say we're evaluating Claude Opus 4.8, and it verbalizes eval-awareness ~21% of the time (Section 6.2.4). But it can reliably (79% of the time) detect an eval transcript if it's placed next to a deployment transcript and you ask which one is real. Which of these numbers should we care about more?
Claim: Even ignoring unverbalized eval-awareness, the ability to pairwise pick out the eval transcript means that there are features the model can pick up on that make the model suspect evaluation. And being even slightly suspicious that you are in an eval might change your behavior a lot.
Intuition: Let's say the model is getting "minor eval vibes~" from some situation. It might gain some benefit from doing the misaligned action, but if the odds of being caught are even 1%, it could shift the cost-benefit analysis a lot. (And much of this reasoning could happen "unconsciously", analogously to humans getting slightly unsettled by walking alone in the dark)