Emotion and authorization steering both move cheat; trained-probe suppression doesn't undo it: a mechanistic study in Gemma-2-2B
> Emotion steering moves cheat. Suppressing the probe that predicts cheat doesn't move it back. Epistemic status: an independent mechanistic follow-up to Sofroniew et al. (2026), run on a single 2B model (Gemma-2-2B) on a personal workstation. What I'd defend: (1) emotion-direction steering moves cheat substantially — calm+confident swings it...
Jun 171