Exploratory: a steering vector in Gemma-2-2B-IT boosts context fidelity on subtraction, goes manic on addition
First LessWrong post / early mech-interp experiment. I’m a software engineer entering this field; feedback on methodology and framing is very welcome. I started this as a hunt for a vector on paltering (deception using technically true statements), motivated by the Machine Bullshit paper and prior work on activation steering....