Why Residual Streams Are the Wrong Place to Probe for Safety Signals
Author: David Cappelli (VecP Labs) Epistemic Status: Empirical result cross-validated on 7 model families (3B–14B). Draft assistance by AI; data and code are original. The TL;DR Most safety probing defaults to monitoring the Residual Stream (layer_output). My testing across 7 models suggests this is a mistake. The residual stream acts...
Jan 171