Power Steering: Behavior Steering via Layer-to-Layer Jacobian Singular Vectors
cross-posted from my blog TLDR The map of how the activations of one ‘source’ layer in an LLM impact the activations in some later ‘target’ layer can provide vectors for steering LLM behavior. Computing this map, or the Jacobian, is costly but the top high rank components can be determined...
Mar 132