Unsupervised Discovery of Steering Vectors
Fig. 1 Searches for steering vectors are often confirmatory: we find “honesty” or “politeness” vectors only because we know to look for them. This work introduces an unsupervised method to discover interesting, steerable behaviours directly from the activation space. I trained 20 LoRAs on Qwen1.5-14b-instruct, out of which five correspond...
Mar 111