x
SAE features for refusal and sycophancy steering vectors — LessWrong