x
Bypassing Refusal Behavior in Qwen Models via Activation Steering — LessWrong