x
Single Direction vs Low-Rank Refusal in Small LLMs — LessWrong