x
Geometric Policy Heads for LLM Safety: A Negative Adversarial Transfer Result — LessWrong