x
(Mis)generalization of Helpful-Only Fine-tuning — LessWrong