x
How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors? — LessWrong