x
Why does off-model SFT degrade capabilities? — LessWrong