x
How load-bearing is KL divergence from a known-good base model in modern RL? — LessWrong