x
Predicting When RL Training Breaks Chain-of-Thought Monitorability — LessWrong