x
Exploring Reinforcement Learning Effects on Chain-of-Thought Legibility — LessWrong