x
Investigating the consequences of accidentally grading CoT during RL — LessWrong