Vestigial reasoning in RL — LessWrong