x
Preventing Language Models from hiding their reasoning — LessWrong