x
From Confession to Inhibition: A Temporal Curriculum for Self-Monitoring in Language Models — LessWrong