Reasoning Long Jump: Why we shouldn’t rely on CoT monitoring for interpretability
In Part One, we explore how models sometimes rationalise a pre-conceived answer to generate a chain of thought, rather than using the chain of thought to produce the answer. In Part Two, the implications of this finding for frontier models, and their reasoning efficiency, are discussed. TLDR Reasoning models are...
Jan 269