Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability
TL;DR CoT monitoring works well when models aren't trying to hide their intents, but it's fragile under pressure and highly model‑dependent. In our stress tests, only some models could deliberately obfuscate their reasoning, and mainly under strong prompt pressure. We also find a widening gap between internal and external CoT...
Oct 24, 202518