If you can generate obfuscated chain-of-thought, can you monitor it?
tldr: Chain-of-thought (CoT) monitoring is a proposed safety mechanism for overseeing AI systems, but its effectiveness against deliberately obfuscated reasoning remains unproven. "Obfuscated" reasoning refers to reasoning which would not be understood by a human, but which contains information that a model could understand in principle, for example model-encrypted reasoning....