Asymmetric Risks of Unfaithful Reasoning: Omission as the Critical Failure Mode for AI Monitoring
TLDR: 1. Faithful reasoning is a representation in a comprehensible dimension that is understandable and only contains the actual computation done by the model which causally influenced the output. 2. Omitting of relevant factor that causally influenced the model's output is more dangerous than adding irrelevant factors to the reasoning....
Feb 267