x

LESSWRONG

LW

Divyansh Singhvi — LessWrong

Divyansh Singhvi

Divyansh Singhvi

Message

6

1

8mo

Divyansh Singhvi

6

8mo

Asymmetric Risks of Unfaithful Reasoning: Omission as the Critical Failure Mode for AI Monitoring

TLDR: 1. Faithful reasoning is a representation in a comprehensible dimension that is understandable and only contains the actual computation done by the model which causally influenced the output. 2. Omitting of relevant factor that causally influenced the model's output is more dangerous than adding irrelevant factors to the reasoning....