Unfaithful chain-of-thought as nudged reasoning
This piece is based on work conducted during MATS 8.0 and is part of a broader aim of interpreting chain-of-thought in reasoning models. tl;dr * Research on chain-of-thought (CoT) unfaithfulness shows how models’ CoTs may omit information that is relevant to their final decision. * Here, we sketch hypotheses for...