Introduction
Some reasoning steps in a large language model’s chain of thought, such as those involving generating plans or managing uncertainty, have disproportionately more influence on final answer correctness and downstream reasoning than others. Recent work (Macar & Bogdan) has discovered that models containing attention heads which attend solely to important sentences.
Counterfactual importance is a method that uses resampling to determine the importance of a sentence in generating the correct answer. This post extends Thought Anchors and tests if ablating the receiver heads impacts the counterfactual importance of plan generation or uncertainty management.
Methodology
Using the original full text Chain of Thought for the Math Rollouts dataset from problem 6481
I created new rollouts with Deepseek R1... (read 1668 more words →)