Realmbird

Message

7mo

Exploration of Counterfactual Importance and Attention Heads

Introduction Some reasoning steps in a large language model’s chain of thought, such as those involving generating plans or managing uncertainty, have disproportionately more influence on final answer correctness and downstream reasoning than others. Recent work (Macar & Bogdan) has discovered that models containing attention heads which attend solely to...

Sep 30, 2025•12

Message

11 karma

1 post

1 comment

Member for 7 months

Realmbird — LessWrong

Realmbird

Message

7mo

Realmbird

Exploration of Counterfactual Importance and Attention Heads

Sep 30, 2025•12

Message

11 karma

1 post

1 comment

Member for 7 months

Exploration of Counterfactual Importance and Attention Heads

Realmbird

4mo

Introduction

Some reasoning steps in a large language model’s chain of thought, such as those involving generating plans or managing uncertainty, have disproportionately more influence on final answer correctness and downstream reasoning than others. Recent work (Macar & Bogdan) has discovered that models containing attention heads which attend solely to important sentences.

Counterfactual importance is a method that uses resampling to determine the importance of a sentence in generating the correct answer. This post extends Thought Anchors and tests if ablating the receiver heads impacts the counterfactual importance of plan generation or uncertainty management. ^[1]

Methodology

Using the original full text Chain of Thought for the Math Rollouts dataset from problem 6481

I created new rollouts with Deepseek R1... (read 1668 more words →)

Replying toThought Anchors: Which LLM Reasoning Steps Matter?

Realmbird6mo*

Thought Anchors: Which LLM Reasoning Steps Matter?

How did you learn that vertical attention corresponded to sentences?

LESSWRONG
LW

LESSWRONG
LW

Realmbird

Realmbird

Realmbird

Exploration of Counterfactual Importance and Attention Heads

Realmbird

Realmbird

Realmbird

Exploration of Counterfactual Importance and Attention Heads

Introduction

Methodology