Matteo Migliarini

The Self-Hating Attention Head: A Deep Dive in GPT-2

> TL;DR: gpt2-small's head L1H5 directs attention to semantically similar tokens and actively suppresses self-attention. The head computes attention purely based on token identity, independent of position. This mechanism is driven by a symmetric bilinear form with negative eigenvalues, which enables suppression. We cluster tokens semantically, interpret the weights to...

Jul 4, 202512

LESSWRONG
LW

LESSWRONG
LW

Matteo Migliarini

Matteo Migliarini

The Self-Hating Attention Head: A Deep Dive in GPT-2

Matteo Migliarini

Matteo Migliarini

Matteo Migliarini

The Self-Hating Attention Head: A Deep Dive in GPT-2