galvsk

Message

Transformers Don't Need LayerNorm at Inference Time: Implications for Interpretability

This work was produced during MARS and SPAR. arXiv version available at https://arxiv.org/abs/2507.02559. Code on GitHub and models on HuggingFace. TL;DR we scaled LayerNorm (LN) removal by fine-tuning to GPT-2 XL: * We improve training stability by regularizing activation standard deviation across token positions & improve the training code. *...

Jul 23, 2025•31

Message

20 karma

Member for a year

galvsk — LessWrong

galvsk

Message

galvsk

Transformers Don't Need LayerNorm at Inference Time: Implications for Interpretability

Jul 23, 2025•31

Message

20 karma

Member for a year

Transformers Don't Need LayerNorm at Inference Time: Implications for Interpretability

submarat

submarat, Joachim Schaeffer, Luca Baroni, galvsk, StefanHex+ 0 more

submarat, Joachim Schaeffer, Luca Baroni, galvsk, StefanHex

7mo

This work was produced during MARS and SPAR. arXiv version available at https://arxiv.org/abs/2507.02559. Code on GitHub and models on HuggingFace.

TL;DR we scaled LayerNorm (LN) removal by fine-tuning to GPT-2 XL:

We improve training stability by regularizing activation standard deviation across token positions & improve the training code.
The process is still not very data efficient (requires about ~1% of original dataset) and fine-tuning is somewhat unstable on larger models.
Surprisingly, attribution patching is not more accurate in LN-free models.
As expected, direct logit attribution becomes exact, and entropy neurons stop working.

We provide drop-in replacements for GPT-2 small, medium, large, and XL on HuggingFace for all your interpretability needs!

Introduction

LN centers the activations by the mean and normalizes

... (read 2076 more words →)

LESSWRONG
LW

LESSWRONG
LW

galvsk

galvsk

galvsk

Transformers Don't Need LayerNorm at Inference Time: Implications for Interpretability

galvsk

galvsk

galvsk

Transformers Don't Need LayerNorm at Inference Time: Implications for Interpretability

Introduction