x

LESSWRONG

LW

Derek Larson — LessWrong

Derek Larson

Derek Larson

Message

9

1

3y

Derek Larson

9

3y

Expanding the Scope of Superposition

Overview One of the active research areas for interpretability involves distilling neural network activations into clean, labeled features. This is made difficult because of superposition, where a neuron may fire in response to multiple, disparate signals making that neuron polysemantic. To date, research has focused on one type of such...

Sep 13, 2023•10