x

LESSWRONG
LW

slavachalnev — LessWrong

slavachalnev

slavachalnev

Message

29

1

4y

slavachalnev

29

4y

Sparse MLP Distillation

This is a research report about my attempt to extract interpretable features from a transformer MLP by distilling it into a larger student MLP, while encouraging sparsity by applying an L1 penalty to the activations, as depicted in Figure 1. I investigate the features learned by the distilled MLP, compare...

Jan 15, 2024•30