x

LESSWRONG

LW

Kanishk Tantia — LessWrong

Kanishk Tantia

Kanishk Tantia

Message

4

2y

Kanishk Tantia

4

2y

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix

by Jaehyuk Lim, Kanishk Tantia, and Sinem

*All authors have equal contribution. This is a short informal post about recent discoveries our team made when we: 1. Clustered feature vectors in the decoder matrices of Sparse Autoencoders (SAEs) trained on the residual stream of each layer of GPT-2 Small and Gemma-2B. 2. Visualized the clusters in 2D...

Oct 11, 2024•8