LESSWRONG
LW

2947
Nick Jiang
64Ω17310
Message
Dialogue
Subscribe

Working to advance our understanding of models!

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Towards data-centric interpretability with sparse autoencoders
Nick Jiang2mo10

We chose maxpooling (vs. average pooling) as our aggregation method because we primarily care about the largest extent to which a property exists in a data sample (vs. the average presence of a property in the sample). In this work, we're using SAE features to construct a document-level embedding in isolation of sample-specific information like the sample length. However, we did not thoroughly investigate other aggregation methods and think that further investigation could be interesting to explore. For instance, if we care about distinguishing short samples where a property occurs once vs. long samples where the property occurs over and over, then a different aggregation method would work better.

Reply
53Towards data-centric interpretability with sparse autoencoders
Ω
2mo
Ω
2
4On the Practical Applications of Interpretability
1y
1
21A gentle introduction to sparse autoencoders
1y
2