LESSWRONG
LW

Nick Jiang
58Ω17310
Message
Dialogue
Subscribe

Working to advance our understanding of models!

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Towards data-centric interpretability with sparse autoencoders
Nick Jiang15d10

We chose maxpooling (vs. average pooling) as our aggregation method because we primarily care about the largest extent to which a property exists in a data sample (vs. the average presence of a property in the sample). In this work, we're using SAE features to construct a document-level embedding in isolation of sample-specific information like the sample length. However, we did not thoroughly investigate other aggregation methods and think that further investigation could be interesting to explore. For instance, if we care about distinguishing short samples where a property occurs once vs. long samples where the property occurs over and over, then a different aggregation method would work better.

Reply
No wikitag contributions to display.
47Towards data-centric interpretability with sparse autoencoders
Ω
17d
Ω
2
4On the Practical Applications of Interpretability
11mo
1
21A gentle introduction to sparse autoencoders
1y
2