x

LESSWRONG

LW

baimamboukar — LessWrong

baimamboukar

baimamboukar

Message

10

1

1mo

baimamboukar

10

1mo

How Matryoshka Sparse AutoEncoders Recover Feature Hierarchies That Vanilla SAEs Lose

A walkthrough of the core findings and guided replication of the concepts from the original research on “Multi-level features discovery with Matryoshka Sparse AutoEncoders”. TL;DR Sparse AutoEncoders (SAEs) are a cornerstone of mechanistic interpretability, but they struggle with scalability. As we increase the dictionary size to capture more features, we...