How Matryoshka Sparse AutoEncoders Recover Feature Hierarchies That Vanilla SAEs Lose
A walkthrough of the core findings and guided replication of the concepts from the original research on “Multi-level features discovery with Matryoshka Sparse AutoEncoders”. TL;DR Sparse AutoEncoders (SAEs) are a cornerstone of mechanistic interpretability, but they struggle with scalability. As we increase the dictionary size to capture more features, we...
Jun 1511