Towards data-centric interpretability with sparse autoencoders
Nick and Lily are co-first authors on this project. Lewis and Neel jointly supervised this project. Check out our updated paper here: https://arxiv.org/abs/2512.10092. TL;DR * We use sparse autoencoders (SAEs) for four textual data analysis tasks—data diffing, finding correlations, targeted clustering, and retrieval. * We care especially about gaining insights...
Aug 15, 202553