Sparse Autoencoders: Future Work
Mostly my own writing, except for the 'Better Training Methods' section which was written by @Aidan Ewart. We made a lot of progress in 4 months working on Sparse Autoencoders, an unsupervised method to scalably find monosemantic features in LLMs, but there's still plenty of work to do. Below I...