x
(tentatively) Found 600+ Monosemantic Features in a Small LM Using Sparse Autoencoders — LessWrong