This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Sparse Autoencoders (SAEs)
•
Applied to
Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
by
Joseph Bloom
17d
ago
•
Applied to
Transcoders enable fine-grained interpretable circuit analysis for language models
by
Jacob Dunefsky
17d
ago
•
Applied to
Improving Dictionary Learning with Gated Sparse Autoencoders
by
Neel Nanda
22d
ago
•
Applied to
ProLU: A Nonlinearity for Sparse Autoencoders
by
Noa Nabeshima
24d
ago
•
Applied to
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
by
Neel Nanda
26d
ago
•
Applied to
Past Tense Features
by
Can
1mo
ago
•
Applied to
[Summary] Progress Update #1 from the GDM Mech Interp Team
by
Joseph Bloom
1mo
ago
•
Applied to
[Full Post] Progress Update #1 from the GDM Mech Interp Team
by
Joseph Bloom
1mo
ago
•
Applied to
Experiments with an alternative method to promote sparsity in sparse autoencoders
by
Eoin Farrell
1mo
ago
•
Applied to
Normalizing Sparse Autoencoders
by
Fengyuan Hu
1mo
ago
•
Applied to
Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
by
Joseph Bloom
1mo
ago
•
Applied to
SAE-VIS: Announcement Post
by
Joseph Bloom
1mo
ago
•
Applied to
SAE reconstruction errors are (empirically) pathological
by
Joseph Bloom
1mo
ago
•
Applied to
My best guess at the important tricks for training 1L SAEs
by
Joseph Bloom
1mo
ago
•
Applied to
Addressing Feature Suppression in SAEs
by
Joseph Bloom
1mo
ago
•
Applied to
Transformer Debugger
by
Joseph Bloom
1mo
ago
•
Applied to
Explainer - AutoInterpretation Finds Sparse Coding Beats Alternatives
by
Joseph Bloom
1mo
ago
•
Applied to
Sparse Coding, for Mechanistic Interpretability and Activation Engineering
by
Joseph Bloom
1mo
ago