This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Sparse Autoencoders (SAEs)
•
Applied to
Quick Thoughts on Scaling Monosemanticity
by
Joel Burget
9d
ago
•
Applied to
SAE sparse feature graph using only residual layers
by
crayhippo
9d
ago
•
Applied to
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
by
Dan Braun
15d
ago
•
Applied to
Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
by
Joseph Bloom
1mo
ago
•
Applied to
Transcoders enable fine-grained interpretable circuit analysis for language models
by
Jacob Dunefsky
1mo
ago
•
Applied to
Improving Dictionary Learning with Gated Sparse Autoencoders
by
Neel Nanda
1mo
ago
•
Applied to
ProLU: A Nonlinearity for Sparse Autoencoders
by
Noa Nabeshima
1mo
ago
•
Applied to
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
by
Neel Nanda
1mo
ago
•
Applied to
Past Tense Features
by
Can
1mo
ago
•
Applied to
[Summary] Progress Update #1 from the GDM Mech Interp Team
by
Joseph Bloom
1mo
ago
•
Applied to
[Full Post] Progress Update #1 from the GDM Mech Interp Team
by
Joseph Bloom
1mo
ago
•
Applied to
Experiments with an alternative method to promote sparsity in sparse autoencoders
by
Eoin Farrell
2mo
ago
•
Applied to
Normalizing Sparse Autoencoders
by
Fengyuan Hu
2mo
ago
•
Applied to
Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
by
Joseph Bloom
2mo
ago
•
Applied to
SAE-VIS: Announcement Post
by
Joseph Bloom
2mo
ago
•
Applied to
SAE reconstruction errors are (empirically) pathological
by
Joseph Bloom
2mo
ago
•
Applied to
My best guess at the important tricks for training 1L SAEs
by
Joseph Bloom
2mo
ago
•
Applied to
Addressing Feature Suppression in SAEs
by
Joseph Bloom
2mo
ago