Anish Mudide

Efficient Dictionary Learning with Switch Sparse Autoencoders

Produced as part of the ML Alignment & Theory Scholars Program - Summer 2024 Cohort 0. Summary To recover all the relevant features from a superintelligent language model, we will likely need to scale sparse autoencoders (SAEs) to billions of features. Using current architectures, training extremely wide SAEs across multiple...

Jul 22, 2024118

LESSWRONG
LW

LESSWRONG
LW

Anish Mudide

Efficient Dictionary Learning with Switch Sparse Autoencoders

Anish Mudide

Anish Mudide

Efficient Dictionary Learning with Switch Sparse Autoencoders

0. Summary

1. Introduction