x

LESSWRONG
LW

Haoran Ye — LessWrong

Haoran Ye

Haoran Ye

Message

3

1

10mo

Haoran Ye

3

10mo

Training Superior Sparse Autoencoders for Instruct Models

Resource Link Paper https://arxiv.org/abs/2506.07691 Code https://github.com/Geaming2002/FAST SAEs Llama-3.1-8B-Instruct_SAEs🤗,Llama-3.2-3B-Instruct_SAEs🤗,Llama-3.2-1B-Instruct_SAEs🤗,Qwen2.5-7B-Instruct_SAEs🤗,Qwen2.5-3B-Instruct_SAEs🤗,Qwen2.5-1.5B-Instruct_SAEs🤗,Qwen2.5-0.5B-Instruct_SAEs🤗 > 💡 TL;DR > > In this paper, we discover problems in previous SAE training approaches for instruct model : > > * 📚 Suboptimal dataset selection affecting SAE performance. > * ✂️ Semantic discontinuity caused by block training truncating samples mid-content....

Jun 14, 2025•4