x
Attention Output SAEs — LessWrong