x

LESSWRONG

LW

Rico Angell — LessWrong

Rico Angell

Rico Angell

Message

23

2y

Rico Angell

23

2y

Evaluating Sparse Autoencoders with Board Game Models

by Adam Karvonen, Sam Marks, Can, Benjamin Wright, Jannik Brinkmann, Logan Riggs, and Rico Angell

This blog post discusses a collaborative research paper on sparse autoencoders (SAEs), specifically focusing on SAE evaluations and a new training method we call p-annealing. As the first author, I primarily contributed to the evaluation portion of our work. The views expressed here are my own and do not necessarily...

Aug 2, 2024•38