Message

Undergraduate student studying Mathematics @ University of Bristol.

Interested in & persuing a career in technical AI safety.

219

Aidan Ewart

Undergraduate student studying Mathematics @ University of Bristol.

Interested in & persuing a career in technical AI safety.

Aidan Ewart — LessWrong

Aidan Ewart

Message

Undergraduate student studying Mathematics @ University of Bristol.

Interested in & persuing a career in technical AI safety.

219

Aidan Ewart

Undergraduate student studying Mathematics @ University of Bristol.

Interested in & persuing a career in technical AI safety.

Sparse Autoencoders: Future Work

by Logan Riggs and Aidan Ewart

Mostly my own writing, except for the 'Better Training Methods' section which was written by @Aidan Ewart. We made a lot of progress in 4 months working on Sparse Autoencoders, an unsupervised method to scalably find monosemantic features in LLMs, but there's still plenty of work to do. Below I...

Sep 21, 2023•35

Sparse Autoencoders Find Highly Interpretable Directions in Language Models

by Logan Riggs, Hoagy, Aidan Ewart, and Robert_AIZI

This is a linkpost for Sparse Autoencoders Find Highly Interpretable Directions in Language Models We use a scalable and unsupervised method called Sparse Autoencoders to find interpretable, monosemantic features in real LLMs (Pythia-70M/410M) for both residual stream and MLPs. We showcase monosemantic features, feature replacement for Indirect Object Identification (IOI),...

Sep 21, 2023•161