LESSWRONG
LW

Adam Karvonen's Shortform

by Adam Karvonen
18th Jan 2025
1 min read
1

4

This is a special post for quick takes by Adam Karvonen. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Adam Karvonen's Shortform
4Adam Karvonen
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 1:39 AM
[-]Adam Karvonen8mo47

If you're looking for a hackable SAE training repo for experiments, I'd recommend our dictionary_learning repo. It's been around for a few months, but we've recently spent some time cleaning it up and adding additional trainer types.

It's designed to be simple and hackable - you can add a new SAE type in a single file (~350 lines). We have 8 tested implementations, including JumpReLU, TopK, BatchTopK, Matryoshka, Gated, and others, with BatchTopK recommended as a good default. Training is quick and cheap - training 6 16K width SAEs on Gemma-2-2B for 200M tokens takes ~6 3090 hours, or ~$1.20.

The repo integrates with SAE Bench and includes reproducible baselines trained on Pythia-160M and Gemma-2-2B. While it's not optimized for large models like Eleuther's (no Cuda kernels/multi-GPU support) and has fewer features than SAE Lens, it's great for experiments and trying new architectures.

Here is a link to the repo: https://github.com/saprmarks/dictionary_learning

Reply
Moderation Log
More from Adam Karvonen
View more
Curated and popular this week
1Comments