Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level
Mechanistic Interpretability Puzzles
Interpreting Othello-GPT
200 Concrete Open Problems in Mechanistic Interpretability
My Overview of the AI Alignment Landscape

Wiki Contributions


Thanks for the clear explanation, Mamba is more cursed and less Transformer like than I realised! And thanks for creating and open sourcing Mamba Lens, it looks like a very useful tool for anyone wanting to build on this stuff

Each element of the  matrix, denoted as , is constrained to the interval . This means that for all , where  indexes the query positions and  indexes the key positions:

Why is this strictly less than 1? Surely if the dot product is 1.1 and you clamp, it gets clamped to exactly 1

Oh nice, I didn't know Evan had a YouTube channel. He's one of the most renowned olympiad coaches and seems highly competent

Thanks! I read and enjoyed the book based on this recommendation

I'm in favour of people having hobbies and fun projects to do in their downtime! That seems good and valuable for impact over the longterm, rather than thinking that every last moment needs to be productive

Thanks for open sourcing this! We've already been finding it really useful on the DeepMind mech interp team, and saved us the effort of writing our own :)

Great post! I'm pretty surprised by this result, and don't have a clear story for what's going on. Though my guess is closer to "adding noise with equal norm to the error is not a fair comparison, for some reason" than "SAEs are fundamentally broken". I'd love to see someone try to figure out WTF is going on.

You may be able to notice data points where the SAE performs unusually badly at reconstruction? (Which is what you'd see if there's a crucial missing feature)

Load More