How To Become A Mechanistic Interpretability Researcher
Last updated Sept 2 2025 Note - if you want to pursue a career in this kind of research, apply to my MATS stream! Due Dec 23 TL;DR * This post is about the mindset and process I recommend if you want to do mechanistic interpretability research. I aim to give a clear sense of direction, so give opinionated advice and concrete recommendations. * Mech interp is high-leverage, impactful, and learnable on your own with short feedback loops and modest compute. * Learn the minimum viable basics, then do research. Mech interp is an empirical science * Three stages: * Learn the ropes (≤1 month) learn the essentials, go breadth-first; * Learn with research mini-projects practice basic research skills with 1-5 day mini projects, focus on fast feedback loop skills; * Work up to full projects, do 1-2 week research sprints, continue the best ones. Explore deeper skills and the mindset of a great researcher. * Stage 1: Learning the Ropes * Breadth over depth; get a good baseline not perfection * Learn the basics: Code a transformer from scratch, key mech interp techniques, the landscape of the field, linear algebra intuitions, how to write mech interp code (ARENA is your friend) * Get your hands dirty: Do not just read things. Mech interp is a fundamentally empirical science * Move on after a month. Don’t expect to feel “done” or to have covered all of the ropes, learn more when needed. You won’t stumble across great research insights without starting to do something real * Use LLMs extensively - they’re not perfect, but are better at mech interp than you right now! They’re a crucial learning tool (when used right!) * Unpacking the research process: * Many skills, categorise them by the feedback loops. * Fast skills (minutes-hours) like write/run/debug experiments * Slow (weeks) like how to prioritise and when to pivot * Very slow (months) like generating good research ideas * Do not try to learn all skills at once. Focus o