I’m mapping out how people are entering mechanistic interpretability and to me it seems like there isn’t a single agreed upon route. Some people begin with reproducing classic experiments, some come through theory or causal ML, others build tools, or jump in through bio/RL/vision.
I would appreciate stories that tell:
How did you start?
What worked?
What didn’t?
What would you do differently if starting today?
Are there alternative routes you’ve seen that people underestimate?
All perspectives are welcomed- even partial experiences