How did you get started in mechanistic interpretability? What other paths have you seen work?
I’m mapping out how people are entering mechanistic interpretability and to me it seems like there isn’t a single agreed upon route. Some people begin with reproducing classic experiments, some come through theory or causal ML, others build tools, or jump in through bio/RL/vision. I would appreciate stories that tell:...