LESSWRONG200 Concrete Open Problems in Mechanistic Interpretability
LW

200 Concrete Open Problems in Mechanistic Interpretability

Dec 28, 2022 by Neel Nanda
51Concrete Steps to Get Started in Transformer Mechanistic InterpretabilityΩ
Neel Nanda
6mo
Ω
7
101200 Concrete Open Problems in Mechanistic Interpretability: IntroductionΩ
Neel Nanda
6mo
Ω
0
39200 COP in MI: The Case for Analysing Toy Language ModelsΩ
Neel Nanda
6mo
Ω
3
15200 COP in MI: Looking for Circuits in the WildΩ
Neel Nanda
6mo
Ω
5
32200 COP in MI: Interpreting Algorithmic ProblemsΩ
Neel Nanda
6mo
Ω
2
19200 COP in MI: Exploring Polysemanticity and SuperpositionΩ
Neel Nanda
5mo
Ω
0
17200 COP in MI: Analysing Training DynamicsΩ
Neel Nanda
5mo
Ω
0
13200 COP in MI: Techniques, Tooling and AutomationΩ
Neel Nanda
5mo
Ω
0
17200 COP in MI: Image Model InterpretabilityΩ
Neel Nanda
5mo
Ω
3
22200 COP in MI: Interpreting Reinforcement LearningΩ
Neel Nanda
5mo
Ω
1
25200 COP in MI: Studying Learned Features in Language ModelsΩ
Neel Nanda
5mo
Ω
2