LESSWRONG200 Concrete Open Problems in Mechanistic Interpretability
LW

200 Concrete Open Problems in Mechanistic Interpretability

Dec 28, 2022 by Neel Nanda
46Concrete Steps to Get Started in Transformer Mechanistic InterpretabilityΩ
Neel Nanda
1mo
Ω
7
81200 Concrete Open Problems in Mechanistic Interpretability: IntroductionΩ
Neel Nanda
1mo
Ω
0
38200 COP in MI: The Case for Analysing Toy Language ModelsΩ
Neel Nanda
1mo
Ω
0
15200 COP in MI: Looking for Circuits in the WildΩ
Neel Nanda
1mo
Ω
5
29200 COP in MI: Interpreting Algorithmic ProblemsΩ
Neel Nanda
1mo
Ω
1
17200 COP in MI: Exploring Polysemanticity and SuperpositionΩ
Neel Nanda
1mo
Ω
0
16200 COP in MI: Analysing Training DynamicsΩ
Neel Nanda
1mo
Ω
0
11200 COP in MI: Techniques, Tooling and AutomationΩ
Neel Nanda
25d
Ω
0
14200 COP in MI: Image Model InterpretabilityΩ
Neel Nanda
23d
Ω
1
16200 COP in MI: Interpreting Reinforcement LearningΩ
Neel Nanda
21d
Ω
1
19200 COP in MI: Studying Learned Features in Language ModelsΩ
Neel Nanda
13d
Ω
2