LESSWRONG
LW

87
200 Concrete Open Problems in Mechanistic Interpretability

200 Concrete Open Problems in Mechanistic Interpretability

Dec 28, 2022 by Neel Nanda
57Concrete Steps to Get Started in Transformer Mechanistic Interpretability
Ω
Neel Nanda
3y
Ω
7
108200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Ω
Neel Nanda
3y
Ω
0
40200 COP in MI: The Case for Analysing Toy Language Models
Ω
Neel Nanda
3y
Ω
3
16200 COP in MI: Looking for Circuits in the Wild
Ω
Neel Nanda
3y
Ω
5
33200 COP in MI: Interpreting Algorithmic Problems
Ω
Neel Nanda
3y
Ω
2
34200 COP in MI: Exploring Polysemanticity and Superposition
Ω
Neel Nanda
3y
Ω
6
16200 COP in MI: Analysing Training Dynamics
Ω
Neel Nanda
3y
Ω
0
13200 COP in MI: Techniques, Tooling and Automation
Ω
Neel Nanda
3y
Ω
0
18200 COP in MI: Image Model Interpretability
Ω
Neel Nanda
3y
Ω
3
25200 COP in MI: Interpreting Reinforcement Learning
Ω
Neel Nanda
3y
Ω
1
24200 COP in MI: Studying Learned Features in Language Models
Ω
Neel Nanda
3y
Ω
2