LESSWRONG
LW

200 Concrete Open Problems in Mechanistic Interpretability

Dec 28, 2022 by Neel Nanda
57Concrete Steps to Get Started in Transformer Mechanistic Interpretability
Ω
Neel Nanda
2y
Ω
7
106200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Ω
Neel Nanda
2y
Ω
0
40200 COP in MI: The Case for Analysing Toy Language Models
Ω
Neel Nanda
2y
Ω
3
16200 COP in MI: Looking for Circuits in the Wild
Ω
Neel Nanda
2y
Ω
5
33200 COP in MI: Interpreting Algorithmic Problems
Ω
Neel Nanda
2y
Ω
2
34200 COP in MI: Exploring Polysemanticity and Superposition
Ω
Neel Nanda
2y
Ω
6
16200 COP in MI: Analysing Training Dynamics
Ω
Neel Nanda
2y
Ω
0
13200 COP in MI: Techniques, Tooling and Automation
Ω
Neel Nanda
2y
Ω
0
18200 COP in MI: Image Model Interpretability
Ω
Neel Nanda
2y
Ω
3
25200 COP in MI: Interpreting Reinforcement Learning
Ω
Neel Nanda
2y
Ω
1
24200 COP in MI: Studying Learned Features in Language Models
Ω
Neel Nanda
2y
Ω
2