Which Circuit is it?
This sequence explores the foundations of interpretability through toy-model experiments.
We seek methodological clarity at the smallest scale so that we are better equipped to reason about the opacity within the largest models.
Toy models grant us a certain intimacy. Sometimes, we can even exhaustively catalog all possible subcircuits and carefully modify them. This meticulous experimentation invites reflection, anchoring our big philosophical questions in something legible.