Which Circuit is it?
This sequence explores the foundations of interpretability through toy-model experiments.
We seek methodological clarity at the smallest scale so that we are better equipped to reason about the opacity within the largest models.
Toy models grant us certain intimacy. Sometimes, we can even exhaustively catalog all possible subcircuits and carefully modify them. This meticulous experimentation invites reflection, anchoring our big philosophical questions in something legible.