Minimizing components used per input Minimizing number of global components
Lots of Mech Interp methods minimize the LHS of this equation. Circuit analysis such as IOI, SAEs, APD.
Optimizing the LHS is useful if you want to be able to "tell a story" about any input. In this case you want the fewest latents on any given input, so that your story on a particular input is not too long.
But there's no reason to think that the model is actually using a sparse set of components /features on any given forward pass. A model might be using hundreds of different features, and combining components together in a harmony to produce the output. And if we don't just want to tell a short story about an input, and want to understand faithfully what's going on, then we are going to have to accept this. And we can't just make a new feature for each of the exponential number of ways that components can be combined.
Just because a model is potentially combining lots of components on any given input does not make it intractable to understand at a high level. We know this because people have a good understanding of operating systems, even though for an operating system to complete a single task, it potentially has to use thousands of different parts of the system. But the global number of parts of the system is small enough for a human to comprehend, and any given forward pass of the operating system just looks like combining these well-understood parts.
But there's no reason to think that the model is actually using a sparse set of components /features on any given forward pass.
I contest this. If a model wants to implement more computations (for example, logic gates) in a layer than that layer has neurons, the known methods for doing this rely on few computations being used (that is, receiving a non-baseline input) on any given forward pass.