Alex Gibson's Shortform

27th Dec 2024

1 min read

1

This is a special post for quick takes by Alex Gibson. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:07 PM

[-]Alex Gibson4mo22

Minimizing components used per input Minimizing number of global components

Lots of Mech Interp methods minimize the LHS of this equation. Circuit analysis such as IOI, SAEs, APD.

Optimizing the LHS is useful if you want to be able to "tell a story" about any input. In this case you want the fewest latents on any given input, so that your story on a particular input is not too long.

But there's no reason to think that the model is actually using a sparse set of components /features on any given forward pass. A model might be using hundreds of different features, and combining components together in a harmony to produce the output. And if we don't just want to tell a short story about an input, and want to understand faithfully what's going on, then we are going to have to accept this. And we can't just make a new feature for each of the exponential number of ways that components can be combined.

Just because a model is potentially combining lots of components on any given input does not make it intractable to understand at a high level. We know this because people have a good understanding of operating systems, even though for an operating system to complete a single task, it potentially has to use thousands of different parts of the system. But the global number of parts of the system is small enough for a human to comprehend, and any given forward pass of the operating system just looks like combining these well-understood parts.

Reply

[-]Lucius Bushnaq4mo20

But there's no reason to think that the model is actually using a sparse set of components /features on any given forward pass.

I contest this. If a model wants to implement more computations (for example, logic gates) in a layer than that layer has neurons, the known methods for doing this rely on few computations being used (that is, receiving a non-baseline input) on any given forward pass.

Reply

[+][comment deleted]5mo10

[+][comment deleted]6mo10

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

Alex Gibson's Shortform

1