When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability
by Logan Riggs, tdooms, Conflux, lwroe, and MLNissenGonzalez
We've found a method that tells you: * How functionally similar two neural networks are across ALL inputs, * Computed solely from the weights (i.e. no data), * Using a principled generalization of cosine similarity. There's only one catch: you have to use a tensor network. We've already shown that...
May 2933