Thanks for this write-up! In case it’s of interest, we have also performed some exploratory interpretability work using the SVD of model weights.
We examine convolutional layers in models on a couple common vision tasks (CIFAR-10, ImageNet). In short, we similarly take the SVD of the weights in a CNN layer, , and project the hidden layer activations onto the th singular vector . These singular direction “neurons” can then be studied with interpretability methods: we use hypergraphs, feature visualizati...
you probably meant "convex" here.