Thanks for this write-up! In case it’s of interest, we have also performed some exploratory interpretability work using the SVD of model weights.
We examine convolutional layers in models on a couple common vision tasks (CIFAR-10, ImageNet). In short, we similarly take the SVD of the weights in a CNN layer, WL=USVT, and project the hidden layer activations xl onto the ith singular vector V[i,:]xl. These singular direction “neurons” can then be studied with interpretability methods: we use hypergraphs, feature visualizations, and exemplary images. More detail can be found in The SVD of Convolutional Weights: A CNN Interpretability Framework and you can explore the OpenAI Microscope-inspired demo we created for a VGG-16 trained on ImageNet here (under the... (read more)
Thanks for this write-up! In case it’s of interest, we have also performed some exploratory interpretability work using the SVD of model weights.
We examine convolutional layers in models on a couple common vision tasks (CIFAR-10, ImageNet). In short, we similarly take the SVD of the weights in a CNN layer, WL=USVT, and project the hidden layer activations xl onto the ith singular vector V[i,:]xl. These singular direction “neurons” can then be studied with interpretability methods: we use hypergraphs, feature visualizations, and exemplary images. More detail can be found in The SVD of Convolutional Weights: A CNN Interpretability Framework and you can explore the OpenAI Microscope-inspired demo we created for a VGG-16 trained on ImageNet here (under the... (read more)