I haven't fully read through your paper, but from the parts I have read sounds like it might be similar to the neural tangent kernel applied to the case of ReLU networks
Actually it is more similar to the lesser known Neural Path Kernel :) Indeed there is a specific product kernel associated with the path space, in that the path space is the RKHS of that kernel.
Hmm, I got a couple of questions. Quoting from the abstract,
In this paper we argue that ReLU networks learn an implicit linear model we can actually tap into.
What do you mean with linear model? In particular, do you mean "Actually DNNs are linear"? Because that's importantly not true, linear models cannot do the things we care about.
We describe that alleged model formally and show that we can approximately pull its decision boundary back to the input space with certain simple modification to the backward pass. The resulting gradients (called excitation pullbacks) reveal high-resolution input- and target-specific features of remarkable perceptual alignment [...]
Figure 1 is less strong evidence than one might initially think. Various saliency map techniques fell for interpretability illusions in the past, see the canonical critique here.
That said, I haven't read your full paper. Do you still think your method is working after considering the saliency map illusions?
Also, I want to reward people thinking of new interpretability ideas and talking about them, thank you for doing so!
Hi, thanks for comment!
By “linear” I mean linear in the feature space, just like kernel machines are considered “linear” under specific data embedding.
Regarding saliency maps, I still think my method can be considered faithful, in fact the whole theoretical toolset I develop serves to argue for the faithfulness of excitation pullbacks, in particular the Hypothesis 1. I argue that the model approximates a kernel machine in the path space exactly to motivate why excitation pullbacks might be faithful, i.e. they reveal the decision boundary of the more regular underlying model, pointing where the gradient noise comes from exactly (in short, I claim that gradients are noisy because they correspond to rank-1 tensors in the feature space, but the network actually learns a higher-rank feature map).
Also notice that I perform just 5 steps of rudimentary gradient ascent in the pixel space with no additional regularisation, immediately achieving very sensible-looking results that are both input- and target-specific. Arguably the highlighted features are exactly those that humans would highlight when asked to accentuate the most salient features predicting given class.
it's only locally linear, and the nonlinearity is a lot of what's interesting. but this does seem like a pretty cool general idea.
To be precise what I meant by "implicitly linear" is a model that is globally linear in the feature space, after transforming inputs with a fixed map. In other words - a kernel machine. The claim is that ReLU networks approximate a particular, computable kernel machine during training.
That’s fair - the current work focuses on vision models. But I’d argue it provides a concrete mechanism that could generalize to other domains. The idea that a hidden linear model lives within the activation structure isn’t specific to images; what’s needed next is to recover and validate it in text or multimodal settings.
In other words: yes, it’s scoped, but it opens a door.
I recently posted a paper suggesting that deep networks may harbor an implicitly linear model, recoverable via a form of gradient denoising. The method - called excitation pullback - produces crisp, human-aligned features and offers a structural lens on generalization. Just look at the explanations for ImageNet-pretrained ResNet50 on the front page of the paper.