I haven't fully read through your paper, but from the parts I have read sounds like it might be similar to the neural tangent kernel applied to the case of ReLU networks

[-]Maciej Satkiewicz4mo10

Actually it is more similar to the lesser known Neural Path Kernel :) Indeed there is a specific product kernel associated with the path space, in that the path space is the RKHS of that kernel.

[-]StefanHex4mo20

Hmm, I got a couple of questions. Quoting from the abstract,

In this paper we argue that ReLU networks learn an implicit linear model we can actually tap into.

What do you mean with linear model? In particular, do you mean "Actually DNNs are linear"? Because that's importantly not true, linear models cannot do the things we care about.

We describe that alleged model formally and show that we can approximately pull its decision boundary back to the input space with certain simple modification to the backward pass. The resulting gradients (called excitation pullbacks) reveal high-resolution input- and target-specific features of remarkable perceptual alignment [...]

Figure 1 is less strong evidence than one might initially think. Various saliency map techniques fell for interpretability illusions in the past, see the canonical critique here.

That said, I haven't read your full paper. Do you still think your method is working after considering the saliency map illusions?

Also, I want to reward people thinking of new interpretability ideas and talking about them, thank you for doing so!

[-]Maciej Satkiewicz4mo10

Hi, thanks for comment!

By “linear” I mean linear in the feature space, just like kernel machines are considered “linear” under specific data embedding.

Regarding saliency maps, I still think my method can be considered faithful, in fact the whole theoretical toolset I develop serves to argue for the faithfulness of excitation pullbacks, in particular the Hypothesis 1. I argue that the model approximates a kernel machine in the path space exactly to motivate why excitation pullbacks might be faithful, i.e. they reveal the decision boundary of the more regular underlying model, pointing where the gradient noise comes from exactly (in short, I claim that gradients are noisy because they correspond to rank-1 tensors in the feature space, but the network actually learns a higher-rank feature map).

Also notice that I perform just 5 steps of rudimentary gradient ascent in the pixel space with no additional regularisation, immediately achieving very sensible-looking results that are both input- and target-specific. Arguably the highlighted features are exactly those that humans would highlight when asked to accentuate the most salient features predicting given class.

[-]the gears to ascension4mo20

it's only locally linear, and the nonlinearity is a lot of what's interesting. but this does seem like a pretty cool general idea.

[-]Maciej Satkiewicz4mo10

To be precise what I meant by "implicitly linear" is a model that is globally linear in the feature space, after transforming inputs with a fixed map. In other words - a kernel machine. The claim is that ReLU networks approximate a particular, computable kernel machine during training.

[-]anaguma4mo12

Idk, this seem fairly limited. I’d be excited for work which extends this beyond image models.

[-]Maciej Satkiewicz4mo20

That’s fair - the current work focuses on vision models. But I’d argue it provides a concrete mechanism that could generalize to other domains. The idea that a hidden linear model lives within the activation structure isn’t specific to images; what’s needed next is to recover and validate it in text or multimodal settings.

In other words: yes, it’s scoped, but it opens a door.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

12

It turns out that DNNs are remarkably interpretable.

12

12