Maciej Satkiewicz — LessWrong

It turns out that DNNs are remarkably interpretable.

Actually it is more similar to the lesser known Neural Path Kernel :) Indeed there is a specific product kernel associated with the path space, in that the path space is the RKHS of that kernel.

It turns out that DNNs are remarkably interpretable.

Maciej Satkiewicz3mo10

Hi, thanks for comment!

By “linear” I mean linear in the feature space, just like kernel machines are considered “linear” under specific data embedding.

Regarding saliency maps, I still think my method can be considered faithful, in fact the whole theoretical toolset I develop serves to argue for the faithfulness of excitation pullbacks, in particular the Hypothesis 1. I argue that the model approximates a kernel machine in the path space exactly to motivate why excitation pullbacks might be faithful, i.e. they reveal the decision boundary of the more regular underlying model, pointing where the gradient noise comes from exactly (in short, I claim that gradients are noisy because they correspond to rank-1 tensors in the feature space, but the network actually learns a higher-rank feature map).

Also notice that I perform just 5 steps of rudimentary gradient ascent in the pixel space with no additional regularisation, immediately achieving very sensible-looking results that are both input- and target-specific. Arguably the highlighted features are exactly those that humans would highlight when asked to accentuate the most salient features predicting given class.

It turns out that DNNs are remarkably interpretable.

Maciej Satkiewicz3mo10

To be precise what I meant by "implicitly linear" is a model that is globally linear in the feature space, after transforming inputs with a fixed map. In other words - a kernel machine. The claim is that ReLU networks approximate a particular, computable kernel machine during training.

It turns out that DNNs are remarkably interpretable.

Maciej Satkiewicz3mo20

That’s fair - the current work focuses on vision models. But I’d argue it provides a concrete mechanism that could generalize to other domains. The idea that a hidden linear model lives within the activation structure isn’t specific to images; what’s needed next is to recover and validate it in text or multimodal settings.

In other words: yes, it’s scoped, but it opens a door.

Towards White Box Deep Learning

Maciej Satkiewicz2y40

These are interesting considerations! I haven't put much thought on this yet but I have some preliminary ideas.

Semantic features are intended to capture meaning-preserving variations of structures. In that sense the "next word" problem seems ill-posed as some permutations of words preserve meaning; in reality its a hardly natural problem also from the human perspective.

The question I'd ask here is "what are the basic semantic building blocks of text for us humans?" and then try to model these blocks using the machinery of semantic features, i.e. model the invariants of these semantic blocks. Only then I'd think of adequate formulations of useful problems regarding text understanding.

So I'd say that these semantic atoms of text are actually thoughts (encoded by certain sequences of words/sentences that enjoy certain permutation-invariance). Thus semantic features would aim to capture thoughts-at-locations by finding these sequences (up to their specific permutations) and deeper layers would capture higher-level thoughts-at-locations composed of the former. This could potentially uncover some euclidian structure in the text (which makes sense as humans arguably think within the space-time framework, after Kant's famous "Space and time are the framework within which the mind is constrained to construct its experience of reality").

That being said the problems I'd consider would be some forms of translation (to another language or another modality) rather then the artificial next-word prediction.

The MVD for this problem could very well consist of 0's and 1's provided that they'd encode some simple yet sensible semantics. I'd have to think of a specific example more, it's a nice question :)

Towards White Box Deep Learning

Maciej Satkiewicz2y20

Thank you! The quote you picked is on point, I added an extended summary based on this, thanks for the suggestion!

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments