LESSWRONG
LW

Interpretability (ML & AI)AI
Frontpage

1

[ Question ]

Previous Work on Recreating Neural Network Input from Intermediate Layer Activations

by bglass
12th Oct 2022
1 min read
A
2
3

1

Interpretability (ML & AI)AI
Frontpage

1

Previous Work on Recreating Neural Network Input from Intermediate Layer Activations
5the gears to ascension
3the gears to ascension
3Garrett Baker
New Answer
New Comment

2 Answers sorted by
top scoring

the gears to ascension

Oct 13, 2022

52

search quality: skimmed the abstracts search method: semantic scholar + browsing note that many of these results are kind of old

  • https://www.semanticscholar.org/paper/Explaining-Neural-Networks-by-Decoding-Layer-Schneider-Vlachos/0de6c8de9154a0db199aa433fc19cdfef2a62076
  • ... is cited by https://www.semanticscholar.org/paper/Toward-Transparent-AI%3A-A-Survey-on-Interpreting-the-Raukur-Ho/108a4000b32e3f6eb566151790bfea69c1f3a9db (fun: it cites the EA forum for one of its 300 cites)
  • ... which cites https://www.semanticscholar.org/paper/Understanding-deep-image-representations-by-them-Mahendran-Vedaldi/4d790c8fae40357d24813d085fa74a436847fb49
  • ... which is heavily cited, eg by https://www.semanticscholar.org/paper/Inverting-Visual-Representations-with-Convolutional-Dosovitskiy-Brox/125f7b539e89cd0940ff89c231902b1d4023b3ba
  • ... https://www.semanticscholar.org/paper/Inverting-face-embeddings-with-convolutional-neural-Zhmoginov-Sandler/e44fc62f9fba4c9ad276544901fd1e82caaf7baa
  • ... https://www.semanticscholar.org/paper/Inverting-Convolutional-Networks-with-Convolutional-Dosovitskiy-Brox/993c55eef970c6a11ec367dbb1bf1f0c1d5d72a6
  • ... hmm interesting, here's a branch off into doing it on the human visual system apparently https://www.semanticscholar.org/paper/Using-deep-learning-to-reveal-the-neural-code-for-Kindel-Christensen/e79b56303a29114762f458d338d0f3b03348d618
  • ... https://www.semanticscholar.org/paper/Visualizing-and-Comparing-AlexNet-and-VGG-using-Yu-Bai/dae981902b1f6d869ef2d047612b90cdbe43fd1e
  • ... https://www.semanticscholar.org/paper/Understading-Image-Restoration-Convolutional-Neural-Protas-Bratti/0c807815ceaa186e99519f59ae6c3ff1ac7defdd
  • https://www.semanticscholar.org/paper/Towards-Understanding-the-Invertibility-of-Neural-Gilbert-Zhang/487489253b03948a1b1c581986c086d577222e0a
  • https://www.semanticscholar.org/paper/Analysis-of-Invariance-and-Robustness-via-of-Behrmann-Dittmer/0c11435e0b97b90dfc3928ce242c68289bc757f2
  • https://www.semanticscholar.org/paper/Deep-Neural-Networks-are-Surprisingly-Reversible%3A-A-Dong-Yin/e8e5f0db724d65f761bd2d415ee46281f8ba751a
  • https://www.semanticscholar.org/paper/Large-capacity-Image-Steganography-Based-on-Neural-Lu-Wang/d1485d298906364c4434454d25c0ed4389420892
  • https://www.semanticscholar.org/paper/Robust-Invertible-Image-Steganography-Xu-Mou/786736d89d5bbfa674fabe42ecec32ed8f67901e
  • https://www.semanticscholar.org/paper/Understanding-and-mitigating-exploding-inverses-in-Behrmann-Vicol/8c0b75099f577cc009065e985cae6986cf755d4d
  • https://www.semanticscholar.org/paper/The-Effects-of-Invertibility-on-the-Complexity-of-Pareek-Risteski/7bb65e9167e5d21f04ebaacdd7bc59f7c4972bb7
  • https://www.semanticscholar.org/paper/Evaluating-generalization-through-interval-based-Adam-Likas/f7843d212ddd65de3dc376bb6c146ce78eacf3e0
  • https://www.semanticscholar.org/paper/Landscape-Learning-for-Neural-Network-Inversion-Liu-Mao/5dad3748e8d4d8c659005903062e5d8e855fa86c <= bold claims, might even read this one properly to see if they hold up
Add Comment
[-]the gears to ascension3y31

interesting to me but not what you asked for

https://www.semanticscholar.org/paper/The-learning-phases-in-NN%3A-From-Fitting-the-to-a-Schneider/f0c5f3e254b3146199ae7d8feb888876edc8ec8b https://www.semanticscholar.org/paper/Deceptive-AI-Explanations%3A-Creation-and-Detection-Schneider-Handali/54560c7bce50e57d2396cbf485ff66e5fda83a13 https://www.semanticscholar.org/paper/TopKConv%3A-Increased-Adversarial-Robustness-Through-Eigen-Sadovnik/fd5a74996cc5ef9a6b866cb5608064218d060d16 https://www.semanticscholar.org/paper/This-Looks-Like-That...-Does-it-Shortcoming... (read more)

Reply

Garrett Baker

Oct 12, 2022

31

Myself and some others did some work looking at the mutual information between intermediate layers of a network, and it's input here.

Add Comment
Moderation Log
Curated and popular this week
A
2
0

Recently I've been experimenting with recreating a neural network's input layer from intermediate layer activations.

The possibility has implications for interpretability. For example, if certain neurons are activated on certain input, you know those neurons are 'about' that type of input.

My question is: Does anyone know of prior work/research in this area?

I'd appreciate even distantly-related work. I may write a blog post about my experiments if there is an interest and if there isn't already adequate research in this area.