Thanks for posting this! Your description of transformations between layers, squashing & folding etc., reminds me of some old-school ML explanations about "how to multi-layer perceptrons work" (this is not meant as a bad thing, but a potential direction to look into!), I can't think of references right now.

It also reminds me of Victor Veitch's group's work, e.g. Park et al., though pay special attention to the refutation(?) of this particular paper.

Finally, I can imagine connecting what you say to my own research agenda around "activation plateaus" / "stable regions". I'm in the progress of producing a better write-up to explain my ideas, but essentially I have the impression that NNs map discrete regions of activation space to specific activations later on in a model (squashing?), and wonder whether we can make use of these regions.

Reply

1

[-]TristanTrim2mo10

Hey! Thanks for the links. I'll look into them.

Your description of "old-school ML explanations" makes me think of this Chris Olah article. It, along with the work of Mingwei Li ( grand tour, umap tour ) and a bunch of time spent trying to reason about the math and geometry of NNs, is what I base my current POV on.

map discrete regions of activation space to specific activations later on

If I understand correctly this corresponds with one of my key claims, that "position" not "direction" is fundamental to semantics in activation spaces. If "direction" is relevant, it is possible for it to be local and distort over distance, more like a vector field then a single vector that applies to the whole space.

In this and the following section in this video I give some more description of the idea if you are interested.

I'm planning to do self study from next week after I finish my final exam until November, and one of the things I want to do is a deep dive on the transformer architecture and attempt to extend my understanding of these concepts as they apply to vanilla and conv nets to transformer based nets.

I look forward to seeing your future work!

Reply

[-]Brendan Long2mo20

I've read something similar to this for explaining simple neural networks (wish I could remember where), but this this still work with transformers?

Reply

[-]TristanTrim2mo30

You aren't thinking of this Chris Olah article, are you? If not, I'd love to hear if you ever remember what it was.

As for applying this theory to transformers. That is a very good question! I wish I had a good answer, but that is very much one of my next goals. I want to become more familiar with the transformer architecture, getting some actual hand's on MI experience (I've only worked with vanilla and conv nets) and analyzing their structure from a math / geometry / topology perspective.

I will say that it seems to me that all "special" architectures I've looked at so far can be viewed mathematically as a very big vanilla network with specific rules for the weights. For example, LSTMs can be unrolled to make very deep networks with the rule that every one of the "unrolled" networks shares their weights with each of the others. For Conv layers, it is as if we have taken a FC layer, and set every weight connected to a far away pixel to 0 and then set each output pixel to share the kernel weights with all of the others.

I suspect something like this is true for transformers as well, but I haven't yet studied the architecture well enough to say with confidence, much less to be able to draw useful implications about transformers from the concept, but I'm optimistic about it, and plan to pursue the idea over the coming months.

Reply

1

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

14

Zoom Out: Distributions in Semantic Spaces

14

14

Introduction

New Mechanistic Interpretability Paradigm?

Input Space

Output Space

Details of Semantics

Latent Semantic Spaces

Semantic Mappings