Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a linkpost for

Eric is a PhD student in the Department of Physics at MIT working with Max Tegmark on improving our scientific/theoretical understanding of deep learning -- understanding what deep neural networks do internally and why they work so well. 

We mostly talk about Eric's paper, The Quantization Model of Neural Scaling, but also two papers he recently published on Grokking, Towards Understanding Grokking: an effective theory of representation learning, and Omnigrok: Grokking Beyond Algorithmic Data.

Below are some highlighted quotes from our conversation (available on Youtube, Spotify, Google Podcast, Apple Podcast). For the full context for each of these quotes, you can find the accompanying transcript.

On The Quantization Of Neural Scaling

"The name of the paper is the quantization model of neural scaling. And the one-tweet summary is that it's possible for smooth loss curves on average to average over lots of small, discrete phase changes in the network performance.

What if there were a bunch of things that you need to learn to do prediction well in something language? And so these things could be pieces of knowledge or different abilities to perform certain types of specific computations.

We can imagine enumerating this set of things that you need to learn to do prediction well. And we call these the quanta of the prediction problem. And then what if the frequency in natural data that these were useful, each of these quanta, each of these pieces of knowledge or computational ability, what if the frequency that they were useful for prediction followed a power law?" (context)

Quantas are the smallest clusters for simple subtasks

"In order to predict the new line, has to count line lengths for the previous lines in the document. And then it's able to use that to accurately predict when a new line should be present. 

And you can find just a large number of clusters where the thing that is common between the clusters just seems to be that it's the same type of problem, or doing prediction on those samples requires the same piece of knowledge. And so you might call these the quanta, or evidence of there being quanta, although it's a little bit tricky, because we, in doing the clustering, enforce this discreteness, where everything is a member of a cluster, a particular cluster, and not another cluster. 

Anyway, it's complicated and weird. Who knows whether this is even the right model for thinking about the networks." (context)

What the existence of quanta would mean for interpretability

"It would be very exciting if it was the true model, because it would maybe tell you that there were these set of things where, if you enumerated them, you could understand the network's performance and understood what it has learned. It's just like, ah, there's this set of pieces of knowledge or pieces of computation that are needed.

And you could describe what these are. You could find them in the network and maybe hope to mechanistically understand the whole network by decomposing it into how it implements each one of these things, how it learns each piece of knowledge or each piece of computation."

How Quantization of Neural Scaling relates to other lines of research like Grokking, or interpretability

"With both the quanta scaling stuff and with the grokking stuff, we sort of hope to identify these maybe mechanisms in the model that are responsible for certain behaviors or for the model generalizing. And in the case of grokking, there's sort of multiple circuits or multiple mechanisms that are going on in the model or something where there's a memorizing mechanism and a generalizing mechanism. [...]

And maybe just in general beyond grokking, but in large language models and otherwise, we might hope to sort of decompose their behavior in terms of a bunch of these mechanisms. And like, if you could do this, then you could hope to do interpretability, but maybe other things like mechanistic anomaly detection or something you might hope to, you know, eventually be able to say like, ah, yes, when the network did prediction on this problem, it used this and this and this mechanism or something, or these were relevant."


New Comment