Quick Paper Review: "There Will Be a Scientific Theory of Deep Learning"

LawrenceC

h/t Eric Michaud for sharing his paper with me.

There’s a tradition of high-impact ML papers using short, punchy categorical sentences as their titles: Understanding Deep Learning Requires Rethinking Generalization, Attention is All You Need, Language Models Are Few Shot Learners, and so forth.

A new paper by Simon et al. seeks to expand on this tradition with not a present claim but a future tense, prophetic future sentence: “There Will Be a Scientific Theory of Deep Learning”.

There’s a lot of pessimism toward deep learning theory basically everywhere: the people building the AIs are pretty pessimistic, the academic AI researchers are as a general rule pessimistic (even people who used to do theory!), and with the exception of maybe 3-4 research groups, the independent AI safety ecosystem has long since given up on hoping for a theory to understand deep learning.

The paper is less of a neutral assessment of the evidence and more of a manifesto arguing for a particular theoretical deep learning research agenda. Given the overall sense of doom and gloom, its form makes sense: any less and it might not be enough to shine through the general sense of pessimism to all deep learning theory.

So what’s in the paper?

The authors start by introducing what they believe to be the new emerging theory of deep learning: “learning mechanics” (its name is a deliberate nod to physics theories such as statistical mechanics or quantum mechanics). In the authors’ words, learning mechanics is a theory that concerns itself with ” the dynamics of the training process”, studies them using “coarse aggregate statistics of learning”, and has the goal to generate “accurate average-case predictions”.

(In this sense, this is less a theory of deep learning as a whole and a theory that describes important aspects of deep learning. I’ll return to this later in this piece.)

The authors lay out why such a theory is important. First, there’s the scientific reason: understanding the dynamics may help us better understand the nature of intelligence and the natural world. Second, there’s the practical, engineering reason: a clear characterization of learning dynamics would provide guidance for LLM training. Third, there’s the AI safety reason: understanding the systems better may help with regulation and AI governance, and it’s possible that learning dynamics may contribute to mech interp.

The authors then present five lines of evidence for why learning mechanics both exists, and is likely to become a “theory of deep learning”:

There exist toy settings that we can analytically solve, that also yield insights that may transfer to large models in practice. Most of these results are from either deep linear networks or linearized versions of neural networks, though recently theoretical progress has been made on toy non-linear neural networks (e.g. 2 layer networks or attention-only models).
We can take the infinite width or infinite depth limit of neural networks, which sometimes yields interesting insights that can be applied for models in practice (the classic example is mu-parameterization).
There are clear regularities between aggregate statistics of neural networks: the classic scaling laws that relate parameter count, dataset size, and loss, or various patterns in the weight dynamics, gradient alignment, or basin width over the course of training. While there aren’t many examples of theory allowing us to produce novel predictions of aggregate statistics, the fact that these clear regularities exist, and some theoretical progress has been made in explaining them, is a reason for hope.
We’ve made progress in terms of understanding and disentangling hyperparameters. Here lies perhaps the main concrete applications of deep learning theory: generating novel rules-of-thumb for scaling learning/initialization hyperparameters as you increase the amount of data or model parameters (again, mu-parameterization is the classic example).
We’ve found universality in inductive biases, data structure, and representations. That is, it seems that different deep neural networks architectures seem to learn similar representations because many datasets also have similar properties. Again, while the theory is still nascent, the fact that these universals exist is reason for hope.

The authors then spend a small number of words outlining the relationship between learning mechanics and each of: classical learning theory, information theory, physics of deep learning, neuroscience, SLT/dev interp, and empirical science of deep learning. They then spend a much larger number of words outlining the connections between learning mechanics and mechanistic interpretability: learning mechanics may be able to help mech interp by formalizing core assumptions or explain how mechanisms arise during training, while mechanistic interpretability may be able to inspire phenomena to study with learning mechanics (as it has done in the past)

Next, the authors respond to arguments that they anticipate from critics:

People have tried for decades to develop a theory of deep learning, and they’ve largely failed. The authors correctly point out that the success of deep learning is quite recent (as has the recent research into learning dynamics), and the total amount of effort invested so far is small relative to other scientific disciplines.
Theory is very far from explaining LLMs. The authors respond that we might still find “local theories” that explain parts of behavior at different scales, and that basic theory may still be useful by providing conceptual handles for analyzing LLMs.
Models’ high level behavior matters, but low-level theories can’t capture this. The authors analogize this to the relationship between physics (learning mechanics), biology (mech interp) and psychology (behavioral evals). What they imply is that, as understanding physics is useful for biology which is useful for psychology, so too is learning mechanics and mech interp for model evaluations.
We need a theory of data, not of deep learning. The authors correctly point out that these theories are likely to be complementary.
The AIs will automate away all human endeavor. The authors note that this is not a unique argument against deep learning theory; all human endeavor is at stake. They argue that theory is already useful, that there will be a transition period with AI-augmented human research, and that understanding learning dynamics may help with oversight of superhuman AIs. (Personally, I find this response the weakest, in large part because I likely disagree with the authors on the usefulness of present work.)

Finally, the authors lay out 10 directions of research in learning dynamics, and provide some tips for research in this area.

The paper is clearly valuable as an overview for anyone getting into interpretability. I think it’s especially useful for people who aren’t familiar with recent academic deep learning theory work. I’d suggest that people who are serious about doing mech interp skim the paper at the very least.

But does does the main claim hold up? Does the paper convince me that that there will be a scientific theory of deep learning?

I think the authors make a stronger case that there will be some theory, than they do for the theory’s usefulness or breadth.

For all the confidence displayed by the paper’s title, I find it ironic that the applications they point to are so weak. The main use of learning mechanics research so far has been in producing new learning mechanics research to retrodict known empirical phenomena; learning dynamics as a field has yielded little practical fruit. The notable exception here is hyperparameter scaling techniques such as mu-parameterization. But even then, it’s possible to derive these techniques either empirically, or heuristically with simple toy models. From talking to deep learning engineers, these theories (at least the theories that belong to academic learning mechanics) have not been useful in practice for LLMs.

I also think it’s worth noting what is not included in learning mechanics. Learning mechanics is far less ambitious than even the moderate versions of rigorous model internals/ambitious mech interp agendas: there is no hope to understand the algorithms learned by any particular network, let alone serve as a rigorous tool for auditing.

Learning mechanics, as the authors note, is intended to be the physics to mech interp’s biology and behavior evaluations’ psychology. But I’d go further than this analogy suggests: learning mechanics is not even trying to be a theory of all of deep learning; while it may be a metaphorical physical theory, it does not endeavor to be a theory of everything. So even if learning dynamics lives up to the authors' hopes, I think it'd still fall short of being a scientific theory of deep learning.

Maybe there will be a scientific theory of deep learning. Maybe learning mechanics will become a theory covering some important aspects of deep learning. Maybe it might even be. But I don’t think the paper has convinced me about these these claims .

For all my criticism, I still really like the piece, and I’m glad the authors wrote it. Too often, believers in fields do not lay out their arguments to be challenged by others; the learning dynamics people have done so with clear language and concrete examples. Insofar as the authors failed to justify their ambitious claim in the title, it’s the result of the titular claim’s ambition as opposed to a lack of effort or evidence on their part.

At the end of the introduction, the authors lay out some hopes in their piece:

We hope the veteran scientist of deep learning will find something valuable in our synthesis of useful approaches and results, and feel galvanized by our depiction of an emerging science. We hope to convince the deep learning practitioner that theory is on a path to fulfilling its longstanding promise of practical utility and to encourage them to experiment with their systems with an eye for science. We hope to convince the AI safety or mechanistic interpretability researcher that white-box theory is difficult yet possible … Lastly, we hope to make it easier for young students and newcomers to the field to get involved.

I doubt this piece will convince many practitioners that deep learning theory is on its path to fulfilling its longstanding utility. I think some AI safety/mech interp researchers may feel heartened by the theory, though I doubt it will change the mind of mech interp skeptics. But even despite these quibbles, I think the authors have done a great service by clearly laying out their hopes and evidence in a way that will be helpful for more junior researchers to understand the academic field of deep learning theory.

The authors analogize this to the relationship between physics (learning mechanics), biology (mech interp) and psychology (behavioral evals). What they imply is that, as understanding physics is useful for biology which is useful for psychology, so too is learning mechanics and mech interp for model evaluations.

I like this comparison. Will there be a theory of biology? Will there be a formal theory of psychology?

I find the framing in your review is somewhat odd - I think the state of 'deep learning theory' is fairly impressive and that its sterility vis-a-vis frontier LLMs is a hint that we are looking in the wrong place. Early-2026 DLT is a major piece of evidence that we need more data-centric theory, precisely because sophisticated theory has had so much trouble connecting to the frontier. If we had worse theory, we would be more uncertain if the relevant complexities were to be located in the data or in the learning process.

Two analogies I have in mind that guide my thinking here:

Understanding the physics of animal neurons is necessary to understanding neuroscience. The complexity of brains exists at a 'higher level' than individual neurons, but understanding neurons carves out how much of that complexity can be subcellular vs. being in their larger-scale organization. In the same way, something like LLM behaviour is a product of training process, data, etc., and a good theory of learning lets us ask what part of that complexity belongs where.
Statistical physics as a formalism provides a family of techniques for analyzing physical systems with many degrees-of-freedom. Its great intellectual triumph is the discovery that some things depend on the details how how those degrees-of-freedom interact, while others do not: behaving-like-a-gas is a highly generic property, but material properties like fatigue and fracture-strength can depend quite sensitively on the specifics of the sample in question. The key thing is that we can try to figure out which properties are 'universal', and which are not. In DLT, I think it's much more like a spectrum, as there are many more knobs to tune - data, gross and fine architecture, hyperparameters etc.

Of note to me is that most of the successes of DLT have been at the level of structural depth you'd expect from studying neurons as relate to brain function. E.g., the average neuron firing must activate on average exactly one neuron (lest one die or have a grand mal; comparable to the 'edge of chaos' in DLT). These are pretty coarse results more to do with signal processing than with structured computation. It is still illuminating to see that these coarse results hold because they validate our mental models ('however this neural network is computing stuff, it still has to navigate some kind of signal/noise tradeoff in its activations').

Insofar as the intelligence of LLMs is the ability to generalize, the no-free-lunch theorems tell us that this generalization has to reflect common structure of the pre-training data and the fine-tuning task (duh!). But our theory of data isn't yet advanced enough to talk more than proleptically about that structure - e.g. a claim that fine-tuning is 'conditioning nodes in the common sparse hierarchical latent world model' is descriptive and substantive, but not enough that it is easily falsifiable. "Write Ruby code" and "Write Python code" are obviously more similar to each other than "design a jet turbine", but given only a black-box loss-function for each of those 3 tasks, it's not that clear how we could principledly determine that similarity a priori from the 'geometry' of the functions alone.

Thanks for the response!

If by "impressive" in "the state of 'deep learning theory' is fairly impressive" you mean "there's a big mathematical edifice, but not much of it is useful", then I don't disagree. Insofar as by "impressive", you mean has concrete evidence of being "useful" or "at the right level of analysis", I suspect we would strongly disagree.

(Note that I use the word "impressive" zero times in my post.)

It seems that you think deep learning theory is "impressive" because it contains sophisticated machinery that has done little in practice.

My first response is that deep learning theory in 2016 also had plenty of sophisticated machinery, but all of it turned out to be even more inapposite than current theory. Do you think that, a priori, without looking at the empirical results, you'd find the mathematical machinery of 2026 is sufficiently more sophisticated than in 2016? The additional sophistication of theory itself since 2016 was not the reason people started abandoning deep learning theory in 2016-2018; in what sense does the new work deserve additional credit for revealing that the line of work people didn't think would work in 2016 also doesn't work in 2026? In fact, a classic result from 2016 was that neural networks + SGD can easily fit random labels (which never generalize) -- surely this already conclusively demonstrates that any theory of neural network generalization must make reference to the structure of the specific hypothesis neural networks are asked to learn, if not the data itself (if the earlier no-free-lunch theorems from learning theory don't count).

My second response is that, was the mathematical machinery really necessary for the insights you point out? The toy models around superposition have arguably been more productive than all of deep learning theory, in that techniques derived from them (SAE variants) are used in production at at least one frontier lab. This is despite the much lower degree of sophistication of such work. (Even math-heavy pieces like the Comp in Sup line of work are far less mathematically tedious than the tensor program calculations, and the productive part of the superposition did not come from the Comp in Sup line of work!) Similarly, you can derive much of the scaling results for muP with a simple toy model of multi-layered linear networks (as Yang does in either the 4th or 5th tensor program papers, iirc), just like you can derive the average neuron firing must activate on average exactly 1 neuron result with hardly any mathematical model of brain activity at all.

My third response is that, if deep learning theory has indeed not been useful in practice (and indeed, might have failed so conclusively that we should look elsewhere), what makes you think a theory of data would be useful in practice?

The implicit argument you seem to give is that some theory must exist that can adequately explain the success of large deep learning models. And it's clearly not something that a theory of the learning process or network architecture can explain, so it must be a theory of data that can explain this. But why think there's an adequate explanation in the first place, let alone one that must be a result of the data or the learning process?

In the final paragraph, you give an explicit argument for this.

Insofar as the intelligence of LLMs is the ability to generalize, the no-free-lunch theorems tell us that this generalization has to reflect common structure of the pre-training data and the fine-tuning task (duh!).

No-free-lunch theorems also tell us that this generalization has to reflect the inductive bias of LLMs and their training procedures! After all, a rock can't generalize from any amount of pretraining or finetuning data to the real environment in any way but the trivial one.

e.g. a claim that fine-tuning is 'conditioning nodes in the common sparse hierarchical latent world model' is descriptive and substantive, but not enough that it is easily falsifiable.

This really does seem like you're including a claim about how fine-tuning works on neural networks, not just what we're fine-tuning them on!

I do think that, insofar as deep learning theory has failed so completely that it's ruled out the relevance of the learning mechanism to generalization, then the people who demonstrated this deserve credit. But I don't think it has failed to nearly that extent. And even if it had, I think I'd still respond critically to the paper, because the piece doesn't make that case either, as opposed to a case in favor of learning mechanics being a field worth investing into.

The authors analogize this to the relationship between physics (learning mechanics), biology (mech interp) and psychology (behavioral evals). What they imply is that, as understanding physics is useful for biology which is useful for psychology, so too is learning mechanics and mech interp for model evaluations.

I like this comparison. Will there be a theory of biology? Will there be a formal theory of psychology?

Two analogies I have in mind that guide my thinking here:

Understanding the physics of animal neurons is necessary to understanding neuroscience. The complexity of brains exists at a 'higher level' than individual neurons, but understanding neurons carves out how much of that complexity can be subcellular vs. being in their larger-scale organization. In the same way, something like LLM behaviour is a product of training process, data, etc., and a good theory of learning lets us ask what part of that complexity belongs where.
Statistical physics as a formalism provides a family of techniques for analyzing physical systems with many degrees-of-freedom. Its great intellectual triumph is the discovery that some things depend on the details how how those degrees-of-freedom interact, while others do not: behaving-like-a-gas is a highly generic property, but material properties like fatigue and fracture-strength can depend quite sensitively on the specifics of the sample in question. The key thing is that we can try to figure out which properties are 'universal', and which are not. In DLT, I think it's much more like a spectrum, as there are many more knobs to tune - data, gross and fine architecture, hyperparameters etc.

Thanks for the response!

(Note that I use the word "impressive" zero times in my post.)

It seems that you think deep learning theory is "impressive" because it contains sophisticated machinery that has done little in practice.

In the final paragraph, you give an explicit argument for this.

Insofar as the intelligence of LLMs is the ability to generalize, the no-free-lunch theorems tell us that this generalization has to reflect common structure of the pre-training data and the fine-tuning task (duh!).

e.g. a claim that fine-tuning is 'conditioning nodes in the common sparse hierarchical latent world model' is descriptive and substantive, but not enough that it is easily falsifiable.

This really does seem like you're including a claim about how fine-tuning works on neural networks, not just what we're fine-tuning them on!

101

Quick Paper Review: "There Will Be a Scientific Theory of Deep Learning"

101

Ω 42

101

Ω 42

101

Ω 42