The authors analogize this to the relationship between physics (learning mechanics), biology (mech interp) and psychology (behavioral evals). What they imply is that, as understanding physics is useful for biology which is useful for psychology, so too is learning mechanics and mech interp for model evaluations.
I like this comparison. Will there be a theory of biology? Will there be a formal theory of psychology?
I find the framing in your review is somewhat odd - I think the state of 'deep learning theory' is fairly impressive and that its sterility vis-a-vis frontier LLMs is a hint that we are looking in the wrong place. Early-2026 DLT is a major piece of evidence that we need more data-centric theory, precisely because sophisticated theory has had so much trouble connecting to the frontier. If we had worse theory, we would be more uncertain if the relevant complexities were to be located in the data or in the learning process.
Two analogies I have in mind that guide my thinking here:
Of note to me is that most of the successes of DLT have been at the level of structural depth you'd expect from studying neurons as relate to brain function. E.g., the average neuron firing must activate on average exactly one neuron (lest one die or have a grand mal; comparable to the 'edge of chaos' in DLT). These are pretty coarse results more to do with signal processing than with structured computation. It is still illuminating to see that these coarse results hold because they validate our mental models ('however this neural network is computing stuff, it still has to navigate some kind of signal/noise tradeoff in its activations').
Insofar as the intelligence of LLMs is the ability to generalize, the no-free-lunch theorems tell us that this generalization has to reflect common structure of the pre-training data and the fine-tuning task (duh!). But our theory of data isn't yet advanced enough to talk more than proleptically about that structure - e.g. a claim that fine-tuning is 'conditioning nodes in the common sparse hierarchical latent world model' is descriptive and substantive, but not enough that it is easily falsifiable. "Write Ruby code" and "Write Python code" are obviously more similar to each other than "design a jet turbine", but given only a black-box loss-function for each of those 3 tasks, it's not that clear how we could principledly determine that similarity a priori from the 'geometry' of the functions alone.
Thanks for the response!
If by "impressive" in "the state of 'deep learning theory' is fairly impressive" you mean "there's a big mathematical edifice, but not much of it is useful", then I don't disagree. Insofar as by "impressive", you mean has concrete evidence of being "useful" or "at the right level of analysis", I suspect we would strongly disagree.
(Note that I use the word "impressive" zero times in my post.)
It seems that you think deep learning theory is "impressive" because it contains sophisticated machinery that has done little in practice.
My first response is that deep learning theory in 2016 also had plenty of sophisticated machinery, but all of it turned out to be even more inapposite than current theory. Do you think that, a priori, without looking at the empirical results, you'd find the mathematical machinery of 2026 is sufficiently more sophisticated than in 2016? The additional sophistication of theory itself since 2016 was not the reason people started abandoning deep learning theory in 2016-2018; in what sense does the new work deserve additional credit for revealing that the line of work people didn't think would work in 2016 also doesn't work in 2026? In fact, a classic result from 2016 was that neural networks + SGD can easily fit random labels (which never generalize) -- surely this already conclusively demonstrates that any theory of neural network generalization must make reference to the structure of the specific hypothesis neural networks are asked to learn, if not the data itself (if the earlier no-free-lunch theorems from learning theory don't count).
My second response is that, was the mathematical machinery really necessary for the insights you point out? The toy models around superposition have arguably been more productive than all of deep learning theory, in that techniques derived from them (SAE variants) are used in production at at least one frontier lab. This is despite the much lower degree of sophistication of such work. (Even math-heavy pieces like the Comp in Sup line of work are far less mathematically tedious than the tensor program calculations, and the productive part of the superposition did not come from the Comp in Sup line of work!) Similarly, you can derive much of the scaling results for muP with a simple toy model of multi-layered linear networks (as Yang does in either the 4th or 5th tensor program papers, iirc), just like you can derive the average neuron firing must activate on average exactly 1 neuron result with hardly any mathematical model of brain activity at all.
My third response is that, if deep learning theory has indeed not been useful in practice (and indeed, might have failed so conclusively that we should look elsewhere), what makes you think a theory of data would be useful in practice?
The implicit argument you seem to give is that some theory must exist that can adequately explain the success of large deep learning models. And it's clearly not something that a theory of the learning process or network architecture can explain, so it must be a theory of data that can explain this. But why think there's an adequate explanation in the first place, let alone one that must be a result of the data or the learning process?
In the final paragraph, you give an explicit argument for this.
Insofar as the intelligence of LLMs is the ability to generalize, the no-free-lunch theorems tell us that this generalization has to reflect common structure of the pre-training data and the fine-tuning task (duh!).
No-free-lunch theorems also tell us that this generalization has to reflect the inductive bias of LLMs and their training procedures! After all, a rock can't generalize from any amount of pretraining or finetuning data to the real environment in any way but the trivial one.
e.g. a claim that fine-tuning is 'conditioning nodes in the common sparse hierarchical latent world model' is descriptive and substantive, but not enough that it is easily falsifiable.
This really does seem like you're including a claim about how fine-tuning works on neural networks, not just what we're fine-tuning them on!
I do think that, insofar as deep learning theory has failed so completely that it's ruled out the relevance of the learning mechanism to generalization, then the people who demonstrated this deserve credit. But I don't think it has failed to nearly that extent. And even if it had, I think I'd still respond critically to the paper, because the piece doesn't make that case either, as opposed to a case in favor of learning mechanics being a field worth investing into.
h/t Eric Michaud for sharing his paper with me.
There’s a tradition of high-impact ML papers using short, punchy categorical sentences as their titles: Understanding Deep Learning Requires Rethinking Generalization, Attention is All You Need, Language Models Are Few Shot Learners, and so forth.
A new paper by Simon et al. seeks to expand on this tradition with not a present claim but a future tense, prophetic future sentence: “There Will Be a Scientific Theory of Deep Learning”.
There’s a lot of pessimism toward deep learning theory basically everywhere: the people building the AIs are pretty pessimistic, the academic AI researchers are as a general rule pessimistic (even people who used to do theory!), and with the exception of maybe 3-4 research groups, the independent AI safety ecosystem has long since given up on hoping for a theory to understand deep learning.
The paper is less of a neutral assessment of the evidence and more of a manifesto arguing for a particular theoretical deep learning research agenda. Given the overall sense of doom and gloom, its form makes sense: any less and it might not be enough to shine through the general sense of pessimism to all deep learning theory.
So what’s in the paper?
The authors start by introducing what they believe to be the new emerging theory of deep learning: “learning mechanics” (its name is a deliberate nod to physics theories such as statistical mechanics or quantum mechanics). In the authors’ words, learning mechanics is a theory that concerns itself with ” the dynamics of the training process”, studies them using “coarse aggregate statistics of learning”, and has the goal to generate “accurate average-case predictions”.
(In this sense, this is less a theory of deep learning as a whole and a theory that describes important aspects of deep learning. I’ll return to this later in this piece.)
The authors lay out why such a theory is important. First, there’s the scientific reason: understanding the dynamics may help us better understand the nature of intelligence and the natural world. Second, there’s the practical, engineering reason: a clear characterization of learning dynamics would provide guidance for LLM training. Third, there’s the AI safety reason: understanding the systems better may help with regulation and AI governance, and it’s possible that learning dynamics may contribute to mech interp.
The authors then present five lines of evidence for why learning mechanics both exists, and is likely to become a “theory of deep learning”:
The authors then spend a small number of words outlining the relationship between learning mechanics and each of: classical learning theory, information theory, physics of deep learning, neuroscience, SLT/dev interp, and empirical science of deep learning. They then spend a much larger number of words outlining the connections between learning mechanics and mechanistic interpretability: learning mechanics may be able to help mech interp by formalizing core assumptions or explain how mechanisms arise during training, while mechanistic interpretability may be able to inspire phenomena to study with learning mechanics (as it has done in the past)
Next, the authors respond to arguments that they anticipate from critics:
Finally, the authors lay out 10 directions of research in learning dynamics, and provide some tips for research in this area.
The paper is clearly valuable as an overview for anyone getting into interpretability. I think it’s especially useful for people who aren’t familiar with recent academic deep learning theory work. I’d suggest that people who are serious about doing mech interp skim the paper at the very least.
But does does the main claim hold up? Does the paper convince me that that there will be a scientific theory of deep learning?
I think the authors make a stronger case that there will be some theory, than they do for the theory’s usefulness or breadth.
For all the confidence displayed by the paper’s title, I find it ironic that the applications they point to are so weak. The main use of learning mechanics research so far has been in producing new learning mechanics research to retrodict known empirical phenomena; learning dynamics as a field has yielded little practical fruit. The notable exception here is hyperparameter scaling techniques such as mu-parameterization. But even then, it’s possible to derive these techniques either empirically, or heuristically with simple toy models. From talking to deep learning engineers, these theories (at least the theories that belong to academic learning mechanics) have not been useful in practice for LLMs.
I also think it’s worth noting what is not included in learning mechanics. Learning mechanics is far less ambitious than even the moderate versions of rigorous model internals/ambitious mech interp agendas: there is no hope to understand the algorithms learned by any particular network, let alone serve as a rigorous tool for auditing.
Learning mechanics, as the authors note, is intended to be the physics to mech interp’s biology and behavior evaluations’ psychology. But I’d go further than this analogy suggests: learning mechanics is not even trying to be a theory of all of deep learning; while it may be a metaphorical physical theory, it does not endeavor to be a theory of everything. So even if learning dynamics lives up to the authors' hopes, I think it'd still fall short of being a scientific theory of deep learning.
Maybe there will be a scientific theory of deep learning. Maybe learning mechanics will become a theory covering some important aspects of deep learning. Maybe it might even be. But I don’t think the paper has convinced me about these these claims .
For all my criticism, I still really like the piece, and I’m glad the authors wrote it. Too often, believers in fields do not lay out their arguments to be challenged by others; the learning dynamics people have done so with clear language and concrete examples. Insofar as the authors failed to justify their ambitious claim in the title, it’s the result of the titular claim’s ambition as opposed to a lack of effort or evidence on their part.
At the end of the introduction, the authors lay out some hopes in their piece:
I doubt this piece will convince many practitioners that deep learning theory is on its path to fulfilling its longstanding utility. I think some AI safety/mech interp researchers may feel heartened by the theory, though I doubt it will change the mind of mech interp skeptics. But even despite these quibbles, I think the authors have done a great service by clearly laying out their hopes and evidence in a way that will be helpful for more junior researchers to understand the academic field of deep learning theory.