LESSWRONG
LW

3752
The Shallow Reality of 'Deep Learning Theory'
Interpretability (ML & AI)Machine Learning (ML)MATS ProgramSpectral Bias (ML)AI
Frontpage

70

Approximation is expensive, but the lunch is cheap

by Jesse Hoogland, Zach Furman
19th Apr 2023
19 min read
3

70

Interpretability (ML & AI)Machine Learning (ML)MATS ProgramSpectral Bias (ML)AI
Frontpage

70

Previous:
Empirical risk minimization is fundamentally confused
8 comments32 karma
Next:
Generalization, from thermodynamics to statistical physics
9 comments64 karma
Log in to save where you left off
Approximation is expensive, but the lunch is cheap
4Noosphere89
3M. Y. Zuo
3Jesse Hoogland
New Comment
3 comments, sorted by
top scoring
Click to highlight new comments since: Today at 12:14 PM
[-]Noosphere892y42

Crosspost from this post: https://www.lesswrong.com/posts/uG7oJkyLBHEw3MYpT/generalization-from-thermodynamics-to-statistical-physics#

On why neural networks generalize, it's known that part of the answer is: They don't generalize nearly as much as people think they do, and there are some fairly important limitations to their generalizability:

Faith and Fate is the paper I'd read, but I think there are other results, like Neural Networks and the Chomsky Hierarchy, or Transformers can't learn to solve problems recursively, but point is that neural networks are quite a bit overhyped in their ability to generalize from certain data, so some of the answer is they don't generalize as much as people think:

https://arxiv.org/abs/2305.18654

Reply
[-]M. Y. Zuo3y30

Thermodynamics of learning. As we saw, the only way to obtain more efficient bounds was to introduce restrictions to the target function class. As we will see in the next post, to obtain stronger generalization bounds, we will need to break apart the model class in a similar way. In both cases, the classical approach attempts to the study the relevant phenomenon in too much generality, which incurs no-free-lunch-y effects that prevent you from obtaining strong guarantees. 

But by breaking these classes down into more manageable subclasses, analogous to how thermodynamics breaks down the phase space into macrostates, we approach much stronger guarantees. As we'll find out in the rest of this sequence, the future of learning theory is physics.

This is a very interesting point. 

Though can you elaborate on "incurs no-free-lunch-y effects that prevent you from obtaining strong guarantees"? I can't quite parse the meaning.

Reply
[-]Jesse Hoogland3y32

The No Free Lunch Theorem says "that any two optimization algorithms are equivalent when their performance is averaged across all possible problems."

So if the class of target functions (=the set of possible problems you would want to solve) is very large, then it's harder for a random model class (=set of solutions) to do much better than any other model class. You can't obtain strong guarantees for why you should expect good approximation.

If the target function class is smaller and your model class is big enough you might have better luck.

Reply
Moderation Log
More from Jesse Hoogland
View more
Curated and popular this week
3Comments