Can you force a neural network to keep generalizing?

[-]the gears to ascension3y50

This suggestion doesn't really make sense as is; It might be fixable, but I would suggest implementing a few neural training algorithms from scratch in either code with no libraries, or a simple library like numpy that does not provide built in backprop. there are also a lot of great videos that introduce the basics of neural networks on YouTube. here are some of my favorites from my intro to ml playlists, I've got a lot more channel recommendations in my shortform (there's some redundancy between these videos, so maybe watch them on different days)

overviews of the intuition for NNs:

https://youtu.be/0QczhVg5HaI - "why neural networks can learn almost anything", 245k views, 10min
https://youtu.be/e5xKayCBOeU - "The mathematics of neural networks", 109k views, 14min
https://youtu.be/-at7SLoVK_I - "why are neural networks so effective?", 17k views, 14min
https://youtu.be/BePQBWPnYuE - "My understanding of the manifold hypothesis", 17k views, 7min
https://youtu.be/QHj9uVmwA_0 - "neural manifolds and the geometry of behavior", 156k views, 23min
https://youtu.be/UOvPeC8WOt8 - "The neural network, a visual introduction", 111k views, 14min

overviews of the math starting at zero:

https://youtu.be/aircAruvnKk - "but what is a neural network?", 11M views, 20min - first video in a very beginner focused series by 3b1b
https://youtu.be/VMj-3S1tku0 - "The spelled out intro to neural networks and backpropagation: building micrograd", 130k views, 2 hours 25 minutes - fantastic intro by karpathy, a well-known deep learning researcher
https://youtu.be/PaCmpygFfXo - The spelled out intro to language modeling: building makemore", 18k views, 2 hours - also by karpathy

I share these in the hope that they turn out to be relevant to your level, but if I guessed wrong about what level to suggest, or if you simply learn better from other kinds of media besides videos, that feedback would be welcome and useful for my learning. many of these folks I am recommending also have videos on other advanced technical topics, btw.

[-]Q Home3y10

Please, don't write offtopic. I'm not an expert, but I have my reasons to suggest this idea.

Keep in mind:

I discussed this idea at least with some people before.
You could understand this idea outside of the context of neural networks.

This suggestion doesn't really make sense as is; It might be fixable

What things don't make sense to you/have unclear motivation?

I've seen 3b1b series (among other things).

[-]the gears to ascension3y30

It learns what is a "wheel". (I know that the model is unlikely to learn a human concept.)

It learns that cars have wheels and poodles don't have wheels.

You show it a poodle on wheels and it needs to relearn everything from scratch. The new data completely destroyed the previous idea of what's a car and what's a poodle.

This is wrong; I don't know how to fix it in english. It would learn all of those things smoothly-ish in parallel; this is why I suggest watching several different training animations to get a sense of what a training process looks like in my previous post. Another resource I should have mentioned is the comparison of weights in Nanda's recent work on grokking - the tweet summary is likely good enough given that we're not focused here on the core of the paper: https://twitter.com/NeelNanda5/status/1559060507524403200 (full lesswrong post: https://www.lesswrong.com/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking )

It sounds like you're trying to come up with domain generalization methods, and you're thinking in terms of doing so using adversarial training; I would suggest popping open semantic scholar and searching for out-of-distribution, domain generalization, adversarial training, and any other keywords that pop up. Then, skim some papers; even a very experienced researcher usually doesn't understand most papers on a quick skim, but it gives context as to which methods seem to work. Once you've gotten a sense for results they had, read closer to find the math; spend some time staring at the math being confused, then make a shortform post to try to figure out what they're talking about. When inevitably that isn't enough, try to invent their method from memory, and look up the math that you remember on a few platforms - especially wikipedia and youtube, but there are a few other search engines that are good for understanding math, especially kagi and teclis, which are better at finding small-time websites than google and ddg - and try to figure it out. English typically underspecifies the math by a fairly large margin.

To be clear - your english ideas here aren't totally barking up the wrong tree. It's not that your idea is fundamentally bad or anything! My point in commenting with a bunch of resources isn't to say you cannot figure out a workable version of this insight, my point is that you are effectively on an early step of figuring out how to precisely specify your idea, and it won't be worth others' time to use your idea until it becomes at least a reasonable summary of the current state of the research or a clear specification of how to go a different direction. eg, are you thinking of learning to learn, perhaps?

[-]Q Home3y0-1

Thank you for taking the time to answer. But I think this is an irrational attitude. Not yours specifically, but in general it's an irrational tradition of analyzing ideas and proposals. It's incompatible with rational analysis of evidence and possibilities. (I wrote a post about this.)

my point is that you are effectively on an early step of figuring out how to precisely specify your idea, and it won't be worth others' time to use your idea until it becomes at least a reasonable summary of the current state of the research or a clear specification of how to go a different direction.

Imagine that one day you suddenly got personal evidence about the way your brain learns. You "saw" the learning process of your brain. Not on the lowest level, but still, you saw a lot of things that you didn't know were true or didn't even imagine as a possibility.

Did you gain any information?
If you saw some missed possibilities, can you explain them to other people?
Are you more likely to find a way to "go in a different direction" in your research than before?

If "yes", then what stops us from discussing ideas not formulated in math? Neural nets aren't a thing disconnected from reality, they try to model learning and you should be able to discuss what "learning" means and how it can happen outside of math. (I mean, if you want you can analyze neural nets as pure math with 0 connection to reality.)

If even professional researchers can't easily understand the papers, it means they don't have high level ideas about "learning"^[1]. So it's strange to encounter a rare high level idea and say that it's not worth anyone's time if it's not math. Maybe it's worth your time because it's not math. Maybe you just rejected thinking about a single high level idea you know about abstract learning.

My idea is applicable to human learning too. You also could imagine situations/objects for which this way of learning is the best one. (A thought experiment that could help you to understand the idea. But you don't let me to argue with you or to explain anything.)

This is wrong; I don't know how to fix it in english. It would learn all of those things smoothly-ish in parallel; this is why I suggest watching several different training animations to get a sense of what a training process looks like in my previous post.

Likely it doesn't affect the point of my post. It's just a nitpick. (I watched Grant's series, I watched some animations you linked.)

^{^}
I don't mean any disrespect here. Just saying they have no other context to work with.

[-]the gears to ascension3y21

If even professional researchers can't easily understand the papers, it means they don't have high level ideas about "learning"[1]. So it's strange to encounter a rare high level idea and say that it's not worth anyone's time if it's not math. Maybe it's worth your time because it's not math. Maybe you just rejected thinking about a single high level idea you know about abstract learning.

This will be my last comment on this post, but for what it's worth, math vs not-math is primarily a question of vagueness. Your english description is too vague to turn into useful math. Precise math can describe reality incredibly well, if it's actually the correct model. Being able to understand the fuzzy version of precise math is in fact useful, you aren't wrong, and I don't think your sense that intuitive reasoning can be useful is wrong. Your idea here, however, seems to underspecify which math it describes, and to the degree I can see ways to convert it into math, it appears to describe math which is false. The difficulty of understanding papers isn't because they don't understand learning, it's simply because writing understandable scientific papers is really hard and most papers do a bad job explaining themselves. (it's fair to say they don't understand it as well as they ideally would, of course.)

I agree that good use of vague ideas is important, but someone else here recently made the point that a lot of what needs to be done to use vague ideas well is to be good at figuring out which vague ideas are not promising and skip focusing on them. Unfortunately, vagueness makes it hard to avoid accidentally paying too much attention to less-promising ideas, and it makes it hard to avoid accidentally paying too little attention to highly-promising ideas.

In machine learning, it is very often the case that someone tried an idea before you thought of it, but tried it poorly and their version can be improved. If you want to make an impact on the field, I'd strongly suggest finding ways to rephrase this idea so that it is more precise; again, my problem with it is that it underspecifies the math severely and in order to make use of your idea I would have to go myself read those papers I suggest you go look at.

[-]Q Home3y10

I agree that good use of vague ideas is important, but someone else here recently made the point that a lot of what needs to be done to use vague ideas well is to be good at figuring out which vague ideas are not promising and skip focusing on them.

I don't think there's a lot of high level ideas about learning. So I don't see a problem of choosing between ideas. Note that "vague idea about neural nets' math" and "(vague) idea about learning" are two different things.

again, my problem with it is that it underspecifies the math severely and in order to make use of your idea I would have to go myself read those papers I suggest you go look at.

Maybe if you tried to discuss the idea I could change your opinion.

Your idea here, however, seems to underspecify which math it describes, and to the degree I can see ways to convert it into math, it appears to describe math which is false.

That would mean that my idea is wrong on non-math level too and you could explain why (or at least explain why you can't explain). I feel that you don't think in terms of levels of the problem and the way they correspond.

Your english description is too vague to turn into useful math.

I don't think "vagueness" is even a meaningful concept. An idea may be identical to other ideas or unclear, but not "vague". If you see that an idea is different from some other idea and you understand what the idea says (about anything), then it's already specific enough. Maybe you jump into neural nets math too early.

I think you can turn my idea into precise enough statements not tied to math of neural nets. Then you can see what implications the idea has for neural nets.

[-]Lech Mazur3y40

Just a quick comment: don't use Wikipedia for machine learning topics. Unlike using it for e.g. some math topics, it's very outdated and full of poorly written articles. Instead, the intro sections of ML papers or review papers that you can find through Google Scholar are usually quite readable.

[-]the gears to ascension3y10

It has been improved significantly in the past few years, but it does still tend to lag the papers themselves.

[-]Lech Mazur3y21

I think even that is overstating how useful it. For example, I think we can all agree that regularization is a huge and very important topic in ML for years. Here is the Wiki entry: https://en.wikipedia.org/wiki/Regularization_(mathematics)#Other_uses_of_regularization_in_statistics_and_machine_learning . Or interpretability: https://en.wikipedia.org/wiki/Explainable_artificial_intelligence . Things like layer normalization are not even mentioned anywhere. Pretty useless for learning about neural nets.

[-]the gears to ascension3y10

yeah, fair enough.

LESSWRONG
LW

LESSWRONG
LW

2

Can you force a neural network to keep generalizing?

2

2

In general

Example

Using DeepDream

Thoughts

Paper

CycleGANs

Iterated Distillation and Amplification

Inspiration