Generalising CNNs

[-]Raemon7y50

I think this could use an opening line or paragraph roughly indicating who the post is for (from the looks of it, people with some background in neural networks, although I couldn't specify it in more detail than that)

[-]James Crook7y30

Thanks. Paragraph added.

[-]interstice7y30

You might be interested in Transformer Networks, which use a learned pattern of attention to route data between layers. They're pretty popular and have been used in some impressive applications like this very convincing image-synthesis GAN.

re: whether this is a good research direction. The fact that neural networks are highly compressible is very interesting and I too suspect that exploiting this fact could lead to more powerful models. However, if your goal is to increase the chance that AI has a positive impact, then it seems like the relevant thing is how quickly our understanding of how to align AI systems progresses, relative to our understanding of how to build powerful AI systems. As described, this idea sounds like it would be more useful for the latter.

[-]James Crook7y10

The image synthesis is impressive. The Transformer network paper looks intriguing. I will need to read again much more slowly, and not skim to understand it. Thanks for both the links and feedback on aligning AI.

I agree the ideas really are about progressing AI, rather than progressing AI specifically in a positive way. As a post-hoc justification though, exploring attention mechanisms in machine learning indicates that what AI 'cares about' may be pretty deeply embedded in its technology. Your comment, and my need to justify post-hoc, set me to the task of making that link more concrete, so let me expand on that.

I think many animals have almost hard-wired attention mechanisms for alerting them to eyes. Things with eyes are animate, and may need a reaction more rapidly than rocks or trees do. Animals do have almost hard-wired attention mechanisms for sudden movement too.

What alerting or attention setting mechanisms will AIs for self-driving cars have? Probably they will prioritise sudden movement detection. Probably they won't have any specific mechanism for alerting to eyes. Perhaps that's a mistake.

I've noticed that the bounding boxes in some videos of 'what a car sees' are pretty good for following vehicles, but flick on and off for bounding boxes around people on the sidewalk. The stable bounding boxes are relatively square. The unstable bounding boxes are tall and thin.

Now just maybe, we want to make a visual architecture that is very good at distinguishing tall thin objects that could be people, from tall thin objects that could be lamp posts. That has implications all the way down to the visual pipeline. The car is not going to be good at solving trolley problems if it can tell trucks from cars, but can't tell people from lamp posts.

[-]james_t7y30

So if I understand you, for (1) you're proposing a "hard" attention over the image, rather than the "soft" differentiable attention which is typically meant by "attention" for NNs.

You might find interesting "Recurrent Models of Visual Attention" by DeepMind (https://arxiv.org/pdf/1406.6247.pdf). They use a hard attention over the image with RL to train where to attend. I found it interesting -- there's been subsequent work using hard attention (I thiiink this is a central paper for the topic, but I could be wrong, and I'm not at all sure what the most interesting recent one is) as well.

[-]James Crook7y20

That paper is new to me - and yes related and interesting. I like their use of a 'glimpse' = more resolution in centre, less resolution further away.

About 'hard' and 'soft' - if 'hard' and 'soft' mean what I think they do, then yes, the attention is 'hard'. It forces some weights to zero that in a fully connected network could end up non zero. That might require some attention in training, as a network that has attention 'way off' where it should be has no gradient to give it better solutions.

Thanks for the link to the paper and the idea of thinking about to what extent the attention is/is-not differentiable.

[-]thomascolthurst7y20

There is a large existing literature on pruning neural networks, starting with the 1990 paper "Optimal Brain Damage" by Le Cun, Denker and Solla. A recent paper with more references is https://arxiv.org/pdf/1803.03635.pdf

LESSWRONG
LW

LESSWRONG
LW

14

14

14

Sub Goal

Dialog