What this means is for the Jacobian is that the determinant tells us how much space is being squished or expanded in the neighborhood around a point. If the output space is being expanded a lot at some input point, then this means that the neural network is a bit unstable at that region, since minor alterations in the input could cause huge distortions in the output. By contrast, if the determinant is small, then some small change to the input will hardly make a difference to the output.

This isn't quite true; the determinant being small is consistent with small changes in input making arbitrarily large changes in output, just so long as small changes in input in a different direction make sufficiently small changes in output.

The frobenius norm is nothing complicated, and is really just a way of describing that we square all of the elements in the matrix, take the sum, and then take the square root of this sum.

An alternative definition of the frobenius norm better highlights its connection to the motivation of regularizing the Jacobian frobenius in terms of limiting the extent to which small changes in input can cause large changes in output: The frobenius norm of a matrix J is the root-mean-square of |J(x)| over all unit vectors x.

Reply

[-]Matthew Barnett6y10

This isn't quite true; the determinant being small is consistent with small changes in input making arbitrarily large changes in output, just so long as small changes in input in a different direction make sufficiently small changes in output.

Hmm, good point. I suppose why that's not why we're minimizing determinant, but rather frobenius norm. Hence:

An alternative definition of the frobenius norm better highlights its connection to the motivation of regularizing the Jacobian frobenius

Makes sense.

Reply

[-]AlexMennen6y30

I suppose why that's not why we're minimizing determinant, but rather frobenius norm.

Yes, although another reason is that the determinant is only defined if the input and output spaces have the same dimension, which they typically don't.

Reply

[-]Raemon6y50

TIL Jacobian's username is a pun. I guess... I should not be surprised?

Reply

[-]Vector5y30

3blue1brown touches on that way of thinking about functions/derivatives in this video

(very similar to Khan academy video linked in the article)

Reply

[-]Pattern6y20

The final set of images looks a bit like someone zooming in on a map*. (The blue part of the first image looks like the head of a cat.)

*ETA: specifically the yellow region. (Not because it's small.)

Reply

[-]Matthew Barnett6y10

I'm not sure if you're referring to the fact that it is small. If so: apologies. At the time of posting there was (still is?) a bug prohibiting me from resizing images on posts. My understanding is that this is being fixed.

Also yeah, zooming in would be good I think because that means that it's robust to changes (ie. it's going to classify it correctly even if we add noise to the output). I think it isn't actually zooming in, it's just that the decision basin for the input is getting larger.

Reply

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

22

A Primer on Matrix Calculus, Part 2: Jacobians and other fun

22

22