LESSWRONG
LW

741
Frank Seidl
6010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Neural networks generalize because of this one weird trick
Frank Seidl3y70

The more important aim of this conversion is that now the minima of the term in the exponent, K(w), are equal to 0. If we manage to find a way to express K(w) as a polynomial, this lets us to pull in the powerful machinery of algebraic geometry, which studies the zeros of polynomials. We've turned our problem of probability theory and statistics into a problem of algebra and geometry.

Wait... but K(w) just isn't a polynomial most of the time. Right? From its definition above, K(w) differs by a constant from the log-likelihood L(w).  So the log-likelihood has to be a polynomial too? If the network has, say, a ReLU layer, then I wouldn't even expect L(w) to be smooth.  And I can't see any reason to think that tanh or swishes or whatever else we use would make L(w) happen to be a polynomial either.

Reply