Frank Seidl

Posts

Sorted by New

Wiki Contributions

Comments

Neural networks generalize because of this one weird trick

The more important aim of this conversion is that now the minima of the term in the exponent, , are equal to 0. If we manage to find a way to express $K (w)$ as a polynomial, this lets us to pull in the powerful machinery of algebraic geometry, which studies the zeros of polynomials. We've turned our problem of probability theory and statistics into a problem of algebra and geometry.

Wait... but $K (w)$ just isn't a polynomial most of the time. Right? From its definition above, $K (w$ ) differs by a constant from the log-likelihood $L (w)$ . So the log-likelihood has to be a polynomial too? If the network has, say, a ReLU layer, then I wouldn't even expect $L (w)$ to be smooth. And I can't see any reason to think that $t a n h$ or swishes or whatever else we use would make $L (w)$ happen to be a polynomial either.

Reply