Rosco-Hunter — LessWrong

This was a really interesting paper; however, I was left with one question. Can anyone argue why exactly the model is motivated to learn a much more complex function than the identity map? An auto-encoder whose latent space is much smaller than the input is forced to learn an interesting map; however, I can't see why a highly over-parameterised auto-encoder wouldn't simply learn something close to an identity map. Is it somehow the regularisation or the bias terms? I'd love to hear an argument for why the auto-encoder is likely to learn these mono-semantic features as opposed to an identity map.

Thank you for the reply. I am using the ruler as an informal way of introducing a Pitchfork Bifurcation - see [3]. Although the specific analogy to a ruler may appear tenuous, my article merely attempts to draw links between the underlying dynamics, in which a stable point (with symmetries) splits into two stable branches protruding from a critical point. This setup is used to study a wide variety of physical and biological phenomena - see [Catastrophe Theory and its Applications, Poston and Stewart, 1978].

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments