LESSWRONG
LW

Rosco-Hunter
-8220
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Rosco-Hunter1y10

This was a really interesting paper; however, I was left with one question. Can anyone argue why exactly the model is motivated to learn a much more complex function than the identity map? An auto-encoder whose latent space is much smaller than the input is forced to learn an interesting map; however, I can't see why a highly over-parameterised auto-encoder wouldn't simply learn something close to an identity map. Is it somehow the regularisation or the bias terms? I'd love to hear an argument for why the auto-encoder is likely to learn these mono-semantic features as opposed to an identity map.

Reply
The Buckling World Hypothesis - Visualising Vulnerable Worlds
Rosco-Hunter1y10

Thank you for the reply. I am using the ruler as an informal way of introducing a Pitchfork Bifurcation - see [3]. Although the specific analogy to a ruler may appear tenuous, my article merely attempts to draw links between the underlying dynamics, in which a stable point (with symmetries) splits into two stable branches protruding from a critical point. This setup is used to study a wide variety of physical and biological phenomena - see [Catastrophe Theory and its Applications, Poston and Stewart, 1978]. 


 

Reply
No wikitag contributions to display.
-5The Buckling World Hypothesis - Visualising Vulnerable Worlds
1y
2
-6Can AI Transform the Electorate into a Citizen’s Assembly?
1y
0