LESSWRONG
LW

2441
Jose Sepulveda
0020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
No posts to display.
Learning Multi-Level Features with Matryoshka SAEs
Jose Sepulveda10mo10

Oh I see that, thanks! :) Super interesting work. I'm testing it's application to recommender systems.

Reply
Learning Multi-Level Features with Matryoshka SAEs
Jose Sepulveda10mo10

Looking at your code I see you still add an L1 penalty to the loss, is this still necessary? In my own experiments I've noticed that top-k is able to achieve sparsity on it's own without the need for L1.

Reply