x

LESSWRONG

LW

Jose Sepulveda — LessWrong

Jose Sepulveda

Jose Sepulveda

Message

2

2y

Jose Sepulveda

2y

Learning Multi-Level Features with Matryoshka SAEs

Jose Sepulveda2y10

Oh I see that, thanks! :) Super interesting work. I'm testing it's application to recommender systems.

Learning Multi-Level Features with Matryoshka SAEs

Jose Sepulveda2y10

Looking at your code I see you still add an L1 penalty to the loss, is this still necessary? In my own experiments I've noticed that top-k is able to achieve sparsity on it's own without the need for L1.