LESSWRONG
LW

Jake Ward
27Ω20210
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Effects of Non-Uniform Sparsity on Superposition in Toy Models
Jake Ward10mo10

we stumble on a weird observation where the few features with the least sparsity are not even learned and represented in the hidden layer

I'm not sure how you're modeling sparsity, but if these features are present in nearly 100% of inputs, you could think of it as the not-feature being extremely sparse. My guess is that these features are getting baked into the bias instead of the weights so the model is just always predicting them.

Reply
35Reasoning-Finetuning Repurposes Latent Representations in Base Models
Ω
1mo
Ω
1
3Antonym Heads Predict Semantic Opposites in Language Models
10mo
0