Effects of Non-Uniform Sparsity on Superposition in Toy Models
Abstract This post summarises my findings on the effects of Non-Uniform feature sparsity on Superposition in the ReLU output model, introduced in the Toy Models of Superposition paper, the ReLU output model is a toy model which is shown to exhibit features in superposition instead of a dedicated dimension ('individual...
Hey, i'm controlling the sparsity when I'm creating the batch of the data, so during that time, i sample according to the probability i'm assigning for that feature.
re: features getting baked into the bias: yeah, that might be one of the intuitions we can develop but to me the interesting part is that that kind of behaviour didn't happen in any of the other cases when the importance was varying and just happened when the feature importance for all of them is equal. I don't have a concrete intuition on why that might be the case, still trying to think on it.