Hi, I'm Kunvar. I enjoy science and engineering. I'm currently studying the structures learned by trained neural networks and trying to understand what algorithms are implemented by these networks for various tasks.
You can find my various social profiles here, my personal website here, and reach out to me at kunvar@mechinterp.com
Good post, thanks for sharing! found it somewhat relatable to my prior life experiences too
Great essay!
I found it to be well written and articulates many of my own arguments in casual conversations well. I'll write up a longer comment with some points I found interesting and concrete questions accompanying them sometime later.
For each of the resulting 1313 arguments, crowdworkers were first asked to rate their support of the corresponding claim on a Likert scale from 1 (“Strongly Oppose”) to 7 (“Strongly Oppose”).
You probably mean strongly oppose to strongly support
You'll enjoy reading What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes (link to the paper)
Using a combination of theory and experiments, we show that incidental polysemanticity can arise due to multiple reasons including regularization and neural noise; this incidental polysemanticity occurs because random initialization can, by chance alone, initially assign multiple features to the same neuron, and the training dynamics then strengthen such overlap.
If we train several SAEs from scratch on the same set of model activations, are they “equivalent”?
For SAEs of different sizes, for most layers, the smaller SAE does contain very high similarity with some of the larger SAE features, but it's not always true. I'm working on an upcoming post on this.
This is purely anecdotal - supplementing sleep debt with cardio-intensive exercise works for me. For example, I usually need 7 hrs of sleep. If I sleep for only 5 hrs, I'm likely to feel a drop in mental sharpness around midway the next day. However, if I go for an hour long run, I miss that drop almost completely and feel just as good I normally would've with a complete sleep.
It's also worth noting that LLMs are not learning directly from the raw input stream but from a crux of that data (LLMs learn on compressed data) i.e. the LLMs are fed tokenized data, and the tokenizers act as compressors. This benefits the models by enabling them to have a more information-rich context.
I think that the answer is no
In this “VRAM-constrained regime,” MoE models (trained from scratch) are nowhere near competitive with dense LLMs.
Curious whether your high-level thoughts on these topics still hold or have changed.
On a more narrow distribution this head could easily exhibit just one behaviour and eg seem like a monosemantic inductin head
induction* head
That's interesting. On the recent episode of Dwarkesh Podcast with David Reich, at 1:18:00, there's a discussion I'll quote here:
Now I don't know which paper this is referring to but it's interesting nonetheless.