LESSWRONG
LW

2048
1stuserhere
821180
Message
Dialogue
Subscribe

Hi, I'm Kunvar. I enjoy science and engineering. I'm currently studying the structures learned by trained neural networks and trying to understand what algorithms are implemented by these networks for various tasks. 

 

You can find my various social profiles here, my personal website here, and reach out to me at kunvar@mechinterp.com 

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
What GPT-oss Leaks About OpenAI's Training Data
1stuserhere8h10

Embedding norm is a proxy with many conflated factors, you'd wanna run ablations instead of using it as conclusive.

Also, the unused tokens -> weight decay assumes embeddings had decoupled decay and werent tied to the LM head, and no input-output tying. Does the model card specify details on this? Otherwise we can't assume so.

Reply
Intelligence Is Not Magic, But Your Threshold For "Magic" Is Pretty Low
1stuserhere3mo64

Also, Rainbolt himself admits that this skill is largely down to deliberate practice. There are dozens of geoguessr players better than him, though he is really good at NMPZ and insta-sending the guess.

Reply
A Rocket–Interpretability Analogy
1stuserhere1y10

on the one hand, mechanistic understanding has historically underperformed as a research strategy,

Are you talking about ML or in general? What are you deriving this from?

Reply
"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"
1stuserhere1y*87

I think it’s even more actively confusing because “smooth/continuous” takeoff not only could be faster in calendar time

We're talking about two different things here: take-off velocity, and timelines. All 4 possibilities are on the table - slow takeoff/long timelines, fast takeoff/long timelines, slow takeoff/short timelines, fast takeoff/short timelines.

A smooth takeoff might actually take longer in calendar time if incremental progress doesn’t lead to exponential gains until later stages.

Honestly I'm surprised people are conflating timelines and takeoff speeds.

Reply
Are the majority of your ancestors farmers or non-farmers?
1stuserhere1y42

That's interesting. On the recent episode of Dwarkesh Podcast with David Reich, at 1:18:00, there's a discussion I'll quote here:

There was a super interesting series of papers. They made many things clear but one of them was that actually the proportion of non-Africans ancestors who are Neanderthals is not 2%.

That’s the proportion of their DNA in our genomes today if you're a non-African person. It's more like 10-20% of your ancestors are Neanderthals. What actually happened was that when Neanderthals and modern humans met and mixed, the Neanderthal DNA was not as biologically fit.

The reason was that Neanderthals had lived in small populations for about half a million years since separating from modern humans—who had lived in larger populations—and had accumulated a large number, thousands of slightly bad mutations. In the mixed populations, there was selection to remove the Neanderthal ancestry. That would have happened very, very rapidly after the mixture process.

There's now overwhelming evidence that that must have happened. If you actually count your ancestors, if you're of non-African descent, how many of them were Neanderthals say, 70,000 years ago, it's not going to be 2%. It's going to be 10-20%, which is a lot.

Now I don't know which paper this is referring to but it's interesting nonetheless.

Reply
The ‘strong’ feature hypothesis could be wrong
1stuserhere1y20

Great essay!

I found it to be well written and articulates many of my own arguments in casual conversations well. I'll write up a longer comment with some points I found interesting and concrete questions accompanying them sometime later.

Reply
Daniel Tan's Shortform
1stuserhere1y10

You'll enjoy reading What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes (link to the paper)

Using a combination of theory and experiments, we show that incidental polysemanticity can arise due to multiple reasons including regularization and neural noise; this incidental polysemanticity occurs because random initialization can, by chance alone, initially assign multiple features to the same neuron, and the training dynamics then strengthen such overlap.

Reply
Daniel Tan's Shortform
1stuserhere1y30

If we train several SAEs from scratch on the same set of model activations, are they “equivalent”?

For SAEs of different sizes, for most layers, the smaller SAE does contain very high similarity with some of the larger SAE features, but it's not always true. I'm working on an upcoming post on this.

Reply
Load More
15Mechanistic Interpretability Reading group
2y
0
37How to Read Papers Efficiently: Fast-then-Slow Three pass method
3y
4