LESSWRONG
LW

1547
Wei Shi
0020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Wei Shi11moΩ010

I got it, thank you very much!

Reply
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Wei Shi11moΩ010

We trained a crosscoder of width 16,384 on the residual stream activations from the middle layer of the Gemma-2 2B base and IT models.

I don't understand the training process here, as well as the mini-paper from Anthropic. How do you train one crosscoder on the residual stream from two different models?

Reply