LESSWRONG
LW

Santiago Aranguri
47Ω22210
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
SAE on activation differences
Santiago Aranguri2mo10

This is definitely a promising next direction. One lesson from working on the diff between chat and base is that the difference is not 'localized' enough: chat and base have too many differences. Taking checkpoints that are closer together can improve on this.

Reply
No wikitag contributions to display.
44SAE on activation differences
Ω
2mo
Ω
3
9Tied Crosscoders: Explaining Chat Behavior from Base Model
Ω
5mo
Ω
0