LESSWRONG
LW

733
Santiago Aranguri
47Ω22210
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
SAE on activation differences
Santiago Aranguri3mo10

This is definitely a promising next direction. One lesson from working on the diff between chat and base is that the difference is not 'localized' enough: chat and base have too many differences. Taking checkpoints that are closer together can improve on this.

Reply
44SAE on activation differences
Ω
4mo
Ω
3
9Tied Crosscoders: Explaining Chat Behavior from Base Model
Ω
7mo
Ω
0