LESSWRONG
LW

Jannik Brinkmann
138000
Message
Dialogue
Subscribe

Posts

Sorted by New
38Evaluating Sparse Autoencoders with Board Game Models
10mo
1
75Interpreting Preference Models w/ Sparse Autoencoders
Ω
1y
Ω
12
50Finding Backward Chaining Circuits in Transformers Trained on Tree Search
1y
1
26Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
Ω
1y
Ω
5

Wikitag Contributions

No wikitag contributions to display.

Comments

Sorted by
Newest
No Comments Found