LESSWRONG
LW

Jannik Brinkmann
140000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
38Evaluating Sparse Autoencoders with Board Game Models
1y
1
75Interpreting Preference Models w/ Sparse Autoencoders
Ω
1y
Ω
12
52Finding Backward Chaining Circuits in Transformers Trained on Tree Search
1y
1
26Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
Ω
1y
Ω
5