LESSWRONG
Petrov Day
LW

212
Amirali Abdullah
33100
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
18Steering Language Models in Multiple Directions Simultaneously
5mo
0
16Backdoors have universal representations across large language models
10mo
0
18Early Experiments in Reward Model Interpretation Using Sparse Autoencoders
2y
0