This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
lisathiergart
Posts
Sorted by New
95
ActAdd: Steering Language Models without Optimization
Ω
19d
Ω
3
39
Open problems in activation engineering
Ω
2mo
Ω
2
47
Distillation of Neurotech and Alignment Workshop January 2023
4mo
7
392
Steering GPT-2-XL by adding an activation vector
Ω
4mo
Ω
94
101
Maze-solving agents: Add a top-right vector, make the agent go to the top-right
Ω
6mo
Ω
17
Wiki Contributions
Comments