x
Steering Language Models with Weight Arithmetic — LessWrong