Understanding Counterbalanced Subtractions for Better Activation Additions — LessWrong