Activation additions in a small residual network — LessWrong