x
Activation additions in a small residual network — LessWrong