x
Re-Examining LayerNorm — LessWrong