x
Minor interpretability exploration #4: LayerNorm and the learning coefficient — LessWrong