x
Why Bigger Models Generalize Better — LessWrong