x
Neural Scaling Laws Rooted in the Data Distribution — LessWrong