[AN #125]: Neural network scaling laws across multiple modalities — LessWrong