Memory bandwidth constraints imply economies of scale in AI inference — LessWrong