[Link] Training Compute-Optimal Large Language Models — LessWrong