Basic facts about language models during training — LessWrong