Addendum: basic facts about language models during training — LessWrong