When and why did 'training' become 'pretraining'? — LessWrong