[updated] how does gpt2′s training corpus capture internet discussion?  not well — LessWrong