Is training data going to be diluted by AI-generated content? — LessWrong