LESSWRONG
LW

Yes, they can generate a list of comments to a post, putting correct names of prominent LessWrongers and typical styles and topics for each commenter.

Add Comment

Gordon Seidoh Worley

Jul 02, 2025

Experimentally, Claude knows details about things I specifically wrote on Less Wrong without doing a web search, as well as other Less Wrong content. I'm fairly confident Less Wrong posts are in its training set and not gotten from mirrors other places.

Add Comment

Cedar

Jul 02, 2025

LessWrong scrape dataset on Hugging face, by NousResearch

https://huggingface.co/datasets/LDJnr/LessWrong-Amplify-Instruct

Add Comment

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 3:18 PM

[-]Viliam4mo35

Potentially good news is that we might contribute to raising the LLM sanity waterline?

Makes me wonder, when LLMs are trained on texts not just from LW but also from Reddit, is the karma information included? That is, is upvoted content somehow considered more important than downvoted, or is it treated all the same way?

If it is all the same, maybe the datasets could be improved by removing negative-karma content?

Moderation Log

7

[ Question ]

Are LLMs being trained using LessWrong text?

7

7

3 Answers sorted by top scoring

Jul 02, 2025

Jul 02, 2025

Jul 02, 2025

3 Answers sorted by
top scoring