Dialects for Humans: Sounding Distinct from LLMs

LESSWRONG
LW

Dialects for Humans: Sounding Distinct from LLMs — LessWrong

TL;DR: Humans are developing new linguistic patterns to distinguish themselves from AI-generated content, and the rate of change will accelerate.

How Dialects Form

Dialects often emerge through geographical isolation (think Australian English vs British English). But there's another powerful driver of dialect formation: the conscious or unconscious need to signal group affiliation and social identity.

Consider African American Vernacular English (AAVE), Southern American English, or "Valley Girl" speech patterns. These dialects emerged from social dynamics, the human need to belong to a group and distinguish ourselves from others. Now we're witnessing the birth of a new dialect divide, between humans and LLMs.

The LLM Dialect is Real

Anyone who spends significant time reading AI-generated content can spot it. Large Language Models have converged on a distinctive writing style that's become increasingly recognizable to human readers. Telltale signs include:

Em-dashes for dramatic pauses
Formulaic patterns like "It's not just X, it's Y"
Words like "delve," "leverage," and "nuanced"
Numbered lists and bullet points

This convergence across different SOTA models is no surprise. The highly-weighted content that shapes these models (books, Wikipedia articles, news, academic papers) overlaps significantly across training sets and creates a shared dialect, which I call "LLM English".

Humans Are Adapting

Writers like me who previously used em-dashes liberally now find themselves switching to double dashes ("--") or avoiding the punctuation entirely. The characteristic LLM juxtaposition style feels suddenly artificial when we write it ourselves. Numbered lists and excessive bolding now carry the stigma of AI generation.

A New Dialect: "Human English"

LLMs generate content by predicting the most likely next tokens based on their training data. Patterns and phrases that weren't present in their pretraining data can be understood when encountered, but are unlikely to be spontaneously generated.

If human communities can rapidly cycle through dialectical innovations like new slang, novel grammatical constructions, and fresh idiomatic expressions, they can stay ahead of the training curve. LLMs will always be working with data that's months or years behind the cutting edge of human linguistic creativity.

Consider how quickly internet slang changes. By the time "yeet" made it into dictionaries, Gen Z had already moved on to newer expressions. This rapid evolution could become even more pronounced as a conscious strategy for maintaining human linguistic identity.

Technical Challenges of Differentiation

There is one significant technical hurdle to this strategy: context length. Modern LLMs like Gemini can handle extremely long contexts, enough to load thousands of recent tweets as few-shot examples. An AI system could theoretically observe contemporary human dialect patterns in real-time and incorporate them into responses.

However, this type of real-time dialectical mimicry would be computationally expensive. Though technically possible, the cost-benefit analysis makes it unlikely for most applications.

Summary: The Future of English

Dialects emerge when there are strong social incentives for signaling group membership and distinguishing in-groups from out-groups. The LLM revolution has created exactly these conditions.

We now have clear social value in demonstrating our humanity through our communication patterns. Consciously or unconsciously, people are developing new ways to signal "I am human" through their writing and speech.

I predict the emergence of distinct "human English" dialects that evolve rapidly to stay ahead of AI capabilities. These dialects will include avoidance of AI-like patterns in addition to positive innovations in slang, grammar, and idioms.

Research Questions

How successfully can AI systems replicate unique dialect variants when given relatively small example sets?
In which online communities is "human English" already emerging? Are there early adopter communities to study?
How quickly do human dialectical innovations spread through online communities? (I'm sure this has been studied, I just haven't researched it thoroughly)

Post originally published at bengubler.com/posts/2025-07-01-dialects-for-humans

LESSWRONG
LW

LESSWRONG
LW

9

Dialects for Humans: Sounding Distinct from LLMs

9

9

How Dialects Form

The LLM Dialect is Real

Humans Are Adapting

A New Dialect: "Human English"

Technical Challenges of Differentiation

Summary: The Future of English

Research Questions