TL;DR: Humans are developing new linguistic patterns to distinguish themselves from AI-generated content, and the rate of change will accelerate.
How Dialects Form
Dialects often emerge through geographical isolation (think Australian English vs British English). But there's another powerful driver of dialect formation: the conscious or unconscious need to signal group affiliation and social identity.
Consider African American Vernacular English (AAVE), Southern American English, or "Valley Girl" speech patterns. These dialects emerged from social dynamics, the human need to belong to a group and distinguish ourselves from others. Now we're witnessing the birth of a new dialect divide, between humans and LLMs.
The LLM Dialect is Real
Anyone who spends significant time reading AI-generated content can spot... (read 472 more words →)
A simple solution to the problem is ensuring that the output of an LLM aligns with a specified schema.
It's possible to do this already. Only want to give an LLM three "valid" options to choose from? Then define an output type with three valid options using a tool like dottxt-ai.github.io/outlines
In many ways, I think this is analogous to how legal systems enumerate only several valid ways of adjudicating a crime, out of the theoretically infinite decision space.