I then draft section by section with Claude
I am curious about this sentence, which could mean anything from "I have Claude come up with a few section headers and one line summaries of what I ought to write for the section" to "I tell Claude to give me ten paragraphs on [topic] and set it loose."
TL;DR: What is slop, and why? Is it fundamental? Is it in the room with us right now? And, most importantly, how do we exorcise it?
Previously in this series: This Week In Fashion and On Automatic Ideas
A potential post for this Substack starts when I pick up an idea by talking to a smart person or revisiting an evergreen topic. The idea then simmers for weeks before, with help from Claude, I run experiments or work out the argument behind the idea. If the idea survives some mild red-teaming[1]. I then draft section by section with Claude[2], and eventually the post lands here.
The fact that I am using AI to write should not come as a shock - it is essentially the premise of this blog. Hence, I will not apologise. Okay, no, changed my mind, I will apologise. Some recent posts contained some very sloppy language that some of you rightly noticed. Mea culpa.
In the grand scheme of things, I do not think this is a huge deal[3]. Every post is a choice between publishing something imperfect and not publishing at all, and I force myself to publish[4]. Still, you deserve prose that doesn’t physically hurt. So I tried to fix it, and here’s what I learned.
Inside you there are two slops
“Slop” has become the standard word for unwanted, low-effort AI-generated content. The central claim of this essay is that the word conflates two distinct phenomena. The first is bad thought, i.e. writing that is superficial, contradictory, or incoherent, a problem that long predates language models. AI accelerated the speed at which a half-formed idea can become a clean, confident-looking page, putting more polished-looking bad thought in circulation[5].
The second phenomenon is a specific register, the recognisable style that is adopted by AI. The early, lexical tells have graduated into memes, and the more recent tells come in the form of recognisable cadences.
Independent of the input, the slop machine turns everything (good thought and bad thought alike) into beautifully glazed… donuts? Not sure why nanobanana decided to make the output donuts. I’m rolling with it though. I’ve set out to fix sloppy text, not the whole extended multimodal universe.
These two phenomena, the bad thought and the specific register, thus co-occur[6] in AI writing and are collectively called slop. Anyone who wants AI to write well enough to regain the trust of their readers needs two separate interventions. 1) You have to keep the input you give the model from being bad thought to make it worth the reader’s time, and 2) you have to remove the statistical tells of the register to get the reader to pick up the text in the first place[7].
De-slopping
Here is the recipe for teaching a model any new skill in four steps.
The rest of this section runs a single paragraph through that recipe.
The worked example is the second paragraph of an earlier post that I received a (very thoughtful!) reader message about:
At least the two highlighted phrases should jump out at you as pretty sloppy. I believe the content is good[8], so let’s preserve that as a bullet brief.
The obvious first move from the bag of tricks is to sample the paragraph five times from the same brief and let a model grader pick the best of the five. I prompted the grader as follows:
I felt pretty good about this grader prompt! (Except for the part where the rationale comes after the verdict, but most of the thinking happens before the submission so it’s not a huge deal).
And here is the champion selected by the grader:
The selected version is worse than the paragraph I started with. The grader appears to reward the sloppier prose[9].
To quantify this, I blindly ranked human written text against model rewritten text. While I prefer the human-written text in 90% of cases, the LLM judge only preferred it in 5% of cases! (Unfortunately we cannot just invert the LLM judge to get 95% accuracy, since worst-of-N does not behave symmetrically to best-of-N if the underlying distribution isn’t symmetric, CDF of the maximum is Fⁿ while the CDF of the minimum is 1-(1-F)ⁿ.).
Where to go from here? Let’s use two graders[10]: The first is a deterministic slop detector that checks a fixed list of lexical and statistical register tells and scores a paragraph mechanically by counting how many of them trigger[11].
This is still hackable, of course, but we won’t optimize too hard against it.
The second grader is a panel of narrow critics rather than one judge giving a single overall verdict. Each critic hunts a single class of thought-defect against a strict rubric and reports only on that class.
Each of these needs to be run with a strong model on high effort, otherwise it’ll miss a bunch. You can $ee why people don’t u$ually do thi$.
When both run over the original text, they uncover every single literary sin committed in this short paragraph.
This might appear overly critical, but it’s actually just what a normal paragraph looks like after my former PhD advisor was done with it.
Now hillclimbing becomes possible. We impose three rules:
After six iterations the slop score is usually close to or equal zero. The critic almost always continues flagging issues, but the total number goes down a bit and the number of critical issues is almost zero.
Here is the output of the pipeline on the same paragraph as above.
I think that is pretty decent. The writing is a bit flat, but that seems preferable to the sloppy default. We can always add character later.
Is that just confirmation bias? To find out I ran a blind test on 40 held-out paragraphs. For each one I compared the loop’s output against a single-shot rewrite from a frontier model and the human original and recorded which I preferred.
An hour of my life spent labelling slop. I admit I’m not the ideal labeller, I knew some of the original human-written paragraphs (although I also got some of those wrong), and I get distracted by squirrels, and my taste isn’t great, per se. But better than nuffin!
The pipeline output clearly outperforms the unprocessed model output, and performs at chance level against the human-written baseline. That is good enough for me…
… is what I would say if there wasn’t the API bill. I burned like three hundred dollars of API credits this month on these experiments alone, and if my wife finds out I am toast[13].
A bet about fuzzy tasks
In “Lossy Self-Improvement,” Nathan Lambert argues that recursive improvement will not produce a fast takeoff. One strand of his argument is that the research that can be automated is too narrow to carry a takeoff. A model can drive a single metric up, but real research means improving many objectives in tension at once, and AI can’t do that. He illustrates that point by reference to AI Writing, which AI also can’t do.
This essay is a small piece of evidence the other way. De-slopping is a fuzzy and taste-laden task, and yet we can make quick progress with a plain eval-driven loop. AI is not fundamentally incapable of doing these tasks, they just require a bit of elbow-grease.
I will grant that fuzzy tasks are messier than crisp ones though. When evaluation is hard/expensive, then errors are hard to catch. These errors can be catastrophic in high stakes work like alignment. Figuring out a better toolkit for reliable supervision on these fuzzy tasks seems really high priority to me.
But for writing, it really isn’t that hard.
Once the argument seems solid, I ask what someone I respect would say if they thought I was wrong. About half of my ideas die right here, when I realise the argument is unsalvageable.
After that I make several more passes to tighten the flow, mostly by pushing side-arguments into footnotes or sometimes cutting them entirely. Just like this sentence, which I moved to the footnotes just now!
#worst-apology-ever
Except for the many long months where I don’t.
This is at bottom an alignment failure, since we trained models to be agreeable, and an agreeable model polishes the thought it is handed rather than telling you the thought is not worth polishing. I dream of a world where the model yells at you, ‘No, stupid hoooman, don’t you see your argument is circular!’
The co-occurrence makes it rational, in the Bayesian sense, to filter your reading list to avoid the specific register, and with it the probable thoughtlessness it signals.
If you only do the latter, then you’re just shifting the distribution and forcing the reader to learn to detect it. The reader will not be pleased.
Making good content (/avoiding bad thought) is arguably the hard part, but it’s less mysterious. You kind of just have to think really hard about stuff and make sure all your arguments are coherent and reasonably novel. AI can do a lot of that part too, you just have to tell it to do that.
One hypothesis for why: The grader was trained on the register it is supposed to be filtering, so it reads that register as good writing, the way a fish has no concept of water.
One to fix the AI slop register, one to combat bad thought.
This essay lands at a slop score of 11.21 instances of slop per 1k words
The remaining nit that the pipeline flags here is that traits can’t actually ‘see’ things, which is true.
The price will come down as the models get cheaper. But the unglamorous truth is that finding the frontier of what these models can do is, for now, just kind of expensive.