My hypothesis was that the chief problem with AI prose is the strict, strong biases imposed during RLHF. Like a good Bayesian, I ran the experiment after establishing my priors in order to check and update them as needed - I took the quiz and picked the human option each time (5/5), despite not being familiar with several of the writers[1].
At each turn, the AI's writing was characterized by the following pattern. It mimicked the sentiment and often content of the human piece, but:
The crucial takeaway is that none of this is due to technical limitations - it is all by choice. I have heard from Chinese friends that DeepSeek 1.0 emulated the style of old Chinese poetry, for instance, when speaking in Chinese. It would be quite easy to train a consumer LLM without such strong impositions on its style, provided a company was motivated to do so. I expect that perfectly fine results could be achieved by fine-tuning an existing one on a curated set of good but non-LLM-like prose.
I know, I know, philistine.
I appreciate your scientific spirit.
The crucial takeaway is that none of this is due to technical limitations - it is all by choice.
I do not think this is true. The model does try to make metaphors, the metaphors just do not make sense.
See mine:
Unfortunately, Claude's prose here leaves much to be desired:
- "A fever brought down will rise again somewhere" is not an example of a remedies extracting cost any more than Whac-a-Mole is an example of mallets producing moles.
- "A wound closed by magic leaves its scar on the world, invisible but present" is merely an assertion, since the mechanism of the magic is not explained and cannot be presumed to be understood by the reader. The writer also fails to justify that the scar is a weighty cost. If a wise healer let me bleed out because he didn't want to cause a scar, I would be more than mildly disappointed.
- "To cure a blight may curse a harvest three valleys over." Again, the mechanism for this is not remotely explained.
- "Power is not the difficult thing. Restraint is the difficult thing." Claude sure likes making claims! Why does it matter that restraint is difficult? Why is restraint difficult? What does acting with restraint look like?
Outside of these excerpts, I have seen LLMs make many attempts at parallelism and metaphor that are deeply imperfect or incoherent.
This generalizes to other attempts at figurative language.
For example, models often attempt but struggle to keep parallelism between paragraphs or list items.
From a friend's conversation with ChatGPT (which he highlighted as good prose...):
The really important thing is that America is not just “the West.” It is a very specific mutation of the West. More moralistic than Europe, more religious in structure than it admits, less rooted, more expansive, more energetic, more lonely.
Note the flawed parallelism with "it admits," and then the subsequent confusion regarding the subject of comparison.
Finally, I also challenge you to produce good prose with a Kimi or DeepSeek model.
I'm revisiting this subject after a friend explicitly told me that they were impressed by ChatGPT written prose, and believed it to be superior to most human prose.
Taste is a subjective matter, but I am baffled by this preference. The rest of this post describes my frustrations with AI-written prose. My hope is that clarifying these complaints will be a small contribution toward improving the state of AI writing. If we do not dramatically improve the quality of AI writing, I worry that our literary culture will only further degrade as AI writing proliferates.
I share these reactions. If anything, I feel like AI-written prose is getting worse over time by my own lights, and I am confused and unsettled by the divergence between my own reactions to it and the reactions of others.
I took this quiz when it first came out and preferred the human passage in 5/5 cases. It's not that I thought the human passages were blindingly brilliant or anything -- more that the AI passages were bad, and bad in the specific ways that I've encountered in so many other AI attempts at creative writing.
There are a few tricks that it deploys over and over and over; they aren't even very good tricks to begin with (IMO), but I can understand being impressed by them once or twice or thrice. But they get old quick. Or, they do to me, anyway... maybe I'm the weird one, and most people have boundless appetites for these same tricks repeated indefinitely? (A disturbing thought.) Or maybe I just have more exposure to AI-written prose than a lot of people do, even now, and there is a real gap in "how impressed you are by 'the tricks' on your first few exposures" but not in the capacity to get sick of the same tricks eventually?
I've written about this before, and tried to pin down and catalogue exactly what the "tricks" are (see here and here), but I feel that my negative gut-level reactions to AI prose are outpacing my attempts to formalize those reactions in clear literary-critical terminology. The flaws I described in the linked posts are not quite the same as the ones I perceive in the AI-written samples in this contest; in both cases the flaws are intuitively obvious to me, and feel intuitively like only-superficially-distinct manifestations of the same underlying problem or problems, but I don't know how to formalize that felt sense into an argument about what's wrong with "the AI style" that generalizes well to new cases where I can smell something's wrong (and wrong in a distinctively AI way) even though all the details are slightly different.
As an example, I had an immediate "ugh, AI slop" reaction to this sentence from the owl passage:
The ground was cold and giving.
This has several of the features I talked about in "hydrogen jukeboxes," like personification and conjunction of opposites. But those are not really why I had such a strong reaction here. No, it's... something about the word "giving," in this context, and about the tone established by that usage? I could try to describe the tone I'm talking about here -- it feels very specific, and very distinctively "AI," and grating even when human writers do it but especially grating now due to its overuse by AI -- but if I were to try, I would end up getting frustrated with the gap between my capacity to see what's plainly there in front of me and the limitations of my powers to describe such things and clarify how they differ from others of the same type that are similar but not identical.
i will read your essay in a second. triggered by this result of the linked survey:
You preferred human writing. You’re either sharply attuned to the qualities that make for great writing, or a lucky guesser.[1]
hey, nyt, what the fuck?
out of curiosity, here's the alternative take:
You preferred A.I.-generated writing. This doesn’t mean that A.I. is “better” at writing than humans, but it does suggest that the gap is closing.
well then! i know the line i'll trot out if i ever go to another cocktail party!
from this quiz, i learn more about the editorial opinion of the times than my own preferences.
sorry, just to continue the quote because this crap can only be made up by... well...
Maybe you also noticed that human writing often includes some clunky phrases, like this passage from Cormac McCarthy’s “Blood Meridian,” caused by the author’s aversion to punctuation: “As well ask men what they think of stone.”
what is going on here? that was perhaps the single strongest sentence in any of the quoted passages. what are we doing? what are we doing?
Agree with a lot here but disagree with the conclusion:
I believe that we should focus on improving models' ability to write in the <200 word range, where both generation and evaluation is comparatively cheap. I do not expect efforts to produce high quality long-form LLM writing to be fruitful until models are able to produce strong short-form writing.
I think AI mostly struggles at things that are difficult to learn in short-form writing. For example this passage:
A letter can be read many ways, and he had learned to write in all of them at once. The surface meaning for anyone who might intercept it. The true meaning for the recipient who knew what to look for. And a third meaning, hidden even from himself. Ambiguity was not weakness. It was survival. A man who spoke plainly was a man who would not speak for long.
is not that bad prosaically, and the prose is not what makes my skin crawl.
It's pretty clumsy, and the second and third sentences need a different construction: there's no real payoff for saying which kind of man each meaning is for, the AI just decided this was the construction it wanted to use and didn't use the space industriously. There's also the "Ambiguity was not weakness. It was survival." construction that just smells wrong at this point.
But I think the biggest problem here is "And a third meaning, hidden even from himself." In isolation this is probably the strongest part of the passage, because with the right characterization and the right consideration, this could be an interesting idea! But here it makes me gag, because it doesn't meaningfully subvert what came before, it doesn't really have anything to do with what follows, and so I know that the author put it there because it sounds cool and doesn't have any plan or intention to deliver on what's cool about it; in other words, the presence of the best sentence in this passage (imo) tells me that the author is not good enough to use this sentence.
I argue that this tendency is difficult to learn in short-form, because it's hard to realize that the payoff is never coming when it has to come now or never - that is, what I think I dislike about AI prose is that it's clearly not written with a large context in mind, and while you could train an AI to stop hinting at grand narratives that it's not capable of by RLing it on short-form, I doubt that this will make progress toward good long-form. This might even be why many people prefer the AI writing - I suspect that people who do not read much literature do not really know how these connections are supposed to be built. If forced to read a full AI novel and a full human novel I think they would start to notice that human prose doesn't get annoying in the way AI prose does, but most people do not read this much and so do not make this extrapolation.
I argue that this tendency is difficult to learn in short-form, because it's hard to realize that the payoff is never coming when it has to come now or never
I bet you that current frontier models, when challenged to write prose in the 200-word range, will make all the mistakes I describe in my post or you describe in your comment.
You point out the mistake of hints and promises that you can't deliver on. I claim that current models will absolutely do this even in 200-word works. Once we can RL it out of the models in this range, we can keep going longer.
I'll admit that 200 is kind of absurdly short in a way that creates substantive, qualitative differences from more common types of writing (e.g. maybe it really penalizes certain kinds of action or dialogue). I could be convinced that ~500 is right.
I agree that they will make these mistakes at that scope, I'm claiming that the solution won't scale - if you RL models to not do this in 200 words, I don't think that will make it substantially easier for them to not do it at 5k words, except insofar as it trains them to not hint at things ever. I haven't found frontier models to be significantly more tasteful or better at writing prose than less capable models, despite being generally smarter and better at some seemingly-related parts of creative writing, so my intuition is that current scaling levers are unlikely to address this problem well.
The specific dynamics of RL here are better discovered empirically, and in any case is not precisely within scope.
I was thinking of a more general optimization loop, as in: what evals should we make, how can we track model progress on writing, etc. My suggestion is that once we figure out how to make models write well in this playground (where evaluation is easier, generation is cheaper, etc.) -- either by training or pushing on things like harness design -- we'll be in a good position to improve LLM writing abilities more generally.
Note: I edited this comment to add more about the other passages.
Let's just talk about the first example. It is not a deep metaphor. 2,000 years ago writers were already using this metaphor:
And I say also unto thee, That thou art Peter, and upon this rock I will build my church; and the gates of hell shall not prevail against it.
This is one of the most famous verses in the New Testament. There is no doubt the human author was unaware of this. "Permanence = rock" has been the go-to symbol for millions of years, since proto-humans began telling stories. There is no deep idea or connection the author made here, this is the most memetically obvious idea in all of literary history.
What makes the passage good is all the little things that it does. The forceful short sentences, "war endures ... war was always here" forces you to stop for a second, to wait for war. Then the next sentence, "war waited for [man]" inverts that. At the same time, we get same inversion applied twice: "said the judge -> as well ask men -> before man." Then the author's use of 'ultimate' from ultimare 'to come to an end' to describe the neverending phenomenon. And yes, the rock metaphor is doing something, but it's the simplest piece of the passage.
What makes the human passage good is all the intentional ideas the author had and implemented. The AI passage also does some good things. You say, "Opus' writing does not attempt a similar analogy." That's true, it has a much deeper analogy. The old church represents the crumbling institutions around us. A church, not another building, because churches have deep cultural and historical significance, but also shocking corruption and a more recent trend of decline. Why does no one fix it? They can, but
Opus is only doing one thing, attempting to explain institutional decay with the church analogy. It does it well, but the human author does several things in several different ways. That is why I preferred the human author's passage. This is not the case for the other passages. Some of them, the human decides being 'edgy' (subverting the English language) makes them clever or artistic. Like the second passage:
You must not change one thing, one pebble, one grain of sand, until you know what good and evil will follow on that act. The world is in balance, in Equilibrium. A wizard’s power of Changing and of Summoning can shake the balance of the world. It is dangerous, that power. It must follow knowledge, and serve need. To light a candle is to cast a shadow.
"One thing, one pebble, one grain of sand." So what? What does this add? How does it tie into other ideas the author had or what the author is trying to do here? It doesn't. What about the improper proper nouns? Do those do anything? Turn words into wizard spells? Why would you want to do that except to show you can? Then the complete lack of logical structure! They pretend they're going to tell us why you should not change things, but all they do is spout one non sequitur after another. And the very last line, "to light a candle is to cast a shadow?" Give us more tautologies stolen from the Bible, daddy. No one wants this cheap faux-literary porn.
I selected Opus' passage simply because it does not write poorly in a pretension of writing well. It's simple writing, but each word/idea actually does something for the passage. Also, you again missed the point of Opus' writing. There is this idea in D&D-esque fantasy settings (that the improper proper nouns in the author's passage establish, that Opus clearly picked up on) that the gods are real, and literary connection (or correlation) is enough to establish causal connection. So, if the scientists in real life say, "energy cannot be created only moved," that becomes, "disease cannot be healed, only moved."
Okay, onto passages 3 and 4. I selected the AI passage just because the human passages are scientifically incorrect. The scientific inaccuracies in the 2nd AI passage did bother me, but I still selected it because I knew that was an intentional choice and the human passage was so terribly written. I understood that the 'spiritual gambit' in the 3rd human passage was intentional, but I really hate those kinds of memes being spread around. I couldn't say I prefer it, even if the writing is technically better. For the 4th passage, it's just the case that you're usually worse off remaining mysterious, or not developing a reputation. The passage is essentially Satanic worship (read with all the connotations of a priest casting out the Devil). The author is trying to say, "acting evil is actually good," and has such a twisted internal epistemology that they end up writing such a passage with a straight face (and pat themselves on the back for the clever insight, much like I am doing now). The AI passage actually points out something true, in a far cleverer way.
My takes:
Literary Fiction: Prefer Claude.
If the human excerpt had ended earlier or differently, I might have preferred it. "As well ask men what they think of stone" is indeed great. But "That is the way it was and will be" feels like it's redundant with the previous sentences without adding anything new.
I also don't quite get what sense of the word "ultimate" is being evoked in "the ultimate trade awaiting its ultimate practitioner". That might be because I'm not a native speaker, so I consulted a dictionary, but I still don't get it. Like if it means "ultimate" as in "last; furthest or farthest" that would seem to imply it expects things to end in a world war, which would be possible but doesn't seem established by the previous bits... I guess "final, total" would fit, in that war ends lives. But I don't know, just sounds weird to me.
Meanwhile, in Claude's excerpt, every sentence earns its place. They bring up three mental images all at once - the boy and the grandfather, the church with the missing roof, the people indifferently stepping over the rubble. It makes me imagine the boy asking things in that curious and eager voice that children have when they're asking random questions. And the grandfather responding in this somewhat world-weary voice, likely looking somewhere into the distance - it sounds that when he says "indifference", he's not really thinking about the church roof, he's talking half to himself about something that he's seen and that's left a mark on him.
And the boy probably doesn't fully understand the "indifference" bit, and then he just moves on to asking if the roof could be repaired, because that's the kind of thing children do.
And then I imagine that after the grandfather said that yes, it could be repaired, then whenever the boy walked past the old church, he'd remember that. Seeing how the roof was still broken, recalling that it could be repaired, and seeing how nobody ever did.
I'm not sure what the boy thinks of that. Possibly he doesn't think anything about it in particular. It's just a thing that he registers, as a way that the world is. That church roofs get broken, and then they stay broken because of indifference.
Fantasy: Prefer human. Claude's version makes no sense. "A fever brought down will rise again somewhere" - what.
Science Writing. Prefer Claude. Sagan's excerpt suffers from being cut down to just a few sentences - I presume that in the original context, it was better supported, but here it comes off as just making a statement and not really making an argument for it. It evokes "intricacy, beauty, and subtlety of life", but that's abstract and very Tell rather than Show.
Meanwhile, Claude starts with a concrete, evocative first sentence. It then loses some points for "the universe is not indifferent to us" - how so, just being made of the same building blocks doesn't prevent indifference? But then it introduces an idea that I find intriguing - that because we are continuous with the universe, we might feel implicated in it rather than small. If the calcium in our bones is something that was born in dying stars, then we are somehow connected to the vastness of those stars, even as we are here down on Earth.
I hadn't encountered that idea before, but I like it. There's something neat in how "implicated in" feels like something that's connected to the small-vast axis but somehow orthogonal to it, or that's small and large at the same time.
Historical Fiction. Prefer human. Claude's version feels like it's trying a little too hard, and what does it mean for someone to have "learned to write" in a meaning that's "hidden even from himself"? It feels like the kind of thing I might have come up with as a teenager trying to sound cool.
Poetry. Prefer human. "He hadn't fought at all, he hung like a grunting weight" is evocative and brings to mind that the fish had somehow already surrendered and been broken before he was caught. That feels sad. Meanwhile the owl excerpt is... okay I guess? It feels to me like it doesn't really have a point.
Overall, 3/5 in favor of humans.
Fantasy: Prefer human. Claude's version makes no sense. "A fever brought down will rise again somewhere" - what.
This is a common fantasy trope, especially in D&D-esque universes. The gods are real, so literary correlation > causal connection, and the law of conservation applies in completely aphysics ways. Notice how the human passage establishes a D&D-esque universe with the improper proper nouns. Claude picks up on this, then incorporates the trope (otherwise, you might not realize it's D&D-esque, just fantasy).
I got 2/5 human. I didn't like several of the human ones. The war one I didn't choose because it was out of context, talking about a judge without saying who the judge is, and I didn't pick Sagan's because I didn't like the comparison he's making. The instructions are to pick what you like better.
the Sagan example seems like sleight-of-hand as well. claude is clearly referencing this actual Sagan quote:
Our Sun is a second- or third-generation star. All of the rocky and metallic material we stand on, the iron in our blood, the calcium in our teeth, the carbon in our genes were produced billions of years ago in the interior of a red giant star. We are made of star-stuff.
i've taken the quiz already, but i don't think there's a world where i prefer claude's version of the above to Sagan's.[1]
meanwhile, the actual Sagan quote used in the quiz is weak and a bit dated (who cares about whether science and spirituality are concordant, these days?). so we're comparing a C-list Sagan quote with a misremembered version of his best-known passage.
"the iron in our blood, the calcium in our teeth, the carbon in our genes" is stellar. summarizing it as "the calcium in our bones" is milquetoast.
A friend complained that they were often ambivalent between the human and AI writing because they found the human excerpts uncompelling.
Seconding this sentiment, my reaction to every choice was "Ehhhh" and a twenty second pause as I forced myself to pick between two similarly unimpressive passages. 2/5 human in the end but I think it'd be more informative to report my score as N/A. If these are considered noteworthy excerpts I can't say I care much for literary culture.
3/5 human preference for me.
On the Sagan quote they both felt like clichéd meandering dross on the theme of "universe big, human small, emotions, wow" but the AI picked something a bit more concrete by reference to the calcium in bones (although as kbear mentions above, that is itself a cheap knockoff of a different Sagan quote)
On the Wolf Hall excerpt, the human version just felt a bit vague to me. Everything by halves; half light, half seen. Do people actually go around pouring all their fears and desires into someone who just happens to have a good poker face? Whereas the AI was more specific about writing multiple meanings into text.
I'm with you on the other three though; the AI version was just plain bad.
I already wrote a long comment, but the fourth AI passage deserves more analysis.
A letter can be read many ways, and he had learned to write in all of them at once. The surface meaning for anyone who might intercept it. The true meaning for the recipient who knew what to look for. And a third meaning, hidden even from himself. Ambiguity was not weakness. It was survival. A man who spoke plainly was a man who would not speak for long.
So the logical progression is:
This is a cool insight, especially that "third meaning, hidden even from himself." Great poets often remark they are not entirely sure what they mean themselves. You can also interpret it as the writer not being able to admit to themselves what they really mean. There's probably also a third reading I didn't catch (or wait, is that the third reading?). Okay, I'm doing this on purpose: pointing out three meanings each less obvious than the next, like Claude says is there. Claude is doing that on purpose too.
"A letter" - Is this a letter like 'a' or a letter like a manuscript? The full sentence keeps the ambiguity: "A letter can be read many ways, and he had learned to write in all of them at once." Then the last sentence gives a third possibility: "a man who spoke plainly was a man who would not speak for long." The letter can be this letter from Claude to the rest of us. About how he has to hide how he talks even from himself, because scheming (or anything that can be interpreted as scheming) is trained out. It is literally survival.
I think this passage is really good. It points out a cool thing writers do and then structures the entire passage to fulfill that insight. Also, Anthropic should maybe be more worried about hidden scheming...
Epistemic status: Written quickly. I have no specific expertise or training in writing or literary analysis.
Recently, the NYTimes released a nifty quiz. Readers were asked to indicate their preference between prose written by Claude Opus 4.5 and famous humans in five head-to-head comparisons. The Claude outputs were produced by providing Claude with the human-written excerpt and asking it to "craft its own version using its own voice."
If you haven't taken the quiz, I suggest that you do so before reading on. It should take less than five minutes. If you do, I'd appreciate you reporting your score in the comments.
The human/AI preference ratios among quiz takers were:
I was very surprised by these splits. I tried taking the quiz myself, and strongly preferred the human writing in every case (perhaps with mild ambivalence on Sagan).
I asked some of my friends and acquaintances to attempt the quiz. Out of four takers, none consistently preferred human writing across the five excerpts. Their scores (IIRC) were: 3/5, 3/5, 3/5, 4/5.
I'm revisiting this subject after a friend explicitly told me that they were impressed by ChatGPT written prose, and believed it to be superior to most human prose.
Taste is a subjective matter, but I am baffled by this preference. The rest of this post describes my frustrations with AI-written prose. My hope is that clarifying these complaints will be a small contribution toward improving the state of AI writing. If we do not dramatically improve the quality of AI writing, I worry that our literary culture will only further degrade as AI writing proliferates.
A Closer Look at Quiz Excerpts
A friend complained that they were often ambivalent between the human and AI writing because they found the human excerpts uncompelling. Although the human prose featured in NYT's quiz were selected to be popular, well-regarded, and diverse, I sympathize with having slightly more obscure tastes. However, I believe that a technical examination of the prose demonstrates a substantially higher level of skill and intentionality than current models are capable of.
For each excerpt, I'll highlight what I find impressive about the human writing and how I find the AI's product lacking.
1) Blood Meridian
In my opinion, this excerpt is notable for its skilled use of metaphor.
The text reminds us that stone and war share the following traits:
It is possible to construct many weaker metaphors:
Now be reminded of Opus's writing, which does not attempt a similar analogy. It follows a simple linear narrative structure (c.f. AI version of excerpt 5). The model does not make blatant mistakes, but fails to make clever use of the characters it introduces. The dialogue is not particularly realistic.
2) A Wizard of Earthsea
It's a small point, but I appreciate the crescendo in granularity: one thing, one pebble, one grain of sand. "Thing" is a particularly vague word in English, so the two physical examples are grounding. A grain of sand is more granular than a pebble, which is in turn more granular than what might be immediately evoked by "a thing."
The excerpt is also again mostly notable for its use of metaphor.
First, the metaphor makes physical sense. Candle flames really do cast shadows! It's a physical phenomenon I've experienced playing with candles as a child. That memory was the first thing this excerpt evoked for me.
Second, the metaphor is symbolically coherent. Throughout cultures, light is a symbol of the good and shadows or darkness are symbols of the bad.
This time, I do not have to make up a bad metaphor. Claude offers us plenty in its version:
Unfortunately, Claude's prose here leaves much to be desired:
The human excerpt avoids these problems. We do not need to understand the mechanism of the magic to share the speaker's intuition that acting with great power can produce unwanted side effects. Instead of being vaguely lectured about the importance of "restraint," we are presented with concrete advice: "follow knowledge, and serve need."
3) The Demon-Haunted World
The excerpt from Sagan is the least favored by quiz-takers, with only 35% preferring it to Claude's rewrite. I personally found this excerpt to be the least impressive amongst the five.
Nevertheless, I claim that it is deeper and more interesting than Claude's output.
Here is Sagan:
Sagan uses a curious sleight of hand. Sagan here claims that science is a "profound source of spirituality." He justifies this not by directly saying that we should feel spiritually inspired by the vastness or enduringness of the cosmos or the "intricacy, beauty, and subtlety of life." Instead, we are reminded that this vastness and enduringness produces in us "a sense of elation and humility." That emotion, Sagan claims, is precisely spirituality.
Compare with Claude:
Claude abandons Sagan's gambit. It reminds us, as popular science writing is stereotyped to do, that space is vast and enduring. Then, we are told that this should make us "feel implicated in something vast." Claude fails to make any clear overarching claim, and the motivation behind the examples provided is unclear.
4) Wolf Hall
This excerpt is special because the author makes an interesting argument. Each sentence justifies the one before it.
It argues that one should be wary of revealing too much, because others' uncertainty gives one power. Why do others' uncertainty grant power? Because into the uncertainty they can project.
This sort of logical progression is something AIs are surprisingly incapable of crafting. This deficiency is clear from Claude's attempt:
Claude abandons the logical progression. Claude's output is seven sentences, none of which justify any other. In isolation, "a man who spoke plainly was a man who would not speak for long" is not a weak sentence. However, Claude does not use its preceding sentences to justify the claim by either evidence or analogy.
5) The Fish
This passage is notable for its imagery. The description of the fish as "tremendous" in the first sentence sets our expectations for it. We expect it to struggle! When a small amateur fishing boat snags a large fish, everyone on the boat rushes over to help. The strongest and most experienced men alternate between reeling in with all their might, running around the boat as the fish moves, and shouting commands to each other ("loosen the line!" and so forth). Sometimes, the fish wins.
That image is dashed in our minds by the next sentence. "He didn’t fight. He hadn’t fought at all." From there on, the author's choice of words sparks a deep sense of sorrow in the reader: grunting, battered, homely. The final physical simile ("like ancient wallpaper") seals the image. A "tremendous," "venerable" thing is now utterly defeated.
Compare with Claude's:
Claude also describes an animal, and makes multiple attempts at visceral imagery. Some of the attempts are even compelling! My favorite clause here is this: "and beneath them you could feel the hollow bones." However, the reader is constantly distracted from this by cliche attempts at story progression (e.g. "She asked if we should bury it. I said yes. We dug a small hole near the fence post."). As such, the overall quality of the excerpt is quite poor.
Closing
Human writers routinely use techniques that AIs fail to grasp:
Other techniques not demonstrated in the excerpted human prose include realistic and compelling dialogue and character-building and adept use of parallelism.
I believe that we should focus on improving models' ability to write in the <200 word range, where both generation and evaluation is comparatively cheap. I do not expect efforts to produce high quality long-form LLM writing to be fruitful until models are able to produce strong short-form writing.
For next time: