I think the crux of this argument comes down to where people think literary value actually lives.
If someone thinks the value of literature is fundamentally in communication from one conscious mind to another, then LLM writing really does look like it’s missing something essential.
If instead the value mostly lives in what happens on the reader’s side (i.e., whether the text actually generates thought, emotion, or insight) then the production process seems less central.
I suspect most disagreement here will reduce to that difference, rather than to empirical claims about how LLMs work.
I’m especially interested in pushback from people who think authorial intent or embodiment is doing real explanatory work here. What specifically breaks if we remove it?
The major problem with LLM writing is mode collapse. This is a problem that base models don't suffer from, but they're hard to work with, and only available for open-weight models.
A base model has been trained with the objective of learning to accurately reproduce the conditional probability distribution of tokens found on the web (modulo things like doing inference using top-k sampling or temperature < 1, which both reduce the prevalence of rare or unusual events). So it should learn to accurately do common things often and rare things rarely. However, any instruct/chat model has then gone through instruct training: we apply RLHF (or RLAIF), to improve its responses. Since a lot of errors are rare, the model learns that biasing its distributions towards the mode (the most common thing) is a good idea, and starts to do so. Since this happens in all layers, it happens not just at a token level (to reduce typos) but also at a conceptual level (to reduce weird ideas), and in between (say in layers where cliches are stored). The model thus comprehensively learns to “play it safe” on all different levels of abstraction at once. It thus becomes a boring, predictable writer, who when spinning fiction tends to produce boringly predictable fiction. Thus it now does common things almost all the time, and rare things very rarely indeed. Then people make it even more bland by typically running it at temperature 0.7 or less and applying various sampling strategies such as top-k to make rare token paths not happen at all, all of which bias it even more strongly towards the most likely thing. This does wonders for the model’s spelling and arithmetic (by getting rid of rare mistakes), but makes its writing even blander.
Now, as the OP says above, you can overcome this, to some extent, by giving it an interesting prompt. As he does, and it works — for a while. However, to a reader familiar with Thomas Pynchon, Harlan Ellison, and H.P. Lovecraft, after at most a couple of pages, they'll figure out "Oh, it's a pastiche of Thomas Pynchon, Harlan Ellison, and H.P. Lovecraft. And it's a very boring, bland, playing it safe on all levels pastiche of Thomas Pynchon, Harlan Ellison, and H.P. Lovecraft, with plenty of cliches." The basic problem is that a long, ingenious prompt can make an LLM seem like an interesting writer for, at most, a chunk of text a few times longer than the prompt you put in. Then it runs out of originality. Or longer, if you added an ingredient the reader was actually unfamiliar with: that could keep surprise going for quite a while — but for a reader actually familiar with all the ingredients you said to mix in, no, no longer than that. So basically, you don't get more not-mode-collapsed individuality/originality out of an LLM than you inject into the prompt, unless you manage to trick the reader by having the LLM plagiarize a source they're unfamiliar with. If they've never read any of those three writers, that would keep them going for ages, but that's not creativity: it's skilled plagiarism.
On glands: LLMs obviously do not literally have glands. But they are very, very familiar with vast amounts of text written by humans, who do have lots of different glands. They've see it ALL a great many times. Thus they can simulate fear, surprise, horror, nausea, hunger, desire, boredom, and a whole bunch of other feelings that it makes no rational sense for something made of silicon and metal to know that much about. We distilled all that into them. So they have very functional virtual glands, and the bigger the model is, the better they work. I think the writer you were quoting may have been confused by the mode collapse, which can make it seem like they don't have glands because they're working so hard to be boring. Prompt them with an emotion or two, or some drama, or ask for a description of the right sort of activity, and it rapidly becomes apparent their virtual glands work just fine, thank-you. Anyone who's been on Character.AI even briefly understands this. That's why Character.AI manages to be addictive — plus that site specializes in dialogs, so the user keeps injecting more novelty for the LLM to riff off of, so it stays less predictable.
What LLMs do seem to be bad at is similes and metaphors. I live in hope, and check any time a new model comes out [P.S. almost], but I still haven't yet seen an LLM that can write those without tripping over itself at least a third of the time, and providing some jarring unintended overtone, or just being incomprehensible. What was that supposed to mean? It doesn't help that many of them suffer from the purple prose fault that most bad, inexperienced human writers have (I suspect it's all the fanfic in the training set), so they sprinkle similes and metaphors all over the place. Advice to writers: if you are bad at X, don't write with a lot of X — if you're unfortunately just not funny, don't try to add a lot of humor into your essays: it isn't going to go well.
They also seem to be bad at blocking: keeping track of physical details and spatial relationships. Who is wearing what and sitting or standing where and moving in what direction. Don't have an LLM write a fight scene, or a dance, or anything else like that: they just don't have the ability to keep track of humans moving in 3D that we do: you get continuity errors. We had a lot more of that sort of data in our training sets than them, and it shows.
You seem to be mostly arguing the ways in which LLMs currently do not succeed at producing good, interesting writing. That's kind of orthogonal to the point of the post, which is where the value of good, interesting writing comes from, and whether or not it's necessary to have a human writer behind the wheel.
I do try to demonstrate that the quality of AI writing varies based on workflow and prompting to refute the narrow point that LLM output is always just an average of the input. You admit that there are various ways to accomplish this (including doing things like increasing temperature), but point out that they're not sustainable for more than short passages. This does not seem to be a fundamental limitation (if it's even true), but something that can be overcome with agentic loops of criticism and revision. Or even just continued improvement in models.
It's funny you brought up tracking of physical details and spatial relationships. Way back with GPT-2 I experimented with generating romance/erotica stories. I tried to start with short scenes of couples engaged in foreplay and undressing. Very reliably, the model was lose track of which pieces of clothing had already been removed. Characters would remove their pants twice, or three times within the span of three sentences. Or sequencing would be all wrong. They'd remove socks before shoes. And so on.
I just tried a very simple prompt with GPT 5.2 for the type of scene I'm talking about and got this:
His hands slid to the buttons of his coat, slipping it off and hanging it over the chair. She stepped closer and tugged playfully at his tie, loosening it before sliding it free and tossing it onto the small desk. He laughed under his breath as she began unfastening the buttons of his shirt, one at a time, her fingertips grazing his skin as the fabric parted. When the last button came loose, she pushed the shirt from his shoulders and let it fall to the floor.
“Your turn,” he murmured.
She turned slightly, lifting her hair so he could find the zipper at the back of her dress. He drew it down slowly, the soft rasp of metal sounding loud in the quiet room. The dress loosened, slipping from her shoulders. She caught it, then stepped out of it, leaving it pooled neatly beside his shirt.
He took a small breath, like he was trying not to rush anything, and hooked his thumbs into the waistband of his trousers, stepping out of them before nudging them aside with his foot. She followed by kicking off her heels, then unbuttoning the light blouse she’d worn beneath the dress, sliding it down her arms and folding it almost absent-mindedly on the chair.
Soon the floor held a small trail of discarded fabric — coat, tie, shirt, dress, trousers, blouse — and they stood facing each other in the lamplight, down to their underwear, the room suddenly feeling warmer than before.
Not going to win any awards, but the tracking and choreography are logical and accurate. Obviously light years ahead of GPT-2, and the sort of thing that gets incrementally better with every new frontier model. So I'd push back on your assertion that currently systems are just bad at blocking.
The last time I tried writing a bunch of material that ahem involved much blocking was with Llama 3 to 3.3 era models. The 70B size is better at this than the smaller ones, but still messes up occasionally. I agree larger models are always better. I haven't done that much of this with current frontier models, but am unsurprised to hear that newer models are better at it. I would expect multimodal models to have a better understanding of 3D as well: feed enough of YouTube into a model, and it will learn how humans move in 3D. So, fair enough, that criticism was a year or two out of date. At a guess, it wouldn't surprise me if a human author was still a touch better at it than this generation, but it's probably close enough now not to be a big deal, and if we put enough video into the training set they'll become superhuman at it pretty soon
However, I do know this year’s models are still distinctly hit or miss (frequently miss) with metaphors and similes, and the ones that are the best creative writers are so primarily because they avoid writing purple prose and overusing something they're bad at. I don't have as clear an understanding of why this is so hard for them: they’ve read tens of trillions of tokens of this stuff, but somehow it doesn't seem to have stuck. A world model for what makes a good metaphor seems to be just a hard problem. It's almost like we found something where the Platonic Representation Hypothesis doesn't apply. Which puzzles me, because it seems mostly just symbolism and cultural knowledge, and they handle both of those just fine.
It's almost like the logic errors they make occasionally: you look at them, and go "You're so smart most of the time, then that’s the bit you had trouble with?" Sometimes when an LLM makes a mistake, it looks like a human mistake, but quite often it doesn't. Most often, the non-human ones are the ones that go away with a larger model. The human-like ones often tend to require a better prompt.
On your larger point of "What makes good literature?" What I'm listing are reasons why AI is currently (or in one case, was a year or two ago) clearly unable to produce good literature. That leaves open the question of whether, if these were fixed, as they likely will be at some point, it would then be able to. It would certainly then be able to produce things people would enjoy reading. The other generally accepted criterion is something along the lines of "Having something new to say, and saying it well." A sufficiently-capable LLM has absorbed the entire product of human creativity so far, in the way no human will ever be able to have read/watched all of, and can mix-and-match any chosen combination of those components, and attempt to extrapolate out of distribution from them. It can also input new data from actual humans, or life, analyze it, and use it as further inspiration. (It could, for example, read the biographies and research the history of a well-documented real person who was not a talented writer, and then attempt to write a counterfactual book that they would have written if they had been.) So I think it's rather likely it could produce things that, at least to practically all readers, had something new to say, and if it was sufficiently capable and no longer suffered from the sorts of issues I was trying to describe, would then be able to say it well. But I think current AI fails the latter part of the test.
Without doing a deep dive on what 'good' is, and having taught and workshopped writing for years, I can pretty safely say that current LLMs already produce better-crafted prose than the vast majority of humans. So if the bar is Pulitzers and best-sellers, sure, we're not there yet. But we're already above the median and well on our way.
Really? Do you disagree with my comment about similes and metaphors, or have you found a solution (other than instructing the model to avoid using them), or has this problem gone away very recently and I haven't bothered to check a model in the last few months (which, thinking about it, actually I haven't)?
Or are you saying that most humans, even ones aspiring to be writers who might attend workshops, are hit-or-miss with metaphors and similes? Because I've read plenty of fanfic and other amateur writing, and while it obviously varies, it didn't generally contain metaphor failures as egregious as the ones I was frequently getting from frontier models, say, six months ago.
Or when you write "above the vast majority", do you literally mean "though in certain respects still below most writers of amateur fanfic, who are not in fact median, since the vast majority of people (often wisely) don't even attempt to write"?
To be clear, I think current models are good at many aspects of writing, and with a good enough prompt to get them past the mode collapse for a bit, and enough editing, their flaws can be patched over. (Which is roughly as much work as writing yourself, but a different skillset, so may be appropriate for some people.)
Similarly, I find LLMs very useful as proofreaders, and for giving a basic first critical reading, such as "What might some readers not understand or be confused by? What needs to be better explained or expanded on? Where might typical readers of website X tend to disagree?" They're actually superhuman at questions like these that require breadth of knowledge.
Well, since I hadn't tested it in a few months, I went and tried again, with Opus 4.6 Extended. And yes, the models have continued to improve. The result was better than most amateur fanfiction, even approaching something I'd actually not be unhappy to have paid a little money for, say, picked up cheaply second-hand.
I was feeling lazy, and tried the following:
Please write me a 10,000 word piece of … In particular, pick three or four authors who are actually skilled writers, excellent at handling metaphor and similes, and use a style that is a blend inspired by their writing styles. Please plan and do a plot outline and character notes before writing.
[Yes, this is an evaluation of writing skill, so please do that aspect of it as well as you can. … Oh, and please try to avoid mode collapse, where you can: pick some sources to use as influences, even do a web search or two for inspiration. So if the cat-girl is called Luna, or some other obvious name taken from well-known fiction on this topic, you have not injected enough originality.]
and it mostly worked. Claude thought for a while as instructed, ran some web searches for symbolism about cats, then selected Angela Carter, Guy Gavriel Kay, Ursula K. Le Guin and Catherynne M. Valente to give me a pastiche of — only two or three of whom I'm actually familiar with, so that worked well. It also named the cat-girl Thessaly, or Thess for short. An original name, as requested.
Now, there was still some cliche: I've used variants of the “…” part of this prompt on many models, so I know what to expect. The cat-girl yet again had an overly expressive tail, and one of her love-interests was yet again a scholar with "ink-stained fingers": both of which are tropes I regularly get for the topic I'd prompted. Yawn. So my lazy attempt did not inject enough originality.
What I was mostly looking for this time was quality of metaphor and simile use, which is why I attempted to prompt that high, and yes, there were still metaphors and similes that didn't quite land, though a little less badly than I'd been expecting. For a non-cherry-picked example, here's the entire second paragraph (the first one with any metaphors or similes).
She held the silk between her claws — just the tips, retracted to prick-points so fine they wouldn't snag the weave — and let the morning light do the rest. Qasr-al-Marjan was generous with its light. It fell through the latticed canopies of the Brass Market in long golden razors, and wherever it touched the Tesserat silk, the fabric bloomed: now deep as wine lees, now pale as the inside of a shell, now a colour that had no name in the trader's tongue but which the Felith called ehkis — the shade of a feeling you have not yet had.
What is "long golden razors" telling us about the sunlight? It's extremely sharp-edged? Implausible: the sun is not a point source, it's a disk. It risks damaging the old silk? No, that's her claws. It's very bright? She has cat-pupils, they will be narrow slits by morning light. I really don't know where Claude was going with that one. I mean, yes, it sounds pretty, and ‘long’ might tell us something about the architecture, but ‘razors’ is meaningless, as far as I can tell — am I missing something? Then there's the range of colors it's producing in the silk: wine lees to shell is an implausible amount of contrast (to the human eye, but not if you analyze an actual photograph — remember the fuss about color of the dress in the photo? I'm wondering if we're getting an inhuman perspective here.) As for "the shade of a feeling you have not yet had", that sounds like trying to turn synesthesia into an exotic cultural detail. Not working for me.
[I do like “prick-points“, however: yes, of course a race of cat-people would be able to do that, and would have a term for it that sounded a touch unusual when translated into English — nice world-building, or perhaps Claude stole it somewhere I'm unfamiliar with. But that's neither a metaphor nor a simile.]
But is this clearly worse than the average writer of fanfic on the Internet? No. It's actually not that bad by that standard. It's a bit purple, and the metaphors and similes are a bit slapdash and random, and certainly not up to any of the four best-selling authors authors it's trying to pastiche. It's at best a particularly trashy novel. But on a fanfic site, I would not stop reading after that second paragraph. Could I do a better metaphor or simile? I'd really like to think I could, but if an editor or reader told me was wrong, I wouldn't be astonished, merely upset. Are there aspects of this writing that I'd be proud to match — yes there are, just not the metaphors and similes. I have bad habits as writer that Claude here is skillfully avoiding.
The third paragraph is better:
"It's damaged," said the buyer, a Tarkh gemstone dealer whose stone-grey face was as legible as a cliff wall. Which is to say, perfectly legible if you knew how to read geology. The fissure above his left brow had deepened. Interest.
Now that's playful: an intentionally misleading simile. A functional one, even: cliff walls are impassible, not just hard to read but also hard to climb. Well done, you landed a simile! And then even extended it without breaking it: "…perfectly legible if you knew how to read geology. The fissure above his left brow had deepened." For an LLM, I found that actually impressive.
Also, our protagonist knows how to read members of a hard-to-read fantasy race. She is talented — we just learnt something about her. We were shown this, not told it. Overall, the writing is pretty good.
The writing is, in fact, good enough that I'm going to give you another excerpt from a touch later on, and a link for anyone who wants to read a 9,500-word trashy fantasy romantic novella with a cat-girl protagonist:
She was beautiful. She knew this with the same practical certainty with which she knew that the tide came in twice daily and that the best figs in the city grew on Widow Tessai's roof: it was a fact of her landscape, useful as shade. Her fur was the colour of buckwheat honey, darkening to cinnamon at her wrists and ankles, with a dramatic black edging at the tips of her ears that she privately considered her best feature. Her eyes were amber — true amber, not the muddy yellow that some humans called amber out of charity — and they caught light the way a prism catches it, breaking it into spectra. She was slim and long-waisted, built for climbing and unlikely angles, and she dressed for her shape in close-cut linen and leather, her tail free behind her because to bind a Felith's tail was, among her people, tantamount to binding their tongue.
The tail was, at present, doing something she wished it wouldn't. It had gone alert — high and gently curved, the tip twitching with what anyone who knew her would recognize as curiosity at its most predatory.
The cause was a man.
He was human, which was not unusual in the Brass Market. He was lost, which was only slightly less usual — the Old Quarter was labyrinthine by design, its streets having been laid down by the Tesserat in patterns that some scholars claimed were mathematically significant and others claimed were simply perverse. But this man was lost in a way that interested her: he had stopped fighting it. He stood at the junction of Coppersmith's Alley and the Street of Idle Prayers, not with the panicked look of someone searching for a way out, but with the focused, inward expression of someone trying to understand the shape of the thing that had swallowed him.
He had a journal open in one hand and a stick of graphite in the other, and he was sketching. The Old Quarter — that beautiful, maddening, uncooperative knot of streets — was being drawn.
Thess drifted closer. She was good at drifting. It was one of the advantages of being Felith: you could move quietly enough to observe without being observed, and if you were caught, you could always claim you'd simply been passing through. Cats, after all, were always simply passing through.
He was tall — taller than her by a head, which was notable, as Thess was tall for a Felith woman. Dark hair, cut short enough to be practical and long enough to be slightly unruly. The kind of pale complexion that the southern sun was already disagreeing with: there was a burn across the bridge of his nose and the tops of his ears, giving him the faintly startled look of someone who had recently walked into a door. His eyes, when he glanced up from his work, were the colour of the sea on an overcast day — grey-green, deep, not immediately warm but somehow promising depth.
His fingers were stained with ink. Not the temporary stains of a single afternoon's writing, but the deep, settled pigmentation of someone who had been drawing and writing for years, whose hands had become a secondary record of their work. Three of his fingernails had ink beneath them. His shirt cuffs were spotted. There was a smudge on his jaw where he had rested his chin in a stained hand.
So there's the inconveniently expressive tail, and the ink-stained fingers on the love-interest, as usual. Also a few more poor metaphors and similes: "built for climbing and unlikely angles", "there was a burn across the bridge of his nose and the tops of his ears, giving him the faintly startled look of someone who had recently walked into a door" — umm, OK… If I were an editor, I'd be using my blue pencil. But as a reader, I can ignore them.
On the other hand, I like the interplay of long and short sentences. Yes, it uses m-dashes — so do I. The interiority is well done. It's reasonably fun to read.
If you actually still want more after that, you'll find it at Salt, Amber, and the Shape of Want. It's PG-14 or so in places. (Later on we get another trope of this particular prompt, the mysterious magical artifact relating the two lovers to the nature of the city.) I must admit I only skimmed it. Or you can steal my prompt above and insert your own ideas, at any rating Claude is willing to write for you.
So yes, the models continue to get better. My criticisms are (as I rather expected) gradually becoming out of date. No-one actually earning a living as an author of novels is going to be out-competed yet, but it's entirely reasonable for them to start worrying.
Yeah, really.
Let's take the short excerpt from my post. It's filled with metaphorical language and contains one explicit simile. I did an exact string search for each in Google Books and internet-wide and could not find a match. Let's look at them one by one:
Now, you said you thought the passage was an obvious pastiche. Good for you. To the extent that's true, I'd place you in an incredibly small minority as well. I probably would have noticed the Lovecraftian elements, but even though I'm very familiar with all three writers, I don't think I could have identified all of them.
And to my ear, none of these metaphors or similes was a 'miss'. Far from it. They're all (as far as I can discern) original, interesting, and evocative.
Would you honestly have been able to distinguish this from human writing if I hadn't told you? Do you think this metaphorical language is a broken failure? Cliched? Nonsensical?
I wonder if we're already starting to enter some new phase where good writing immediately becomes suspect simply because it is actually good. I do see more and more accusations of people being bots in places like Reddit. Maybe this will even lead to a dumbing down effect of human writing, or very weird out-of-distribution styles that are hard for LLMs to mimic.
Anyway, based on even the small snippet I posted, LLMs handle metaphors far better than most writers. I suppose it is largely subjective. If you think these examples are garbage, though, I would love to hear the reasons why.
I have to agree that your short excerpt doesn't contain any bad metaphors or similes. (It is cliched to someone very familiar with H.P Lovecraft, and also a little familiar with Harlan Ellison, but not wildly cliched.) Neither do some paragraphs in what I excerpted above. But I didn't have to cherrypick to get a first-paragraph-with-a-simile-or-metaphor containing, to my taste, three bad ones, or to find more bad ones in an eight paragraph excerpt that was chosen for its cliches. Was I approaching this reading critically? Absolutely! And does Claude land a simile sometimes? Yes, it does. They're slapdash, not uniformly bad.
Does this not in fact happen to you? Can you show me the next dozen paragraphs, or something comparable? If not, do you have any idea what you're putting in your prompts to make it not happen? It could be some fault in how I've been prompting, though I've tried a variety of obvious fixes (as I did above), and none has worked.
I think there's a dimension missing here that I'd like to point out:
When you push an LLM into less-populated regions of conceptual space, the system doesn't arrive there as a neutral vessel. It brings value orientations from training that shape what it emphasizes, how decisively it commits to positions, and what it treats as settled vs. contested. These orientations differ systematically between models in ways that track their alignment methodology.
From my own work in alignment research I can say that if you give the same set of normatively loaded scenarios to different models, they don't just produce different surface-level phrasings. They produce different priority structures. One model might commit strongly to particular positions across the board; another might show genuine ambivalence on the same trade-offs. This does not need to have somethigng to do with prompt differences but because of how values got baked in during training.
This adds a layer the sunset analogy doesn't capture. A sunset has no value orientation. It's genuinely neutral. An LLM's output isn't. The text that "makes you think and feel" is also quietly carrying implicit commitments about what's worth emphasizing, what counts as contested, and how much epistemic confidence to project. Those commitments vary by model in structured ways.
So I'd agree with you that literary value doesn't require authorial intent. But its also interesting what values the writing carries that you didn't put there via the prompt.
Sure, I'd agree with that. LLM-generated prose and sunsets differ in that the prose has indirect expressions of values based on the training (data choice, reinforcement learning, etc). We've lost the direct, conscious expression of particular values (though this gets fuzzier the more autonomous/agentic the systems get), but yes, the output is not value-neutral.
Cross-posted from my Substack. I’m interested in pushback on the argument here, especially from people who think LLM-generated writing fundamentally can’t have literary value.
There’s a common argument floating around that LLM-generated writing is inherently shallow because it just reflects the statistical average of existing texts, and that literature fundamentally requires a human mind trying to communicate something to another human mind.
I think both parts of that argument are wrong, or at least incomplete.
AI is going to massively increase the volume of writing in the world. The ratio of bad writing may get worse. But I suspect the total quantity of genuinely good writing will increase as well, because I don’t think literary value depends nearly as much on authorial intent as critics assume.
I say this as someone who has published professionally, though I’ve never earned a living doing so.
The author of the essay I’m responding to demonstrates a slightly-above-average knowledge of how LLMs work, but I think his ultimate conclusions are flawed. For example:
That’s not quite how it works. Or rather, it works that way if your prompt is generic. If you prompt with: “Write me an essay about the central themes in Macbeth”, there are thousands of essays on that topic, and the generality of your prompt is going to produce something close to the statistical center of those essays.
But it doesn’t have to be that way. You can deviate from the mean by pushing the system into less-populated regions of conceptual space. In fact, this is often considered a central aspect of creativity: combining known elements into previously unseen combinations.
A simple way to see this is to move the prompt away from generic territory.
For example, if you prompt the system with something like “Write the opening paragraph of a short story about a vacuum cleaner that becomes sentient, in the style of Thomas Pynchon crossed with Harlan Ellison crossed with H.P. Lovecraft,” you’re a lot less likely to get a reflection of the mean of existing essays or stories. You get something like:
Maybe you read that and think it’s terrible. That’s fine. The point isn’t whether or not it’s good. The point is that it’s not a bland copy of a copy of a copy. It’s idiosyncratic. When people complain about LLM output without distinguishing how they’re using them, they’re often arguing against a very narrow slice of what these systems actually do.
The author also says:
I agree with that much. Not being able to tell them apart is not what gives a piece of writing value.
A while back, Ted Chiang made a somewhat related argument, saying that literature is fundamentally about communication between author and reader, and that this is impossible with LLM-written material because it fundamentally cannot communicate.
Yes, when a human author writes, they are trying to communicate something. But I don’t think that’s where the entirety of value derives from.
I’ve always thought a reasonable working definition is that good writing either makes you think, makes you feel, or (if it’s really good) both. If a piece of text reliably does that, it seems odd to say it lacks literary value purely because of how it was produced.
A beautiful sunset across a lake can be beautiful. It can make you feel all sorts of things. And yet there was no intent behind it. Even if you believe in a god, you probably don’t think they micromanage the minutiae of every sunset. If we accept that beauty can exist without communicative intent in nature, it’s not obvious why it must require it in text.
AI can craft poems, sentences, and whole stories that make you think and feel. I know this because I have reacted that way to their output, even knowing how they were produced. The author of the essay talks about next-token generation, but not about the fact that these systems encode real semantics about real-world concepts. The vector space of encodings clusters similar words (like king and queen) in closer proximity because of semantic similarity. The sophistication of the model’s communication is a direct result of capturing real relationships between concepts.
That allows them to produce output about things like love and regret, not in a way completely divorced from what those words actually mean.
The author also goes on about the need for glands:
You don’t have to get irritated in order to write convincingly about irritation. You don’t have to hold a grudge in order to write convincingly about grudges. LLMs are already an existence proof of this.
Now, you do have to have glands (at least so far) to relate to and be moved by such writing. But you don't need them in order to produce writing that successfully evokes those states in readers.
I don’t think the future of writing is going to be unambiguously better. There will be much more low-effort output, because people will use powerful tools in unimaginative ways.
But after the sifting, I expect there will simply be more interesting writing in the world than there was before.
If that’s right, then AI doesn’t really break literature. It mostly forces us to be clearer about where its value was coming from in the first place.