The New LessWrong LLM Policy is Worse Than You Think

Oliver Kuperman

Recently, the moderators of LessWrong have decided to change the site's policies on LLM usage. The essence of the policy can be summarized by the following excerpt:

With all that in mind, our new policy is this:
"LLM output" includes all of:
text written entirely by an LLM
text that was written by a human and then substantially^[6] edited or revised by an LLM
text that was written by an LLM and then edited or revised by a human
"LLM output" does not include:
text that was written by a human and then lightly edited or revised by an LLM
text written by a human, which includes facts, arguments, examples, etc, which were researched/discovered/developed with LLM assistance. (If you "borrow language" from the LLM, that no longer counts as "text written by a human".)
code (either in code blocks or in the new widgets)
"LLM output" must go into the new LLM content blocks. You can put "LLM output" into a collapsible section without wrapping it in an LLM content block if all of the content is "LLM output". If it's mixed, you should use LLM content blocks within the collapsible section to demarcate those parts which are "LLM output".
We are going to be more strictly enforcing the "no LLM output" rule by normalizing our auto-moderation logic to treat posts by approved^[7] users similarly to posts by new users - that is, they'll be automatically rejected if they score above a certain threshold in our automated LLM content detection pipeline. Having spent a few months staring at what's been coming down the pipe, we are also going to be lowering that threshold.

While certainly well-intentioned, the policies are rather vague, difficult to enforce, and detrimental to the development of high-quality posts.

In this essay, I will demonstrate the benefits of using LLMs for writing, address the arguments cited in favor of the policy change, express the value that LLMs provide in the writing process, and advocate for a more nuanced "solution" to the increasing usage of LLMs on this forum.

The benefits of using LLMs for writing:

LLMs save a significant amount of time for the following reasons:

1.) Boilerplate:

For sections of posts that are more about the craft of writing itself instead of ideas (Introduction, Conclusion), having an LLM expand upon a template saves a bunch of time while not really changing the message much.

2.) Editing:

Sometimes I write a paragraph, and the wording is just a bit off. In my experience, LLMs are pretty good at taking something you wrote and making it sound smoother. Some people may enjoy the process of rewriting a paragraph until it sounds just right, but I personally care more about expressing my ideas in an engaging manner than engaging in the craft of writing.

3.) Translation:

For non-native English speakers, LLMs can help effectively translate their ideas into English. While its difficult to precisely measure the benefits of using LLMs for translation compared to traditional methods such as Google Translate, LLMs outperformed google translate in this study on the translation of ancient Indian texts to English, and most of the evidence I have seen on this question points to LLMs being better. It seems like the policy change ignored this, but even if it didn't, the moderators would have a dilemma of either creating a carve-out for only non-native English speakers, clearly demonstrating the arbitrary nature of the policy, or making writing more difficult for these users.

4.) Source Searching, Feedback, and Other Auxiliary Uses:

Beyond writing itself, LLMs are a good tool for finding relevant sources. While traditional search engines can also do the job, I find LLMs are often better for niche topics. If I am going to be using an LLM anyway, I might as well use it for other tasks. A similar thing could be said for LLM feedback (although I haven't used this that much) and image generation.

While LLMs certainly aren't perfect at writing, and are not a substitute for human thinking, they serve to substantially reduce the amount of time it takes to write posts while not really detracting from the user's authentic voice if the user is using the tools responsibly. The policy recognizes this somewhat by allowing for content "lightly edited or reviewed by an LLM", but this standard is somewhat unclear, likely varies from moderator to moderator, and risks creating a chilling effect on LLM usage. A better approach is to only police posts which are almost entirely devoid of human input.

A response to the critiques of using LLM while writing:

1.) "LLM Writing is Worse."

To start, I think there is definitely an element of truth to this claim, as in my experience LLMs, when asked to write on their own, tend to be less creative, engaging, and insightful than human writers. However, I think this problem is mitigated somewhat when you use LLMs less like a ghost writer and more like autocomplete by telling them to improve/flesh out sections of text which already have a clear direction.

As a commenter on the post explaining the update wrote, LLM writing is now functionally indistinguishable from human writing. Readers have difficulty differentiating between human-generated text and LLM-generated text.

While certain individuals may be able to detect AI writing better than others and be annoyed with certain stylistic elements commonly used in LLM writing, I think that there is no reason we cannot rely on upvotes to decide what type of content the broader Less Wrong community wants to see.

2.) "Using LLM writing obfuscates the human mind behind the screen":

In the update, the writer makes the argument that a substantial thing that we care about is the beliefs and perspectives of the writer, instead of just the arguments provided by them. I agree with this statement to an extent, which is why I think that people who use LLMs to assist their writing should review LLM outputs to ensure they represent their argument well (and also to ensure the outputs are factual). However, once again, I do not see why a policy change is necessary to address this.

Even before LLMs, the exact opinions and attitudes of the author are often clarified in the comment section. People often make careless mistakes, poor wording choices, or conspicuous omissions in their own writing, so I don't think much is lost in the case where an LLM writes something in a slightly different way than the author intended (I would like to actually see a significant example of this happening, though. In my experience, LLMs are pretty good at filling in an argument if you give it a decent amount to work with).

Some might fear that some people may just let LLMs take over their writing entirely, but I think very few people actually just let LLMs generate an entire post with minimal input. Even if they did, the post would likely be low effort in more ways than one and downvoted, solving the issue without a dedicated moderation policy. However, even if LLM writing advances to a point where this type of writing would not be filtered out, there are better ways to deal with it than a blanket ban on LLM-assisted content. Simply tracking the amount of time commenters on a draft, combined with checking for very high thresholds of LLM writing, could practically eliminate pure LLM writing.

3.) "This policy is necessary to combat bots":

In the post laying out the introduction of the new LessWrong LLM policy, this argument was not present, but you can certainly combat bots without blanket banning all substantial LLM usage in posts.

The Current Policy is either Very Difficult to Enforce, Subjective, or Overly Restrictive (or some combination of the three):

Maybe the LessWrong team has cracked the code on detecting LLM writing, but it is fairly difficult to actually determine whether a text was generated by an AI or a human. A 2025 study on GPTZero's ability to detect LLM generated text saw GPTZero gave a 14.75% chance on average that human-generated long essays (350-800 words) were LLM-generated. It is important to note that this test was only done for ChatGPT 3.5 and ChatGPT 4o, that AI models have advanced since 2025, and that as time goes on, LLM writing has begun to influence human writing.

All of these factors lead me to believe that either the LessWrong AI detection system will have difficulty flagging all but the most obvious cases of LLM writing, or result in an unacceptable level of false positives. These circumstances also invite a high degree of subjectivity into moderation decisions. Most moderation policies have an element of subjectivity involved with enforcement, but with something as difficult to detect as LLM writing, the capacity for mis-moderation is higher. To illustrate why, take the quote below from LessWrong Admin Oliver Habryka on Neel Nanda's use of LLM Transcription (bolded letters added for emphasis):

LLM transcription is IMO a completely different use-case (one I certainly didn't think of when thinking about the policy above), so in as much as the editing post-transcription is light, you would not need to put it into an LLM block. I also think structural edits by LLMs are basically totally fine, like having LLMs suggest moving a section earlier or later, which seems like the other thing that would be going on here.
We intentionally made the choice that light editing is fine, and heavy editing is not fine (where the line is somewhere between "is it doing line edits and suggesting changes to a relatively sparse number of individual sentences, or is it rewriting multiple sentences in a row and/or adding paragraphs").
Also just de-facto, none of the posts you link trigger my "I know it when I see it" slop-detector, so you are also fine on that dimension.

From Oliver Habryka's response, we can see that a great deal of subjectivity will be involved in moderation decisions used under this rule.

The Current Policy Promotes Rule Breakers:

As with any selectively enforced rule, the moderation policies will affect scrupulous posters more than unscrupulous posters. As someone who tries hard to respect the rules of others, I (and others like me) will abstain from LLM use while posting on this form, while others who are less scrupulous will not. Due to the efficiency gains in writing from LLMs, less scrupulous posters will increase their share of the posts on this forum. The effects of this on the forum are difficult to predict, but I think there is reason to believe it will not improve things.

A Better Alternative:

While I disagree about the benefits of the LessWrong LLM policy, I understand that certain users may dislike LLM-assisted posts for a wide variety of reasons. For the sake of these users, I recommend creating a new category of posts (LLM Free), which will be an optional filter for users. Doing so preserves the benefits of LLM writing while also allowing those bothered by it to avoid it.

Along with this, I would support a ban on "pure" LLM posts, which see users spend very little time reviewing the draft and post something with minimal human input. I think the simplest way to do this would be to track the number of edits on a post combined with LLM detection software, and only remove posts where it is extremely obvious that the post is unreviewed LLM content. Posts that use LLMs in a collaborative manner or with substantial human input and review should not be affected by this policy.

I would also endorse a ban on the use of LLMs for quick takes and comments, as these mediums naturally are more about human interaction than a post is, and the benefits of LLMs decline with the length of the writing being produced.

The above policy would solve the worst problems posed by LLM writing while still preserving the benefits it provides to LessWrong users.

Edit: It seems that people are more worried about new users using LLMs than high karma users, and so I would also support leaving the LLM rules in place for people below a certain karma threshold, as the arguments laid out in favor of the policy are strongest for unscrupulous, low-effort posters who are more likely to misuse LLMs. While karma isn't a perfect benchmark, it probably does correlate with effort somewhat, as high-effort, truthful content is what users and moderators alike have professed to prefer.

Edit II:

My comment section discussion with Seth Herd helped me to better understand why some people might support this new moderation policy. The usage of LLMs can make it harder to tell if a post is low-effort or high-effort, and also lowers the barriers to posting, so allowing LLMs can make it harder for users to find genuinely good or great content. While I am more sympathetic to this argument than the others considered by this post, I think the best way to address concerns about crowd-out is by creating moderation tools to measure the effort applied to posts and promote high-effort posts, rather than enact a policy which is only tangentially related to this goal. While I am uncertain about the best way to go about this, as I wrote in a comment:

tracking the amount of time spent editing/ number of edits on a given LessWrong post would be a good way to judge the amount of effort placed into a post (this should not be too difficult to track/implement, and for people who write their posts in google drive or word, I doubt it would be a huge inconvenience to move over to LessWrong).

Written with Grammarly spell check.

1.) Boilerplate:
For sections of posts that are more about the craft of writing itself instead of ideas (Introduction, Conclusion), having an LLM expand upon a template saves a bunch of time while not really changing the message much.

In my experience, LLMs are pretty good at filling in an argument if you give it a decent amount to work with

In discussions of such things, I would like to request examples of this from folks proposing that AIs are equally good or better at writing than humans. My life as a LW mod is filled with reading slop/bullshit, so flat assertions to the contrary are not persuasive to me.

As I alluded to in this post, my thoughts on LLM writing are multifaceted. I think that LLMs lack the creativity of human writers and are not a substitute for good ideas and a sense of direction. However, I think someone who has a good idea of what they want to write (and a compelling subject to write about) can use LLMs to save considerable amounts of time and also improve their writing on the margins.

Because you asked, I think this recent post of mine is a good example of how LLMs can help with writing. Compared to the baseline of this post (which I think is not particularly well written), the example post is considerably smoother, while also being faithful to the vision I had for the post.

you can certainly combat bots without blanket banning all substantial LLM usage in posts.

"LLM output" must go into the new LLM content blocks. You can put "LLM output" into a collapsible section without wrapping it in an LLM content block if all of the content is "LLM output". If it's mixed, you should use LLM content blocks within the collapsible section to demarcate those parts which are "LLM output".

I am confused about why you think this constitutes a ban.

We are going to be more strictly enforcing the "no LLM output" rule by normalizing our auto-moderation logic to treat posts by approved^[7] users similarly to posts by new users - that is, they'll be automatically rejected if they score above a certain threshold in our automated LLM content detection pipeline. Having spent a few months staring at what's been coming down the pipe, we are also going to be lowering that threshold.

The above quote lays out pretty clearly that substantial LLM usage will be banned. This is further reinforced by the quote of Oliver Habryka I included:

We intentionally made the choice that light editing is fine, and heavy editing is not fine (where the line is somewhere between "is it doing line edits and suggesting changes to a relatively sparse number of individual sentences, or is it rewriting multiple sentences in a row and/or adding paragraphs").

If you are a new user and submit a post that substantially consists of content inside of LLM content blocks, it is pretty unlikely that it will get approved^[8]. This does not suddenly become wise if you're an approved user. If you're confident that people will want to read it, then sure, go ahead, but please pay close attention to the kind of feedback you get (karma, comments, etc), and if this proves noisy we'll probably just tell people to cut it out.

I took this to indicate that the ban on LLM content applied specifically to posting it outside of LLM blocks. I don't know what this paragraph could mean if substantial LLM usage will be blanket banned ("go ahead" and post something the Pangram filter will throw right in the trash?), @habryka can you clarify?

Yes, obviously content in the new LLM Content Block elements is excluded from the automated LLM content detection; how would it even be possible to use them otherwise?

Thanks for clarifying. So posts that place 100% of their content within will be approved? What about 50%? 45%? Will this content be disadvantaged? I think a lot of the same concerns apply, even if this policy is somewhat less strict than I thought.

We apply higher standards to posts by new users (in-particular we say in the onboarding docs and the new post page that your first post to LessWrong is kind of like a job application). This is just saying that a post co-written by AI will likely cause you to not meet that bar (though it's not guaranteed!).

I think to have any chance of catching the mice these days you just have to give the ratcatchers flanethrowers, or else it is hopeless and you will be overrun

I think bot detection is getting more and more difficult, but I do not think we are at the point where less invasive mouse-capturing procedures are ineffective. This is reflected by the fact that a "bot-pocalypse" where LessWrong is overwhelmed by non-human posters was not referenced in the original post justifying the new LLM policy. Why risk burning down the house when mouse-traps can still work?

I don't think catching bots plays any real role in the policy. It's largely IMO about preventing pollution of the epistemic commons by LLM slop.

You do address that by noting that the author should be sure the final writing reflects their views. That's roughly the old policy. The new policy is that you put anything an LLM touched more than a little in a deicated block (it's not at all a ban, just that it's prone to be ignored if it's all LLM-blocked- but some stuff has already been posted all in LLM block and upvoted).

Boilerplate: Don't write it. If your intro and conclusion don't have content, don't write them. If they do, get them right. My intros are my highest-effort portion, because I assume many readers will just read them then say they've got the basic message and go read something else.

Editing: light editing is allowed. Heavy editing always changes the meaning. Whether it's changed a lot is specific to the writing and very much a judgment call. But saying "make sure you looked closely" is entirely unenforceable. You'd assume lots of people just aren't going to take the time.

Writing advice: not at all mentioned in the policy. By all means do use LLMs to critique your work (if you don't have human experts available), just make your own judgments about what critiques to take seriously. This is not a valid criticism of the LLM policy - unless you wanted to argue this should be against the rules, if they want to consistently ban LLM claims/ideas!

I think this is really worth digging into, but I think you're mostly not addressing the biggest reason they implemented the policy.

I feel torn and I expect further discussion to result in a refined policy before too long.

I too am bothered by Habryka's implication that the real rule is an AI detector (including the one in his head). Because that's not what they said the rule is.

OTOH what Neel directly described is clearly allowed within the rules - but I think what he implied (using his notes and samples) is that he doesn't actually dictate a full first draft, just the ideas. So the implication is that there's a different rule for Neel than for the rest of us. Which makes sense; Neel has proven his contributions to be high-quality, however he's produced them.

But that's not what the post says is the rule.

Ah well; no set of rules is perfect. I do think this is worth discussing and thinking about.

I don't think catching bots plays any real role in the policy. It's largely IMO about preventing pollution of the epistemic commons by LLM slop.

I think I responded to this line of thinking a bit in my post, but I think this "pollution" is greatly overblown. Compared to humans, LLMs have been found to be better at analyzing complex texts, less likely to believe myths, and third statement to make this sentence sound better (the last part of this sentence is a joke and demonstrates the importance of boilerplate in writing).

Editing: light editing is allowed. Heavy editing always changes the meaning. Whether it's changed a lot is specific to the writing and very much a judgment call. But saying "make sure you looked closely" is entirely unenforceable. You'd assume lots of people just aren't going to take the time.

The line between light and heavy editing is blurry, and if you assume people aren't even going to take the time to review LLM outputs, why would you expect them not to make false claims on their own accord? This is a problem with humans, not LLMs.

So the implication is that there's a different rule for Neel than for the rest of us. Which makes sense; Neel has proven his contributions to be high-quality, however he's produced them.

Maybe just ban LLMs for new users or create a karma threshold after which LLM usage is allowed? It seems like the majority of the rationale for the ban is "unscrupulous users will use LLMs irresponsibly and produce writing which is just good enough to not be downvoted, gradually crowding out higher effort posts", but if this is the case, then the policy should be more targeted towards such users. People with substantial post history have hopefully already shown themselves to be fairly scrupulous.

Without targeting, the justification for this policy becomes tantamount to "we should ban driving, because some people will drive drunk". How about we focus on those most likely to drive drunk instead of just banning cars for everyone?

I think you've got to do more on this front, because that's the core of why they're enacting this policy.

"compared to humans" won't cut it for these purposes; they actually don't want the average human on LW exactly because they're bad at stuff.

Boilerplate is super important for most writing outlets. Making things sound better without being better is does not bring upvotes here (usually). We are blessed that it's not required or appreciated on LW. Mod policy is an attempt to keep LW a special and better place than the rest of the internet.

The question of how much people make claims they don't really endorse is a huge one. I think they do it a lot - but a lot less than LLMs currently do it by default.

Banning LLMs for new users is exactly what they were doing, and changed away from.

I think I agree with you in principle that we should just ban driving drunk, but we have no breathalyzer for how much human thought was put into any given piece of writing.

The other thing to think of is this: if we make everyone an excellent writer without improving their thinkingk we'll lose the signal we currently have that helps us read good ideas by noting good writing.

It sucks to lean on that weak signal, because good thinking sometimes occurs without good writing, but we are leaning on that to filter the large quantify of writing posted on LW currently.

Making things sound better without being better is does not bring upvotes here (usually). We are blessed that it's not required or appreciated on LW. Mod policy is an attempt to keep LW a special and better place than the rest of the internet.

...

The other thing to think of is this: if we make everyone an excellent writer without improving their thinkingk we'll lose the signal we currently have that helps us read good ideas by noting good writing.

Okay. Which interpretation of the role of writing quality on LessWrong would you like to defend?

Jokes aside, I think everyone writing in a better manner would be better, as better writing is typically more pleasant to read and also conveys ideas more effectively than worse writing.

As to your point about writing quality being one of the best gauges we have for human thought being put into writing, I can kinda see that, but if the moderators want a better gauge for high effort writing, they should put more effort into finding new ways to measure effort, instead of just making a policy that tangentially affects this and either won't really be enforceable anyway or lead to a lot of false positives.

I think tracking the amount of time spent editing/ number of edits on a given LessWrong post would be a good way to judge the amount of effort placed into a post (this should not be too difficult to track/implement, and for people who write their posts in google drive or word, I doubt it would be a huge inconvenience to move over to LessWrong).

Because we restrict writing to humans, things that sound better usually also are better. There's no contradiction there. Good writing elsewhere means convincing outside of quality of ideas; here it does not.

I like that idea for tracking actual contribution. But what they've done in the meantime is a hell of a lot easier to implement. That would take some tricky algorithms.

This policy is almost strictly more permissive than the previous policy, so I think this is a pretty confused argument.