Models of moderation

by habryka7 min read2nd Feb 201835 comments


Moderation (topic)Public DiscourseSite Meta
Personal Blog

[Author's note: I will move this into meta in a week, but this is a bit more important than the usual meta-announcements, so I will have it in community for a bit.]


This post is trying to achieve roughly four things:

1. Be a future reference for some hopefully useful models about moderation

2. Give people a model of me in particular and how I think about moderation (since I have a bunch of control about moderation on the site)

3. Ask the community for input on the moderation systems we should have on LessWrong

4. Provide some context for moderation changes we have planned that we will explain in a follow-up post

Thoughts on moderation

I think when making my decisions about moderation, there are at least five major models that drive my decisions:

  1. People are competing for limited resources, and moderation helps to move the people participating into a better Nash equilibrium by enacting legislation that rewards cooperative behavior and punishes defective behavior. [microeconomic frame]
  2. There is a distinction between adversarial and non-adversarial states of mind, and the goal of a moderation policy is to cause participants to generally feel safe and deactivate their adversarial instincts. [safety frame]
  3. There is a limited amount of bandwidth available to communicate, and the goal of a moderation policy is to allocate that bandwidth to users that will use that bandwidth for the greatest common good. [bandwidth allocation frame]
  4. There is a shared methodology that underlies the rationalist community that establishes what forms of reasoning are effective, and what perspectives on the world are fruitful. The goal of a moderation policy is to nudge people to use the approaches to reasoning that actually work (and that the community agrees on work). [methodology frame]
  5. We don't really know what moderation policies and technologies work before we see them, but we can judge fairly easily what discussions have been productive. Different people might also need different moderation policies to be effective. So the most important dimension is to encourage experimentation of different moderation principles and technologies, and then set up incentives for the most effective ways of moderation to propagate. [libertarian frame]

I think all of these describe meaningful aspects of reality and are lenses I often apply when thinking about moderation and culture on LessWrong. I want to spend this post discussing what changes the above frames recommend, and how this influenced our plans for moderation on the page. Note that these lenses do not necessarily describe different aspects of the territory, but instead they are often just different abstractions over the same considerations and facts.

1. The microeconomic frame:

In the microeconomic frame, most problems want to be solved by better karma allocation. Karma is the cash of LessWrong, and if we tax things appropriately, then the market will find a naturally effective equilibrium that generally will successfully avoid bad nash-equilibria.

One of the easiest ways to improve moderation in this frame is to allow for better allocation of votes, by allowing users to give different amounts of karma to different users.

The other obvious improvement is to simply tax things that we think are behaviors that have the potential to damage the quality of the discourse. In this world, we might make it so that to create a comment that points out some meta-level feature about the discussion are taxed with a karma penalty. And so people can create meta-discussions, but they will have to pay a certain tax to do so.

In general, here are the things I think this frame suggests about how to design our moderation and karma system:

  1. Make it so that people can allocate their votes in more granular ways, and make it so that people can transfer karma somehow (i.e. pay karma to give someone a larger upvote or something like that).
  2. Allow moderators to give out larger karma rewards and karma penalties, instead of just deleting things or threatening moderation.
  3. Ensure that lower karma does indeed correspond to lower visibility and a lower influence of writing (i.e. the money in the system needs to actually matter for this to work).
  4. Generally focus on technologies like Eigenkarma that promise to better allocate karma according to quality and desired visibility of posts.

2. The safety frame

The safety frame naturally suggests that we want to limit most conversations to a small group of people that has already built up trust with each other, and for the conversations between people who haven't built up trust, we should set up very strong incentives that minimize behavior that puts other people into an adversarial state of mind.

In this frame, there are two major error modes to avoid:

  1. Users are afraid of each other, and generally optimize for a mixture between defensiveness and trying to make the rhetorically strongest case against people they disagree with.
  2. Users are afraid of the moderators and are hesitant to post things because they expect to be punished or possibly even publicly humiliated by the moderators.

(1) is the outcome of having almost no moderation or incentives that avoid adversarial behavior. (2) is the outcome of having too strong moderation, or too unpredictable moderation.

To avoid either of these, you want to make sure that the moderation policy is transparent and predictable, while also being strong enough to actually cause people to feel like they can have reliably good experiences when reading and engaging with the comments.

Here are some things that come to mind when I am considering this frame:

  1. Allow users to restrict who can comment on posts on their personal blogs (i.e. "only users above N karma") to ensure a higher level of average trust between the participating users.
  2. Allow trusted users to moderate their own posts on their personal blogs.
  3. Do not allow everyone to moderate or restrict who can comment on their posts, since that makes people feel scared of being moderated / their effort being wasted by their comments being deleted.
  4. Generally encourage high-levels of charitability, and positively incentivize contributions that make everyone feel safe, even if they do not contribute to the object-level discussion (i.e. incentivize mediation).

3. The bandwidth allocation frame

The bandwidth allocation frame generally wants to allocate bandwidth to users according to how likely they are to contribute in a high-value way to a conversation. This means, when talking about a technical subject, someone with expertise in that specific subject should get more bandwidth than someone who has no expertise. And in general, people with a track record of valuable contributions should be given more bandwidth.

In this frame, the big problem of moderation is identifying who is most likely to contribute positively to a conversation. You can use karma, but in most naive formations that can't really account for expertise in a given topic or domain. Additionally, some users might interact particularly badly with other users, and so you generally want to avoid putting them into the same conversation.

A set of possible solutions consists of allowing users to self-select in various ways, either by creating private groups, limiting visibility/commentability to friendslists and similar things. These tend to scale pretty well and be reasonably effective, but come at the added cost of often creating filter-bubbles and furthering political divides by avoiding contact with perspectives the members in one group disagree with.

Another set of solutions are reputation based, where you try to somehow estimate someone's likelihood of contributing positively based on their past behavior and achievements, such as participation in threads on similar topics in the past.

In general, here is my guess at the things this frame suggests about changes to our moderation and karma system:

  1. If you expect a post to get a lot of comments, it is probably better to limit participation to a group of users who will have a valuable conversation, as opposed to having a giant wall of comments that nobody can really read, and that prevents a real conversation from happening through constant interruption.
  2. Allow users to choose for themselves who they want to contribute to the problems they are trying to solve. This could involve allowing users to only allow users they trust to comment on their content, to selectively ban individuals from posting on their content they've had bad experiences with, or to request answers from specific individuals who they expect to have expertise in the topic they care about (similar to how on Quora you can request answers from specific people).
  3. Allow users to create posts where participation is limited to certain predefined subsets of people. Such as people above a certain karma threshold, people who have read a set of posts that are relevant to fully understanding the content, or people who have achieved certain milestones on the page (such as commenting n-times.
  4. Generally optimize the degree to which users can read and participate in a discussion at various level of time-engagement, by creating easy ways to filter a discussion for only the best contributions and similar interventions.

4. The methodology frame

In the methodology frame, the key question of moderation is "what mental algorithms are best at making collective epistemic progress, and how can we encourage people to adopt these algorithms?".

This frame encourages a certain systematization of how discussion is structured, with content and comments following a structure and rules that we think will be productive. The two big constraints on these methods are, (A) the complexity of the methods and (B) the effectiveness of the method. We want to find simple methodologies that people can actually follow, and that we can build common-knowledge around, that actually allow us to make collective epistemic progress.

Importantly, the LessWrong community already has a lot of strong shared models about correct methodologies, mostly shaped by the sequences and their connection to the scientific method. This frame encourages us to explicate these models, and to generally encourage content to follow the methodological recommendations of science and the sequences.

Here are my guesses at what this frame recommends:

  1. Comments could be categorized based on a few different purposes, such as "editorial recommendations", "counterarguments", "appreciations", "anecdotes" and "inspired thoughts".
  2. We should focus on creating content guidelines that are based in scientific methodology and our thoughts about how we think good reasoning looks like, and want to focus on significant rewards for what we think is good reasoning.
  3. We want to make it easy to discuss the collective methodology of the site, and allow authors and commenters to improve the methodological standards we are applying.
  4. Establishing a shared culture strikes me as highly related, and as such we want to encourage art and media that creates a strong shared culture that has the correct methodology at its core.

5. The libertarian frame:

In the libertarian frame, we want to focus on allowing as much experimentation with moderation policies and methodologies and incentive structures as possible. We then want to pick the things that worked best and implement the lessons learned from those, across the site.


  1. Allow users to create subdomains to LessWrong, somewhat similar to StackExchanges, where they can have their own moderation policies, content guidelines and culture.
  2. Give users complete moderation power over their own blogs and over the comments on their own blogs.
  3. Allow users to personalize their feed and their subscriptions.
  4. Establish multiple forms of karma that track different things, and that track reputation in different domains of the page (i.e. meta-karma, AI-karma, rationality-karma, getting-shit-done-karma, etc).


I think most of the recommendations above are things we want to take seriously, and most of them aren't in that much conflict. But we have limited development budget, and limited time, and so the question becomes what things are most important to implement soon.

I am currently most worried about the safety aspect of the current LessWrong site, with multiple top authors telling me that they prefer not to publish on LessWrong because they expect the comments to be overly aggressive and needlessly critical. This also appears to be a problem on the internet in general, and I've heard from many people in a variety of fields that they have a lot of trouble having open and honest conversation in online public spaces, because they have a constant sense of being attacked.

I think the recommendations of the safety frame are pretty reasonable. However, I don't really know how a well-working moderation system that encourages positivity and safety would look in practice. Moderators coming in and tone-policing strikes me as dangerous and prone to sparking conflict. Karma hasn't historically worked super well at encouraging a feeling of safety, which, based on my own introspection, seems related to me feeling uncomfortable downvoting something if I think its content is correct, even if its subtext seems to be damaging the conversation.

I also have a sense that the bandwidth allocation frame recommends a few things that will naturally help with a feeling of safety, by increasing the degree to which the top users will talk to one another, as opposed to talking to random strangers on the internet, which I expect to both cause higher-content-quality and higher-safety conversations.

Overall, I don't think I have a great synthesis of all the frames above, and generally feel like all of them guide my thinking about moderation at almost every step.

Discussion prompts:

  • What frames am I missing?
  • Am I wrong about the implications of any of the above frames?
  • Do you have any concrete suggestions for moderation systems we should build?
  • Thoughts on what big-picture trajectory we should aim for with our moderation?


33 comments, sorted by Highlighting new comments since Today at 4:43 PM
New Comment

Most subreddits don't try to solve moderation problems with karma markets, they just publish rules and ban violators. These rules can be quite specific, e.g. /r/programming requires submissions to contain code, /r/gamedev limits self-promotion, /r/math disallows homework problems. We have a decade of experience with all kinds of unproductive posters on old LW, so it should be easy to come up with rules (e.g. no politics, no sockpuppets, no one-liners, don't write more than 1 in 10 recent comments...) Do you think we need more than that?

One frame you can take on this is asking the question of: What rules and guidelines do we want to have? Should we have the same rules and guidelines for all sections of the page? What should be the consequences of violating those rules and guidelines? Are the guidelines fuzzy and require interpretation, or are they maximally objective? If the latter, how do you deal with people gaming the guidelines, or you realizing that things are still going wrong with people following the guidelines.

I would be strongly interested in people’s suggestions for rules we want to have, and the ideal ways of enforcing them.

Actually, I'm not sure I want to propose any particular rules. LW2.0 is different from old LW after all. Just keep moderating by whim and let rules arise as a non-exhaustive summary of what you do. In my limited mod experience it has worked beautifully.

Yeah, I think this is basically correct, though for some reason me and other moderators have found this particularly emotionally taxing on the current LessWrong. This partially has to do with the kind of intuition that I think underlies a bunch of Christian's comments on this thread, which I think can often lead to a feeling that every moderation decision has a high likelihood of a ton of people demanding a detailed explanation for those decisions, and that makes the cost of moderating prohibitive in many ways.

I don't think it's necessarily a bad thing when the cost of moderating is high enough to prevent micromanaging.

From my perspective in most cases where you want to moderate, you want the person who wrote the post you moderate to understand why you made the moderation decision to be able to act differently in the past. That works a lot better when the person gets an explanation about their mistake.

It works better on the individual level, and I certainly get why this feels more fair and valuable to an individual contributor.

But moderation is not just about individuals learning - it's about the conversation being an interesting, valuable place to discuss things and learn.

Providing a good explanation for each moderation case is a fair amount of cognitive work. In a lot of cases it can be emotionally draining - if you started moderating a site because it had interesting content, but then you keep having to patiently explain the same things over and over to people who don't get (or disagree with) the norms, it ends up being not fun, and then you risk your moderators burning out and conversational quality degrading.

It also means you have to scale moderation linearly with the number of people on the site, which can be hard to coordinate.

i.e. imagine a place with good conversation, and one person per week who posts something rude, or oblivious, or whatever. It's not that hard to give that person an explanation.

But then if there are 10 people (or 1 prolific person) making bad comments every day, and you have to spend 70x the time providing explanations... on one hand, yes, if you patiently explain things each time, those 10 people might grow and become good commenters. But it makes you slower to respond. And now the people you wanted to participate in the good conversations see a comment stream with 10 unresponded to bad comments, and think "man, this is not the place where the productive discussion is happening."

It's not just about those 10 people's potential to learn, it's also about the people who are actually trying to have a productive conversation.

If you have 1 prolific person making comments every day that have to be moderated, the solution isn't to delete those comments every day but to start by attempting to teach the person and ban the person if that attempt at teaching doesn't work.

Currently, the moderation decisions aren't only about moderators not responding to unresponded bad comments but moderators going further and forbidding other people from commenting on the relevant posts and explaining why they shouldn't be there.

Karma votes and collapsing comments that get negative karma is a way to allow them to have less effect on good conversations. It's the way quality norms got enforced on the old LessWrong. I think that the cases where that didn't work are relatively few and that those call for engagement where there's first an attempt to teach the person and the person is banned when that doesn't work.

(I'm speaking here about contributions made in good faith. I don't think moderating decisions to delete SPAM by new users needs explaining)

It's important to have a clear moderation policy somewhere, even if the moderation policy is simply, "These are a non-exhaustive set of rules. We may remove content that tries to ignore the spirit of them". People react less negatively if they've been informed.

Yeah. We do have the Frontpage moderation guidelines but they aren't as visible, and we're planning to add a link to them on the post and comment forms.

I very much agree with this approach, and recommend this tumblr post as the best commentary I’ve read on the general principle that suggests it.

I feel like you haven't been particularly happy with that approach of ours in the past, though I might have misread you in this respect. My read was that you were pretty frustrated with a bunch of the moderation decisions related to past comments of yours. Though the crux for that might not at all be related to the meta-principle of moderation, and had simply more to do with differing intuitions about what comments should be moderated.

Seems plausible that just rules are good enough. There are benefits to having more dynamic stuff, and I would guess that a lot of LessWrong's success partially comes from its ability to cover a large range of subjects, and to draw deep connections between things that initially seem unrelated. At the moment it seems unlikely that a system as restricted as a StackExchange seems like the best choice for us, but moving more in that direction might get us a lot of the benefit.

The karma thing seems to suffer from the fact that karma doesn't actually have any real use at the moment, hurting the currency metaphor. Something like a Stack Exchange-style system of privileges tied to karma levels might help with that, if indeed we want karma to work as something like a currency.

Having a higher voting weight does feel like a privilege to me.

Stackoverflow really is the model here. Earning new privileges is fun and they space them out nicely!

Yeah, we've actually been moving in that direction a bunch. You should soon get more privileges with more karma.

Idea: at the bottom of the post the author could add "[Moderation policy: please be gentle]" or "[Moderation policy: fight me]", etc. Then other users/moderators would downvote/delete any comments that violate the policy. This is in response to

multiple top authors telling me that they prefer not to publish on LessWrong because they expect the comments to be overly aggressive and needlessly critical.

If some authors are more sensitive than others, they should be able to say so. On the other hand, I think that needlessly critical comments are important to my experience of LW and I much prefer them to no comments at all. Another benefit is that this costs nothing. If this would work, then maybe it could be implemented as a feature.

Although, there remains a question about what the default policy should be.

I think the moderation frames you listed would be sufficient for the majority of community members. Mild carrots and mild sticks serve as guardrails to control the behavior of the median rationalist who may be somewhat prickly and stubborn.

These frames don’t really seem to have much to say about the highly visible moderation disasters I’ve observed. I’ve seen posters create armies of abusive sockpuppets and outflank attempts to crack down on them. I’ve seen trolls systematically pollute every thread with negativity until they grew bored with the project. I’ve seen cranks post numerous threads in a short period of time and flood the feeds with a hundred comments per day about their idiosyncratic interest. None of these people were, nor could they have been, coerced by softer, reputation-based moderation policies. The reason for the disaster is that they don’t care about their reputation in the community or the actively despise the community.

I’m mentioning this because I feel like this brand of bad poster has done much more damage than the essentially well-meaning but slightly tone deaf kind of rationalist who usually reacts to a slap on the wrist by actually adjusting their behavior. And the only set of policies I've ever really observed to successfully deal with bad faith posters was (1) to require a small registration fee and (2) ban people after two or three strikes. And I don’t think there’s anything like the political will to make LW2 a paid service. I just don’t see another general solution.

Instead of building a subdomain system, it might make more sense to change the software in a way that makes it easy to deploy it as open-source.

The EA forum won't be hosted as a subdomain on LessWrong but it could still profit from the same discussion architecture and the same is likely true for a lot of projects.

That would also be good, but having a sub-domain would be much, much easier than having to run, update and support your own server. You would need a certain level of technical skills, which would be a significant barrier to entry.

When it comes to missing frames you said nothing about explaining norms of behavior and having a discussions about norms of how to comment.

The idea that you can talk to people to convince them to act in a better way from a paragraph like:

Allow moderators to give out larger karma rewards and karma penalties, instead of just deleting things or threatening moderation.

Especially in rationalist circles, I would prefer attempts at convincing people over trying to solve the problem with carrots and sticks.

It's not easy to have shared conversations about what's too critical and what isn't but those conversations are likely necessary.

When it comes to missing frames you said nothing about explaining norms of behavior and having a discussions about norms of how to comment.

I did a bit, though I agree it wasn't as clear as I would like it to be. In the recommendations of the methodological frame I said:

We want to make it easy to discuss the collective methodology of the site, and allow authors and commenters to improve the methodological standards we are applying.

And at least in my internal model, this was basically saying exactly that: Focus on explaining the norms, culture and methodology of the community, and generally ensure that productive meta-discussion is happening.

We want to make it easy to discuss the collective methodology of the site, and allow authors and commenters to improve the methodological standards we are applying.

But you also wrote

we might make it so that to create a comment that points out some meta-level feature about the discussion are taxed with a karma penalty. And so people can create meta-discussions, but they will have to pay a certain tax to do so.

Do these contradict each other? More generally, I don't understand what's bad about meta. "What environment encourages productive debate" is surely a very important question and possibly something we don't talk about enough.

Yeah, they are both valid considerations, pointing in somewhat opposite directions. Though importantly the second paragraph was about someone bringing meta into an object-level discussion, whereas the first one was about generally making it easy for people to discuss the broad trajectory of the site. I am all in favor of people discussing more stuff in the meta section, but fairly against people derailing a conversation by talking about the moderation guidelines in the middle of an object-level discussion thread.

There are two ways I remember previous admin behavior not following what I have in mind:

Shutting down commenting to a particular comment without explanation and then moving to shut down a discussion about that by shutting down comments again.

Claiming that karma voting isn't made with "the whole website in mind" and nobody knows why people vote the way they did. I think it's part of LW's culture to often have people say "I downvoted because of X" in a comment and then that comment can be discussed.

It's also possible to ask "Why was this downvoted?". This usually results in either someone writing an explanation or readers thinking "I don't see any reason why this should be low karma, I'll upvote it." Those discussions are productive ways to form a shared understanding about what we downvote and I consider them an important addition to threads on meta/

I agree. Soft power goes a long way. I expect that most of the time a mod messaging someone and asking them to be nicer would work, although you'd want to write the message very carefully because you don't want the person to feel embarassed and leave LW.

I believe that sub-domains are the most important feature out of all those mentioned. We've seen how sucessful this has been on Reddit, while most of the other features mentioned are rather speculative.

If a trusted user can moderate their own blog, I would suggest making this highly visible so that people are aware that a different moderation policy applies.

"With multiple top authors telling me that they prefer not to publish on LessWrong because they expect the comments to be overly aggressive and needlessly critical" - I don't know if moderation would actually solve this problem as I feel that most authors would be very reluctant to delete criticism because of the risk of getting backlash because of it.

We don't have enough content on this site to separate it into several subdomains. Unless it would still be all in one place, in the spirit of r/all?

Yeah, my current read is that we don’t have enough content for subreddits, which makes me expect that this is quite a while away, until we do have enough critical mass for multiple spaces.

I believe Eigenkarma to be a great feature because it's as simple for the user as our existing system and allows better weighting of votes at the same time.

Eigenkarma would also make limits of participation by karma for commenting on specific threads a better filter.

It doesn't strike me as simple in terms of conceptual load. The user would no longer see a direct, reliable increase in the karma of a post as a result of them upvoting/downvoting it, which actually strikes me as pretty problematic. I.e. in eigenkarma the more outgoing votes you have, the less powerful each marginal vote becomes, and so you would see a decline in the weight of your votes everytime you vote on something, which is much less predictable than just knowing that your vote-weight is "4" or something like that.

Like all posts, I think this could be greatly improved by concrete examples. What behavior do we want to discourage? Is LW having moderation problems right now? Do we expect there to be more problems in the future?

multiple top authors telling me that they prefer not to publish on LessWrong because they expect the comments to be overly aggressive and needlessly critical.

Oh, I guess that could be my fault? If that's the case, I'd repeat ChristianKl's point - I would prefer attempts at convincing people over trying to solve the problem with carrots and sticks.

Agree on more concrete examples. I will try to keep that in mind for future posts.