Models of moderation

by habryka7 min read2nd Feb 201835 comments


Moderation (topic)Public DiscourseSite Meta
Personal Blog

[Author's note: I will move this into meta in a week, but this is a bit more important than the usual meta-announcements, so I will have it in community for a bit.]


This post is trying to achieve roughly four things:

1. Be a future reference for some hopefully useful models about moderation

2. Give people a model of me in particular and how I think about moderation (since I have a bunch of control about moderation on the site)

3. Ask the community for input on the moderation systems we should have on LessWrong

4. Provide some context for moderation changes we have planned that we will explain in a follow-up post

Thoughts on moderation

I think when making my decisions about moderation, there are at least five major models that drive my decisions:

  1. People are competing for limited resources, and moderation helps to move the people participating into a better Nash equilibrium by enacting legislation that rewards cooperative behavior and punishes defective behavior. [microeconomic frame]
  2. There is a distinction between adversarial and non-adversarial states of mind, and the goal of a moderation policy is to cause participants to generally feel safe and deactivate their adversarial instincts. [safety frame]
  3. There is a limited amount of bandwidth available to communicate, and the goal of a moderation policy is to allocate that bandwidth to users that will use that bandwidth for the greatest common good. [bandwidth allocation frame]
  4. There is a shared methodology that underlies the rationalist community that establishes what forms of reasoning are effective, and what perspectives on the world are fruitful. The goal of a moderation policy is to nudge people to use the approaches to reasoning that actually work (and that the community agrees on work). [methodology frame]
  5. We don't really know what moderation policies and technologies work before we see them, but we can judge fairly easily what discussions have been productive. Different people might also need different moderation policies to be effective. So the most important dimension is to encourage experimentation of different moderation principles and technologies, and then set up incentives for the most effective ways of moderation to propagate. [libertarian frame]

I think all of these describe meaningful aspects of reality and are lenses I often apply when thinking about moderation and culture on LessWrong. I want to spend this post discussing what changes the above frames recommend, and how this influenced our plans for moderation on the page. Note that these lenses do not necessarily describe different aspects of the territory, but instead they are often just different abstractions over the same considerations and facts.

1. The microeconomic frame:

In the microeconomic frame, most problems want to be solved by better karma allocation. Karma is the cash of LessWrong, and if we tax things appropriately, then the market will find a naturally effective equilibrium that generally will successfully avoid bad nash-equilibria.

One of the easiest ways to improve moderation in this frame is to allow for better allocation of votes, by allowing users to give different amounts of karma to different users.

The other obvious improvement is to simply tax things that we think are behaviors that have the potential to damage the quality of the discourse. In this world, we might make it so that to create a comment that points out some meta-level feature about the discussion are taxed with a karma penalty. And so people can create meta-discussions, but they will have to pay a certain tax to do so.

In general, here are the things I think this frame suggests about how to design our moderation and karma system:

  1. Make it so that people can allocate their votes in more granular ways, and make it so that people can transfer karma somehow (i.e. pay karma to give someone a larger upvote or something like that).
  2. Allow moderators to give out larger karma rewards and karma penalties, instead of just deleting things or threatening moderation.
  3. Ensure that lower karma does indeed correspond to lower visibility and a lower influence of writing (i.e. the money in the system needs to actually matter for this to work).
  4. Generally focus on technologies like Eigenkarma that promise to better allocate karma according to quality and desired visibility of posts.

2. The safety frame

The safety frame naturally suggests that we want to limit most conversations to a small group of people that has already built up trust with each other, and for the conversations between people who haven't built up trust, we should set up very strong incentives that minimize behavior that puts other people into an adversarial state of mind.

In this frame, there are two major error modes to avoid:

  1. Users are afraid of each other, and generally optimize for a mixture between defensiveness and trying to make the rhetorically strongest case against people they disagree with.
  2. Users are afraid of the moderators and are hesitant to post things because they expect to be punished or possibly even publicly humiliated by the moderators.

(1) is the outcome of having almost no moderation or incentives that avoid adversarial behavior. (2) is the outcome of having too strong moderation, or too unpredictable moderation.

To avoid either of these, you want to make sure that the moderation policy is transparent and predictable, while also being strong enough to actually cause people to feel like they can have reliably good experiences when reading and engaging with the comments.

Here are some things that come to mind when I am considering this frame:

  1. Allow users to restrict who can comment on posts on their personal blogs (i.e. "only users above N karma") to ensure a higher level of average trust between the participating users.
  2. Allow trusted users to moderate their own posts on their personal blogs.
  3. Do not allow everyone to moderate or restrict who can comment on their posts, since that makes people feel scared of being moderated / their effort being wasted by their comments being deleted.
  4. Generally encourage high-levels of charitability, and positively incentivize contributions that make everyone feel safe, even if they do not contribute to the object-level discussion (i.e. incentivize mediation).

3. The bandwidth allocation frame

The bandwidth allocation frame generally wants to allocate bandwidth to users according to how likely they are to contribute in a high-value way to a conversation. This means, when talking about a technical subject, someone with expertise in that specific subject should get more bandwidth than someone who has no expertise. And in general, people with a track record of valuable contributions should be given more bandwidth.

In this frame, the big problem of moderation is identifying who is most likely to contribute positively to a conversation. You can use karma, but in most naive formations that can't really account for expertise in a given topic or domain. Additionally, some users might interact particularly badly with other users, and so you generally want to avoid putting them into the same conversation.

A set of possible solutions consists of allowing users to self-select in various ways, either by creating private groups, limiting visibility/commentability to friendslists and similar things. These tend to scale pretty well and be reasonably effective, but come at the added cost of often creating filter-bubbles and furthering political divides by avoiding contact with perspectives the members in one group disagree with.

Another set of solutions are reputation based, where you try to somehow estimate someone's likelihood of contributing positively based on their past behavior and achievements, such as participation in threads on similar topics in the past.

In general, here is my guess at the things this frame suggests about changes to our moderation and karma system:

  1. If you expect a post to get a lot of comments, it is probably better to limit participation to a group of users who will have a valuable conversation, as opposed to having a giant wall of comments that nobody can really read, and that prevents a real conversation from happening through constant interruption.
  2. Allow users to choose for themselves who they want to contribute to the problems they are trying to solve. This could involve allowing users to only allow users they trust to comment on their content, to selectively ban individuals from posting on their content they've had bad experiences with, or to request answers from specific individuals who they expect to have expertise in the topic they care about (similar to how on Quora you can request answers from specific people).
  3. Allow users to create posts where participation is limited to certain predefined subsets of people. Such as people above a certain karma threshold, people who have read a set of posts that are relevant to fully understanding the content, or people who have achieved certain milestones on the page (such as commenting n-times.
  4. Generally optimize the degree to which users can read and participate in a discussion at various level of time-engagement, by creating easy ways to filter a discussion for only the best contributions and similar interventions.

4. The methodology frame

In the methodology frame, the key question of moderation is "what mental algorithms are best at making collective epistemic progress, and how can we encourage people to adopt these algorithms?".

This frame encourages a certain systematization of how discussion is structured, with content and comments following a structure and rules that we think will be productive. The two big constraints on these methods are, (A) the complexity of the methods and (B) the effectiveness of the method. We want to find simple methodologies that people can actually follow, and that we can build common-knowledge around, that actually allow us to make collective epistemic progress.

Importantly, the LessWrong community already has a lot of strong shared models about correct methodologies, mostly shaped by the sequences and their connection to the scientific method. This frame encourages us to explicate these models, and to generally encourage content to follow the methodological recommendations of science and the sequences.

Here are my guesses at what this frame recommends:

  1. Comments could be categorized based on a few different purposes, such as "editorial recommendations", "counterarguments", "appreciations", "anecdotes" and "inspired thoughts".
  2. We should focus on creating content guidelines that are based in scientific methodology and our thoughts about how we think good reasoning looks like, and want to focus on significant rewards for what we think is good reasoning.
  3. We want to make it easy to discuss the collective methodology of the site, and allow authors and commenters to improve the methodological standards we are applying.
  4. Establishing a shared culture strikes me as highly related, and as such we want to encourage art and media that creates a strong shared culture that has the correct methodology at its core.

5. The libertarian frame:

In the libertarian frame, we want to focus on allowing as much experimentation with moderation policies and methodologies and incentive structures as possible. We then want to pick the things that worked best and implement the lessons learned from those, across the site.


  1. Allow users to create subdomains to LessWrong, somewhat similar to StackExchanges, where they can have their own moderation policies, content guidelines and culture.
  2. Give users complete moderation power over their own blogs and over the comments on their own blogs.
  3. Allow users to personalize their feed and their subscriptions.
  4. Establish multiple forms of karma that track different things, and that track reputation in different domains of the page (i.e. meta-karma, AI-karma, rationality-karma, getting-shit-done-karma, etc).


I think most of the recommendations above are things we want to take seriously, and most of them aren't in that much conflict. But we have limited development budget, and limited time, and so the question becomes what things are most important to implement soon.

I am currently most worried about the safety aspect of the current LessWrong site, with multiple top authors telling me that they prefer not to publish on LessWrong because they expect the comments to be overly aggressive and needlessly critical. This also appears to be a problem on the internet in general, and I've heard from many people in a variety of fields that they have a lot of trouble having open and honest conversation in online public spaces, because they have a constant sense of being attacked.

I think the recommendations of the safety frame are pretty reasonable. However, I don't really know how a well-working moderation system that encourages positivity and safety would look in practice. Moderators coming in and tone-policing strikes me as dangerous and prone to sparking conflict. Karma hasn't historically worked super well at encouraging a feeling of safety, which, based on my own introspection, seems related to me feeling uncomfortable downvoting something if I think its content is correct, even if its subtext seems to be damaging the conversation.

I also have a sense that the bandwidth allocation frame recommends a few things that will naturally help with a feeling of safety, by increasing the degree to which the top users will talk to one another, as opposed to talking to random strangers on the internet, which I expect to both cause higher-content-quality and higher-safety conversations.

Overall, I don't think I have a great synthesis of all the frames above, and generally feel like all of them guide my thinking about moderation at almost every step.

Discussion prompts:

  • What frames am I missing?
  • Am I wrong about the implications of any of the above frames?
  • Do you have any concrete suggestions for moderation systems we should build?
  • Thoughts on what big-picture trajectory we should aim for with our moderation?