the subreddit size threshold

=culture =institutions

Nobody goes there anymore. It's too crowded.

— Yogi Berra

In the early days of the internet, people on Usenet complained about the influx of new users from AOL making it worse. I always thought the evolution of online communities with growth was an interesting and important topic. Do they really get worse with size? According to who? Why would that happen? What can be done about it?

Today, Reddit has over 1 billion monthly active users. It's divided into smaller communities called subreddits, all using the same software. This provides an unprecedented amount of data on the dynamics of online communities.

I haven't done a systematic study of every subreddit, but sometimes I read things on Reddit myself. I mainly do that by using a browser shortcut to see the weekly top posts of a particular subreddit, using the old site version. In doing that, I've gotten a decent idea of how particular subreddits differ, and I've noticed that very large subreddits tend to have lower quality than smaller ones. I'm not the only one; this has been widely noted.

Naively, one might expect that the week's best posts from a larger group of people would be better, and that does seem to be the case up to a point - and then the trend reverses. At 100k users, the derivative of quality vs size is clearly negative. That raises the obvious question: why? Why would large subreddits be worse? Here are the possible reasons I've thought of.

reasons for decline

selection bias

Maybe I'm selecting high-quality subreddits to read, and there are more small subreddits, so some of them will randomly be better. I certainly do select what subreddits I look at, but I don't think that's the reason here, because:

- I've seen changes in quality over time as subreddits grow.
- The variation seems mostly consistent across different ways of selecting subreddits to read.

memes

A common thing that relatively high-quality larger subreddits do is remove meme posts, which are mostly popular images with a few words added on them.

I think the problem with those meme posts is that time spent on posts varies but every upvote is worth the same. Most people who see posts don't even vote on them, and there's some fraction of people who will see a meme, look at it for 2 seconds, upvote, and move on. That upvote is worth the same as an upvote from someone who spent 10 minutes reading an insightful essay.

A similar problem happens with titles that confirm people's preconceptions. For example, if someone really hates Trump, and sees a title that implies "this shows Trump is bad", they might upvote without actually looking at the linked post.

There have been a few attempts at mitigating this by making vote strength variable. Some sites have "claps" instead of "likes", which can be clicked multiple times. There are sites like LessWrong where users can make stronger votes by pressing the vote for a couple seconds. The problem I have with such systems is, while individual votes more accurately represent the voter's opinion, the result is a worse average of overall user views. For example, there might be a thread of 2 people arguing, and then 1 person strong-downvotes every post of the other person to make their argument look relatively better, and then the other person gets mad and does the same, and then those strong votes can outweigh votes from other people.

new post visibility

When you make a new post on a smaller subreddit, it goes directly to the front page, where ordinary users see it and vote on it. On a larger subreddit, new posts are only visible on a special "new" page, which only a small fraction of users visit.

One uncommon thing TikTok did was showing new videos from creators with few followers to a hundred or so people. Videos that got some likes would then be shown to more people. The result was a million high school girls recording their dancing to popular songs and a very successful social media platform.

power users

When people upvote your posts on Reddit, a number called "karma" goes up. It doesn't have any actual uses or benefit, but some people like making points go up, so you see "karma whores" who try to optimize their point gain. And the bigger a subreddit is, the more points you can get from posting in it.

Part of the reason small subreddits aren't dominated by memes is that there aren't enough people posting adequate memes to dominate the front page. But the long-term result of posters optimizing for karma can be decline, at least in some ways.

shills

Some people just like making a number go up, but there are also people actually getting paid for their posts. Usually, they're trying to sell a product or push some political agenda. For example, a large % of the votes and top comments on /r/politics are from paid shills and bots they run.

conversations & relationships

In a very small town, people meet the other residents periodically, often enough that their random encounters become connected, with conversations becoming worthwhile and relationships developing. In a very large city, when you meet someone, you'll probably never see them again, so unless you're going to establish a relationship in a few seconds somehow, why bother talking to them?

On streaming sites like Twitch, if there's a stream with 20 viewers, you can have a conversation with the streamer, but with 1000 viewers, you can't. Perhaps there's an analogous effect with online communities.

mitigation

Supposing the above is accurate, what could be done to mitigate those problems?

I already mentioned a few possible mitigations: mods deleting low-effort memes, alternate voting systems, and showing new content to a random sample of people instead of putting it in a separate section. But more generally, Reddit itself can be considered a single large community that mitigates the problems from its size by splitting people into sub-communities.

Taking that idea of splitting up communities to its logical conclusion, what if we just split a community when it hits 100k users? There are a few logical ways to do that. It could be secretly split, with people assigned to a cohort and only seeing the top posts from other cohorts, but not seeing any obvious change. It could be explicitly split, forked into 2 communities with different names that users are divided across; perhaps one of those communities would end up being higher-quality than the other and absorbing most of the users, but that should improve the quality of culture vs not having split. The users could be distributed randomly, or something like "multidimensional scaling" could be used to group similar people together.

As I said, I don't think the attempts at variable-strength voting systems have worked well so far, but I think the basic concept could be done better. Perhaps votes could be weighted by something like 1 / (1 + ln(recent_vote_count / karma)) with exponential decay over time for vote count and very slow decay for karma. I'm not sure what the best approach is; I just think there's room for improvement.