A Better Web is Coming

matto

(Cross-posted from my blog)

It bums me out when I see how much of the Web is trolling, ads, and endless re-shares of yesteryear's memes. The cynic in me thinks that this is it, this is as good as it gets–we're stuck and cursed to never realize the full potential of this amazing, planet-spanning network we've built. But then, I remind myself that what I see is a coordination problem. And we, humans, have been pretty decent at solving these for thousands of years. Like the time when we went from living in caves to cities (with lights and flush toilets!). Or when we built up from tiny semi-barbarian princedoms to vast organizations like the East India Company. So there's hope.

How would a better Web look like though? I got a small hint when I read a paper about Wust. Wust is an experimental online discussion system that sort of reminds me of knowledge organization tools such as Obsidian.md or Roam Research–however, instead of being tailored to individuals or small groups like those two, it is designed for large groups of strangers. To accomplish that, it includes features that specifically aim to improve discourse quality. I found that fascinating because if it could do that, it could potentially raise the Web's sanity waterline by a few inches.

I've never seen anything like it before. The more I think about it, the more convinced I am that we will see more Wust-like systems pop up and stem the tide of low-quality content flooding the Web today. To explain why I think so, let me tell you about a discussion system called Wust.

Defining the Problem

Wust's creator traces low quality discourse to two problems: ineffective moderation and the duplication of content.

Moderation is hard. Early on, most of it was done by small groups of volunteers. They enforced norms by bans, warnings, and locking or deleting offensive content. But this approach had problems. First, it was difficult to scale because finding trustworthy, dependable people in a sea of strangers that wanted to volunteer took time. Second, as in other hierarchical systems, some moderators became corrupt and abused their privileges. It all got a little better after Web 2.0 arrived and augmented online communities with simple voting systems. That alleviated both problems by giving users more say in how their group was organized. However, a new problem appeared: making popularity synonymous with quality, which resulted in dank memes often attracting most of the attention and basically burying quality content like SSC posts.

On the other hand, duplication is a permanent tax on a community's creative energy. Let me illustrate what I mean: Alice makes a post asking about the dangers of COVID-19. The post generates some useful insights, but as the discussion dies down, it gets bumped off the front page. People lose track of it. Later, Bob comes around and posts the same question. If he's lucky someone will link to Alice's post. But if not, he and the others will likely cover the same ground. Repeat this a few time and all you have is a handful of similar posts and a lot of wasted motion.

Wust addresses these problems in two ways. First, it changes the structure of the discussion from a tree to a graph and makes it easy to navigate through a system of tags. Second, it gives users a voting system that encourages participation, effectively moving most moderation responsibility to them.

The Structure of Wust

Today, most online discussion is organized into tree-like structures. A site's entry-point commonly presents users with a list of topics. If you were to click on one, you find not only comments about the topic, but also comments replying to the comments. It's trees all the way down.

This is suboptimal because:

By emphasizing recency and popularity over quality, even high quality posts will slowly disappear out of sight.
Most of these trees are unbalanced–some topics or comments will generate the majority of discussion. So, when attempting to engage with a popular topic, writing a new top level comment will likely lead to it getting ignored, which encourages users to create more new posts even if they're duplicates.
This may sound counter-intuitive but trees are difficult to navigate. As they grow, it takes more and more time to explore all the branches and sift through irrelevant information to find what you're seeking. For moderators, it becomes harder and harder to find and remove content that breaks the rules. Thus, users are further encouraged to create new posts.

In other words, trees implicitly lead toward duplicating content. One can, of course, resist this, but then their content will get less attention and less engagement.

To overcome these issues, Wust structures discussion as a graph. Doing so divorces content from its creation time, allowing the system to promote quality. (I describe how it does this later). However, being less structured, graphs are even more difficult to navigate than trees. Wust's answer to this is a system composed of two types of tags.

First, there are context tags. One or more can be attached to each piece of content, describing what is about, eg. "mathematics", "music", "AI risk", etc. In other words, they categorize content by theme, making it easier to search. Second, there are classification tags, which describe how a piece relates to its context(s) or to other content. So, for example, a post titled "What would AGI look like 10 years before it manifests?" would include context tags like "AI", "Existential Risk." Its classification tags would indicate that it's a "Question" related to another post (eg. "Preparing for AGI").

Wust keeps tag pollution under control by using a handful of mechanisms:

Only system administrators can create context tags.
Similar classification tags are combined into one. For example, "Canine" and "Dog" point at the same content.
Classification tags support multiple inheritance, so someone searching for "Animal" will also find content tagged with "Dog."

Consider how all of these features help limit duplication–if users can find existing topics and contribute to them, they will create fewer new posts; If a single post can exist in multiple contexts, users don't have to re- or cross-post it elsewhere.

Voting in Wust

So far, Wust resembles graph-based content organization software. But thanks to its voting system, it allows large groups to use to literally build knowledge. Imagine Roam Research, except scaled to thousands of users. Or Wikipedia, but less about being an encyclopedia and more about being a group-mind interested in specific problem domains (like rationality, body-building, writing, etc.). How does that even work? Let's start with the basics.

Any Wust user can edit any piece of content. To do that, they submit a change request, which can contain anything from grammar fixes, additional information, to new context or classification tags. Others can vote whether to approve or rejected the proposed changes. This should channel users' effort toward improving existing posts rather than creating new ones. To further encourage this, Wust feature a karma system.

Users gain karma when their change requests are accepted and lose it when they are rejected (or reverted). With enough karma, users can have their requests bypass the voting process and get applied instantly. When that happens, however, a change request is automatically created to revert the changes, giving the community a chance to weight in nonetheless. What's more, a user's karma gives their votes more weight, but only in specific contexts. So if someone contributed a lot to "mathematics", they have more say in shaping that context.

Unlike in Reddit or Facebook, Wust posts do not have a score. Instead, users vote on how relevant a post is to its contexts. The resulting number determines a few things:

How many votes are required to change the post. More relevant and more valuable posts will need more votes to alter.
How much karma a user need to have their change applied instantly. Again, more valuable posts will be harder to alter.
How much karma a user will gain or lose if their change request is accepted or rejected respectively.

Additionally, to alleviate the popularity-quality problem, Wust treats page views as downvotes. The reasoning here is that most users are far less likely to downvote content they dislike, usually opting to just close the tab. Stated another way, bad content doesn't get enough downvotes, so Wust remedies that by treating each page view as a partial downvote. (There are mechanisms in place to prevent abuse of this).

Putting all this together, we get an adaptive discussion system with a more decentralized way of governing itself, which directs users' behavior toward benefiting the whole group. Let's unpack this.

Because of the karma/voting system, Wust users basically do not own the content they produce–everything they create can be edited by everyone else. This is kind of like a shared garden of sorts.

And by shifting moderation responsibilities from the few to the many, essentially giving users the power to reward and punish, Wust allows a larger portion of users to shape community norms. The price for this, however, is the there will be fewer new posts and edits will have to go through a voting process. Everything will be a little slower in a similar way how democratic governments are a little slower than their centralized counterparts. If the comparison really holds true, we would expect Wust-based communities to also make fewer catastrophic mistakes, like authoritarian governments do.

Additionally, by moving away from the popular "one user, one vote" paradigm, Wust allows those whose contributions are most valued by others to wield more power. In a way, this mimics real-life informal institutions, where people who have earned trust and standing within a community have more say in decisions about it. I'm thinking of respected shop keepers, teachers, and other "beacons of the community." And, as in real life, Wust's community could reduce a misbehaving "superuser's" karma until they lost their privileges.

Finally, it's worth noting how Wust discourages classic anti-social behavior patterns popular in today's system. For one, most current systems expose users' vanity metrics like post or upvote counts, which is often taken as a proxy for their standing in the community. But users Goodhart this by creating as much content as possible in the hopes that at least some of it will win the karma lottery. In other words, these systems encourage users to increase their status by inflicting low-quality posts on everybody else.

The Near Future of the Web

The Wust paper was published in 2015. Back then, most people were only beginning to understand the direction the Web was heading. Search engine results were becoming stuffed with SEO-friendly crap. Social media users were discovering how easy "engaged and connected" turned into "distracted and addicted." And workers were learning that slick tools like slack or google suite created a stream of constant interruptions. But it was all shiney and new and few cared.

Today, I feel there's growing discontent with this state of things, which translates into an unmet need, a need to collaborate efficiently on meaningful work. It doesn't matter if it's creating new bodybuilding routines or organizing a group of fanfic writers–the medium of the Web doesn't do much to limit the cost that antisocial users impose on everyone else.

All this means is that there's an increasing number of people looking for better ways to communicate on the Web and Wust marks the general direction things are heading–a more efficient, more bottom-up way of collaborating, with more emphasis on healthy community.

I can't wait to see what's coming.

So Wust is only an idea from a paper, it's not a website you can use now?

I did find this: https://github.com/woost/wust

It looks like an attempt at implementation, but I'm not sure how complete it is. Felix Dietze, the author of the paper, is one of the main contributors.

They also linked to related projects.

Arbital was an attempt at something similar. Not sure where that one's going.

So Wust is only an idea from a paper, it's not a website you can use now?

Yes, as far as I can tell, that github repo is the implementation of Wust created by the author of the paper I describe.

They also linked to related projects.

Wow, thanks for sharing these. I'll spend some time going down this rabbit hole as soon as I get a chance.

I find the context tags on LessWrong useful at times. Links and search bars have some ability to fulfill the function of classification. There is indeed a proliferation of tools to enhance the motivated web user's ability to search for relevant information generally, and I share your enjoyment with them!

That said, I think that the problems of the current version of the web largely stem not from the inaccessibility of these tools, but from the fact that an enormous number of users genuinely demand what is being supplied to them. They want free internet, and they get that through submitting to advertisements. They want memes, and the internet can pretty much inject them directly into their veins at this point. They want a feeling of righteous anger, and that's as available to them as oxygen.

Some people want other things. Finding out better ways to encourage people to exit the cave of shadows is where I think the bottleneck lies. I'm not sure if new technology to support those who've already exited will fix the bad web, but it will make the good web better.

I find the context tags on LessWrong useful at times.

I've found them useless in every iteration. They are extremely inconsistently applied, and those authors who do bother to make an effort often leave them at uselessly large levels of granularity like math or statistics or AI. (Gee, thanks.)

A decent tag or category system needs to be reasonably comprehensive - if not, why even bother, just go straight to Google search - and regularly refined to shrink member count. If there are 1000 members of a category, then it is long past time to break that down into a few sub-categories. When I look at websites whose tags or categories are useful, like Wikipedia or Danbooru or classic folksonomies like Del.icious (RIP), the tagging itself is a major focus of community efforts and it doesn't require the cooperation of the author to update things.

Any WP editor can refine a category into subcategories or add a category to any article, and there are tools to assist by brute force to clean it all up. It's a huge time-sink of human effort, like everything on WP, but it works, dammit! You can meaningfully browse WP categories and have a reasonable expectation of comprehensiveness, and they do a good job of gradually encoding the structure of all the crosscutting domains. I use them fairly often.

I use tags on gwern.net for pages, and I try to systematically add new tags to all relevant pages and refactor them down into reasonably sharp tags. I think they wind up being reasonably useful, but there's also not enough pages on gwern.net for tags to shine. (When you can simply list all the good pages on the index in a few screens by topic, you've covered the Pareto value of tags.)

What I have been considering is extending tags to external links/documents. I have something like 20k external links + hosted documents, and the sheer volume means that tags are potentially highly useful for them. (A link like "Open-Ended Learning Leads to Generally Capable Agents" would benefit a lot from a set of tags like 'blessings-of-scale multi-agent DeepMind deep-reinforcement-learning' which offer an entrance point to the scores of prior art links to contextualize it.) The problem is how to be systematic? My thinking is that this is a case where I can employ the OA GPT-3 API's "classification" endpoint to do the work for me: I don't scale well, but it does. I can initialize the link tags from my existing directory hierarchy, finetune a GPT-3 model to infer "tag" from "annotation" (GPT-3 is smart enough that it'll understand this very well), use that to rank possible tags for all links, accept/reject by hand, and bootstrap. Then adding new tags can be done by re-classifying all links. A lot of details to get right, but if it works, it'll be almost as good as if I'd been building up a tag folksonomy on my links from the getgo.

The current use-case for which tags work is for content discovery, not really for comprehensive tagging. There are some nice thing that comprehensive tagging gets you, but it's just a really big pile of work, even if you build lots of custom tools for it.

The flow that I think currently works pretty well is:

User is interested in a certain topic, and hasn't read 90% of what already exists on LessWrong
User searches in the search bar or goes to the concepts page
User clicks on a tag
The top-relevance rated posts on that tag are indeed pretty good, and the user finds some content that helps them get oriented about the topic. The important thing here is mostly that the best and most relevant content for any category gets tagged, not that all content in that category gets tagged.

We apply a number of core tags comprehensively to all posts (like the AI one you mentioned), because it allows people to do selective filtering for their frontpage feeds, but those are necessarily high-level, because for the granular ones there isn't really enough content to justify a filter adjustment.

You also still get decent folksonomy benefits of being able to show a user the rough ontology of the site, even without having comprehensive tagging.

Overall, I guess... I don't really get why for the use-case of LessWrong, it's necessary for tagging to be comprehensive, in order for it to be useful. From my perspective most value add is pretty incremental, and the key thing is that the best stuff gets tagged, and that each tag has some posts that can give people a good intro.

I'm sure you know that LW tags are broken down into sub-categories. We seem to lack the energy to apply those sub-categories. This post is tagged "world-optimization," but might be best if it was sub-tagged with "mechanism design" and "coordination/cooperation" at the least. It takes some time to look those tags up, consider which is a good fit, and apply it, and there's no reward for doing so. There's an equilibrium issue as well. If few people are applying specific tags, then the tags remain underused for navigating the site, as well as unknown, thus discouraging their further adoption.

That said, signs of any kind, including these tags, can give somebody the idea to embark on a reading expedition that they might not otherwise have conceived. You're one of our shining lights, so perhaps you are normally driven to engage in thoughtfully directed reading projects. I suspect that many just sorta consume whatever happens to be at the top of the posts list, or whatever strikes their fancy in a sub-link. The idea of reviewing the collected LW writings on blackmail may never occur to them, unless they navigate to it even with our sub-par system of tags. They function as a "suggested reading" feature, and that has utility even if it's not nearly comprehensive or specific enough to be of use to an expert reader.

Hope you do execute some of these optimizations on your site, and let us know about your experience putting them into place.

Yes, the community equilibrium is entirely different. On WP editors have little compunction about editing categories; here, I know vaguely that tags can be added (although I didn't know that you could refactor them or remove them), but I wouldn't do so because there's no particular norm to do so. Who am I go to about editing matto's post's tags to break down world-optimization into something more specific?

Tags could be useful, but they aren't now, and so they stay being not useful, and it's unrealistic to expect anyone to single-handedly fix that when there's like 10 posts a day and approaching 12 years of backlog.

A GPT-3 proof-of-concept will certainly be interesting. If it works, it could bootstrap useful tags on larger corpuses like LW. (It might be expensive, but it's only money, and a lot cheaper than the expert LWer time it'd take; and of course, if GPT-3 works well, then perhaps a rival model like GPT-J or T5 or Jurassic would be worth finetuning to cut costs.)

That said, I think that the problems of the current version of the web largely stem not from the inaccessibility of these tools, but from the fact that an enormous number of users genuinely demand what is being supplied to them. They want free internet, and they get that through submitting to advertisements. They want memes, and the internet can pretty much inject them directly into their veins at this point. They want a feeling of righteous anger, and that's as available to them as oxygen.

But do they genuinely demand this? If one were to think along the lines of "revealed preference curves", then the answer would be yes - people spend time injecting themselves with memes because this exactly what they want. However, this reminds of a book review posted on ACX about addictions:

Underlying Schüll’s foil is a fairly common instinct that people have about the difference between substance addictions (to drugs, alcohol, or nicotine) and behavioral addictions (to gambling, eating, or exercising). Most people think that substance addictions are caused by things, but behavioral addictions are caused by people.

In other words, if most people hear a story about a kindly old grandmother who was prescribed opioids for a backache, and became an opioid addict, they blame the opioids for causing her addictions. Without the opioids, she would still be that kindly old grandmother. In contrast, if most people hear a story about a kindly old grandmother who started going to casinos to have fun on slot machines, and became a gambling addict, they blame the grandmother for having a defective character. Even if she hadn’t visited casinos, she would always have had that character defect.

So is the bad web like opioids or like casinos? Or, where is the line where instead of blaming users, we would blame the designers and builders of addicting web sites?

Didn't Stack Overflow (and in general the stackexchange network) already solve both these problems pretty well? If you're just looking for knowledge creation on a particular topic why wouldn't you use their system?

I think Wust-like systems are heading in a different direction.

The stackexchange network is about questions and answers. While a lot can be accomplished in that format, many things don't fit it well. For example, most questions/answers there are under 1000 words (if I were to give a rough guess, I'd say the median is around 100-150 words). This makes it great for accumulating short bits of knowledge, but I highly doubt that these sites can generate new, interesting knowledge. Additionally, the format of these sites is extremely rigid -- I doubt that StackOverflow would even accept questions about topics like AI risk, for example. And the community has little say in this.

Because of this, I imagine Wust-like systems to be more similar to LessWrong. Places where people can post longform pieces, where whole new domains can be cracked open and explored.

I think this design would be good.

I'm working on the same problem of improving discussion and curation systems with Tasteweb. I focus more on making it easier to extend or revoke invitations with transparency and stronger support for forking/subjectivity. I'm hoping that if you make it easy to form and maintain alternative communities, it'll become obvious enough that some of them are much more good faith/respectful/sincerely interested in what others are saying, and that would also pretty much solve deduplication.
I think in reality, it's too much labor, and it would only work for subjects that people really really care about, but those also happen to be the most important applications to build for so.

I like the focus on relevance. Relevance is all you need. If everyone just voted on the basis of relevance, reddit would be a lot better (but of course, the voters are totally unaccountable, so there's no way to get them to).

I don't think graph visualizations are really useful. The data should be graph-shaped, sure, but it's super rare that you want to see the entire graph or browse through the data that way. A tree is just a clean layout for the results of a query from from a particular origin node in a graph. I'd recommend a UI for directed graphs, a tree where things can be mounted to the tree at multiple points, and where it's communicated to the user if they've seen a comment recently before with, eg, red backlinks.

The tagging system reminds me of Archive of Our Own and the tag-wranglers.

It couls take off with some blockchain for buzz, like maybe a distributed ledger system for karma?

It seems like a genuinely collaborative project, where articles are intended to be useful and somewhat more evergreen, would probably end up looking something like Wikipedia or perhaps an open source project.

There needs to be some concept of shared goals, a sense of organization and incompleteness, of at least a rough plan with obvious gaps to be filled in. Furthermore, attempts to fill the gaps need to be welcomed.

Wikipedia had the great advantage of previous examples to follow. People already knew what an encyclopedia was supposed to be.

I suspect that attempts at a “better discussion board” are too generic to inspire anyone. Someone needs to come up with a more specific and more appealing idea of what the thing will look like when it’s built out enough to actually be useful. How will you read it? what would you learn?

Wikipedia revolves around it's concept of consensus. A system where changes are voted upon and there's no need to have discussions to come to a consensus will have substantially different dynamics and not simply be another Wikipedia.

I think people should get paid for making awesome content and it shouldn't be through ads.

I like initiatives like these. But they have a major problem, that at the beginning no users will use it because there's no content, and no content is created because there are no users.

To have a real shot at adoption, you need to either initially populate the new system with content from existing system (here LLMs could help solve compatibility issues), or have some bridge that mirrors (some) activity between these systems.

(There are examples of systems that kicked off from zero, but you need to be lucky or put huge effort in sparking adoption.)

While improvements to moderation are welcome, I suspect it’s even more important to have a common, well-understood goal for the large group of strangers to organize around. For example, Wikipedia did well because the strangers who gathered there already knew what an encyclopedia was.

Tag curation seems a bit like a solution in search of a problem. If we knew what the tags were for, maybe we would be more likely to adopt a tag and try to make a complete collection of things associated with that tag?

Maybe tags (collections of useful articles with something in common) should be created by the researchers who need them? They can be bootstrapped with search. Compare with playlists on YouTube and Spotify.

I would just take issue with how you've defined the problem space: the web is an internet platform that compete with other platforms like iOS, Facebook, etc.

I don't think the problem of bad content is specific to the web. Actually I think that the web is where we are most likely to encounter stuff like LessWrong and Wust.

Yes, I'm being sensitive about this point - I love the web and am sad to see it slowly losing the tech and user battle to the tightly controlled proprietary platforms.

I really like how hashtags work, and I think a big improvement would be to use them in a lot more places. It offers an easy way to underline keywords of posts/comments and make them accessible.

Combine the hashtag concept with the namespace concept, and it gets even better. Could take a while until non-programmers adopt the idea, though.

After all, #ai::timeline and #movie::timeline are very different things.

So Wust is only an idea from a paper, it's not a website you can use now?

I did find this: https://github.com/woost/wust

It looks like an attempt at implementation, but I'm not sure how complete it is. Felix Dietze, the author of the paper, is one of the main contributors.

They also linked to related projects.

Arbital was an attempt at something similar. Not sure where that one's going.

So Wust is only an idea from a paper, it's not a website you can use now?

Yes, as far as I can tell, that github repo is the implementation of Wust created by the author of the paper I describe.

They also linked to related projects.

Wow, thanks for sharing these. I'll spend some time going down this rabbit hole as soon as I get a chance.

I find the context tags on LessWrong useful at times.

The flow that I think currently works pretty well is:

User is interested in a certain topic, and hasn't read 90% of what already exists on LessWrong
User searches in the search bar or goes to the concepts page
User clicks on a tag
The top-relevance rated posts on that tag are indeed pretty good, and the user finds some content that helps them get oriented about the topic. The important thing here is mostly that the best and most relevant content for any category gets tagged, not that all content in that category gets tagged.