Review

This is a description of a project I’ve been thinking about making in the last few weeks. I’m looking for feedback and collaborators. [BetterDiscourse] name is provisional.

Updated on 9 Dec 2023: Completely overhauled the "Data and training" section, which no longer critically relies on LessWrong as a source of the initial training data and as a testing ground.

Problem

Social media promote tribalism and polarisation. Communities drift towards groupthink even if they were specifically founded to be strongholds for rational discourse and good epistemics.

Comments sections to media articles and YouTube videos are frequently dominated by support of the author (or the pundit guest of the show), finding rational critique there is time consuming, like finding a needle in a haystack.

X’s Community Notes, pol.is, and its development variant viewpoints.xyz are universally louded, but they all use the principle of “finding the most uncontroversial common ground” which is by definition low-information for the discourse participants (because most of them should already be expected to be on board with this common ground).

Big Audacious Goal aka mission statement: improve the global online discourse. Among humans, anyway, but perhaps AI debate could benefit from the mechanics that I describe below as well.

Solution

Each user has a state-space model (SSMs are on fire right now) that represents their levels of knowledge in this or that fields, beliefs, current interests, ethics, and aesthetics, in order to predict whether the user will find a particular comment insightful/interesting/surprising, clarifying, mind-changing, or reconciling.

To provide feedback to the model, a browser extension adds the corresponding reactions (along with negative counterparts to the above reactions: “Nothing new”, “Unclear/Muddled”, “Disagree”, “Combative/Inflaming”) to the comment on the popular platforms: Reddit, YouTube, X/Twitter. (During a later stage, [BetterDiscourse] may also host comments for media which don’t have comments themselves, such as NYT or YouTube videos with comments disabled, and display them through the same browser extension, like glasp.co is doing.)

Then, when the user opens a media or a comment section on some site, the browser extension simply sorts the comments for them[1].

I think this should very valuable already in this completely local regime, however, things may get even more interesting, and to recapitulate the “collaborative filtration power” of Community Notes, Pol.is, and Viewpoints.xyz, (active) users’ feedbacks are aggregated to bubble up the best comments up for new users, or for users who choose not to vote actively to tune their predictive model well. Furthermore, when users with a similar state-space already voted positively for comments that their models didn’t predict then such comments could be shown earlier to other users in the same state-space cluster, overriding the predictions of their models.

More concretely, [BetterDiscourse] creates two models:

  • The predictive model inputs user’s state, language embedding of the comment (or the original piece of content itself, such as an article), native rating/vote signals for the comment on the host platform, and existing reactions to the comment on [BetterDiscourse]; outputs expected user’s reactions to the content or comment.
  • The transition model inputs user’s state, language embedding of the content, and user’s actual reactions to the content (or lack thereof), and outputs the next user’s state.

Data and training

The user’s state and model inference should, of course, stay local to the user, and the models themselves open sourced.

Self-supervised pre-training

Update 19 Dec 2023: the idea of this pre-trained model has been developed into "SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research".

The training data are the dialogues or comment threads on forums and discussion boards where user IDs are stable across dialogues and the messages are timestamped, so all user’s messages within a particular forum or website can be globally ordered in time.

First, all messages are converted into embeddings with an off-the-shelf language model[2].

The SSM’s predictive model is trained to predict the next message’s embedding in a thread by the previous messages’ embeddings and the state of the author of the next message as the latent variable (in the terminology of JEPA).

The probabilistic state of the author of the comment is updated with a transition model upon every message that the user has posted.

Thus, this architecture should learn to represent user’s beliefs and other personal features in the state[3].

Fine-tuning on X’s Community Notes data

The pre-trained predictive and transition models could be straightforwardly fine-tuned to predict reactions and input reactions data, respectively.

X’s Community Notes data is an open and seems to be suitable for this.

Perhaps the main complication with this dataset is that community notes frequently contain links, and to train a good “text-based” predictive model, these pages behind these links should be crawled and summarised by a powerful LLM.

The reactions in the Community Notes dataset are similar to the reactions suggested in the “Solution” section above, so they will provide a decent training signal.

Economics

Reading comments and diligent voting is time-consuming work. It should be compensated by a token, similar to Brave’s Basic Attention Token.

Reactions to comments to a particular content piece are aggregated and distributed by special types of nodes (let’s call them Story Nodes), which could perhaps itself be open sourced, and spun up for arbitrary communities, including those where the content and comments are not public.

Story Nodes charge the token for accessing the best consensus sorting of comments. It is bought by people who don’t want to spend a lot of time reading through the comments and voting, but want to access comments with the best ordering, saving their time and maximising value from reading. Story Nodes share the tokens with the users who reacted to the comments “well”.

Credit assignment to the users is not trivial here: to prevent abuse of the system with junk voting, it should be based on free energy reduction (FER), i. e., formally, just the difference in the free energy of the Story Node before and after receiving a reaction from the user[4].

Users who intend to make [BetterDiscourse] their income source (perhaps, there should be such users to reach a sufficient voting density on many resources) will try to react to comments that don’t have many reactions yet (to maximise Story Node’s FER from their reaction) and that are expected to still receive a lot of readership in the future: Story Nodes will pay out to the voters based on their expected token revenue projects from the moment of receiving reactions.

Story Nodes are spun up by Story Node operators when they predict the story is going to be popular (e.g., a new video on a popular YouTube channel). When the attention to the story has died out, the Story Node is shut down, thus there are no amortised storage costs.

The real question is whether the cost of all inference happening on all nodes involved in [BetterDiscourse] has dropped below the market value that [BetterDiscourse] delivers which is optimising time and value from reading comments. I’m not sure of that. But even if this is currently (end of 2023) not the case, we should expect this to become so in one or two years, as the cost of inference continues to drop exponentially.

Secondary data uses and cross-sector integrations

The predictive model could directly be used for prioritising content the user doesn’t have time to process in full, such as newsletters, posts on Telegram, Mastodon and RSS feeds. This is not the initial use-case because this would probably require very big and powerful SSM to work well. Such a big SSM couldn’t be trained initially. Also, the impact is limited because the closed platforms such as Facebook, X/Twitter, YouTube, and Reddit won’t permit accessing their feeds programmatically.

Building the whole content recommendation and delivery platform (such as Substack) should be treated as a separate problem. If [BetterDiscourse] gains a lot of users, these platforms will themselves be interested to buy reactions from some [BetterDiscourse] to train their recommendation models such that if the platform later onboards a person who is already a [BetterDiscourse] user, they will receive good recommendations from the very beginning, and then continuously respond to their changing state.

Apart from Story Nodes and content platforms, users could also sell their reactions to market researchers and political researchers, or donate it to social scientists and alignment researchers.

There are also many opportunities for leveraging user’s state as inputs for other personal AI apps, such as networking (dialogue-matching) app, movie or event recommendation app, psychotherapy app, or a general personal assistant.

Spam and manipulation

There are a few strategies to combat spam and opinion manipulation on [BetterDiscourse].

First, to rule out bots, some Story Nodes may permit receiving reactions only from users with some kind of proof of humanness identity, such as Worldcoin.

However, this is not necessary in principle, and I’m curious what will happen with the discourse if AI agents can participate as voters, apart from depriving real human users of all opportunity to earn by using the system. We can speculate how this could lead to the emergence of “AI opinion leaders”, or be connected with debate, but this is outside of the scope of this post.

If voting is human-only, people could try to game [BetterDiscourse] for earning more tokens. If that happens to be the case, some standard anti-fraud data analysis can be used, and financed by Story Node operators.

Other risks

The main risk that I see in the whole project is the product risk: that is, people won’t want to prioritise informationally best comments, and that their main motivation for reading comments is confirming their pre-existing worldviews. This is sort of what is customary to expect, but leaning into my optimism bias, I should plan as if this is not the case. (Otherwise, aren’t we all doomed, anyway?)

If it proves that letting AI to react to comments leads to bad results or rampant spam and manipulation (battling which requires so much compute that it makes the whole system uneconomical), gatekeeping reactions aggregation on proof-of-humanness will limit the initial reach of the system a lot, i.e., only to the people who are already have a proof-of-humanness ID, which is well less than 1% of people today.


If you are interested in building [BetterDiscourse], please leave a comment, or send me an e-mail at leventov.ru@gmail.com.

Thanks to Rafael Kaufmann and Connor McCormick (folks from Digital Gaia and my fellows in Gaia Consortium) for many discussions that have led to the development of this project idea.

  1. ^

    YouTube makes this harder by not loading many comments at once.

  2. ^

    This is an optimisation step to optimise our initial training costs and difficulty. Ultimately, the SSM could predict text tokens directly rather than embeddings from another language model.

  3. ^

    Note that simple fine-tuning which is currently the dominant approach to representing personal beliefs in language modelling, is not applicable here because we want to be able to learn something relevant about the user just from a few comments (or, ultimately, from a few reactions given to comments in [BetterDiscourse] deployment). Fine-tuning requires vastly more data to be anywhere effective.

  4. ^

    Even this is not the end of the credit assignment story. For example, the credit that initially went to a good-sounding but misleading comment should be forwarded to “rebuttal” replies. I don’t have a complete vision of credit assignment in [BetterDiscourse] yet.

New Comment
21 comments, sorted by Click to highlight new comments since:

I think this should very valuable already in this completely local regime, however, things may get even more interesting, and to recapitulate the “collaborative filtration power” of Community Notes, Pol.is, and Viewpoints.xyz, (active) users’ feedbacks are aggregated to bubble up the best comments up for new users, or for users who choose not to vote actively to tune their predictive model well. Furthermore, when users with a similar state-space already voted positively for comments that their models didn’t predict then such comments could be shown earlier to other users in the same state-space cluster, overriding the predictions of their models.

 

I used to think this wouldn't reach a critical mass of high-quality active users, but I've started warming up to this idea. Just yesterday I was talking to some friends who basically described how they pack-hunted to debunk right-wing political commentary on Instagram and news site. And these are Brazilian diaspora normies in their 40s, highly educated but not the highly motivated teenage nerd persona that I would normally envision as an active contributor in this kind of thing. So I think if we find a way to help people like this, who already see collaborative moderation as an important public duty, by increasing and making more visible the material impact from their contributions, we can achieve critical mass and at least initially overcome the deluge of noise that characterizes online commentary.

Same. I think the internet discussion system situation is dire enough, since Reddit's recent API changes, and since twitter became quite hostile to free accounts, that right now a critical mass of people would actually be willing to install a browser extension to see a universal comment section.

I agree that there is a huge technical and execution/growth-hacking/product risk in making the flywheel of [BetterDiscourse] self-sustaining.

But I want to emphasize that I think the "good reactions feedback data from production use -> better-trained SSM -> more value to the user even in a completely "local" type of use -> more data" loop is, IMO, more important initially,than the network effect (the latter is supposed to attract users who are not active voters, and revenue from them). In particular, training the original version just on LessWrong data, although it's high-quality, may not produce a good SSM because it will likely overfit on the particular topics, language, and worldviews that appear on LessWrong, rather than, let's say, in political discourse in Portuguese.

I work on similar designs for personalized/subjective curation of annotations and comment sections. Mainly orienting around topic-specific Webs of Trust (networks of endorsements about specific personal qualities).

WoTs are transparent, collaborative, scalable (or they will be when I'm done with them) and fully human-controlled, which I think is potentially really important for buy-in. If an algorithm disappoints someone, they may just give up on it, while if the network of humans who they respect disappoints them, they're more likely to be patient with it, and more importantly, they're more likely to feel like there's something they can do about it and work proactively to improve it.

So that sense of complete transparency and controllability might actually be totally crucial, in which case an algorithmic approach might not get adoption. But if you can be transparent about what the algorithm is optimizing for and give users enough control over that or the algorithm, in theory this distrust or impatience shouldn't be so much of an issue, could go either way. Of course it would be auspicious if human self-assembling structures could outperform aggregate algorithmic predictions but even I wouldn't actually bet on it, I think the algorithmic side of things is important. Maybe a choice between both should be offered.

I'd urge you to design a UX where, rather than soliciting an absolute metric, ratings or likes, instead users submit relative comparisons between comments they have seen. Otherwise I'm fairly sure the system will optimize for the behavior of liking every comment that comes into view, which is an sort of unnatural state of stasis where the system is no longer receiving much information from ratings. It's probably not where we want to end up.

  • The UX that springs to mind for me is showing the titles of comments on the left (or if on mobile, circles representing the comments) and allowing the user to reorder them by dragging as they go, to communicate their ranking. Holding a circle would show a preview of it as a reminder. I'm not sure what you'd do about all the noise you'd get from users frequently being too ambivalent or forgetting to do this, but hey, the same problem exists with likes.

Regarding echo chambers, I'd suggest a feedback with a meaning like "this comment most Advanced my Perspective" (sometimes called "Changed my View"). As long as people sometimes earnestly seek out interesting information (which is a common and natural behavior), they will cross paths with their outgroup often and sometimes learn something from them.

Hope to keep in touch. Right now I'm investigating the prospect of just building a new kind of browser that centers cross-language ocap wasm APIs, treats the DOM as a second class representation format, and presents a new UI API that tries to be a lot more friendly towards third party app extensions, code signing, and annotation stability, but if someone else is going to do it on the traditional web, I'll still try to help.

I'd urge you to design a UX where, rather than soliciting an absolute metric, ratings or likes, instead users submit relative comparisons between comments they have seen. Otherwise I'm fairly sure the system will optimize for the behavior of liking every comment that comes into view, which is an sort of unnatural state of stasis where the system is no longer receiving much information from ratings. It's probably not where we want to end up.

I think comment comparision is too demanding and still most often just doesn't make sense. I don't feel I ever want to compare comments on LessWrong, for instance.

The problem of junk voting is addressed here in the post:

Credit assignment to the users is not trivial here: to prevent abuse of the system with junk voting, it should be based on free energy reduction (FER), i. e., formally, just the difference in the free energy of the Story Node before and after receiving a reaction from the user[4].

The user who always upvotes everything will contribute zero information signal to the Story Node, thus the user will receive zero FER for its contribution.

Right now I'm investigating the prospect of just building a new kind of browser that centers cross-language ocap wasm APIs, treats the DOM as a second class representation format, and presents a new UI API that tries to be a lot more friendly towards third party app extensions, code signing, and annotation stability

I would first focus on finding a good niche and building a real, big WoT. Only then, once the network effect kicks in, you will have a real leverage and real chance to pull people to a new browser, which is extremely hard and should provide a lot of value from the beginning.

Consider starting with Telegram, by creating a bot that doesn't permit posting in a chat unless the user has the minimum trust among the existing members of the chat. The problem of spam in Telegram is huge right now. It also provides organic starting points: selected communities could quickly solve the spam problem among themselves, so you don't need to have the global network effect before kicking in the local network effect.

I don't feel I ever want to compare comments on LessWrong, for instance.

It is the way I vote (looking through as many comments as I can bare to and deciding how I think the ordering could be improved), and I think it's a better way to vote! The usual way has a pretty serious pathology where they'll tend to vote on comments that're already most upvoted, which actually decreases the usefulness of the vote scores (but I suppose that wouldn't apply to a predictor system.)

Likes could be reframed of as a comparison over the comments that the user has looked at, sorting those comments into two buckets, with a dense.. network layer of comparison edges going from each unliked comment to each liked comment. If we consider strong and weak downvotes as feedback instead of the binary like/dislike, that could be treated as a sorting of the comments that the user has seen into five buckets, though it's arguable that the unvoted bucket should be treated as an N/A, or "I didn't read or have feelings about this" answer and not counted.

And I guess, now that I think about that, that's a pretty good UX for this. I think having two buckets is too crude, while four might actually be the maximum detail we can expect.
It's kind of funny that lesswrong could implement this system without presenting any visible indication of it. If they did so, I would probably continue complaining about its absence for at least a year.

thus the user will receive zero FER for its contribution.

I haven't been thinking in terms of paid review yet. It seems important! I guess I feel like a platform has to work for users who aren't financially invested in it before they'll be interested in paying for anything.

The problem of spam in Telegram is huge right now

That's true, but is telegram important? if you wanted a more open system for groupchats, why not just use discord? I'd be a bit more interested in solving this for Element (which presumably doesn't have discord's algorithmic moderation system and will be overrun with spam as soon as anyone depends on it. Though, federation also offers a solution (at least outside of the default instances) as it's essentially a two-layer web of trust, or a web of trust between instances.), but I guess due to the project I'm currently considering, I don't feel like any of these platforms are going to be used in the future. They're all woefully inflexible and high-friction, relative to what could be built on a better web.

So, when we have that better web, my current comfiest adoption path would be... I get a small community of creative hackers interested, they have a huge amount of fun with it, they develop loads of features to the point where it becomes seriously useful for organizing and managing an org's data, some organizations start to adopt it, and after it refines and streamlines in response to their insights, it becomes a necessity for operating in the modern world.
I should probably try to think of something better than this, but this is the trajectory I'm on.

That's true, but is telegram important?

Telegram is the dominant social, communication, and media platform in the Russian-speaking part of the internet. I think it is more dominant than Facebook was in the US in its heyday (and you surely heard that for many people, "Facebook meant the internet"). So currently, for many Russian speakers, the internet is basically YouTube for videos + Telegram for everything else.

My understanding (but not sure) is that Telegram is also dominant in Iran and Ethiopia (combined population > 200 million), but I have no idea what is the situation with spam in these sectors of Telegram.

I think Telegram is also huge in Brazil, but not dominant.

if you wanted a more open system for groupchats, why not just use discord?

This is a rhetorical question. I just tell you where a lot of people are right now, and where LLM-enabled spam is a huge problem right now. I think these are the conditions that you should be looking for if you want to test Web of Trust mechanisms at scale. But, of course, you might make a normative decision not try help Telegram to grow even bigger because you are not satisfied with its level of openness and decentralisation. Though, I want to note Telegram is more open than any other major messaging platform: its content API is open, anyone can create alternative clients.

But, of course, you might make a normative decision not try help Telegram to grow even bigger because you are not satisfied with its level of openness and decentralisation

It is likely. I don't want to extend the reign of systems that aren't deeply upgradeable/accountable/extensible.

And it's not even as simple as proprietary vs open source, an open source project can be hostile to contributions, or lack processes for facilitating mass transitions in standards of use.

The usual way has a pretty serious pathology where they'll tend to vote on comments that're already most upvoted, which actually decreases the usefulness of the vote scores (but I suppose that wouldn't apply to a predictor system.)

This is specifically one of the problems [BetterDiscourse] is conceived to address. Like, there are many "basically reasonable" positions/comments that I am happy to promote through an upvote (and most people vote this way, too), but is a low information content for me because it's already my position, or close to my position. With separate upvote/downvote and insightful/not reactions, I can switch between looking at the most popular positions among the crowd (and Pol.is, Viewpoints.xyz, and Community Notes further remove political bias from this signal, thus prioritising the "greatest common denominator" position), and the comments that are most likely to have the greatest informational value for me personally.

And to make it clear, the claim that such "informational value first" comment ordering model is realistically trainable on user's reactions to comments on different topics, and quickly, i.e., only on a few or a few dozen reactions from the user, is currently a hypothesis. I'm not sure there are good ways to test this hypothesis short of just trying to train such a model and see whether a large portion of people will find it useful.

In the beginning of the "Solution" section, I wrote that in principle, the information value of the comment should be in part predictable from "user's levels of knowledge in this or that fields, beliefs, current interests, ethics, and aesthetics", but there is a big question mark whether this information could be easily inferred from user's reactions to other comments, or assessed for a comment in isolation when the prediction model is applied to it.

there is a big question mark whether this information could be easily inferred from user's reactions to other comments

Right... I think it can't, recognizing that is equivalent to being able to recognize surprising truth, it's kind of AGI-complete.
There are not so many top experts in any particular niche, and as soon as any are identified, there comes to be a huge bulk of users who will imitate them, so actual experts wont be an obviously important category to the recommender engine and it might not be able to tell them apart from their crowd.

For that we may depend on more explicit systems like webs of trust for expert recommendations. Users have to apply their own intelligence to identify the real (probable) experts, explicitly communicate that recognition, and they have to see that the experts have endorsed the comment being shown to them.
We follow experts because their taste differs from ours, because their recommendations are not intuitive to us.

I should ask, is free energy reduction something we actually know how to train? I can see a way of measuring it, but it's not economically feasible.

Thanks for sharing your work. Some comments on that:

Why Tastweb hasn't been made already
I think it's mostly the fact that querying very large trust graphs is slow.

Oh I highly doubt that the reason was the technical problem :) If anything, we should interpret it as there wasn't strong enough demand for solving that technical problem, as of yet. Perhaps now there will be, due to LLM-enabled spam. Telegram is drowning in spam in the last half a year, Twitter, YouTube struggle, too.

Are you in contact with https://subconscious.network/ developers? They may benefit from the algorithms that you develop.

Of course it would be auspicious if human self-assembling structures could outperform aggregate algorithmic predictions but even I wouldn't actually bet on it, I think the algorithmic side of things is important. Maybe a choice between both should be offered.

I see these systems as more complementary: Webs of Trust for moderation, filtering, and user gating (perhaps, as the key piece of decentralised content delivery platforms/networks), and algorithms for content ordering that has already passed the filter. In fact, WoT is one of the ways to do proof of personhood, and I recognise that it might be a critical foundation for [BetterDiscourse], while centralised proof-of-humanness systems such as WorldCoin may have too slow adoption.

Are you in contact with https://subconscious.network/ developers? They may benefit from the algorithms that you develop.

No but I'm aware of them, what they're doing sounds pretty cool, and yeah, it is the kind of moderation system that you need for bootstrapping big collaborative wikis.

[checks in on what they're doing] ... yeah this sounds like a good protocol, maybe the best. I should take a closer look at this. My project might be convergent with theirs. Maybe I should try to connect with them. Darn, I think what happened was I got them confused with Fission (who do a hosting system for wasm in IPFS, develop UCAN. Subconscious uses these things, and has the exact same colors in its logo), so I've been hanging out with Fission instead xD.

I see these systems as more complementary: Webs of Trust for moderation, filtering, and user gating (perhaps, as the key piece of decentralised content delivery platforms/networks), and algorithms for content ordering that has already passed the filter.

I was thinking the same thing. I ended up not mentioning it because it's not immediately clear to me how users would police the introduction of non-human participants, in an algorithmic context, since users are interacting less directly; if someone starts misbehaving (IE, upvoting scam ads), it's a hard for their endorsers to debug that.
Do you know how you'd approach this?

Additionally, the tasteweb work is about making WoTs usable for subjective moderation, it seems to me that you actually need WoTs just to answer an objective question of who's human or not (which you use to figure out which users to focus your training resources on), and then your algorithmic system does the subjective parts of moderation. Is that correct? In that case, it might make sense for you to use existing old fashioned O(n^2) energy propagation algorithms, you could talk to alignment ecosystem's "eigenkarma network" people about that. Algorithm discussed here.
Or, I note, you could instead use multi-origin dijkstra (O(n)) (or the min dijkstra from any of the known humans), to update metrics of who's close to the network of a few confirmed human-controlled accounts.
For some reason I seem to be the only one who's noticed that distance is an adequate metric of trust that's also much easier to compute than the prior approaches. I think maybe everyone else is looking for guidance from the prior art, even though there is very little of it and it obviously doesn't scale (I'm pretty sure you could get that stuff to run on a minute-long cycle for 1M users, but 10M might be too much, and it's never getting to a billion.)

Update, checked out the subconscious protocol. It's just okay. Doesn't have finality. I'm fairly sure something better will come along.

I'm kind of planning on not committing to a distributed state protocol at first. Maybe centralizing it at first while keeping all the code abstracted so it'll be easy to switch later.

Edit: Might use it anyway though. It is okay, and it makes it especially easy to guarantee that it will be possible for users to switch to something more robust later. It has finality as long as you trust one of the relays (us).

(SSMs are on fire right now)

Maybe just point to the relevant paper? https://arxiv.org/abs/2312.00752

people won’t want to prioritise informationally best comments, and that their main motivation for reading comments is confirming their pre-existing worldviews. This is sort of what is customary to expect, but leaning into my optimism bias, I should plan as if this is not the case. (Otherwise, aren’t we all doomed, anyway?)

 

There are countermoves to this. Preferences and behaviors are malleable. There can be incentives for adopting BetterDiscourse (potentially through public good funding), peer pressure, etc.

Haven't read your entire post yet but agree broadly with the idea. Unsure of your methodology but I think knowledge has to be built from the ground-up. Lack of understanding leads to frustration. Upvote systems encourage that difficult concepts must not simply be described but also taught/explained thoroughly rather than just 'pointed at'.

For example, I can understand on some level if someone tries to explain to me why object oriented design patterns in programming are inferior to procedural, but if I've never made programs with either methodology, I will only understand the broadest strokes, none of the examples given or reasoning will really resonate with me.

On average, when describing any concept, a certain number of people will have the necessary 'base understanding' to grok it based on the explanation, and an additional number of people will need significantly more explanation to understand.

I think on one side of the extreme, you have an explanation from someone with an extremely autistic brain, going into far more detail than one might need, assuming the listener is lacking all relevant information.

On the other side, you have the schizophrenic or manic brained explanation, which describes things completely intuitively, assuming that the listener understands all of the unspoken elements without needing them to be explained. Most people would think that it just sounds like complete gibberish.

I think the perfect middle ground is the 'highly esteemed teacher-brained explanation', someone who describes things both basically and intuitively in perfect amounts, so the widest audience is capable of understanding even some amount of the concept. Imagine the best teacher you've ever had in college, whoever was able to really convey difficult concepts in a way you immediately understood on a fundamental level, allowing you to then develop more complex understanding. I think upvote based systems, at their best, encourage this sort of information.

I think at their WORST, upvote systems discourage valuable discourse that requires an understanding of the subject matter so that you can intuitively grok a difficult, novel piece of information.

This then causes the content to trend towards being easily comprehensible but lower overall quality, novelty and complexity. This is often referred to as speaking to the 'lowest common denominator' when referred to derisively. This is the 'endless summer' of internet communities. The larger and less specified a demographic is, the less unique, interesting, and high quality it becomes, as the content valued by the average user is different than the content valued by the informed, experienced, insular user.

If your system intends to solve these problems, I support it strongly. I think that a website/app can support a large community without also being lowered in quality. I think the endless summer effect is not an inevitability of all systems of this type, but a symptom of describing the 'most valuable information' as the 'most upvoted or engaged-with information' which is frequently not the case! I mean, that's clearly evident to anyone who's used Reddit.

The Story Node design that I suggested permits spinning up such nodes independently, with gated input from the users. The result could be effectively the same as Lightcone attempted with LessWrong and Alignment Forum separation, just with less gatekeeping: Alignment Forum gates not only votes but also posts and comments, the latter seems particularly unreasonable and elitist to me: as if, comments on LessWrong are not worthy of attention. This, I suspect, leads to that even if somebody still uses Alignment Forum frontpage as the entry point to AI safety discussion, which I myself long abandoned in favour of LW, reading comments on Alignment Forum strictly doesn't make sense to anyone. The design that I propose, on the other hand, separates the concerns of "who can post the content" and the usefulness/signal calculation of the content.

And this system is more general: apart from prioritising "high quality AI safety content", more than one person wrote here that there is too much AI safety and x-risk content for them. Filtering specific tags is brittle because people apply tags rather inconsistently to their writing. So, another Story Node could be founded by these people who want to read and upvote primarily "good old rationality content" and not AI safety content.

This is sort of what is customary to expect, but leaning into my optimism bias, I should plan as if this is not the case. (Otherwise, aren’t we all doomed, anyway?)

In your opinion, what are the odds that your tool would make polarization worse? (What’s wrong with keep looking for better plans?)

0% that the tool itself will make the situation with the current comment ordering and discourse on platforms such as Twitter, Facebook, YouTube worse. It will be obvious and consistent across applications whether the tool prioritises thought-provoking, insightful, and reconciling, or bias-confirmatory, groupthink-ey, and combative comments.

For example, the tool could rank the comments by the decreasing value of Expectation[user reacts "Insightful"] * Expectation[user reacts "Changed my mind"]. If the model is trained on anything than the dataset where users deliberately coordinated to abuse the "Insightful" reaction to completely reverse its semantics (i.e., they always voted "Insightful" as if it was "Combative", and vice versa), then either the ranking will not be much better than the status quo, or it will be better. Let alone if the model is trained on the LW data, which is high quality (though there are concerns whether an SSM trained on the LW reactions data can generalise beyond LW, as I noted in this comment, the worst case risk here is again uselessness, not harm).

Two caveats:

  • You can imagine there is a "dual-use technology risk" of sorts, namely that if such an SSM proves to be trainable and gives good comment ordering, someone will give it a Waluigi spin: put out a version of the tool, an ultimate "filter bubble that works across all websites" that leverages the same SSM to prioritise the most bias-confirmatory and groupthink-ey comments. Then, the cynic projection is that people will actually flock to using that tool in large numbers, therefore accelerating polarisation.
    • I think the risk of this is not completely negligible, but it's a small fraction of the risk that people just won't use [BetterDiscourse] because they are mostly interested in confirming their pre-existing beliefs. And again, if the escapism proves to be so rampant, the humanity is doomed through many other AI-enabled paths, such as AI romantic partners.
  • It's also plausible that even the best discourse management in the current social network and discourse topology (hyper-connected, a lot of interactions with people whom you never met in the real life, often not contextualised by a particular physical location or issue but rather about high-level, abstract issues, such as country-level and global policy) will be worse for polarisation, than some discourse management in a very different community topology, namely, where communities are very localised. See this Kurzgesagt video where this is explained.
    • This doesn't seem relevant because there is just no path back to the "old internet". Also, the country politics should be discussed somewhere apart from the parlament, and the global politics should be discussed somewhere apart from the UN and the international political conferences.

0% that the tool itself will make the situation with the current comment ordering and discourse on platforms such as Twitter, Facebook, YouTube worse.

Thanks for the detailed answer, but I’m more interested in polarization per see than in the value of comment ordering. Indeed we could imagine that your tool feels like it behaves as well as you wanted, but that’s make the memetic world less diverse then more fragile (like monocultures tend to collapse here and then). What’d be your rough range for this larger question?

The system shall indeed create the dynamic of converging on some most reasonable positions (such as that climate change is not a hoax and is man-made, etc.), which you can read as a homogenisation of views, but also naturally keeps itself out of complete balance: when the views are sufficiently homogeneous in a community or the society at large, most of the comments will generally be low-information value to most of the readers, but in such a muted environment, any new promising theory or novel perspective will receive more attention than it would in a highly heterogeneous belief landscape. Which creates the incentive for creating such new theories or perspectives.

Thus, the discourse and the belief landscape as a whole should equilibrate themselves at some "not too homogeneous, not too heterogeneous" level.