Community Notes by X

be verified

Correction: community notes users only need to be phone-verified, not blue-check-verified.

Thank you, it's been fixed.

I wonder what would happen if we run the simple version of that algorithm on LW comments. So that votes would have "polarity", and so each comment would have two vote-counts, let's say orange count and blue count. (Of course that would be only optionally enabled.)

Then we could sort the comments by the minimum of these counts, descending.

(I think it makes more sense to train it per post than globally. But then it would be useful only on very popular posts with lots of comments.)

[-]NicholasKees2y106

That sounds cool! Though I think I'd be more interested using this to first visualize and understand current LW dynamics rather than immediately try to intervene on it by changing how comments are ranked.

[-]antanaclasis2y133

I think a lot of the value that I’d get out of something like that being implemented would be getting an answer to “what is the biggest axis along which LW users vary” according to the algorithm. I am highly unsure about what the axis would even end up being.

[-]Shankar Sivarajan2y10

Would that even be a meaningful question? Thinking of it as a kind of PCA, there will be some axis, with a lot of correlations, and how you interpret that is up to you.

[-]antanaclasis2y64

I’d imagine that once we see the axis it will probably (~70%) have a reasonably clear meaning. Likely not as obvious as the left-right axis on Twitter but probably still interpretable.

[-]localdeity2y102

, $f_{n}$ : Factor vectors for $u$ and $n$ . The dot product of these vectors is intended to describe the “ideological agreement” between a user and a note. These vectors are currently one dimensional, though the algorithm is in principle agnostic to the number of dimensions.

It took me a few minutes to figure out that "one dimensional" appears to mean "the vector contains one number".

[-]NicholasKees2y40

Thanks for pointing that out. I've added some clarification.

[-]ChristianKl2y93

I'm surprised that it's one-dimensional as that should be relatively easy for the game. If the attacker cares about promoting Israeli interests or Chinese interests they can just cast a lot of votes in the other right/left direction on topics they don't care about.

Did they write anywhere why they only consider one-dimension?

[-]NicholasKees2y60

"Note: for now, to avoid overfitting on our very small dataset, we only use 1-dimensional factors. We expect to increase this dimensionality as our dataset size grows significantly."

This was the reason given from the documentation.

[-]ChristianKl2y30

That sounds like it made sense at the beginning but now the data set should be large enough that a higher dimensional approach would be better?

[-]NicholasKees2y42

That sounds right intuitively. One thing worth noting though is that most notes get very few ratings, and most users rate very few notes, so it might be trickier than it sounds. Also if I were them I might worry about some drastic changes in note rankings as a result of switching models. Currently, just as notes can become helpful by reaching a threshold of 0.4, they can lose this status by dropping below 0.39. They may also have to manually pick new thresholds, as well as maybe redesign the algorithm slightly (since it seems that a lot of this algorithm was built via trial and error, rather than clear principles).

[-]Kabir Kumar2y20

Thank you, this is useful. Planning to use this for AI-Plans.

[-]Review Bot2y*10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

^{^}

Particularly in the context of Elon Musk (Owner of X/Twitter) firing most of the existing content moderators.

^{^}

This philosophy of risk aversion appears frequently in many of their design decisions.

^{^}

To be considered helpful, a note also needs to have a factor vector $a b s (f_{n}) < 0.5$ (as a final check against polarization).

^{^}

Full disclosure, sometimes they use a threshold of -0.04 and sometimes a threshold of $- 0.05 - 0.8 * a b s (f_{n})$ , and I don’t totally understand when or why.

^{^}

Though I suppose plausibly the worst disinformation on X at the moment might be mostly political claims.

^{^}

I originally thought this incentivizes people to strategically rate comments in a way that makes them appear more neutral, but it seems a bit unclear. If a user has a strong partisan lean, they actually maximize the weight of those ratings which are opposite of what their ideology would predict, which makes the incentive landscape a bit more complicated.

^{^}

While the docs explicitly mention using the upper bound for certifying "not helpful" notes, I only saw mention of using the lower bound for certifying "helpful" from the Buterin summary. I think this is probably correct, but I'm not totally sure.

^{^}

They do add a safeguard to prevent users from directly copying the group decision by only counting ratings which happened before the group rating is published (48 hours after a note is submitted).

^{^}

Users are weighted by a complicated function which punishes strong ideological disagreement with the note.

^{^}

While memes do convey important information not easily shared via specific and concrete claims, it does make discussing their “accuracy” really messy and hard to do (e.g. from the LW community: this commentary on a Shoggoth meme by @TurnTrout ).

^{^}

Academics clearly can never resist a pun, even if it’s a pun on another pun.

^{^}

They also speculate that partisanship might be a key motivator for becoming a Birdwatch contributor.

LESSWRONG
LW

LESSWRONG
LW

129

Community Notes by X

129

129

How does the ranking algorithm work?

Some further details and comments

Academic commentary

Conclusion