Improving on the Karma System

[-]jimrandomh4y550

We're thinking a lot about this! Probably what's going to happen next is that we're going to implement more than one variation on voting, and pick a few posts where comments will use alternate voting systems. This will be separated out by post, not by user, since having users using a mix of voting systems on the same comments introduces a bunch of problems.

Ideally we can both improve sorting/evaluation, and also use aspects of the voting system itself as a culture-shaping tool to remind people of what sort of comments we're hoping for. Here's a mockup I posted in the LW development slack last week (one idea among multiple, and definitely not the final form):

(In this mockup, which currently is just a fake screenshot and not any real wired-up code, the way it works is you have an overall vote which works like existing karma, but can also pick any subset of the 12 things on the form that appears on hover-over. If you pick one of the postive or negative adjectives, it sets your overall vote to match the valence, but you can then override the overall vote if you want to say things like False/Upvote or True/Downvote.)

I think we're unlikely to settle on anything involving ratings-out-of-five-stars, mainly because a significant subset of users have preexisting associations with five-star scales which would make them overuse the top rating (and misinterpret non-5 ratings from others).

We're also considering making users able to (optionally) make their votes public.

[-]Vladimir_Nesov4y120

A fixed set of tags turns this into multiple-choice questions where all answers are inaccurate, and most answers are irrelevant. Write-in tags could be similar to voting on replies to a comment that evaluate it in some respect. Different people pay attention to different aspects, so the flexibility to vote on multiple aspects at once or differently from overall vote is unnecessary.

[-]jimrandomh4y30

Different people pay attention to different aspects, so the flexibility to vote on multiple aspects at once or differently from overall vote is unnecessary.

There's a limited sense in which this is true - the adjective voting on Slashdot wouldn't benefit from allowing people to pick multiple adjectives, for example. But being able to express a mismatch between overall upvote/downvote and true/false or agree/disagree may be important; part of the goal is to nudge people's votes away from being based on agreement, and towards being based on argument quality.

[-]delton1374y60

For what it's worth - I see value in votes being public by default. It can be very useful to see who upvoted or downvoted your comment. Of course then people will use the upvote feature just to indicate they read a post, but that's OK (we are familiar with that system from Facebook, Twitter, etc).

I'm pretty apathetic about all the other proposals here. Reactions seem to me to be unnecessary distractions. [side note - emojiis are very ambiguous so it's good you put words next to each one to explain what they are supposed to mean]. The way I would interpret reactions would be as a poll of people's system 1 snap judgements. That is arguably useful/interesting information in many contexts but also distracting in other contexts.

[-]ozziegooen4y50

Just want to say; I'm really excited to see this.

I might suggest starting with an "other" list that can be pretty long. With Slack, different subcommunities focus heavily on different emojis for different functional things. Users sometimes figure out neat innovations and those proliferate. So if it's all designed by the LW team, you might be missing out.

That said, I'd imagine 80% of the benefit is just having anything like this, so I'm happy to see that happen.

[-]Liron4y40

Whoa I like the labeled upvotes and downvotes. It's like the emoji reactions feature but for rational discourse.

[-]Raelifin4y30

Ah! This looks good! I'm excited to try it out.

[-]Dagon4y420

I'd like to support the "do nothing" proposal.

Karma is destined to be imperfect - there's no way to motivate any particular use of it when there's no mechanism to limit invalid use, and no actual utility of the points. It's current very simple implementation provides a little bit of benefit in guiding attention to popular posts, and that's enough.

Anything more complex and it will distract people from the content of the site. Or at least, it'll distract and annoy me, and I'd prefer not to add unnecessary complexity to a site I enjoy.

[-]lsusr4y120

Nitpick. Accumulating karma is useful in one respect: high karma users get more automatic karma in our posts, which draws more attention to them.

I agree with the do nothing proposal, by the way. The current system, while imperfect, is simple and effective.

[-]Measure4y40

I am also perfectly fine with the status quo, but there's still value in experimenting and trying to iterate/improve.

[-]aphyer4y290

If I think that this post is interesting and well-written but disagree with it and prefer the current karma system, should I upvote or downvote it?

[-]Measure4y20

You have answered your own question.

[-]SarahNibs4y210

Very well thought out. I think the two biggest things missing from your analysis are:

5-star ratings have become corrupted in the wild. Small-time authors get legitimately angered when a fan rates their work as 4-stars on Amazon, because anything other than 5 stars is very damaging. We don't want to port this behavior/intuition to LW, but by default that's what we'd do. jimrandomh mentions this in their comment. I don't know how to overcome this problem while retaining 5-star ratings.
Users like to "correct" a post/comment's rating. Personally I hate this behavior, but after ranting about it several times over the years I've learned that I don't represent everyone. :D So if they see a comment with average 2-stars, which they think should be 3.5, they will not want to rate it 3.5. Instead they will want to rate it 5.0 to "make up for" the other "wrongheaded" views. Maybe one way to overcome this problem is to allow a lightweight way to say "I think this is 3.5. I want to use my QJR to fix the current rating so I'm voting 5.0. Please increase the chance a mod sees this, double down on my bet, and maybe even change my rating back to 3.5 if the post becomes 3.5?? And also if a mod rates a post maybe this should drastically reduce the effective QJR of people contradicting the mod... or something, if a mod says 3.5 and the weighted avg is still only 2.5 I won't be happy".

[-]Raelifin4y60

I suggested the 5-star interface because it's the most common way of giving things scores on a fixed scale. We could easily use a slider, or a number between 0 and 100 from my perspective. I think we want to err towards intuitive/easy interfaces even if it means porting over some bad intuitions from Amazon or whatever, but I'm not confident on this point.

I toyed with the idea of having a strong-bet option, which lets a user put down a stronger QJR bet than normal, and thus influence the community rating more than they would by default (albeit exposing them to higher risk). I mainly avoided it in the above post because it seemed like unnecessary complexity, although I appreciate the point about people overcompensating in order to have more influence.

One idea that I just had is that instead of having the community rating set by the weighted mean, perhaps it should be the weighted median. The effect of this would be such that voting 5-stars on a 2-star post would have exactly the same amount of sway as voting 3.5, right up until the 3.5 line is crossed. I really like this idea, and will edit the post body to mention it. Thanks!

[-]aphyer4y40

Another issue I'd highlight is one of complexity. When I consider how much math is involved:

This post involves Gaussians, logarithms, weighted means, integration, and probably a few other things I missed.

The current karma system uses...addition? Sometimes subtraction?

One of these things is much more transparent to new users.

[-]SarahNibs4y100

I am a huge fan of tiered-complexity views on complex underlying systems. The description to new users would be:

Ratings are a magic median-like combination of how users rated a post. Click through for more details...
Displayed ratings are the median of how users have rated the post/comment. Smoothed. Weighted by how LessWrongy the rater has been. Your own rating will have more effect when your historical ratings are good predictions of how trusted moderators end up rating. Click through for more details...
Sometimes mods will rate posts/comments, after careful reflection of how they want LessWrong in general to rate. When they do, everyone who previously rated will be awarded additional weight to their future votes if their ratings were similar to what the mod decided, or penalized with less future vote weight if their ratings were pretty far off. That's how the weights are determined when aggregating people's votes on comments. Of course, it's more complicated than that. Folks were grandfathered in. New folks [behavior]. Mods who are regularly different than other mods and high-weight voters trigger investigation into whether they should be mods anymore, or whether everyone is getting something wrong. Multiple mod votes are a thing, as is voting similar to high-weight voters (?? maybe ?? is it ??), as is promoting high-weight voters to mods, as is etc etc. Click through for more details, including math...
[treatise]
[link to the documented code]

[-]Shmi4y150

My gut feeling is that attracting more attention to a metric, no matter how good, will inevitably Goodhart it. The current karma system lives happily in the background, and people have not attempted to game it much since the days of Eugine_Nier. I am not sure what problem you are trying to solve, and whether your cure will not be worse than the disease.

[-]alkexr4y100

My gut feeling is that attracting more attention to a metric, no matter how good, will inevitably Goodhart it.

That is a good gut feeling to have, and Goodhart certainly does need to be invoked in the discussion. But the proposal is about using a different metric with a (perhaps) higher level of attention directed towards it, not just directing more attention to the same metric. Different metrics create different incentive landscapes to optimizers (LessWrongers, in this case), and not all incentive landscapes are equal relative to the goal of a Good LessWrong Community (whatever that means).

I am not sure what problem you are trying to solve, and whether your cure will not be worse than the disease.

This last sentence comes across as particularly low-effort, given that the post lists 10 dimensions along which it claims karma has problems, and then evaluates the proposed system relative to karma along those same dimensions.

[-]Yoav Ravid4y20

I don't think the problem is that people try to game it, but that it's flawed (in the many ways the post describes) even when people try to be honest.

[-]Elizabeth4y150

Some people are better at voting thanks to knowledge/wisdom/etc; which an egalitarian system suppresses.

Double checking that you are aware that voting is weighted, with higher karma users having the option to give much stronger votes?

[-]Bezzi4y90

Just for convenience, I think the relevant piece of code is this.

Also, from what I read in that file, even normal votes are weighted. A regular upvote/downvote counts double if the user has at least 1000 karma (right?).

[-]Yoav Ravid4y40

Yes, that's right.

[-]Raelifin4y50

Yep. I'm aware of that. Our karma system is better in that regard, and I should have mentioned that.

[-]tivelen4y30

I appreciate the benefits of the karma system as a whole (sorting, hiding, and recommending comments based on perceived quality, as voted on by users and weighted by their own karma), but what are the benefits of specifically having the exact karma of comments be visible to anyone who reads them?

Some people in this thread have mentioned that they like that karma chugs along in the background: would it be even better if it were completely in the background, and stopped being an "Internet points" sort of thing like on all other social media? We are not immune to the effects of such things on rational thinking.

Sometimes in a discussion in comments, one party will be getting low karma on their posts, and the other high karma, and once you notice that you'll be subject to increased bias when reading the comments. Unless we're explicitly trying to bias ourselves towards posts others have upvoted, this seems to be operating against rationality.

Comments seem far more useful in helping writers make good posts. The "score" aspect of karma adds distracting social signaling, beyond what is necessary to keep posts prioritized properly. If I got X karma instead of Y karma for a post, it would tell me nothing about what I got right or wrong, and therefore wouldn't help me make better posts in the future. It would only make me compare myself to everyone else and let my biases construct reasoning for the different scores.

A sort of "Popular Comment" badge could still automatically be applied to high-karma comments, if indicating that is considered valuable, but I'm not sure that it would be.

TL;DR: Hiding the explicit karma totals of comments would keep all the benefits of karma for the health of the site, reduce cognitive load on readers and writers, and reduce the impact of groupthink, with no apparent downsides. Are there any benefits to seeing such totals that I've overlooked?

[-]Raelifin4y20

I agree that there are benefits to hiding karma, but it seems like there are two major costs. The first is in reducing transparency; I claim that people like knowing why something is selected for them, and if karma becomes invisible the information becomes hidden in a way that people won’t like. (One could argue it should be hidden despite people’s desires, but that seems less obvious.) The other major reason is one cited by Habryka: creating common knowledge. Visible Karma scores help people gain a shared understanding of what’s valued across the site. Rankings aren’t sufficient for this, because they can’t distinguish relative quality from absolute quality (eg I’m much more likely to read a post with 200 karma, even if it’s ranked lower due to staleness than one that has 50).

[-]ExtrArro2y20

how do I get 1 Karma as a new user?

[-]Filipe Marchesini4y20

I would like to see different voting systems on different posts, so we could try them out and report back on how each one allows us to express what we think about those posts.

I don't like a single 5-star rating system, but I would love multiple 5-stars in different categories. For example, we could choose to give 5 stars for each of these categories you pointed out: clarity, interestingness, validity/correctness, informativeness, friendliness.

If we had a single 5-star rating system, and suppose I read a post that was completely clear and interesting, but with a wrong conclusion, I just don't know how many stars I would give it. But if I could give 5 stars for clarity and interest, I might give 0~3 stars for correctness (depending on how wrong it was).

Suppose there's a post with 2-star rating on its clarity and I believe the fair rating should be 3-star; I think I would click on 4~5 stars so that I could steer the rating to 3-star. I am not sure how to fix this behaviour, but the rating could say something like "select the final rating you think this post should have", and then I would click on 3-star, if the calculations somehow took care of that.

I completely endorse experimenting different voting systems, we can simply not use them if we realize they don't work well. We should be open to experimentation, and obviously the current system is not perfect and can be improved, and if you are willing to put time and effort into this, I will support and participate and help on discovering if they work better than the current system.

[-]Raelifin4y10

If you have multiple quality metrics then you need a way to aggregate them (barring more radical proposals). Let’s say you sum them (the specifics of how they combine are irrelevant here). What has been created is essentially a 25-star system with a more explicit breakdown. This is essentially what I was suggesting. Rate each post on 5 dimensions from 0 to 2, add the values together, and divide by two (min 0.5), and you have my proposed system. Perhaps you think the interface should clarify the distinct dimensions of quality, but I think UI simplicity is pretty important, and am wary of suggesting having to click 5+ times to rate a post.

I addressed the issue of overcompensating in an edit: if the weighting is a median then users are incentivized to select their true rating. Good thought. ☺️

Thanks for your support and feedback!

[-]Patodesu1y10

If a 5-star system of voting were to be implemented, the UI of voting could continue being the same, and the weights of previous votes could be used but as if they had in between 1 stars increments: strong downvote, downvote, no vote, upvote, strong upvote.

And a middle (3 stars) vote could be added.

I know that people don't think of both ways of voting as equivalents, and a regular "upvote" could reduce the score of a comment/ post.

But they are similar enough, and the UI would be much simpler and not discourage people from voting.

[-]Jesse Kanner3y10

TL;DR - probably best to scrap rating people's posts and comments altogether. At very least change the name.

I'm not fond of the label "Karma". It suggests universal and hermetic moral judgement when in the context of this blog it's just, you know, people's impulsive opinion in the moment. It also suggests persistence - as Karma supposedly spills over into the next iteration.

My very first comment on LW garnered a -36 Karma score. It was thoughtful and carefully argued - and, yes, a bit spicy. But regardless, the community decided to just shun me with a click without actually engaging in the ideas I proposed. I feel ganged up upon and not taken seriously.

I still trudge ahead though with other comments but regrettably I feel compelled to adopt a "me versus you all" stance. Its saddening and anti-intellectual (and a direct violation of the LW ethic of "A community blog devoted to refining the art of rationality.")

Viewpoint diversity is vitally important for deep learning. I suggest dropping Karma altogether, or at least use simple up-down voting to shift comments to the bottom. But even that doesn't really feel right. What you have now is a Milgram machine.

[-]ExtrArro2y10

As I am new here, I am not allowed to 'vote' --

hope your comment at least gets a reply from one of the mods, esp. regarding the possibility of dissenting without being 'dissappeared'.

[-]hath4y10

What if we could score on a power-law level of quality, instead of just five stars for fulfilling each of five categories? There could be one order of magnitude for "well written/thought out", the next order of magnitude higher meaning "has become part of my world model" and one higher of "implemented the recommendations in this post, positively improved my life". The potential issue I see in the 5 star rating system is that it doesn't have enough variance; probably 95% of the posts I've read on here would be either 4 or 5 stars. Being able to rate posts with a decimal, so you can rate a post 4.5 instead of just 4 or 5, would also help, though it'd clutter the UI and make voting cost more spoons.

[-]lsusr4y50

In my personal experience, a single post's karma already operates as a logarithmic measure of quality. It takes more than twice as much effort to write a 100 karma post compared to a 50 karma post.

[-]Raelifin4y10

I agree with the expectation that many posts/comments would be nearly indistinguishable on a five-star scale. I'm not sure there's a way around this while keeping most of the desirable properties of having a range of options, though perhaps increasing it from 10 options (half-stars) to 14 or 18 options would help.

My basic thought is that if I can see a bunch of 4.5 star posts, I don't really need the signal as to whether one is 4.3 stars vs 4.7 stars, even if 4.7 is much harder to achieve. I, as a reader, mostly just want a filter for bad/mediocre posts, and the high-end of the scale is just "stuff I want to read". If I really want to measure difference, I can still see which are more uncontroversially good, and also which has more gratitude.

I'm not sure how a power-law system would work. It seems like if there's still a fixed scale, you're marking down a number of zeroes instead of a number of stars. ...Unless you're just suggesting linear voting (ie karma)?

[-]habryka4y50

One of my ideas for this (when thinking about voting systems in general) is to have a rating that is trivially inconvenient to access. Like, you have a ranking system from F to A, but then you can also hold the A button for 10 seconds, and then award an S rank, and then you can hold the S button for 30 seconds, and award a double S rank, and then hold it for a full minute, and then award a triple S rank.

The only instance I've seen of something like this implemented is Medium's clap system, which allows you to give up to 50 claps, but you do have to click 50 times to actually give those claps.

[-]Yoav Ravid4y30

If we were making just a small change to voting, then the one I would have liked to make is having something like the clap system instead of weakvotes and strongvotes, and have the cap decided by karma score (as it is now, if your strongvote is X, your cap would be X).

[+][comment deleted]4y20

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

108

Improving on the Karma System

108

108

Karma

The Problems of Karma

Other Options

Eigenkarma

Predicting/Modeling the Reader

FB/Discord Style Reacts

My Proposal

The Benefits and Costs

Monotonous Details

QJR Gains/Losses

Moderator Interface

That's All, Folks