A pet peeve of mine has always been how people decide to give ratings on Yelp. If the restaurant is pretty good, it gets five stars. If it's ok, it gets four stars. If it's genuinely bad, it gets three. If the waiter was rude, it gets one.

This leads to a situation where most ratings range from 4 to 5. A 4.8 is a good rating. A 4.2 is underwhelming.

I think this is something that we've all internalized. In some sort of a priori sense, you'd think that an average of 4 out of 5 stars is pretty good. However, we all know that it isn't. In fact, this extends beyond Yelp. I remember hearing that Uber drivers get fired if their rating drops below 4.7 or something.

This works because deep down, we're all Bayesians. We don't take things literally. If we did take things literally, we'd see a place with an average of 4.2 stars and expect good things. After all, Yelp labels a 5 star review as "Great", 4 stars as "Good", 3 as "Ok", 2 as "Could've been better", and 1 as "Not good".

Instead of taking things literally, we look at the rating as Bayesian evidence. We realize that a 4.2 rating isn't something that we'd expect to observe if the restaurant actually is really good, and it is something we'd expect to observe if the restaurant is mediocre.

So then, maybe there's no problem here. Maybe things all work out at the end of the day. The fact that raters lean so heavily towards giving ratings of 4 and 5 doesn't actually prevent users from using the average rating to tell how good a restaurant is. Users just need to re-calibrate.

It's the same thing as that friend who is overenthusiastic in their text messages. Everything has a bunch of exclamation points and emojis. And so, when you receive a text saying "good to hear from you", you're concerned. If they actually were happy to hear from you, you'd expect something more like "HEY!!! So good to hear from you!!!!!!!". If they were just normal-pleased to hear from you you'd expect "Hey! Great to hear from you!!!".

I think in most contexts this sort of skew towards positivity is fine. People are able to read well enough between the lines. Recalibrating would take more effort, require too much coordination, and cause too much social friction for it to be worth it. However, I can imagine contexts where accurate feedback is essential, and it would be worth the effort to incorporate 1 and 2 star reviews, so to speak.

Such contexts are probably rare. The participants would have to really care about being effective. Off the top of my head, a few candidates that come to mind are alignment researchers, cofounders of a startup, and a mentee working with a mentor. In situations like office jobs where people only pretend to care about being effective, pursuing such a culture would probably just backfire and cause friction.

New to LessWrong?

New Comment
7 comments, sorted by Click to highlight new comments since: Today at 9:55 AM

... most ratings range from 4 to 5. A 4.8 is a good rating. A 4.2 is underwhelming.

Is this US-centric? I was confused by this prior, since I see 2 and 3 star ratings all the time; 4.2 would catch my attention as being pretty good. 

Your anecdote about the friend who's overenthusiastic in their texts reminds me of my culture shock when I first arrived in the US. People were very often 'fake-friendly', for want of a better term; it was disorienting. 

Oh, interesting. I live in the US and haven't had much of a chance throughout my life to experience other cultures so both the ratings and "fake friendliness" very well might be US-centric.

I remember seeing a post (with lots of likes) on Finnish-language social media about some (American) video call app asking you how satisfied you were with the call and then asking what was wrong if you only gave it four stars. The person who made the post was annoyed with this behavior because they didn't think there was anything wrong with the call, that was the reason they gave it four stars!

A popular sentiment was that four stars was the "everything went well, I have no complaints" grade, with five reserved for cases when something is really outstanding and exceeds your expectations.

I agree that attitudes have been internalized that make ratings skewed. I will add, however, that the rating for "mean performance" on a scale is context-dependent. Examples off the top of my head: 80% is an okay-ish grade in most US schools, but 50% is atrocious. Contrast this with attractiveness on a 0-10 scale: an 8 is a superior specimen, whereas a 5 is average.

With customer service in particular, I can attest to feeling a lot of pressure to give a high rating (if I must rate) because I don't want an employee punished as a result. Heck, this goes beyond ratings. I would be a dishonest juror if I thought the defendant were guilty of a minor crime but they were facing an extreme sentence.

Agreed: If I have in the back of my mind the knowledge that the human being I interacted with is being graded and measured on their rating, there's definitely a "don't screw over that person" motive.

They're working under conditions I would find nearly intolerable and they deserve some sympathy/solidarity.

[-]Viliam10mo40

People are able to read well enough between the lines.

Well, some of them are. (Those who are neurotypical and not coming from a different culture.)

[-]Ben10mo20

It varies between cultures a lot. When I check reviews of stories I have written on Amazon or Goodreads I always calibrate by clicking on the user portrait and seeing what they normally give. Many of my 5-star ratings are not much to celebrate:  turns out they have given 5 to everything ever. But it makes me smile when I see that my 4-star was the highest rating that person gave in the last 10-20 things they reviewed.

I assume that Uber and similar software already does this automatically under the hood. They know a 4-star rating from a prolific 5-star giver is a bad sign. They know a 4-star is good from that person who aims to give 3 on average because "that's obviously what a well-calibrated person does". I think the searching algorithms at least know this.