Link: How Community Feedback Shapes User Behavior

[-]Shmi11y130

To understand what a person’s “utility function” for votes is, we conducted an Amazon Mechanical Turk experiment that asked users how they would perceive receiving a given number of up- and down-votes.

I don't even... Amazon Turk? What are the odds of an experiment like that among one self-selected group to yield any insight on a completely different self-selected group?

Many questions they asked are pretty interesting, though. Not sure how much one can trust their answers.

[-]VincentYu11y90

First, let me point out that the "behavioral changes" that the authors described were investigated over only three posts subsequent to each positive/negative evaluation, so it is unclear whether these effects remain over the long term.

Second, I find questionable the authors' conclusion that negative evaluations cause the subsequent decline in post quality and increase in post frequency, since they did not control the positive/negative evaluations. They model the positive/negative evaluations as random acts of chance (which is what we want for an RCT) and justify this by reporting that their bigram classifier assigns no difference in quality between the positively- and negatively-evaluated posts (across two posts by a pair of matched subjects). However, I find it likely that their classifier makes sufficiently many misclassifications to call into question their conclusion.

For instance, if bad posts have a tendency to occur in streaks of frequent posts (as is the case in flame wars#Flame_war)), then we can explain their observations without assigning causal potency to negative evaluations: once in a while the classifier will erroneously assign a high quality to a bad post near the start of a flame war, but on average it will correctly assign low qualities to the subsequent three posts by the same poster in the flame war, and thus we see the effects that the authors described (without assigning any causal effect to the negative evaluation given by other users to the post near the start of the flame war). To test this explanation, the authors can ask the Crowdflower workers (p. 4) to label each b_0 (described on p. 5) to check whether their classifier is indeed misclassifying b_0 by assigning it too high a quality.

Since the authors did not conduct an RCT, we can come up with many alternative explanations, and I find them plausible. (Is it feasible to conduct an RCT on a site featuring upvotes and downvotes? Yes, it's been done before.)

Despite my criticisms, I think the paper is not bad. I just don't think the authors' methods provide sufficient evidence to warrant their seemingly strong confidence in their conclusions.

[-]ChristianKl11y40

Second, I find questionable the authors' conclusion that negative evaluations cause the subsequent decline in post quality and increase in post frequency, since they did not control the positive/negative evaluations. They model the positive/negative evaluations as random acts of chance

If a community really votes as random acts of chance, that explains that the voting doesn't lead to good behavior ;)

[-]hyporational11y30

I suspect in most communities votes are a measure of attention and this makes even downvotes rewarding. Downvotes are easier to get which could explain the disparity in the amount of contributions. This doesn't apply to LW due to the comment hiding system, I think.

[-]Tenoke11y100

This has already been posted on the Open Thread by NancyLebovitz.

[-]Tyrrell_McAllister11y10

Thanks. I missed that.

[-]NancyLebovitz11y60

On the other hand, it didn't get any comments on the open thread, and it's getting some discussion here.

[-]Gunnar_Zarncke11y-20

I think it is worth a Discussion post as it is really applicable. The rational choice would be to change the LW voting semantics: Drop vote down.

[-]Stefan_Schubert11y70

Yes. Clearly bad karma in itself is not enough for trolls and others who frequently get downvoted - there need to be some more tangible effects like comment hiding. This should have been discussed by the authors but I can't see that they did that (only skim-read the paper, though).

This interesting sentence from the abstract confirms what you say about downvotes being rewarding:

Interestingly, the authors that receive no feedback are most likely to leave a community.

Hence negative feeback is better than being ignored.

[-]CronoDAS11y30

Or worse, being hellbanned.

[-][anonymous]11y10

[This comment is no longer endorsed by its author]Reply

[-]Protagoras11y10

I'm torn. There are definitely differences between the way Less Wrong operates and the situation the article describes, but that's always going to be the case. It would be nice to see more studies, of course, examining how the details of the system matter, but no such seem to be available. Absent that it kind of seems like special pleading to say "we do things slightly differently, so obviously it won't apply to us." On the other hand, only one study is rather weak evidence, and the differences do exist, even if we don't have any actual evidence that they matter. I really don't know if it makes sense to consider changing our system in light of this.

[-]slutbunwaller11y00

People might take Eliezer's proclaimed rationalism more seriously if he had even a rudimentary understanding of statistics and probability. And was actually a good writer.

LESSWRONG
LW

LESSWRONG
LW

7

Link: How Community Feedback Shapes User Behavior

7

7