LW anchoring experiment: maybe

by gwern1 min read23rd Jan 201323 comments

23

PrimingAnchoring
Personal Blog

I do an informal experiment testing whether LessWrong karma scores are susceptible to a form of anchoring based on the first comment posted; a medium-large effect size is found although the data does not fit the assumed normal distribution & the more sophisticated analysis is equivocal, so there may or may not be an anchoring effect.

Full writeup on gwern.net at http://www.gwern.net/Anchoring

23 comments, sorted by Highlighting new comments since Today at 5:25 AM
New Comment

I would really appreciate a very brief statement of your conclusions. My apologies, but I don't feel like shoveling through your analysis just to find out whether there is an effect, a weak effect, a backwards effect, or whatever.

[-][anonymous]8y 12

Just skip the intro, R-code and graphs (too heavy on math).

Question 1:

Is there a difference in karma between posts that received a negative initial comment and those that received a positive initial comment? (Any difference suggests that one or both is having an effect.)

Conclusion 1:

The difference in means has shrunk but not gone away; it’s large enough that 10% of the possible effect sizes (of "a negative initial comment rather than positive") may be zero or actually be positive (increase karma) instead. This is a little concerning, but I don’t take this too seriously:

  • this is not a lot of data
  • as we’ve seen there are extreme outliers suggesting that the assumptions of normality may be badly wrong
  • even at face value, 10 karma points doesn’t seem like it’s large enough to have any important real-world consequences (like make people leave LW who should’ve stayed)

Question 2:

Is there a difference in karma between the two kinds of initial comments, as I began to suspect during the experiment?

Conclusion 2:

As one would hope, neither group of comments ends up with net positive mean score, but they’re clearly being treated very differently: the negative comments get downvoted far more than the positive comments. I take this as perhaps implying that LW’s reputation for being negative & hostile is a bit overblown: we’re negative and hostile to poorly thought out criticisms and arguments, not fluffy praise.

tl;dr: maybe

I was interested in the details of this, but yes, even I would have appreciated a tl:dr.

I'm not sure to what extent these comments can be modeled as expressing a "positive" or a "negative" reaction, the nonsensical one-line explanations made them mostly "insane" reactions (in my perception), which might overshadow the intended interpretation. It might have been a cleaner test if there were no explanations, or if you made an effort to carefully rationalize the random judgments (although that would be a more significant interference).

It's a "damned if you do, damned if you don't" sort of dilemma.

I know from watching them plummet into oblivion that comments which are just "Upvoted" or "Downvoted" are not a good idea for any anchoring question - they'll quickly be hidden, so any effect size will be a lot smaller than usual, and it's possible that hidden comments themselves anchor (my guess: negatively, by making people think "why is this attracting stupid comments?').

While if you go with more carefully rationalized comments, that's sort of like http://xkcd.com/810/ and starts to draw on the experimenter's own strengths & weaknesses (I'm sure I could make both quality criticisms and praises of psychology-related articles, but not so much technical decision theory articles).

I hoped my strategy would be a golden mean of not too trivial to be downvoted into oblivion, but not so high-quality and individualized that comparability was lost. I think I came close, since the positive comments saw only a small negative net downvote, indicating LWers may not have regarded it as good enough to upvote but also not so obviously bad as to merit a downvote.

(Of course, I didn't expect the positive and negative comments to be treated differently - they're pretty much the same thing, with a negation. I'm not sure how I would have designed it differently if I had known about the double-standard in advance.)

Of course, I didn't expect the positive and negative comments to be treated differently

(Positive and somewhat stupid comments tend to be upvoted back to 0 even after they get downvoted at some point, so it's not just absence of response. I consider it a dangerous vulnerability of LW to poorly thinking but socially conforming participants, whose active participation should be discouraged, but who are instead mildly rewarded.)

I consider it a dangerous vulnerability of LW to poorly thinking but socially conforming participants, whose active participation should be discouraged, but who are instead mildly rewarded.

It's a huge problem that I have observed eroding quality of thought and discussion over time. I'm relieved to see others acknowledge it.

A respected member saying "I know, right?" as you just did is valuable evidence, whereas the same from a no-name poster is noise. The naive reaction risks forming cliques with mutual back-scratching from big names.

Full disclosure: That kind of fluff is how I got most of my karma.

Haven't you critiqued people for doing just this kind of thing on LW?

Have I? If I have, I'm sure there were some germane difference: banned accounts, more than 1 sock, abuse of socks to gain multiple votes, unsystematic data collection, no analysis, no public claim, clear damage, etc.

Oh, damn - now I'm annoyed at myself for forgetting to make Rhwawn comments on my own posts after the beginning.

Don't feel too bad, you weren't the only one who lapsed. I didn't hector you guys because after all, it wasn't your experiment.

Possible model extensions:

Does best allow you to add prior information?

You might try adding a prior over the effect size, it would be surprising if it was huge. For example, -30 seems implausibly large to me.

You could also add priors for the group means. You have some pretty good prior information here since there are lots of other posts.

It would be interesting to look at the distribution of post karma. That might be kind of informative, perhaps it would be better to do the analysis on something like a log scale? Obviously it can't be exactly that since there are negative values...

Does best allow you to add prior information?

Supposedly you can add it but you'd have to edit the source, and that's beyond me right now.

You might try adding a prior over the effect size, it would be surprising if it was huge. For example, -30 seems implausibly large to me.

Sure, but the normal distribution is the wrong distribution to be using in the first place. I'm not really sure what... an exponential, maybe?

You could also add priors for the group means. You have some pretty good prior information here since there are lots of other posts. It would be interesting to look at the distribution of post karma.

You'd need the post karma in the first place. Offhand, I don't know any way to get it other than scraping thousands of pages...

perhaps it would be better to do the analysis on something like a log scale? Obviously it can't be exactly that since there are negative values...

Run the log on the absolute value and negate.

You can look at the RSS feed for some post category, and extract the votes, they're near the beginning in the description section.

if Heads, I posted a comment as Rhwawn saying only "Upvoted" or if Tails, a comment saying "Downvoted".

Upon inspection, these comments seem to all contain explanatory remarks.

Yes, see the full writeup.

You're a bit late.

Never too late to upboat a good post! \o/ (…and dispense some bias at the occasion…)