The great guys over at 80,000 Hours made a quiz for that recent replication study (previously discussed on LW), where you can find out if you can predict as well as the scientists or the prediction market. Description:

Can you guess which psychology experiments had correct findings and which were bogus, just from reading a brief description of their results?
Depending on how long you want to play, we'll describe 10, 15 or 21 psychology findings published in Nature and Science, and you'll have to guess whether a replication, with a much larger sample size, got the same result.
Before starting, the people who organised these 21 replications asked expert psychologists to predict which results would hold up. We'll show you how you compare to their performance at the end! (And give you links to all the papers.)
New Comment
9 comments, sorted by Click to highlight new comments since: Today at 7:16 PM

This test might have been more useful if the way those 21 papers were chosen was specified. If they weren't sampled randomly, it might be the case that, for example, the fraction of bogus papers in those 21 papers is a lot higher than the general fraction in psychology papers published in Nature and Science. In such a case, the evaluation will be biased in favor of predictors who tend to classify non-bogus papers as bogus.

From the replication project's web page:

We will replicate 21 experimental studies in the social sciences published in Nature and Science in 2010-2015. These papers were objectively chosen because they were published in these two high-profile journals in this time period, they share a common structure in testing a treatment effect within or between subjects, they test at least one clear hypothesis with a statistically significant finding, and they were performed using students or accessible convenience samples. We plan to conduct the replications between September 2016 and September 2017.

Interesting. I was surprised at how predictable the studies were. It felt like results that aligned with my intuition were likely to be replicated, and results that didn't (e.g., priming affecting a pretty unrelated task) were unlikely to be replicated. Makes me wonder - what's the value of this science if a layperson like me can score 18/18 (with 3 I don't knows) by gut feel after reading only a paragraph or two? Hmm.

(Then again, I guess my attitude of finding predictable results low-value is what has incentivized so much bad science in the hunt for counterintuitive results with their higher rewards.)

39/42 (19.5 studies out of 21). Didn't even notice that details & stats were available. Also this is great work from 80,000 Hours :)

Thanks for sharing this, great fun! 36/42 but I reckon if I'd done this a year or so ago (before reading the sequences/blue minimising robot etc.) I'd have got a lot less.

Incidentally, is it just me or was the "Don't know" button a bit irrelevant. Even if I have no idea at all I can average 1 point by choosing an answer at random. If I have any idea at all my expected score for the question can only go up as long as I'm over 50% confident. If I'm less than 50% confident I should just choose the other answer!

I like the idea of having more than just yes or no but it would need to be more like Professor McGonnagal in hpmor - marking down for incorrect answers. Even a 1 point loss for getting one wrong would mean I had to be 67% confident to put in a guess. To allow players to enter your actual confidence level you could set it up like The £100k Drop.

31 points. Mostly skimmed and went with whether the p-value seemed low enough.

30 points. (Got a size-able number wrong but now the quiz only shows me point total). According to the quiz about average.

Wish I'd been able to list things probabilistically.

I'd love to see something similar but with a higher level of difficulty to see whether the prediction markets have a significant advantage over a casual guess. According to their data, the difference is small, casual guess is about 75% right, and the prediction market is maybe 82% right?

34 points – 17 studies out of 21. Mostly looked at the effect size & the p-values, also thought about whether the proposed causal mechanism seemed plausible.