"Statistically significant results" mean that there's a 5% chance that results are wrong in addition to chance that the wrong thing was measures, chance that sample was biased, chance that measurement instruments were biased, chance that mistakes were made during analysis, chance that publication bias skewed results, chance that results were entirely made up and so on.

"Not statistically significant results" mean all those, except chance of randomly mistaken results even if everything was ran correct is not 5%, but something else, unknown, and dependent of strength of the effect measured (if the effect is weak, you can have study where chance of false negative is over 99%).

So results being statistically significant or not, is really not that useful.

For example, here's a survey of civic knowledge. Plus or minus 3% measurement error? Not this time, they just completely made up the results.

Take home exercise - what do you estimate Bayesian chance of published results being wrong to be?

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 10:52 AM

"Statistically significant results" mean that there's a 5% chance that results are wrong

Wrong. It means that the researcher defined a class of results such that the class had less than a 5% chance of occurring if the null hypothesis were true, and that the actual outcome fell into this class.

There are all sorts of things that can go wrong with that, but, even leaving all those aside, it doesn't mean there's a 5% chance the results are wrong. Suppose you're investigating psychic powers, and that the journals have (as is usually the case!) a heavy publication bias toward positive results. Then the journal will be full of statistically significant results and they will all be wrong.

In fairness, your last point isn't really about confidence levels. A journal that only accepted papers written in the Bayesian methodology, but had the same publication bias, would be just as wrong.

A journal that reported likelihood ratios would at least be doing better.

A journal that actually cared about science would accept papers before the experiment had been done, with a fixed statistical methodology submitted with the paper in advance rather than data-mining the statistical significance afterward.

A journal that actually cared about science

Is this meant to suggest that journal editors literally don't care about science that much, or simply that "people are crazy, the world is mad"?

Not an objection, but a lot of the articles in that journal would be "here's my reproduction of the results I got last year and published then".

...which is a really good thing, on reflection.

I'm confused by your remark. "5% chance of false positive" obviously means P(positive results|null hypothesis true)=5%, P(null hypothesis true|positive results) is subjective and has no particular meaningful value, so I couldn't have talked about that.

"Statistically significant results" mean that there's a 5% chance that results are wrong

Hmm. Assuming the experiment was run correctly, it means there's a less than 5% chance that data this extreme would have been generated if the null hypothesis - that nothing interesting was happening - were true. The actual chance can be specified as e.g. 1%, 0.01%, or whatever.

Also, assuming everything was done correctly, it's really the conclusions drawn from the results, rather than the results themselves, that might be wrong...

The point is that this chance, no matter how small, is in addition to massive number of things that could have gone wrong.

And with negative results you don't even have that.

Yes, classical hypothesis testing is of questionable value - estimated precisely enough we will almost always reject the null hypothesis, but who cares? I think that "the chance the results are wrong" is not the most helpful way to think about research in many areas.

Of course, it is important to remind ourselves of the many types of mistakes possible in our research.

5 minutes one google didn't turn up the study I'm thinking of, but I remember reading a study that claimed that around 2/3 of all published studies were false positives due to publication bias. (the p value they reported was sufficiently small to believe it).

I did, however, find a metastudy that studied publication bias on papers about publication bias.link.

They found "statistically insignificant" (p = 0.13) evidence for false positives there too.

The way I tend to deal with them now is treating them as weak evidence unless I'm interested enough to look further.

If the P value not really low, I'll make guesses at how popular a topic of study it is (how many times can you try for a positive result?), how I heard about the study (more possibility for selection bias), how controversial the topic is (how strong is the urge to fudge something?), and what my prior probability would be.

For example, when someone tells me about a study that claims "X causes cancer" and 1) P = .04 2) it would somehow benefit the person if the claim were true 3) I see no prior reason for a link between X and cancer and 4) see possible other causes for the correlation that were not obviously corrected for, then I assign very little weight to this evidence.

If I find a study by googling the topic, P = 0.001, the topic isn't all that controversial, and no one would even think to test it if they did not assign high prior probability, then I'll file it under "known".