We're generally familiar here with the appalling state of medical and dietary research, where most correlations turn out to be bogus. (And if we're not, I have collected a number of links on the topic in my DNB FAQ that one can read, see http://www.gwern.net/DNB%20FAQ#flaws-in-mainstream-science-and-psychology - probably the best first link to read would be Ioannidis's “Why Most Published Research Findings Are False”.)

I recently found a talk arguing that this problem was worse than one might assume, with false positives in the >80% range, and more interestingly, why the rate is so high and will remain high for the foreseeable future. Young asserts, pointing to papers and textbooks by epidemiologists, that they are perfectly aware of what the Bonferroni correction does (and why one would use it) and that they choose to not use it because they do not want to risk any false negatives. (Young also conducts some surveys showing less interest in public sharing of data and other good things like that, but that seems to me to be much less important than the statistical tradeoffs.)

There are three papers online that seem representative:

  1. Rothman (1990)
  2. Perneger (1998)
  3. Vandenbroucke, PLoS Med (2008)

Reading them is a little horrifying when one considers the costs of the false positives, all the people trying to stay healthy by following what is only random noise, and the general (and justified!) contempt for science by those aware of the false positive rate. (I enlarge on this vein of thought on Reddit. The recent kerfluffle about whether salt really is bad for you - medical advice that has stressed millions and will cost more millions due to New York City's war on salt - is a reminder of what is at stake.)

The take-away, I think, is to resolutely ignore anything to do with diet & exercise that is not a randomized trial. Correlations may be worth paying attention to in other areas but not in health.


New Comment
13 comments, sorted by Click to highlight new comments since: Today at 4:55 PM

From Reddit:

So this does really happen, a frustratingly large amount of my time is spent convincing other epidemiologists to do something/anything to control for multiple comparisons. I've got two additional comments to the article. First, its worse than that. There is a genuine lack of multiple comparison control when looking at just published results, but this is just the tip of the iceberg. There are a ton of analyses that get run in the name of "understanding the data" that get tossed when you finally find something publishable. Second, this kind of stuff isnt limited to observational epi. There are plenty of non-FDA scrutinized randomized trials (just look at the social science or education literature) where this kind of thing happens. "Oh well the curriculum we implemented didnt reduce alcohol, violence, unprotected sex, or marijuana; but cigarette use went down in the intervention schools!! !" The only way this stuff stops is if journal editors start taking it more seriously. We should start requiring a priori hypothesis specification for even observational studies, null results should be published, and multiple comparisons should be adjusted for; and for the love of all that is good in this world keep publishing replications till no one has any reasonable doubts about the relationship under question.

This is too nihilistic and is not really what experts like Ioannidis are proposing. Better to evaluate the studies (or find sources that evaluate the studies) individually for their sample size and statistical measures, such as whether or not they control for relevant covariates and do multiple hypothesis testing corrections.

You can download a video of Ioannidis' Mar '11 lecture on nutrition from http://videocast.nih.gov/PastEvents.asp?c=144 (it's big though, 250 MB). Some notes:

  • Randomized trials have problems too.
  • For example, they'll often inflate the effects by contrasting the most extreme groups (upper vs lower 20%).
  • Or, just basic biases, like the winner's curse (large effects tend to come from studies with small sample sizes--you can see this by comparing the log of treatment effect vs the log of total sample size in the cochrane database) or publication bias (leading to missing data).
  • Odds ratios in randomized trials also decrease over time.

Generally, Ioannidis wants massive testing via biobanks (sample sizes in the millions), longitudinal measurements, and large-scale global collaborations. These do not necessarily mean only randomized trials, and in fact they are pretty much impossible for that kind of data set. Epi can work too, it just needs to be done well.

It would be nice to have what Ioannidis suggests, but what do we do in the decades (or ever) before those suggestions happen? Throwing out the correlations seems like the best idea to me - 20% of randomized trials having issues is a win in a way that 80% of results with serious issues is not.

Certainly not all correlations are useless. This feels like I am breaking some analogue of Godwin's law, but just consider the association between cigarette smoke and some types of cancer. Generally, discounting correlations and treating them with more skepticism seem like good ideas. But "throwing out" seems needlessly harsh to me, unless for some reason you are in a hurry, in which case you should think about deferring to more expert sources anyways.

For example, this useful source http://www.informationisbeautiful.net/play/snake-oil-supplements/ (see the spreadsheet at the link) uses mostly randomized trials but also includes some studies which discuss prospective associations. I don't think the organizers should be criticized for including the correlations.

This feels like I am breaking some analogue of Godwin's law, but just consider the association between cigarette smoke and some types of cancer.

It seems like everyone wants to bring up tobacco as the justification for such irresponsibility - it paid off once, so we should keep doing it... See my reply to http://news.ycombinator.com/item?id=2870962 (since they brought up tobacco before you did).

Recently it was announced that some organization (It thought it was the SIAI but i can't find it in their blog) would work to form a panel in order to examine and disambiguate the state of knowledge about a number of different areas, the first being diet, nutrition and exercise. It seems imperative that they take this into consideration. What was this organization, and do we have any way of knowing whether they will or not?

My own opinion of that proposal (I'm not sure whether I said this elsewhere) is that the Group is already being done, and better, by things like the Cochrane Collaboration. There is no comparative advantage there.

That was my thought as well, although if this group were formed I'd be extremely interested in how they worked and what their findings were. I'd imagine Bayesian methods would be the norm, which might give them a leg up.

It would be particularly interesting if they consistently disagreed with mainstream systematic reviews.

Thank you for your writings. This is exactly what this site needs more of: Applied rationality.


Recently I attended a talk by some genetic epidemiology students who applied to bonferroni corrections just based on their supervisor's advice. The whole lot of them had done it, independently. It's a conservative method, and not always the best approach. I reckon some subfields of epidemiology are more liable to methodological failings than others.

Inquiries into the relation between any method as a method and the practice of that method are relatively uncommon. There are, however, a few recent examples in epidemiology. Very little mismatch was observed between the methodological accounts of case-control studies of cancer screening and their practice, perhaps because those who write the methods papers in this highly specialised area are also directly involved in designing investigations of screening programmes using the case-control approach.10 For other methods, the situation is not so neat. Considerable mismatch between methodological standards and actual practice was recently identified in clinical epidemiological studies of molecular and genetic factors

[This comment is no longer endorsed by its author]Reply

I don't follow. You mean, why does reducing false positives increase false negatives? Because Bonferroni doesn't pull any new data from anywhere, it just shifts along a tradeoff.