I'm a medical student, and I will often read articles that are critical of scientific literature (Scott Alexander on Pharmacogenomics; EMCrit on thrombolysis in ischemic stroke, etc.) with some awe at the authors' ability to evaluate evidence.

I'm sure that part of this is practice. If I spend more time critically reading scientific literature, and less time taking experts at face value, I will likely become better able to think independently.

However, part of it strikes me as a lack of technical skills. I'm often unsure how to critique study designs when I don't understand the statistical methods being used.

Any recommendations for how I might get the skills I need to think independently about scientific/medical literature?

[Edit: Changed formatting of links after a comment]

New Answer
New Comment

4 Answers sorted by



Skills: Learn both bayesian and frequentist statistics. E T Jaynes's book, also Gelman's Bayesian Data Analysis, and any solid frequentist textbook e.g. Goodman Teach Yourself Statistics 1972 edition. Also Judea Pearl Causality. Read the papers critiquing current methods (why most published research findings are false, the recent papers criticising the use of P values).

You will need calculus and linear algebra to get far but for reading the medical literature you can probably ignore measure theory.

Heuristics: Look at sponsorship, both for the study itself and for the researchers (speaking fees, sponsorship of other papers. This massively skews results.

Look for ideological or prior commitments by authors. This also massively skews results.

Look out for p hacking / garden of forking paths i.e. degrees of freedom that result in 'significant' results being claimed when this is not valid.

Understand the difference between statistical significance and practical significance. Understand how arbitrary the 5% threshold for statistical significance is. Understand that a result falling short of statistical significance may actually be evidence *for* an effect. No significant effect /= no effect, may mean probably is an effect.

Understand how little most medical people from GP to professors know about statistics and how often basic statistical errors occur in the literature (e.g. lack of statistical significant taken to be disproof as in the Vioxx debacle).

Read the methods section first. Don't read the results part of the abstract or if you do, check that all the claims made are backed up by the body of the paper.

When reading meta-analyses look hard at the papers they are based on - you cannot make silk from sows ears. Be very wary of any study that has not been replicated by independent researchers.

Be aware of the extreme weaknesses of epidemiological and observational studies and be very sceptical of claims to have "controlled for" some variable. Such attempts are usually miserable failures, invalid and can make things actually worse. See Pearl's book.

Practically speaking, how might I go about checking if a study has been replicated independently?

A replication will always cite the original study. Google scholar can show you all studies that cite a given page and that list is often a good place to look.

I tend to search "<title of study> replication" in Google, as well as "<core claim of the study> replication"

how do you find the sponsorships of studies and researchers?

Usually conflicts of interest and funding are disclosed (these days) in the paper. Usually I go there first, before the second step which is reading the methods section. There are also registers of funding for medical researchers. Australia https://ses.library.usyd.edu.au/handle/2123/20224 https://ses.library.usyd.edu.au/handle/2123/20223 US https://openpaymentsdata.cms.gov/ But it is imperfect https://www.nytimes.com/2018/12/08/health/medical-journals-conflicts-of-interest.html and of course disclosure is not a complete answer. Disclosed funding greatly affects the reported results.



A few months ago I read this paper for a class (paywalled). In it, the authors perform a similar set of knockdown experiments using both short hairpin RNA and CRISPR in order to repress a gene. Their results with the shRNAs are quite impressive, but the CRISPR part is less so. Why the disparity?

The key to this sort of thing is to picture it from the authors' perspective. Read between the lines, and picture what the authors actually do on a day-to-day basis. How did they decide exactly which experiments to do, which analyses to run, what to write up in the paper?

In the case of the paper I linked above, the authors had a great deal of experience and expertise with shRNAs, but seemed to be new to CRISPR. Most likely, they tried out the new technique either because someone in the lab wanted to try it or because a reviewer suggested it. But they didn't have much expertise with CRISPR, so they had some probably-spurious results in that part of the paper. All we see in the paper itself is a few results which don't quite line up with everything else, but it's not hard to guess what's going on if we think about what the authors actually did.

This principle generalizes. The main things to ask when evaluating a paper's reliability are things like:

  • Does it seem like the authors ran the numbers on every little subset of their data until they found p < .05?
  • Does it seem like the authors massaged the data until it gave the answer they wanted?
  • Does it seem like the authors actively looked for alternative hypotheses/interpretations of their data, and tried to rule them out?

... and so forth. In short, try to picture the authors' actual decision-making process, and then ask whether that decision-making process will yield reliable results.

There's all sorts of math you can run and red flags to watch out for - multiple tests, bad incentives, data not actually matching claims, etc - but at the end of the day, those are mostly just concrete techniques for operationalizing the question "what decision-making process did these authors actually use?" Start with that question, and the rest will follow naturally.



If you haven't done so, I do recommend reading the sequences since they do talk a lot about the basic epistemological foundations necessary for that kind of analysis.

After that, I would probably recommend reading "The Book of Why" by Judea Pearl, which is the best accessible analysis and critique of standard methodologies that I know of, maybe together with "How to Measure Anything". And then I would just try to read as many critiques of studies that you can find as you can. Scott Alexander obviously has many, some of the most important ones are curated in this sequence.

As a concrete training tool, I also think it's a really good idea to write fact posts. Sarah Constantin has a great explanation and guide on how to write them.

Thanks for the links. I think one concern that keeps popping up is that by reading more analysis of other papers I'm just learning others' thoughts rather than learning to think my own.

Constantin's fact post approach does seem like an effective way to cut through that.



Read as many such critiques as possible, take notes, and do iterated compression/summarization of the notes. This way you'll build your own toolbox of heuristics for evaluation that you deeply understand rather than aping the experts without really understanding.

2 comments, sorted by Click to highlight new comments since:

FYI, found your links fairly confusing – would not have thought that each part of the "th"e"se" would link to a different thing, and would find it more useful to just briefly describe each link (with a couple words, a la "this post by Scott" and "this paper")

Thanks for the feedback. Will change the format in the future!