Science is full of misleading findings—results that would not hold up if the study were attempted again under ideal conditions. In data-driven investigations, a big part of this could be the forking paths problem. Researchers make many decisions about how to analyze the data (leave out this subgroup, include that control variable, etc.). But there are many defensible paths to take, and they could yield vastly different results. When the study gets published, you don’t learn about results from the paths not taken.
I think a lot of well-meaning scientists fall prey to this and churn out shoddy results. I don’t think they have an evil mindset, however. (One could certainly imagine a Psychopath's Guide to Causal Inference: fake the data. Or, if that’s too hard, run a loop that tries every possible analysis until by random luck you get the result you want. Then write up that result and justify the analytical setup as the obvious choice.)
A more appropriate analogy for understanding (most) p-hackers might be focusing a microscope: the effect is there, you’re merely revealing it. When you look at a microorganism under a microscope, you have some idea of what should appear once it's in focus. It's blurry at first, but you adjust the knobs until you see the expected shape.
A case in point is Brown et al (2022), a paper on pollution and mortality in India. This is a highly credentialed team (here’s the last author). What’s fascinating is that they describe the forking paths that they abandoned:
We performed exploratory analyses on lag years of PM25…We found that using lag 2–4 y would cause PM25 exposures to be protective on respiratory disease and IHD [(ischemic heart disease)], whereas using lag 4–6 y the effect of PM25 exposures on stroke was smallest in comparison with lag 2–4 y and lag 3–5 y…Thus, to be consistent for all three disease outcomes and avoiding an implausible protective effect of PM25, we chose to use lag 3–5 y.
So: the authors were trying to study the impact of ambient pollution on different causes of death. They were unsure how far back to measure exposure: pollution 2-4 years ago, 3-5 years ago or something else? There isn’t an obvious answer so they tried a few things. When they used a 2-4 year lag, their results appeared to suggest that pollution improves your health. But that’s impossible. When they tried the 4-6 year lag, the effects were much smaller. They chose the 3-5 year lag because the effects were large and positive across causes of death.
The previous paragraph will make most methodologists want to throw up. You can’t choose your analytical approach based on the results it generates. And the positive impact that they found was reason to think that the whole study is hopelessly confounded. Surely this practice, in general, will lead to unreliable findings.
The authors’ perspective might be that this is overly strict. Results that make sense are more likely to be true. (After all, this form of Bayesianism is how we reject the findings from junk science.) If certain analytical decisions yield a collection of findings that align with our priors, those decisions are probably correct. They knew the effects were there; they just had to focus the microscope.
My hunch though is that it would improve science if we were a lot more strict, requiring pre-registrations or specification curves. In observational causal inference, you are not focusing a microscope. Any priors you have should be highly uncertain.
Many researchers will be guided in their data analysis by overly confident priors. They’ll abandon many analyses that readers will never learn about, so published work will convey answers with far too much certainty. In light of this, it’s actually commendable that Brown et al shared their process. I think we can push science to be even better.
References
Brown, Patrick E., et al. "Mortality associated with ambient PM 2.5 exposure in India: Results from the million death study." Environmental Health Perspectives 130.9 (2022): 097004.