Review

Quinn wrote a while ago "I heard a pretty haunting take about how long it took to discover steroids in bike races. Apparently, there was a while where a "few bad apples" narrative remained popular even when an ostensibly "one of the good ones" guy was outperforming guys discovered to be using steroids."

I have been thinking about that notion after researching BPC 157 where it seems that the literature around it is completely fraudulent.

How do you think about the issue of how much of the literature is fraudulent?

New Answer
New Comment

1 Answers sorted by

johnswentworth

82

The main way I think about academic fraud is to lump it together with other things which are equivalent for most purposes and hard to distinguish from fraud.

For instance: fraud and p-hacking are pretty similar from an epistemic point of view. I expect the resulting publications to usually have a similar "smell" to them; they lack the gears which show up in real and useful work. And in both cases, the right response from me is usually just to ignore the paper(s) in question.

9 comments, sorted by Click to highlight new comments since:

There’s been some survey data on this, e.g.: 

Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PloS one4(5), e5738. 

This study reports that ~2% admitted to data fabrication. However, it is of course difficult to get a good estimate by asking people, since this is a case where people have strong incentives to lie. Asking people about suspicions of colleagues may give an overestimate. So I think in general it’s very hard to estimate actual fabrication rates.

One obvious issue is that many instances of fraud that are caught are likely to be cases where the data looked suspicious to others. This requires eyes to be on the data, for someone to notice, for that person to follow through, and most importantly, for the fraud to be sloppy enough that someone noticed it. This means identified cases of fraud are probably from people who are less careful. So we’re seeing the most blatant, obvious, and sloppy fraud. People who are very good at committing fraud are much more likely to go undetected. And that’s scary.

Asking people about suspicions of colleagues may give an overestimate. 

It might also be an underestimate. If you ask most people about how many of their colleagues have stolen in the past or ask men about how many of their friends engaged in sexual assault, you get underestimates.

Fanelli is a good, if dated reference for this. Another important point is that there are levels of misconduct in research, ranging from bad authorship practices to outright fabrication of results, with the less severe practices being relatively more common: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4269469/

Aside from all that, there's irreproducibility, which doesn't arise from any kind of deliberate misconduct, but still pollutes the epistemic commons: https://www.cos.io/rpcb

There are still more issues. Even if the results of a study can be reproduced given the raw data, and even if the findings can be replicated in subsequent studies, that does not ensure that results have identified the effect researchers claim to have found. 

This is because studies can rely on invalid measures. If a study claims to measure P, but fails to do so, it may nevertheless pick up on some real pattern other than a successful measurement of P. In these cases, results can replicate and appear legitimate even if they don't show what they purport to show.

In addition to measurement problems, and definitional problems (is p-hacking "fraud" or just bad methodology?), I think "academia" is too broad to meaningfully answer this question.

Different disciplines, and even different topics within a discipline will have a very different distribution of quality of research, including multiple components - specificity of topic, design of mechanism, data collection, and application of testing methodology.  AND in clarity and transparency, for whether others can easily replicate the results, AND agree or disagree with the interpretation.  AND in importance of result, whether anyone seriously tries to replicate or contradict a finding.

Then there are selection effects.  To the degree that popular media and political/personal discussions of interesting topics are biased and untrustworthy, their choice of WHICH academic papers to use as evidence is likely to have the same biases.  Not necessarily massive fraud, but less reliable in terms of conclusions than a random sampling.

obMontyPython:

Sir John Cunningham : May I take this opportunity of emphasizing that there is no cannibalism in the British Navy, absolutely none. And when I say none, I mean there is a certain amount, more than I personally admit.

[-]gjm20

Not an answer but a remark: there are several different notions of "how much of the literature" -- you could ask "what fraction of all things published anyhow anywhere?" or "what fraction of all things published in reputable journals?" or "what fraction of things published in reputable journals, weighted by X?" where X is citation count or journal impact-factor or some other thing that accounts for the fact that someone looking (competently and honestly) in the academic literature will likely pay more attention to some parts of it than others.

I'd bet fairly heavily that the amount of fraud decreases quite a bit as you move along that scale.

The effect isn't large, but you'd lose that bet. See Nonreplicable publications are cited more than replicable ones.

[-]gjm20

That's very interesting but it's about replicability not fraud and those are (at least sometimes) very different things.

Possible counters:

1. "Not so different: results that don't replicate are probably fraudulent." I'm prepared to be persuaded but I'd be surprised if most reproducibility-failures were the result of fraud.

2. "Whatever differences between 'worse' and 'better' publications you have in mind that you'd hope would reduce fraud in the 'better' ones, you should also expect to reduce nonreplicability; apparently they don't do that, so why expect them to reduce fraud?" My impression is that a lot of nonreplicability is just "lucky": you happened to get a result with p<0.05 and so you published it, and if you'd got p>0.05 maybe you wouldn't have bothered, or maybe the fancy journal wouldn't have been interested and it would have been published somewhere worse. This mechanism will lead to nonreplicable papers being higher-profile and cited more, without there being anything much about those papers that would raise red flags if a journal is worried about fraud. So to whatever extent fraud is detectable in advance, and better journals try harder to spot it because their reputation is more valuable, better journals will tend to see less fraud but not less nonreplicability. (And: to whatever extent fraudsters expect better journals to try harder to catch fraud, they will tend to avoid publishing their fraudulent papers there.)

I like your second argument. But to be honest, there is a giant grey area between "non-replicable" and "fraudulent". It is hard to draw the line between, "Intellectually dishonest but didn't mean to deceive" and "fraudulent". And even if you could define the line, we lack the data to identify what falls on either side.

It is worth reading Time to assume that health research is fraudulent until proven otherwise? I believe that this case is an exception to Betteridge's law - I think that the answer is yes. Given the extraordinary efforts that the editor of Anaesthesia needed to catch some fraud, I doubt that most journals do it. And because I believe that, I'm inclined to a prior that says that non-replicability suggests at least even odds of fraud.

As a sanity check, high profile examples like the former President of Stanford demonstrate that fraudulent research is accepted in top journals, leading to prestigious positions. See also the case of Dr. Francesca Gino, formerly of Harvard.

And, finally, back to the line between intellectual dishonesty and fraud. I'm inclined to say that they amount to the same thing in practice, and we should treat them similarly. And the combined bucket is a pretty big problem.

Here is a good example. The Cargo Cult Science speech happened around 50 years ago. Psychologists have objected ever since to being called a pseudoscience by many physicists. But it took 40 years before they finally did what Feynman told them to, and tried replicating their results. They generally have not acknowledged Feynman's point, nor have the started fixing the other problems that Feynman talked about.

Given that, how much faith should we put in psychology?