Case study: abuse of frequentist statistics
Recently, a colleague was reviewing an article whose key justification rested on some statistics that seemed dodgy to him, so he came to me for advice. (I guess my boss, the resident statistician, was out of his office.) Now, I'm no expert in frequentist statistics. My formal schooling in frequentist statistics comes from my undergraduate chemical engineering curriculum -- I wouldn't rely on it for consulting. But I've been working for someone who is essentially a frequentist for a year and a half, so I've had some hands-on experience. My boss hired me on the strength of my experience with Bayesian statistics, which I taught myself in grad school, and one thing reading the Bayesian literature voraciously will equip you for is critiquing frequentist statistics. So I felt competent enough to take a look.1 The article compared an old, trusted experimental method with the authors' new method; the authors sought to show that the new method gave the same results on average as the trusted method. They performed three replicates using the trusted method and three replicates using the new method; each replicate generated a real-valued data point. They did this in nine different conditions, and for each condition, they did a statistical hypothesis test. (I'm going to lean heavily on Wikipedia for explanations of the jargon terms I'm using, so this post is actually a lot longer than it appears on the page. If you don't feel like following along, the punch line is three paragraphs down, last sentence.) The authors used what's called a Mann-Whitney U test, which, in simplified terms, aims to determine if two sets of data come from different distributions. The essential thing to know about this test is that it doesn't depend on the actual data except insofar as those data determine the ranks of the data points when the two data sets are combined. That is, it throws away most of the data, in the sense that data sets that generate the same ranking are equivalent under the test
This is a field in which the discoverer of the theorem that rational agents cannot disagree was given the highest possible honours...