Jun 10, 2010

33 comments

*This article is an attempt to summarize basic material, and thus probably won't have anything new for the hard core posting crowd. If you're new and this article got you curious, we recommend the Sequences.*

People who know a little bit of statistics - enough to use statistical techniques, not enough to understand why or how they work - often end up horribly misusing them. Statistical tests are complicated mathematical techniques, and to work, they tend to make numerous assumptions. The problem is that if those assumptions are not valid, most statistical tests do not cleanly fail and produce obviously false results. Neither do they require you to carry out impossible mathematical operations, like dividing by zero. Instead, they simply produce results that do not tell you what you think they tell you. As a formal system, pure math exists only inside our heads. We can try to apply it to the real world, but if we are misapplying it, nothing in the system itself will tell us that we're making a mistake.

Examples of misapplied statistics have been discussed here before. Cyan discussed a "test" that could only produce one outcome. PhilGoetz critiqued a statistical method which implicitly assumed that taking a healthy dose of vitamins had a comparable effect as taking a toxic dose.

Even a very simple statistical technique, like taking the correlation between two variables, might be misleading if you forget about the assumptions it's making. When someone says "correlation", they are most commonly talking about Pearson's correlation coefficient, which seeks to gauge whether there's a linear relationship between two variables. In other words, if X increases, does Y also tend to increase. (Or decrease.) However, like with vitamin dosages and their effects on health, two variables might have a non-linear relationship. Increasing X might increase Y up to a certain point, after which increasing X would decrease Y. Simply calculating Pearson's correlation on two such variables might cause someone to get a low correlation, and therefore conclude that there's no relationship or there's only a weak relationship between the two. (See also Anscombe's quartet.)

The lesson here, then, is that not understanding how your analytical tools work will get you incorrect results when you try to analyze something. A person who doesn't stop to consider the assumptions of the techniques she's using is, in effect, thinking that her techniques are magical. No matter how she might use them, they will always produce the right results. Of course, assuming that makes about as much sense as assuming that your hammer is magical and can be used to repair anything. Even if you had a broken window, you could fix that by hitting it with your magic hammer. But I'm not *only* talking about statistics here, for the same principle can be applied in a more general manner.

Every moment in our lives, we are trying to make estimates of the way the world works. Of what causal relationships there are, of what ways of describing the world make sense and which ones don't, which plans will work and which ones will fail. In order to make those estimates, we need to draw on a vast amount of information our brains have gathered throughout our lives. Our brains keep track of countless pieces of information that we will not usually even think about. Few people will explicitly keep track of the amount of different restaurants they've seen. Yet in general, if people are asked about the relative number of restaurants in various fast-food chains, their estimates generally bear a close relation to the truth.

But like explicit statistical techniques, the brain makes numerous assumptions when building its models of the world. Newspapers are selective in their reporting of disasters, focusing on rare shocking ones above common mundane ones. Yet our brains assume that we hear about all those disasters because we've personally witnessed them, and that the distribution of disasters in the newspapers therefore reflects the distribution of disasters in the real world. Thus, people asked to estimate the frequency of different causes of death underestimate the frequency of those that are underreported in the media, and overestimate the ones that are overreported.

On this site, we've also discussed a variety of other ways by which the brain's reasoning sometimes goes wrong: the absurdity heuristic, the affect heuristic, the affective death spiral, the availability heuristic, the conjunction fallacy... the list goes on and on.

So what happens when you've read too many newspaper articles and then naively wonder about how frequent different disasters are? You are querying your unconscious processes about a certain kind of statistical relationship, and you get an answer back. But like the person who was naively misapplying her statistical tools, the process which generates the answers is a black box to you. You do not know how or why it works. If you would, you could tell when its results were reliable, when they needed to be explicitly corrected for, and when they were flat-out wrong.

Sometimes we rely on our intuitions even when they are being directly contradicted by math and science. The science seems absurd and unintuitive; our intuitions seem firm and clear. And indeed, sometimes there's a flaw in the science, and we are right to trust our intuitions. But on other occasions, our intuitions are wrong. Yet we frequently persist in holding onto our intuitions. And what is ironic is that we persist on holding onto them exactly because we do not know how they work, because we cannot see their insides and all the things inside them that could go wrong. We only get the feeling of certainty, a knowledge of *this being right*, and that feeling cannot be broken into parts that could be subjected to criticism to see if they add up.

But like statistical techniques in general, our intuitions are not magic. Hitting a broken window with a hammer will not fix the window, no matter how reliable the hammer. It would certainly be *easy* and *convenient* if our intuitions always gave us the right results, just like it would be *easy* and *convenient* if our statistical techniques always gave us the right results. Yet carelessness can cost lives. Misapplying a statistical technique when evaluating the safety of a new drug might kill people or cause them to spend money on a useless treatment. Blindly following our intuitions can cause our careers, relationships or lives to crash and burn, because we did not think of the possibility that we might be wrong.

That is why we need to study the cognitive sciences, figure out the way our intuitions work and how we might correct for mistakes. Above all, we need to learn to always question the workings of our minds, for we need to understand that they are not magical.