Related to: The Conjunction Fallacy, Conjunction Controversy
The heuristics and biases research program in psychology has discovered many different ways that humans fail to reason correctly under uncertainty. In experiment after experiment, they show that we use heuristics to approximate probabilities rather than making the appropriate calculation, and that these heuristics are systematically biased. However, a tweak in the experiment protocols seems to remove the biases altogether and shed doubt on whether we are actually using heuristics. Instead, it appears that the errors are simply an artifact of how our brains internally store information about uncertainty. Theoretical considerations support this view.
EDIT: The view presented here is controversial in the heuristics and biases literature; see Unnamed's comment on this post below.
EDIT 2: The author no longer holds the views presented in this post. See this comment.
A common example of the failure of humans to reason correctly under uncertainty is the conjunction fallacy. Consider the following question:
Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.
What is the probability that Linda is:
(a) a bank teller
(b) a bank teller and active in the feminist movement
In a replication by Gigerenzer, 91% of subjects rank (b) as more probable than (a), saying that it is more likely that Linda is active in the feminist movement AND a bank teller than that Linda is simply a bank teller (1993). The conjunction rule of probability states that the probability of two things being true is less than or equal to the probability of one of those things being true. Formally, P(A & B) ≤ P(A). So this experiment shows that people violate the conjunction rule, and thus fail to reason correctly under uncertainty. The representative heuristic has been proposed as an explanation for this phenomenon. To use this heuristic, you evaluate the probability of a hypothesis by comparing how "alike" it is to the data. Someone using the representative heuristic looks at the Linda question and sees that Linda's characteristics resemble those of a feminist bank teller much more closely than that of just a bank teller, and so they conclude that Linda is more likely to be a feminist bank teller than a bank teller.
This is the standard story, but are people really using the representative heuristic in the Linda problem? Consider the following rewording of the question:
Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.
There are 100 people who fit the description above. How many of them are:
(a) bank tellers
(b) bank tellers and active in the feminist movement
Notice that the question is now strictly in terms of frequencies. Under this version, only 22% of subjects rank (b) as more probable than (a) (Gigerenzer, 1993). The only thing that changed is the question that is asked; the description of Linda (and the 100 people) remains unchanged, so the representativeness of the description for the two groups should remain unchanged. Thus people are not using the representative heuristic - at least not in general.
Tversky and Kahneman, champions and founders of the heuristics and biases research program, acknowledged that the conjunction fallacy can be mitigated by changing the wording of the question (1983, pg 309), but this isn't the only anomaly. Consider another problem:
If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming you know nothing about the person's symptoms or signs?
Using Bayes' theorem, the correct answer is .02, or 2%. In one replication, only 12% of subjects correctly calculated this probability. In these experiments, the most common wrong answer given is usually .95, or 95% (Gigerenzer, 1993). This is what's known as the base rate fallacy because the error comes from ignoring the "base rate" of the disease in the population. Intuitively, if absolutely no one has the disease, it doesn't matter what the test says - you still wouldn't think you had the disease.
Now consider the same question framed in terms of relative frequencies.
One out of 1000 Americans has disease X. A test has been developed to detect when a person has disease X. Every time the test is given to a person who has the disease, the test comes out positive. But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, out of every 1000 people who are perfectly healthy, 50 of them test positive for the disease.
Imagine that we have assembled a random sample of 1000 Americans. They were selected by a lottery. Those who conducted the lottery had no information about the health status of any of these people. How many people who test positive for the disease will actually have the disease?
_____ out of _____.
Using this version of the question, 76% of subjects answered correctly with 1 out of 50. Instructing subjects to visualize frequencies in graphs increases this percentage to 92% (Gigerenzer, 1993). Again, re-framing the question in terms of relative frequencies rather than (subjective) probabilities results in improved performance on the test.
Consider yet another typical question in these experiments:
Which city has more inhabitants?
How confident are you that your answer is correct?
50%, 60%, 70%, 80%, 90%, 100%
According to Gigerenzer (1993),
The major finding of some two decades of research is the following: In all the cases where subjects said, "I am 100% confident that my answer is correct," the relative frequency of correct answers was only about 80%; in all the cases where subjects said, "I am 90% confident" the relative frequency of correct answers was only about 75%, when subjects said "I am 80% confident" the relative frequency of correct answers was only about 65%, and so on.
This is called overconfidence bias. A Bayesian might say that you aren't calibrated. In any case, it's generally frowned upon by both statistical camps. If when you say you're 90% confident and you're only right 80% of the time, why not just say you're 80% confident? But consider a different experimental setup. Instead of only asking subjects one general knowledge question like the Hyderabad-Islamabad question above, ask them 50; and instead of asking them how confident they are that their answer is correct every time, ask them at the end how many they think they answered correctly. If people are biased in the way that overconfidence bias says they are there should be no difference between the two experiments.
First, Gigerenzer replicated the original experiments, showing an overconfidence bias of 13.8% - that is, subjects were an additional 13.8% more confident than the true relative frequency of correct answers, on average. For example, if they claimed a confidence of 90%, on average they would answer correctly 76.2% of the time. Using the 50 question treatment, overconfidence biased dropped to -2.4%! In a second replication, the control was 15.4% and the treatment was -4.2% (1993). Note that -2.4% and -4.2% are likely not significantly different from 0, so don't interpret that as underconfidence bias. Once the probability judgment was framed in terms of relative frequencies, the bias basically disappeared.
So in all three experiments, the standard results of the heuristics and biases program fall once the problem is recast in terms of relative frequencies. Humans don't simply use heuristics; something else more complicated is going on. But the important question is, of course, what else? To answer that, we need to take a detour through information representation. Any computer - and the brain is just a very difficult to understand computer - has to represent its information symbolically. The problem is that there are usually many ways to represent the same information. For example, 31, 11111, and XXXI all represent the same number using different systems of representation. Aside from the obvious visual differences, systems of representation also differ based on how easy they are to use for a variety of operations. If this doesn't seem obvious, as Gigerenzer says, try long division using roman numerals (1993). Crucially, this difficulty is relative to the computer attempting to perform the operations. Your calculator works great in binary, but your brain works better when things are represented visually.
What does the representation of information have to do with the experimental results above? Well, let's take another detour - this time through the philosophy of probability. As most of you already know, there the two most common positions are frequentism and Bayesianism. I won't get into the details of either position beyond what is relevant, so if you're unaware of the difference and are interested click the links. According to the Bayesian position, all probabilities are subjective degrees of belief. Don't worry about the sense in which probabilities are subjective, just focus on the degrees of belief part. A Bayesian is comfortable assigning a probability to any proposition you can come up with. Some Bayesians don't even care if the proposition is coherent.
Frequentists are different beasts altogether. For a frequentist, the probability of an event happening is its relative frequency in some well defined reference class. A useful though not entirely accurate way to think about frequentist probability is that there must be a numerator and a denominator in order to get a probability. The reference class of events you are considering provides the denominator (the total number of events), and the particular event you are considering provides the numerator (the number of times that particular event occurs in the reference class). If you flip a coin 100 times and get 37 heads and are interested in heads, the reference class is coin flips. Then the probability of flipping a coin and getting heads is 37/100.1 Key to all of this is that the frequentist thinks there is no such thing as the probability of a single event happening without referring to some reference class. So returning to the Linda problem, there is no such thing as a frequentist probability that Linda is a bank teller, or a bank teller and active in the feminist movement. But there is a probability that, out of 100 people who have the same description as Linda, a randomly selected person is a bank teller, or a bank teller and active in the feminist movement.
In addition to the various philosophical differences between the Bayesians and frequentists, the two different schools also naturally lead to two different ways of representing the information contained in probabilities. Since all the frequentist cares about is relative frequencies, the natural way to represent probabilities in her mind is through, well, frequencies. The actual number representing the probability (e.g. p=.23) can always be calculated later as an afterthought. The Bayesian approach, on the other hand, leads to thinking in terms of percentages. If probability is just a degree of belief, why not represent it as such with, say, a number between 0 and 1? A "natural frequentist" would store all probabilistic information as frequencies, carefully counting each time an event occurs, while a "natural Bayesian" would store it as a single number - a percentage - to be updated later using Bayes' theorem as information comes in. It wouldn't be surprising if the natural frequentist had trouble operating with Bayesian probabilities. She thinks in terms of frequencies, but a single number isn't a frequency - it has to be converted to a frequency in some way that allows her to keep counting events accurately if she wants to use this information.
So if it isn't obvious by now, we're natural frequentists! How many of you thought you were Bayesians?2 Gigerenzer's experiments show that changing the representation of uncertainty from probabilities to frequencies drastically alters the results, making humans appear much better at statistical reasoning than previously thought. It's not that we use heuristics that are systematically biased, our native architecture for representing uncertainty is just better at working with frequencies. When uncertainty isn't represented using frequencies, our brains have trouble and fail in apparently predictable ways. To anyone who had Bayes' theorem intuitively explained to them, it shouldn't be all that surprising that we're natural frequentists. How does Eliezer intuitively explain Bayes' theorem? By working through examples using relative frequencies. This is also a relatively common tactic in undergraduate statistics textbooks, though it may only be because undergraduates typically are taught only the frequentist approach to probability.
So the heuristics and biases program doesn't catalog the various ways that we fail to reason correctly under uncertainty, but it does catalog the various ways we reason incorrectly about probabilities that aren't in our native representation. This could be because of our native architecture just not handling alternate representations of probability effectively, or it could be because when our native architecture starts having trouble, our brains automatically resort to using the heuristics Tversky and Kahneman were talking about. The latter seems more plausible to me in light of the other ways the brain approximates when it is forced to, but I'm still fairly uncertain. Gigerenzer has his own explanation that unifies the two domains under a specific theory of natural frequentism and has performed further experiments to back it up. He calls his explanation a theory of probabilistic mental models.3 I don't completely understand Gigerenzer's theory and his extra evidence seems to equally support the hypothesis that our brains are using heuristics when probabilities aren't represented as frequencies, but I will say that Gigerenzer's theory does have elegance going for it. Capturing both groups of phenomena with a unified theory makes Occam smile.
These experiments aren't the only reason to believe that we're actually pretty good at reasoning under uncertainty or that we're natural frequentists; there are theoretical reasons as well. First, consider evolutionary theory. If lower order animals are decent at statistical reasoning, we would probably expect that humans are good as well since we all evolved from the same source. It is possible that a lower order species developed its statistical reasoning capabilities after its evolutionary path diverged from the ancestors of humans, or that statistical reasoning became less important for humans or their recent ancestors and thus evolution committed less resources to the process. But the ability to reason under uncertainty seems so useful, and if any species has the mental capacity to do it, we would expect humans to with their large, adept brains. Gigerenzer summarizes the evidence across species (1993):
Bumblebees, birds, rats, and ants all seem to be good intuitive statisticians, highly sensitive to changes in frequency distributions in their environments, as recent research in foraging behavior indicates (Gallistel, 1990; Real & Caraco, 1986). From sea snails to humans, as John Staddon (1988) argued, the learning mechanisms responsible for habituation, sensitization, and classical and operant conditioning can be described in terms of statistical inference machines. Reading this literature, one wonders why humans seem to do so badly in experiments on statistical reasoning.
Indeed. Should we really expect that bumblebees, birds, rats, and ants are better intuitive statisticians than us? It's certainly possible, but it doesn't appear all that likely, a priori.
Theories of the brain from cognitive science provide another reason why we would be adept at reasoning under uncertainty and a reason why would be natural frequentists. The connectionist approach to the study of the human mind suggests that the brain encodes information by making literal physical connections between neurons, represented on the mental level by connections between concepts. So, for example, if you see a dog and notice that it's black, a connection between the concept "dog" and the concept "black" is made in a very literal sense. If connectionism is basically correct, then probabilistic reasoning shouldn't be all that difficult for us. For example, if the brain needs to calculate the probability that any given dog is black, it can just count the number of connections between "dog" and "black" and the number of connections between "dog" and colors other than black.4 Voila! Relative frequencies. As Nobel Prize winning economist Vernon Smith puts it (2008, pg 208):
Hayek's theory5 - that mental categories are based on the experiential relative frequency of coincidence between current and past perceptions - seems to imply that our minds should be good at probability reasoning.
It also suggests that we would be natural frequentists since our brains are quite literally built on relative frequencies.
So both evidence and theory point in the same direction. The research of Tversky and Kahneman, among others, originally showed that humans were fairly bad at reasoning under uncertainty. It turns out much of this is an artifact of how their subjects were asked to think about uncertainty. Having subjects think in terms of frequencies basically eliminates biases in experiments, suggesting that humans are just natural frequentists - their minds are structured to handle probabilities in terms of frequencies rather than in proportions or percentages. Only when we are working with information represented in a form difficult for our native architecture to handle do we appear to be using heuristics. Theoretical considerations from both evolutionary biology and cognitive science buttress both claims - that humans are both natural frequentists and not so bad at handling uncertainty - at least when thinking in terms of frequencies.
1: To any of you who raised an eyebrow, I did it on purpose ;).
2: Just to be clear, I am not arguing that since we are natural frequentists, the frequentist approach to probability is the correct approach.
3: What seems to be the key paper is the second link in the Google search I linked to. I haven't read it yet, so I won't really get into his theory here.
4: I acknowledge that this is a very simplified example and a gross simplification of the theory.
5: Friedrich Hayek, another Nobel Prize winning economist, independently developed the connectionist paradigm of the mind culminating in his 1952 book The Sensory Order. I do recommend reading Hayek's book, but not without a reading group of some sort. It's short but dense and very difficult to parse - let's just say Hayek is not known for his prose.
Gigerenzer, Gerd. 1993. "The Bounded Rationality of Probabilistic Mental Models." in Manktelow, K. I., & Over, D. E. eds. Rationality: Psychological and philosophical perspectives. (pp. 284-313). London: Routledge. Preprint available online.
Smith, Vernon L. 2008. Rationality in Economics. Cambridge: Cambridge UP.
Tversky, A., and D. Kahneman. 1983. "Extensional versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment." Psychological Bulletin 90(4):293-315. Available online.
This frequencies vs. probabilities issue is one of the controversies in heuristics & biases research, and Kahneman, Tversky, and others dispute Gigerenzer's take. For instance, here (pdf, see p. 8) is what Gilovich & Griffin have to say in their introduction to the book Heuristics and Biases (emphasis added):... (read more)
Thanks for the links!
Reading both of these papers sent me on a trawl through the literature. Kahneman's paper sparked it. He reported an experiment to test Gigerenzer's hypothesis that recasting the problem in terms of frequencies reduces or eliminates the prevalence of the conjunction fallacy (pg 586-7). The results, according to Kahneman, confirm his hypothesis that frequencies just cue people to think in terms of set relations and reject Gigerenzer's hypothesis that people are natural frequentists.
In an unpublished manuscript (that I did not read), Hertwick (1997) challenged Kahneman's interpretation of his experiment, arguing that the language Kahneman used led subjects to misinterpret "and" to be disjunctive - i.e., when Kahneman asked how many out of 1000 women fitting the Linda description are "feminists and active bank tellers," the subjects interpreted this as "how many are feminists and how many bank tellers." Hertwick ran an experiment to test this, confirming his hypothesis.
Then Hertwick, Kahneman, and Mellers wrote an "adversarial collaboration" where Mellers arbitrated the disagreement between Hertwick and Kahneman (pdf). They ... (read more)
Here's my take on 'Linda'. Don't know if anyone else has made the same or nearly the same point, but anyway I'll try to be brief:
Let E be the background information about Linda, and imagine two scenarios:
Now obviously P(A | E) is greater than or equal to P(B | E). However, I think it's quite reasonable for P(A | E + "someone told us A") to be less than P(B | E + "someone told us B"), because if someone merely tells us A, we don't have any particularly good reason to believe them, but if someone tells us B then it seems likely that they know this particular Linda, that they're thinking of the right person, and that they know she's a bank teller.
However, the 'frequentist version' of the Linda experiment cannot possibly be (mis?)-interpreted in this way, because we're fixing the statements A and B and considering a whole bunch of people who are obviously unrelated to the processes by which the statements were formed.
(Perhaps there's an analogo... (read more)
When considering the initial probability question regarding Linda, it strikes me that it isn't really a choice between a single possibility and two conjoined possibilities.
Giving a person an exclusive choice between "bank teller" OR "bank teller and feminist" will make people imply that "bank teller" means "bank teller and not feminist".
So both choices are conjoined items, it's just that one of them is hidden.
Given this, people may not be so incorrect after all.
Edit: People should probably stop giving this post points, given Sniffnoy's linking of a complete destruction of this objection :)
This has already been addressed in Conjunction Controversy.
Okay, this is silly, but I can't for the life of me figure out what that number and those systems of representation are.
That's a nice educational post.
I want to pick a nit, not with you, but with Gigerenzer and " ... the conjunction fallacy can be mitigated by changing the wording of the question ... " Unfortunately, in real life, the problems come at you the way they do, and you need to learn to deal with it.
I say that rational thinking looks like this: pencil applied to paper. Or a spreadsheet or other decision support program in use. We can't do this stuff in our heads. At least I can't. Evolution didn't deliver arithmetic, much less rationality. We teach arithmetic to kids, slowly and painstakingly. We had better start teaching them rationality. Slowly and painstakingly, not like a 1-hour also-mentioned.
And, since I have my spreadsheet program open, I will indeed convert probabilities into frequencies and look at the world both ways, so my automatic processors can participate. But, I only trust the answers on the screen. My brain lies to me too often.
Once again, thanks Matt. Well done!
This post is very much in accordance with my experience. I've never been able to develop any non-frequentist intuitions about probability, and even simple problems sometimes confuse me until I translate them into explicit frequentist terms. However, once I have the reference classes clearly defined and sketched, I have no difficulty following complex arguments and solving reasonably hard problems in probability. (This includes the numerous supposed paradoxes that disappear as soon as the problem is stated in clear frequentist terms.)
Moreover, I'm still at... (read more)
More evidence in favor of the hypothesis that we are natural frequentists: Even though I try to think like a Bayesian, I am mentally incapable of assigning probabilities without perfect information. The best I can do is offer a range of probabilities that seem reasonable, whereas a real Bayesian should be able to average the probabilities that it would assign given all possible sets of perfect information, weighted by likelihood.
One thing I do not like about research such as this is that they only report mean overconfidence values. How can you conclude from the mean that everyone is overconfident? Perhaps only halve of the subjects are very overconfident while the other halve are less overconfident.
Just give us all the data! It should be easy to visualize in a scatter plot, for example.
That's why when I try to explain Bayes' Theorem to people I draw it out as a venn diagram and put a whole bunch of little stick figures inside.
This is very interesting. Thanks!
When I heard about Bayesian and Frequentist, I thought Bayesianism made more intuitive sense because I was used to working with random variables. (It's the intuition of someone more used to chalkboards than lab coats.)
I wonder if people appear to be "natural frequentists" because we are better at thinking "how many" than "how likely." "How likely" is a prediction about the future; and it's easy to think that your wishes and hopes can influence the future.
Maybe what we're really bad at is internalizing the Ergodic Theorem -- understanding that if something happens a low percentage of the time, then it's unlikely to happen.
It's Bayes' theorem, not Baye's theorem :).
Nitpick: shouldn't the answer to the disease question be 1/50.95 (instead of 1/50)? One person has the disease, and 49.95 (5% of 999) are false positives. So there are 50.95 total positives.
One problem I have with Gigerenzer is that he often seems to be deliberately obtuse by taking philosophical positions that don't allow for the occurance of errors.
For instance, time and time again he states that, because one-time events don't have probabilities (in a frequentist interpretation), it's incoherent to say someone's confidence judgment is "wrong". As Kahneman points out, this violates common sense - surely we should consider someone wrong if they say they are 99.99% confident that tomorrow the sun will turn into a chocolate cake.
In an... (read more)
Somewhat relevant Op-Ed from the New York Times today on the limits of behavioural economics. The author, George Loewenstein, is considered a leading authority in the field.
I'm not sure if I buy that the "frequentist" explanations (as in the disease testing example) are best characterized by being frequentist -- it seems to me that they are just stating the problem and the data in a more relevant way to the question that's being asked. Without those extra statements, you have to decode the information down from a more abstract level.
Upvoted; however, I thought the post could have been considerably shortened, with the last 13 or so paragraphs combined into just one or two.
I see how Gigerenzer's point is relevant to some of the biases such as the conjunction fallacy.
But what about other biases such as the anchoring bias?
Is there really a way to show that all fallacious reasoning in K&T's experiments is due to presentation of information in terms of probabilities as opposed to frequencies?
Some of the same questions were dealt with by Epsilon-Upsilon: Conjunction Controversty or How They Nail It Down