Judgment Under Uncertainty summaries, Part 1: Representativeness

Tesseract

Judgment Under Uncertainty: Heuristics and Biases is one of the foundational works on the flaws of human reasoning, and as such gets cited a lot on Less Wrong — but it's also rather long and esoteric, which makes it inaccessible to most Less Wrong users. Over the next few months, I'm going to attempt to distill the essence of the studies that make up the collection, in an attempt to convey the many interesting bits without forcing you to slog through the 500 or so pages of the volume itself. This post summarizes sections I (Introduction) and II (Representativeness).

By way of background: Judgment Under Uncertainty is a collection of 35 scientific papers and articles on how people make decisions with limited information, edited by Daniel Kahneman, Amos Tversky, and Paul Slovic. Kahneman and Tversky are the most recognizable figures in the area and the names most associated with the book, but only 12 of the studies are their work. It was first published in 1982 (my version is from 1986), and most studies were performed in the '70s — so note that this is not up-to-date research, and I can't say for sure what the current scientific consensus on the topic is. Judgement Under Uncertainty focuses on the divergence of human intuition from optimal reasoning, so it uses a lot of statistics and probability to define what's optimal. The details are actually pretty fascinating if you have the time and inclination (and it's also something of an education in study design and statistics), and this series of posts by no means replaces the book, but I intend to provide something of a shorthand version.

That said, on to the summaries! Title of the chapter/paper in quotes, sections organized as in the book and in bold. (Incomplete preview here, if you want to follow along.)

Introduction

"Judgment Under Uncertainty: Heuristics and Biases", Tversky and Kahneman (1974)

This is the most important paper in the book, and it's short and publicly available (PDF), so I'd encourage you to just go read it now. It reviews the representativeness and availability heuristics and the various errors in reasoning they produce, and introduces the idea of anchoring. Since it reviews some of the material contained in Judgment Under Uncertainty, there's overlap between the material it covers and the material I'm going to cover in this and the other posts. As it's already a boiled-down version of the heuristics literature, I won't attempt to summarize it here.

Representativeness

"Belief in the law of small numbers", Tversky and Kahneman, 1971 (PDF)

People expect that samples will have much less variability and be much more representative of the population than they actually are. This manifests in expecting that two random samples will be very similar to each other and that large observations in one direction will be canceled out by large observations in the other rather than just being diluted. Tversky and Kahneman call this the "law of small numbers" — the belief that the law of large numbers applies to small samples as well.

One consequence of this in science is that failing to account for variability means that studies will be way underpowered. Tversky and Kahneman surveyed psychologists on the probability that a significant result from an experiment on 20 subjects would be confirmed by a replication using 10 subjects — most estimated around 85%, when it was actually around 48%. (Incidentally, a study they cite reviewing published results in psychology estimates that the power was .18 for small effects and .48 for effects of medium size.) The gist of this is that one might very well find a real significant result, attempt to replicate it using a smaller sample on the belief that the small sample will be very representative of the population, and miss entirely due to lack of statistical power. Worse, when given a hypothetical case of a student who ran such a replication and got an insignificant result, many of the surveyed suggested he should try to find an explanation for the difference between the two groups — when it was due entirely to random variation.

"Subjective probability: A judgment of representativeness", Kahneman and Tversky, 1972 (PDF)

People judge the likelihood of events based on representativeness rather than actual probability. Representativeness is a bit hard to pin down, but involves reflecting the characteristics of the population and the process that generated it — so the likelihood of six children having the gender order B G B B B B is judged less than of them having the order G B G B B G (because it doesn't reflect the proportion of boys in the population) and likewise for B B B G G G versus G B B G B G (because it doesn't reflect the randomness of gender determination).

People also completely ignore the effect of sample size on the probability of an outcome (e.g. the likelihood of the proportion of male babies being between .55 and .65 for N births), because it doesn't affect the representativeness of that outcome. Repeat: Sample size has no effect at all. People expect the probability of the example I gave to be around 15% whether it's N=10 or N=1000, when it's actually ~20% for N=10 and zero for N=1000. (The graphs on pages 42-43 of the PDF can get this across better than I can — the black line is the predicted probability for all sample sizes, and the bars are the real probability for each.)

"On the psychology of prediction", Kahneman and Tversky, 1973

Judging by representativeness makes people completely ignore base rates (i.e. prior probabilities). Subjects asked to judge (on the basis of a personality sketch) either how similar someone was to the typical student in a graduate program or how likely they were to be a student in that program produced identical results (correlation of .97), with no regard whatsoever for the judged prior probability of a graduate student being in a given area (correlation of -.65) — which would be permissible if they thought the sketches were such strong evidence that they overwhelmed existing information, but when asked, subjects expected predictions based on personality sketches to be accurate only 23% of the time. In a followup, Kahneman and Tversky manipulated beliefs about how predictive the evidence was (telling one group that such predictions were accurate 55% of the time and the other 27%) and found that while subjects were slightly less confident in the low-predictiveness group (though they were still 56% sure of being right), they ignored base rates just as completely in either condition. In this and in several other experiments in this chapter, people fail to be regressive in their predictions — that is, the weight that they assign to prior probability versus new evidence is unaffected by the expected accuracy of the new evidence.

An interesting specific point with regard to new information replacing rather than supplementing prior probabilities: while people can make judgments about base rates in the abstract, completely useless specific information can cause this ability to disappear. e.g.: If asked for the probability that an individual randomly selected from a group of 70 engineers and 30 lawyers is a lawyer, they'll say 30%, but if given utterly useless information about a specific person —

Dick is a 30-year-old man. He is married with no children. A man of high ability and high motivation, he promises to be quite successful in his field. He is well liked by his colleagues.

— they'll go back to 50-50.

The rest of the chapter contains several other experiments in which people egregiously ignore base rates and assign far too much predictive validity to unreliable evidence.

People make predictions (e.g. future GPA) more confidently when input (e.g. test scores) is highly consistent, but highly consistent data tends to result from highly intercorrelated variables, and you can predict more accurately given independent variables than intercorrelated ones — so high consistency increases confidence while decreasing accuracy. What's more, people predict extreme outcomes (dazzling success, abject failure) much more confidently than they predict middling ones, but they're also more likely to be wrong when predicting extreme outcomes (because intuitive predictions aren't nearly regressive enough), so people are most confident when they're most likely to be wrong. Kahneman and Tversky call this "the illusion of validity".

There's a bit about regression to the mean, but I intend to cover that in a separate post.

"Studies of representativeness", Maya Bar-Hillel

This paper attempts to determine what specific features cause a sample to be judged more or less representative, rather than relying on the black-box approach of asking subjects to assess representativeness themselves. It's pretty esoteric and difficult to summarize, so I won't get into it. There's a flowchart summarizing the findings.

"Judgments of and by representativeness", Tversky and Kahneman

The first section of this chapter breaks down representativeness judgement into four cases:

1. "M is a class and X is a value of a variable defined in this class." e.g. A representative value for the age of first marriage.

2. "M is a class and X is an instance of that class." e.g. Robins are representative birds.

3. "M is a class and X is a subset of M." e.g. Psychology students are representative of all students.

4. "M is a (causal) system an X is a (possible) consequence." e.g. An act being representative of a person.

The second section is an examination of the effect of the representativeness heuristic on the evaluation of compound probabilities. This experiment has been written about on Less Wrong before, so I'll be brief: given two possible outcomes, one of which is highly representative (in sense 4) and one of which is highly non-representative, subjects rank their conjunction as being more probable than the non-representative outcome alone, even though any compound probability must be less than either of its components. (For example, "Reagan will provide federal support for unwed mothers and cut support to local governments" was rated more probable than "Reagan will provide federal support for unwed mothers.") Statistical training doesn't help.

This brings us up to page 100, and the end of the Representativeness section. Next post: "Causality and attribution".