(Content note: The experimental results on the availability bias, one of the biases described in Tversky and Kahneman's original work, have been overdetermined, which has led to at least two separate interpretations of the heuristic in the cognitive science literature. These interpretations also result in different experimental predictions. The audience probably wants to know about this. This post is also intended to measure audience interest in a tradition of cognitive scientific research that I've been considering describing here for a while. Finally, I steal from Scott Alexander the section numbering technique that he stole from someone else: I expect it to be helpful because there are several inferential steps to take in this particular article, and it makes it look less monolithic.)
Related to: Availability
The availability heuristic is judging the frequency or probability of an event, by the ease with which examples of the event come to mind.
This statement is actually slightly ambiguous. I notice at least two possible interpretations with regards to what the cognitive scientists infer is happening inside of the human mind:
- Humans think things like, “I found a lot of examples, thus the frequency or probability of the event is high,” or, “I didn’t find many examples, thus the frequency or probability of the event is low.”
- Humans think things like, “Looking for examples felt easy, thus the frequency or probability of the event is high,” or, “Looking for examples felt hard, thus the frequency or probability of the event is low.”
I think the second interpretation is the one more similar to Kahneman and Tversky’s original description, as quoted above.
And it doesn’t seem that I would be building up a strawman by claiming that some adhere to the first interpretation, intentionally or not. From Medin and Ross (1996, p. 522):
The availability heuristic refers to a tendency to form a judgment on the basis of what is readily brought to mind. For example, a person who is asked whether there are more English words that begin with the letter ‘t’ or the letter ‘k’ might try to think of words that begin with each of these letters. Since a person can probably think of more words beginning with ‘t’, he or she would (correctly) conclude that ‘t’ is more frequent than ‘k’ as the first letter of English words.
And even that sounds at least slightly ambiguous to me, although it falls on the other side of the continuum between pure mental-content-ism and pure phenomenal-experience-ism that includes the original description.
You can’t really tease out this ambiguity with the older studies on availability, because these two interpretations generate the same prediction. There is a strong correlation between the number of examples recalled and the ease with which those examples come to mind.
For example, consider a piece of the setup in Experiment 3 from the original paper on the availability heuristic. The subjects in this experiment were asked to estimate the frequency of two types of words in the English language: words with ‘k’ as their first letter, and words with ‘k’ as their third letter. There are twice as many words with ‘k’ as their third letter, but there was bias towards estimating that there are more words with ‘k’ as their first letter.
How, in experiments like these, are you supposed to figure out whether the subjects are relying on mental content or phenomenal experience? Both mechanisms predict the outcome, "Humans will be biased towards estimating that there are more words with 'k' as their first letter." And a lot of the later studies just replicate this result in other domains, and thus suffer from the same ambiguity.
If you wanted to design a better experiment, where would you begin?
Well, if we think of feelings as sources of information in the way that we regard thoughts as sources of information, then we should find that we have some (perhaps low, perhaps high) confidence in the informational value of those feelings, as we have some level of confidence in the informational value of our thoughts.
This is useful because it suggests a method for detecting the use of feelings as sources of information: if we are led to believe that a source of information has low value, then its relevance will be discounted; and if we are led to believe that it has high value, then its relevance will be augmented. Detecting this phenomenon in the first place is probably a good place to start before trying to determine whether the classic availability studies demonstrate a reliance on phenomenal experience, mental content, or both.
Fortunately, Wänke et al. (1995) conducted a modified replication of the experiment described above with exactly the properties that we’re looking for! Let’s start with the control condition.
In the control condition, subjects were given a blank sheet of paper and asked to write down 10 words that have ‘t’ as the third letter, and then to write down 10 words that begin with the letter ‘t’. After this listing task, they rated the extent to which words beginning with a ‘t’ are more or less frequent than words that have ‘t’ as the third letter. As in the original availability experiments, subjects estimated that words that begin with a ‘t’ are much more frequent than words with a ‘t’ in the third position.
Like before, this isn’t enough to answer the questions that we want to answer, but it can’t hurt to replicate the original result. It doesn’t really get interesting until you do things that affect the perceived value of the subjects’ feelings.
Wänke et al. got creative and, instead of blank paper, they gave subjects in two experimental conditions sheets of paper imprinted with pale, blue rows of ‘t’s, and told them to write 10 words beginning with a ‘t’. One condition was told that the paper would make it easier for them to recall words beginning with a ‘t’, and the other was told that the paper would make it harder for them to recall words beginning with a ‘t’.
Subjects made to think that the magic paper made it easier to think of examples gave lower estimates of the frequency of words beginning with a ‘t’ in the English language. It felt easy to think of examples, but the experimenter made them expect that by means of the magic paper, so they discounted the value of the feeling of ease. Their estimates of the frequency of words beginning with 't' went down relative to the control condition.
Subjects made to think that the magic paper made it harder to think of examples gave higher estimates of the frequency of words beginning with a ‘t’ in the English language. It felt easy to recall examples, but the experimenter made them think it would feel hard, so they augmented the value of the feeling of ease. Their estimates of the frequency of words beginning with 't' went up relative to the control condition.
(Also, here's a second explanation by Nate Soares if you want one.)
So, at least in this sort of experiment, it looks like the subjects weren’t counting the number of examples they came up with; it looks like they really were using their phenomenal experiences of ease and difficulty to estimate the frequency of certain classes of words. This is some evidence for the validity of the second interpretation mentioned at the beginning.
So we know that there is at least one circumstance in which the second interpretation seems valid. This was a step towards figuring out whether the availability heuristic first described by Kahneman and Tversky is an inference from amount of mental content, or an inference from the phenomenal experience of ease of recall, or something else, or some combination thereof.
As I said before, the two interpretations have identical predictions in the earlier studies. The solution to this is to design an experiment where inferences from mental content and inferences from phenomenal experience cause different judgments.
Schwarz et al. (1991, Experiment 1) asked subjects to list either 6 or 12 situations in which they behaved either assertively or unassertively. Pretests had shown that recalling 6 examples was experienced as easy, whereas recalling 12 examples was experienced as difficult. After listing examples, subjects had to evaluate their own assertiveness.
As one would expect, subjects rated themselves as more assertive when recalling 6 examples of assertive behavior than when recalling 6 examples of unassertive behavior.
But the difference in assertiveness ratings didn’t increase with the number of examples. Subjects who had to recall examples of assertive behavior rated themselves as less assertive after reporting 12 examples rather than 6 examples, and subjects who had to recall examples of unassertive behavior rated themselves as more assertive after reporting 12 examples rather than 6 examples.
If they were relying on the number of examples, then we should expect their ratings for the recalled quality to increase with the number of examples. Instead, they decreased.
It could be that it got harder to come up with good examples near the end of the task, and that later examples were lower quality than earlier examples, and the increased availability of the later examples biased the ratings in the way that we see. Schwarz acknowledged this, checked the written reports manually, and claimed that no such quality difference was evident.
It would still be nice if we could do better than taking Schwarz’s word on that though. One thing you could try is seeing what happens when you combine the methods we used in the last two experiments: vary the number of examples generated and manipulate the perceived relevance of the experiences of ease and difficulty at the same time. (Last experiment, I promise.)
Schwarz et al. (1991, Experiment 3) manipulated the perceived value of the experienced ease or difficulty of recall by having subjects listen to ‘new-age music’ played at half-speed while they worked on the recall task. Some subjects were told that this music would make it easier to recall situations in which they behaved assertively and felt at ease, whereas others were told that it would make it easier to recall situations in which they behaved unassertively and felt insecure. These manipulations make subjects perceive recall experiences as uninformative whenever the experience matches the alleged impact of the music; after all, it may simply be easy or difficult because of the music. On the other hand, experiences that are opposite to the alleged impact of the music are considered very informative.
When the alleged effects of the music were the opposite of the phenomenal experience of generating examples, the previous experimental results were replicated.
When the alleged effects of the music match the phenomenal experience of generating examples, then the experience is called into question, since you can’t tell if it’s caused by the recall task or the music.
When this is done, the pattern that we expect from the first interpretation of the availability heuristic holds. Thinking of 12 examples of assertive behavior makes subjects rate themselves as more assertive than thinking of 6 examples of assertive behavior; mutatis mutandis for unassertive examples. When people can’t rely on their experience, they fall back to using mental content, and instead of relying on how hard or easy things feel, they count.
Under different circumstances, both interpretations are useful, but of course, it’s important to recognize that a distinction exists in the first place.