Attempts to Debias Hindsight Backfire!

by Gram_Stone4 min read13th Jun 20169 comments

13

Hindsight Bias
Personal Blog

(Content note: A common suggestion for debiasing hindsight: try to think of many alternative historical outcomes. But thinking of too many examples can actually make hindsight bias worse.)

Followup to: Availability Heuristic Considered Ambiguous

Related to: Hindsight Bias

I.

Hindsight bias is when people who know the answer vastly overestimate its predictability or obviousness, compared to the estimates of subjects who must guess without advance knowledge.  Hindsight bias is sometimes called the I-knew-it-all-along effect.

The way that this bias is usually explained is via the availability of outcome-related knowledge. The outcome is very salient, but the possible alternatives are not, so the probability that people claim they would have assigned to an event that has already happened gets jacked up. It's also known that knowing about hindsight bias and trying to adjust for it consciously doesn't eliminate it.

This means that most attempts at debiasing focus on making alternative outcomes more salient. One is encouraged to recall other ways that things could have happened. Even this merely attenuates the hindsight bias, and does not eliminate it (Koriat, Lichtenstein, & Fischhoff, 1980; Slovic & Fischhoff, 1977).

II.

Remember what happened with the availability heuristic when we varied the number of examples that subjects had to recall? Crazy things happened because of the phenomenal experience of difficulty that recalling more examples caused within the subjects.

You might imagine that, if you recalled too many examples, you could actually make the hindsight bias worse, because if subjects experience alternative outcomes as difficult to generate, then they'll consider the alternatives less likely, and not more.

Relatedly, Sanna, Schwarz, and Stocker (2002, Experiment 2) presented participants with a description of the British–Gurkha War (taken from Fischhoff, 1975; you should remember this one). Depending on conditions, subjects were told either that the British or the Gurkha had won the war, or were given no outcome information. Afterwards, they were asked, “If we hadn’t already told you who had won, what would you have thought the probability of the British (Gurkhas, respectively) winning would be?”, and asked to give a probability in the form of a percentage.

Like in the original hindsight bias studies, subjects with outcome knowledge assigned a higher probability to the known outcome than subjects in the group with no outcome knowledge. (Median probability of 58.2% in the group with outcome knowledge, and 48.3% in the group without outcome knowledge.)

Some subjects, however, were asked to generate either 2 or 10 thoughts about how the outcome could have been different. Thinking of 2 alternative outcomes slightly attenuated hindsight bias (median down to 54.3%), but asking subjects to think of 10 alternative outcomes went horribly, horribly awry, increasing the subjects' median probability for the 'known' outcome all the way up to 68.0%!

It looks like we should be extremely careful when we try to retrieve counterexamples to claims that we believe. If we're too hard on ourselves and fail to take this effect into account, then we can make ourselves even more biased than we would have been if we had done nothing at all.

III.

But it doesn't end there.

Like in the availability experiments before this, we can discount the informational value of the experience of difficulty when generating examples of alternative historical outcomes. Then the subjects would make their judgment based on the number of thoughts instead of the experience of difficulty.

Just before the 2000 U.S. presidential elections, Sanna et al. (2002, Experiment 4) asked subjects to predict the percentage of the popular vote the major candidates would receive. (They had to wait a little longer than they expected for the results.)

Later, they were asked to recall what their predictions were.

Control group subjects who listed no alternative thoughts replicated previous results on the hindsight bias.

Experimental group subjects who listed 12 alternative thoughts experienced difficulty and their hindsight bias wasn't made any better, but it didn't get worse either.

(It seems the reason it didn't get worse is because everyone thought Gore was going to win before the election, and for the hindsight bias to get worse, the subjects would have to incorrectly recall that they predicted a Bush victory.)

Other experimental group subjects listed 12 alternative thoughts and were also made to attribute their phenomenal experience of difficulty to lack of domain knowledge, via the question: "We realize that this was an extremely difficult task that only people with a good knowledge of politics may be able to complete. As background information, may we therefore ask you how knowledgeable you are about politics?" They were then made to provide a rating of their political expertise and to recall their predictions.

Because they discounted the relevance of the difficulty of recalling 12 alternative thoughts, attributing it to their lack of political domain knowledge, thinking of 12 ways that Gore could have won introduced a bias in the opposite direction! They recalled their original predictions for a Gore victory as even more confident than they actually, originally were.

We really are doomed.


Fischhoff, B. (1975). Hindsight is not equal to foresight: the effect of outcome knowledge on judgment under uncertainty. Journal of Experimental Psychology: Human Perception and Performance, 1, 288–299.

Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). Reasons for confidence. Journal of Experimental Psychology: Human Learning and Memory, 6, 107–118.

Sanna, L. J., Schwarz, N., & Stocker, S. L. (2002). When debiasing backfires: Accessible content and accessibility experiences in debiasing hindsight through mental simulations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 497–502.

Slovic, P., & Fischhoff, B. (1977). On the psychology of experimental surprises. Journal of Experimental Psychology: Human Perception and Performance, 3, 544–551.

13

9 comments, sorted by Highlighting new comments since Today at 4:19 PM
New Comment

I wonder if hindsight bias is related to overfitting?

Why do you wonder that? If you care to elaborate.

Overfitting is a problem of "thinking" the data given is more strongly determined than it is. Hindsight bias seems similar - we feel that things couldn't have turned out other than the way they actually did.

Just as "Does induction work?" and "Why does induction work?" are two different questions, we can distinguish the questions "Do people fail to seek alternative explanations?" and "Why do people fail to seek alternative explanations?" The answer to the first is quite obviously "Yes," and the second is harder to answer, as questions of that form often are.

Before trying to answer it, it seems like a good idea to point out that overfitting is a simple name for a complex phenomenon, and that overfitting as a mistake is probably overdetermined. Statistical inference in general seems far more cognitively complex than the tasks issued to subjects in hindsight bias experiments. So there may very well be multiple explanations to the question "Why do people overfit the data?"

But, I agree with you that both phenomena seem like an example of a failure to seek alternative explanations; specifically, a failure based on the quiet inference that seeking an alternative explanation doesn't seem necessary in each case.

We see in the article that people infer from the difficulty of seeking alternative explanations that those alternatives are less plausible and that their focal explanation is more plausible. We also see that when you make them discount the relevance of this difficulty, thinking of alternatives has the effect that we initially and naively thought that it would: the more alternatives you imagine, the less determined the past seems.

I haven't gotten into it yet, but we use these phenomenal experiences of ease and difficulty to make many, many other judgments: judgments of truth, credibility, beauty, frequency, familiarity, etc. A particularly interesting result is that merely writing a misleading question in a difficult-to-read font is enough to increase the probability that the subject will answer the question correctly.

It seems the reason overfitting happens at all is because there is no clear reason at the time to seek an alternative explanation, besides the outside view. "But it fits so well!" the one says. The experience is so very fluent. What is there to discourage the statistician? Nothing until they use the model on untrained data. They believe that the model is accurate right up until the moment that their perception of the model becomes disfluent.

And at this point it begins to look related to the hindsight bias experiments, at least to me. But I also don't think that they are especially related, because my answer to a question like "Is overfitting related to availability bias?" or "Is overfitting related to the planning fallacy?" would probably be quite similar. I would maintain that it's the deep cog-sci results about the effect of phenomenal experiences on judgment that are the important relation, and not the more superficial details like whether or not the task has to do with inventing alternative hypotheses.

Hopefully that makes sense.

Hm. I tend to think about overfitting as a mistake in picking the level of complexity and about the hindsight bias as a mistake about (prior) probabilities. However you have a point: if you have a choice of models to forecast the future and you pick a particular model while suffering from the hindsight bias, this can be seen as overfitting: you "in-sample" error will be low, but your out-of-sample error will be high.

Another way to look at this is to treat it as confusion between a sample estimate and a true population value. The hindsight bias will tell you that the particular realization that you're observing (=sample) is how it should have been and always will be (=population).

We really are doomed.

Not so fast. You are only looking at results of attempts to debias people without telling them explicitly that that's what is being done to them.

None of the studies even started on teaching people to substitute a judgement of some other factor for guessing probability directly.

I follow a generally useful policy to never estimate probability directly. (I also call this the no-pulling-numbers-out-of-my-ass policy.)

I do believe this is teachable, and it forces the subject to follow an explicit strategy to come up with their estimate.

Depending on the choice of strategy, you'd of course get all sorts of mistaken answers - but at least, they would not be influenced by this particular bias :)

Meta:

the third section is written in a way that makes it hard to parse. E.g.:

Control group subjects who listed no alternative thoughts replicated previous results on the hindsight bias.

The previous setup was different, and it's unclear what it means to have "replicated results". Sure, it's possible to guess, but it obfuscates your writing a lot. Or if you want people to guess, just put a question in the article telling them to do so.

The previous setup was different, and it's unclear what it means to have "replicated results".

I meant that the control condition replicated (reproduced, demonstrated once more) the result found in previous publications on the hindsight bias, namely that subjects view known outcomes as far more inevitable than they would have before the outcome was known. Maybe "results from previous studies" would fix this...? I thought it was clear enough, but I would.

"results from previous studies" doesn't fix it;

neither does "subjects view known outcomes as far more inevitable";

Be concrete!

e.g. "Later, they were asked to recall what their predictions were. The subjects, on average, remembered a higher confidence in their past prediction that Bush would win, compared to what had reported before the election."

Again - be concrete. By and large, avoid any dangling references to "previous results", or even previous sections of the same article.