Abstract: "Close-call counterfactuals", claims of what could have almost happened but didn't, can be used to either defend a belief or to attack it. People have a tendency to reject counterfactuals as improbable when those counterfactuals threaten a belief (the "I was not almost wrong" defense), but to embrace counterfactuals that support a belief (the "I was almost right" defense). This behavior is the strongest in people who score high on a test for need for closure and simplicity. Exploring counterfactual worlds can be used to reduce overconfidence, but it can also lead to logically incoherent answers, especially in people who score low on a test for need for closure and simplicity.

”I was not almost wrong”

Dr. Zany, the Nefarious Scientist, has a theory which he intends to use to achieve his goal of world domination. ”As you know, I have long been a student of human nature”, he tells his assistant, AS-01. (Dr. Zany has always wanted to have an intelligent robot as his assistant. Unfortunately, for some reason all the robots he has built have only been interested in eradicating the color blue from the universe. And blue is his favorite color. So for now, he has resorted to just hiring a human assistant and referring to her with a robot-like name.)

”During my studies, I have discovered the following. Whenever my archnemesis, Captain Anvil, shows up at a scene, the media will very quickly show up to make a report about it, and they prefer to send the report live. While this is going on, the whole city – including the police forces! - will be captivated by the report about Captain Anvil, and neglect to pay attention to anything else. This happened once, and a bank was robbed on the other side of the city while nobody was paying any attention. Thus, I know how to commit the perfect crime – I simply need to create a diversion that attracts Captain Anvil, and then nobody will notice me. History tells us that this is the inevitable outcome of Captain Anvil showing up!”

But to Dr. Zany's annoyance, AS-01 is always doubting him. Dr. Zany has often considered turning her into a brain-in-a-vat as punishment, but she makes the best tuna sandwiches Dr. Zany has ever tasted. He's forced to tolerate her impundence, or he'll lose that culinary pleasure.

”But Dr. Zany”, AS-01 says. ”Suppose that some TV reporter had happened to be on her way to where Captain Anvil was, and on her route she saw the bank robbery. Then part of the media attention would have been diverted, and the police would have heard about the robbery. That might happen to you, too!”

Dr. Zany's favorite belief is now being threatened. It might not be inevitable that Captain Anvil showing up will actually let criminals elsewhere act unhindered! AS-01 has presented a plausible-sounding counterfactual, ”if a TV reporter had seen the robbery, then the city's attention had been diverted to the other crime scene”. Although the historical record does not show that Dr. Zany's theory would have been wrong, the counterfactual suggests that he might be almost wrong.

There are now three tactics that Dr. Zany can use to defend his belief (warrantedly or not):

1. Challenge the mutability of the antecedent. Since AS-01's counterfactual is of the form ”if A, then B”, Dr. Zany could question the plausibility of A.

”Baloney!” exclaims Dr. Zany. ”No TV reporter could ever have wandered past, let alone seen the robbery!”

That seems a little hard to believe, however.

2. Challenge the causal principles linking the antecedent to the consequent. Dr. Zany is not logically required to accept the ”then” in ”if A, then B”. There are always unstated background assumptions that he can question.

”Humbug!” shouts Dr. Zany. ”Yes, a reporter could have seen the robbery and alerted the media, but given the choice of covering such a minor incident and continuing to report on Captain Anvil, they would not have cared about the bank robbery!”

3. Concede the counterfactual, but insist that it does not matter for the overall theory.

”Inconceivable!” yelps Dr. Zany. ”Even if the city's attention would have been diverted to the robbery, the robbers would have escaped by then! So Captain Anvil's presence would have allowed them to succeed regardless!”


Empirical work suggests that it's not only Dr. Zany who wants to stick to his beliefs. Let us for a moment turn our attention away from supervillains, and look at professional historians and analysts of world politics. In order to make sense of something as complicated as world history, experts resort to various simplifying strategies. For instance, one explanatory schema is called neorealist balancing. Neorealist balancing claims that ”when one state threatens to become too powerful, other states coalesce against it, thereby preserving the balance of power”. Among other things, it implies that Hitler's failure was predetermined by a fundemental law of world politics.

Tetlock (1998, 1999, 2001) surveyed a number of experts on history and international affairs. He surveyed the experts on their commitment to such theories, and then posed them counterfactuals that conflicted with some of those theories. For instance, counterfactuals that conflicted with neorealist balancing were "If Goering had continued to concentrate Luftwaffe attacks on British airbases and radar stations, Germany would have won the Battle of Britain" and "If the German military had played more effectively on the widespread resentment of local populations toward the Stalinist regime, the Soviet Union would have collapsed". The experts were then asked to indicate the extent to which they agreed with the antecedent, the causal link, and the claim that the counterfactual being true would have substantially changed world history.

As might have been expected, experts who subscribed to a certain theory were skeptical about counterfactuals threatening the theory, and employed all three defenses more than experts who were less committed. Denying the possibility of the antecedent was done the least frequently, while questioning the overall impact of the consequence was the most common defense.

By itself, this might not be a sign of bias – the experts might have been skeptical of a counterfactual because they had an irrational commitment to theory, but they might also have acquired a rational commitment to the theory because they were skeptical of counterfactuals challenging it. Maybe neorealist balancing is true, and the experts subscribing to it are right to defend it. What's more telling is that Tetlock also measured each expert's need for closure. It turned out that if an expert had – like Dr. Zany – had a high need for closure, then they were also more likely to employ defenses questioning the validity of a counterfactual.

Theoretically, high need-for-closure individuals are characterized by two tendencies: urgency which inclines them to 'seize' quickly on readily available explanations and to dismiss alternatives and permanence which inclines them to 'freeze' on these explanations and persist with them even in the face of formidable counterevidence. In the current context, high need-for-closure individuals were hypothesized to prefer simple explanations that portray the past as inevitable, to defend these explanations tenaciously when confronted by dissonant close-call counterfactuals that imply events could have unfolded otherwise, to express confidence in conditional forecasts that extend these explanations into the future, and to defend disconfirmed forecasts from refutation by invoking second-order counterfactuals that imply that the predicted events almost happened. (Tetlock, 1998)

If two people draw different conclusions from the same information, then at least one of them is wrong. Tetlock is careful to note that the data doesn't reveal whether it's the people with a high or a low need for closure who are closer to the truth, but we probably presume that at least some of them were being exceedingly defensive.

This gives us reason to be worried. If some past occurrance seems to fit perfectly into our pet theory, have we considered the case that we might be almost wrong? And if we have, are we exhibiting an excess need for closure by rushing to its defense, or are we being excessively flexible by unnecessarily admitting that something might have gone differently? We should only admit to being almost wrong if we really were almost wrong, after all. Is the cognitive style we happen to have the one that's the most correlated with getting the right answers?

”I was almost right.”

Having defended his theory against AS-01's criticism, Dr. Zany puts the theory into use by starting a fire in a tar factory, diverting Captain Anvil. While the media is preoccupied with reporting the story, Dr. Zany tries to steal the bridge connecting Example City to the continent. Unfortunately, a City Police patrol boat happens to see this, alerting the police forces (as well as Captain Anvil) to the site. Dr. Zany is forced to withdraw.

”Damn that unanticipated patrol boat!”, Dr. Zany swears. ”If only it had not appeared, my plan would have worked perfectly!” AS-01 wisely says nothing, and avoids being turned into a brain-in-a-vat.


Tetlock (1998, 1999) surveyed a number of experts and asked them to make predictions about world politics. Afterwards, when it was clear whether or not the predictions had turned out to be true, he surveyed them again. It turned out that like Dr. Zany, most of the mistaken experts had not seriously updated their beliefs:

Not surprisingly, experts who got it right credited their accuracy to their sound reading of the 'basic forces' at play in the situation. Across issue domains they assigned average ratings between 6.5 and 7.6 on a 9-point scale where 9 indicates maximum confidence. Perhaps more surprisingly, experts who got it wrong were almost as likely to believe that their reading of the political situation was fundamentally sound. They assigned average ratings from 6.3 to 7.1, across domain (Tetlock, 1998)

Many of the experts defended their reading of the situation by saying that they were ”almost right”. For instance, experts who predicted in 1988 that the Communist Party of the Soviet Union would grow increasingly authortarian during the next five years were prone to claiming that the hardliner coup of 1991 had almost succeeded, and if that had happened, their prediction would have become true. Similarly, observers of South Africa who in 1988-1989 expected white minority rule to continue or to become increasingly oppressive were likely to believe that were it not for two exceptional individuals – de Klerk and Mandela - in key leadership roles, South Africa could easily have gone the other way.

In total, Tetlock (1999) identified five logically defensible strategies for defending one's forecasts, all of which were employed by at least some of the experts. Again, it was the experts who scored the highest on a need for closure who tended to employ such defenses the most:

  1. The antecedent (the A in the ”if A, then B”) was never adequately satisfied. Experts might insist ”if we had properly implemented deterrence or reassurance, we could have averter war” or ”if real shock therapy had been practiced, we could have averted the nasty bout of hyperinflation”.
  2. Although the specified antecedent was satisfied, something unexpected happened, severing the normal link of cause and effect. Experts might declare that rapid privatization in state industries would have led to the predicted surge in economic growth, but only if the government had pursued prudent monetary policies.
  3. Although the predicted outcome did not occur, it ”almost occurred” and would have, if not for some inherently unpredictable outside shock.
  4. Although the predicted outcome has not yet occurred, it eventually will and we just need to be more patient (hardline communists may yet prevail in Moscow, the EU might still fall apart).
  5. Although the relevant conditions were satisfied and the predicted outcome never came close to occurring and never will, this should not be held against the framework that inspired the forecast. Forecasts are inherently unreliable and politics is hard to predict: just because the framework failed once didn't mean that it's wrong.

Again, Tetlock is careful to note that although it's tempting to dismiss all such maneuvering as ”transparently defensive post hocery”, it would be wrong to automatically interpret it as bias. Each defense is a potentially valid objection, and might have been the right one to make, in some cases.

But there are also signs of bias. Tetlock (1999) makes a number of observations from his data, noting – among other things – that the stronger the original confidence in a claim, the more likely an expert is to employ various defenses. That would suggest that big threats to an expert's claims of expertise activate many defenses. He also notes that the experts who'd made failed predictions and employed strong defenses tended not to update their confidence, while the experts who'd made failed predictions but didn't employ strong defenses did update.

Again, some of the experts were probably right to defend themselves, but some of them were probably biased and only trying to protect their reputations. We should ourselves be alert when we catch ourselves using one of those techniques to defend our predictions.

Exploring counter-factual worlds: a possible debiasing technique.

”Although my plan failed this time, I was almost right! The next time, I'll be prepared for any patrol boats!”, Dr. Zany mutters to himself, back in the safety of his laboratory.

”Yes, it was an unlikely coincidence indeed”, AS-01 agrees. ”Say, I know that such coincidences are terribly unlikely, but I started wondering – what other coincidence might have caused your plan to fail? Are there any others that we should take into account before the next try?”

”Hmm....”, Dr. Zany responds, thoughtfully.


Tetlock & Lebow (2001) found that experts became less convinced of the inevitability of a scenario when they were explicitly instructed to consider various events that might have led to a different outcome. In two studies, experts were told to consider the Cuban Missile Crisis and, for each day of the crisis, estimate the subjective probability that the crisis would end either peacefully or violently. When experts were told to consider various provided counterfactuals suggesting a different outcome, they thought that a violent outcome remained a possibility for longer than the experts who weren't given such counterfactuals to consider. The same happened when the experts weren't given ready-made counterfactuals, but were told to generate alternative scenarios of their own, at an increasingly fine resolution.

The other group (n = 34) was asked to consider (1) how the set of more violent endings of the Cuban missile crisis could be disaggregated into subsets in which violence remained localized or spread outside the Caribbean, (2) in turn differentiated into subsets in which violence claimed fewer or more than 100 casualties, and (3) for the higher casualty scenario, still more differentiated into a conflict either limited to conventional weaponry or extending to nuclear. (Tetlock & Lebow, 2001)

Again, the experts who generated counterfactual scenarios became less confident of their predictions. The experts with a low need for closure adjusted their opinions considerably more than the ones with a high need for closure.

However, this technique has its dangers as well. More fine-grained scenarios offer an opportunity to tell more detailed stories, and humans give disproportionate weight to detailed stories. Unpacking the various scenarios leads us to giving too much weight for the individual subscenarios. You might remember the example of ”the USA and Soviet Union suspending relations” being considered less probable than ”the Soviet Union invades Poland, and the USA and Soviet Union suspend relations”, even though the second scenario is a subset of the first. People with a low need for closure seem to be especially suspectible to this, while people with a high need for closure tend to produce more logically coherent answers. This might be considered an advantage of the high need for closure – an unwillingness to engage in extended wild goose chases, and thus assign minor scenarios a disproportionately high probability

References

Tetlock, P.E. (1998) Close-Call Counterfactuals and Belief-System Defenses: I Was Not Almost Wrong But I Was Almost Right. Journal of Personality and Social Psychology, Vol. 75, No. 3, 639-652. http://faculty.haas.berkeley.edu/tetlock/Vita/Philip%20Tetlock/Phil%20Tetlock/1994-1998/1998%20Close-Call%20Counterfactuals%20and%20Belief-System%20Defenses.pdf

Tetlock, P.E. (1999) Theory-Driven Reasoning About Plausible Pasts and Probable Futures in World Politics: Are We Prisoners of Our Preconceptions? American Journal of Political Science, Vol. 43, No. 2, 335-366. http://www.uky.edu/AS/PoliSci/Peffley/pdf/Tetlock%201999%20AJPS%20Theory-driven%20World%20Politics.pdf

Tetlock, P.E. & Lebow, R.N. (2001) Poking Counterfactual Holes in Covering Laws: Cognitive Styles and Historical Reasoning. American Political Science Review, Vol. 95, No. 4. http://faculty.haas.berkeley.edu/tetlock/vita/philip%20tetlock/phil%20tetlock/1999-2000/2001%20poking%20counterfactual%20holes%20in%20covering%20laws....pdf

New Comment
40 comments, sorted by Click to highlight new comments since: Today at 6:09 AM

This reminds me of the notion of a premortem — an exercise for identifying weaknesses in a plan by asking you to imagine that you have implemented your plan and that it has failed. Why did it fail? By envisioning your future self conducting a postmortem on the failed plan, you might be able to identify weaknesses without going to all the expense of implementing it and failing.

So, feedback requested on the Dr. Zany thing. Made an otherwise dry post more interesting to read, or pointless and distracting?

I liked it, it's always good to have an example, it makes reading more pleasant, and it helps to update with the info (not to understand what you are saying, but to propagate the new knowledge into your belief network). Or at least it does to me.

I entirely agree with this, but am writing a comment in addition to an upvote of the above just to make my appreciation towards Kaj for the Dr. Zany thing more salient to him.

It worked. Thanks. :-)

Liked it, but thought there was a bit too much of it (e.g. the blue-minimizing robot reference). Might be better to leave out details that don't help you illustrate your point, lest the reader get a sense that your example isn't going anywhere.

Check. It was a good idea, but could've and should've been shortened. I skimmed it, and my guess is that it could've been set up in one or two paragraphs if only the minimum of required detail had been included.

I disagree that there were too many extraneous details about Dr. Zany in this post. They didn't detract from the value of the post and, at least, the blue-minimizing robot reference was funny.

Challenge the mutability of the antecedent. Since AS-01's counterfactual is of the form ”if A, then B”, Dr. Zany could question the plausibility of A.

brain balks at "mutability", stumbles over "antecedent", sprains ankle on "counterfactual"

”Baloney!” exclaims Dr. Zany. ”No TV reporter could ever have wandered past, let alone seen the robbery!”

Oh, I get it! Brain jumps up and down with glee.

I found it helpful and entertaining.

Well, it did make me more likely to accept your theory. After all...

"More fine-grained scenarios offer an opportunity to tell more detailed stories, and humans give disproportionate weight to detailed stories."

Would not have gotten through the post without it.

Made the post more interesting to read.

It was good, but mostly because it provided some nice examples.

I found it awkward and weird, especially the bits with the assistant. But it looks like you and some readers had fun, so I don't mind if you keep doing it.

Made it more interesting, to me at least. I probably wouldn't have had the focus to get through the article otherwise.

It was cute, particularly the conclusion.

Annoying, at least for me.

Somewhat more interesting. I'm not sure about the brain in a vat threat-- I'm pulled between "that's really creepy if it's read literally-- is she trapped there?" and the tone of "this is lightweight humor, you're supposed to read it with most of your empathy turned off".

"this is lightweight humor, you're supposed to read it with most of your empathy turned off"

Ever since reading The Sword of Good, I've lost the ability to do that. Not that I was ever great at it. I wonder if that's happened to anyone else. /irrelevant tangent

Spoiler alert: People who've seen the movie Silent Hill might enjoy this comment. Vg jnf jrveq ubj n fznyy er-nffrffzrag bs gur cerzvfrf bs gur svyz znqr zr tb sebz "lrnuuuuu tb Fngna, xvyy nyy gubfr ovtbgrq Puevfgvna fgnaq-vaf!" gb "bu zl Tbq V jnf whfg purrevat nf gur qrivy znffnperq n ohapu bs cngurgvp fpnerq puhepu crbcyr va fbzr tbqsbefnxra yvzob jbeyq, gung'f nobhg nf Rivy nf vg trgf, jul nz V fb sevttva' vzcerffvbanoyr".

the tone of "this is lightweight humor, you're supposed to read it with most of your empathy turned off".

This is hardly an original thought, but I wonder how much work this does in ethical thought experiments.

Worked. Good character. Do please use him again.

Slightly distracting, but worth it.

(On the other hand, the female assistant set off some gender-stereotypes-icky warning bells in me. Despite your obvious attempts at avoiding this. I'm probably just projecting some unfavourable impressions of the source material on your adaptation, but you may still want to be aware of this possibility.)

Oddly, I made the assistant female partially because having a mad scientist with a male assistant (Igor fetch brains, master...) felt too stereotypical.

I also considered making Dr. Zany himself female, but there the character felt so strongly male that my brain just wouldn't go along with it.

Strongly agreed. That aspect also seemed bad because the assistant being labeled like a robot while funny sounded almost like some form of symbolic objectification. And the fact that her main talent she's valued for is the ability to make sandwiches rather than say help tweak the ray guns or Tesla coils strongly didn't help matters.

But note that her being valued mostly for the sandwiches says more about Dr. Zany's attitude than about how things really are, and she's strongly implied to be the more competent of the two...

Stories are a huge way we make sense of the world. Adding a narrative sequence to the post did helped me keep track of the ideas and how they fit together.

Worked great for me. I like to browse the articles during coffee breaks, and anything that helps me to easily grab on to an idea through example in "reality" rather than slow down and parse out the abstract concepts in my head makes the read go altogether easier :)

[-][anonymous]12y20

I like stories illustrating facts but I think that their usefulness is inversely proportional to the technical complexity (and maybe inferential distance) of the writing. So here it wasn't a problem but it probably wouldn't make much difference if you skipped it.

Definitely felt it made the article more attention grabbing and easier to follow.

Datapoint: I skim-read the article today. I am interested in the overal thesis [need for closureness, counterfactual modelling etc]. I skipped the Dr. Zany story. 

Tetlock (1998) also provided me with the two funniest-sounding sentences that I've read in a while (though that doesn't make them incorrect). Commenting on the "concede the counterfactual, but insist that it does not matter for the overall theory" defense:

This defense, which is the most popular of the three, is designated a second-order counterfactual inasmuch as it undoes the undoing of the original close-call counterfactual. Second-order counterfactuals allow for deviations from reality but minimize the significance of the deviations by invoking additional causal forces that soon bring events in the simulated counterfactual world back toward the observed historical path.

He also notes that the experts who'd made failed predictions and employed strong defenses tended to update their confidence, while the experts who'd made failed predictions but didn't employ strong defenses did update.

I assume there's a 'not' missing in one of those.

Fixed, thanks.

Good abstract. Feels obvious, But then, there are some nice details that didn't come out in the summary, like "imagine different outcomes" being risky due to story-thinking. It's worth reading the whole thing for the excellent speculations on how and why.

[-][anonymous]12y30

Feels obvious until it gets to using counterfactuals for possible debiasing and the dangers in the technique — this was quite interesting for me.

Also interesting are the "five logically defensible strategies".

Quibble:

If two people draw different conclusions from the same information, then at least one of them is wrong.

But different conclusions can be compatible?

You used the terms "high-need-for-closure" and "low-need-for-closure" quite a lot in you essay. Would you mind explaining what they mean and/or linking to somewhere I can look up the definition, since I am not familiar with them?

Could you maybe also explain what those tests are and how they work (the ones to measure need for closure)?

I quoted this excerpt from Tetlock (1998) in the post, did you not find it helpful?

Theoretically, high need-for-closure individuals are characterized by two tendencies: urgency which inclines them to 'seize' quickly on readily available explanations and to dismiss alternatives and permanence which inclines them to 'freeze' on these explanations and persist with them even in the face of formidable counterevidence. In the current context, high need-for-closure individuals were hypothesized to prefer simple explanations that portray the past as inevitable, to defend these explanations tenaciously when confronted by dissonant close-call counterfactuals that imply events could have unfolded otherwise, to express confidence in conditional forecasts that extend these explanations into the future, and to defend disconfirmed forecasts from refutation by invoking second-order counterfactuals that imply that the predicted events almost happened.

The papers I referenced (see the end of the post for links) briefly discuss how this was measured. For instance, Tetlock 1998:

The Need for Closure Scale was adapted from a longer scale developed by Kruglanski and Webster (1996) and included the following eight items: "I think that having clear rules and order at work is essential for success"; "Even after I have made up my mind about something, I am always eager to consider a different opinion' l; " I dislike questions that can be answered in many different ways''; "I usually make important decisions quickly and confidently"; "When considering most conflict situations, I can usually see how both sides could be right"; "It is annoying to listen to someone who cannot seem to make up his or her mind"; "I prefer interacting with people whose opinions are very different from my own"; and "When trying to solve a problem I often see so many possible options that it is confusing."1 Experts rated their agreement with each item on 9-point disagree-agree scales.

Incidentally, Tetlock 1998 also used another measure that's theoretically different from need-for-closure, namely integrative complexity.

Integrative complexity should be negatively correlated with need for closure. It implies not only a willingness to entertain contradictory ideas but also an interest in generating, testing, and revising integrative cognitions that specify flexible boundary conditions for contradictory hypotheses. The two constructs— need for closure and integrative complexity—are, however, measured in very different ways: a traditional selfreport personality scale in the case of need for closure and an open-ended thought-sampling procedure requiring content analysis in the case of integrative complexity. Given the severe problems of method variance that have bedeviled cognitivestyle research over the past 50 years (Streufert, 1997), a major advantage of the present study is the inclusion of methodologically dissimilar but conceptually overlapping procedures for assessing cognitive style. [...]

The integrative complexity measure was derived from open-ended responses to a request to reflect on 20th-century history. The following question was used: "Did the 20th century have to be as violent as it has been?" We assured respondents that we understood that many books had been written on this subject and that many more undoubtedly would be written. Our goal was just to get a quick sense for the factors that they deemed most decisive in shaping the general course of events (the sort of shorthand answer they might give a respected colleague in a different discipline at a social occasion). Integrative complexity was coded on a 7-point scale in which scores of 1 were given to statements that identified only causal forces that increased or decreased the likelihood of the specified outcomes (e.g., "Nationalism and mass production of weapons guaranteed disaster"), scores of 3 were assigned to statements that identified causal forces with contradictory effects (e.g., ' "Iwentieth-century history will be remembered not only for the destructive forces unleashed—totalitarianism and weapons of mass destruction— but also for the initial steps toward global governance"), scores of 5 were assigned to statements that tried to integrate two contradictory causal forces (e.g., "Wars can be caused by being too tough or too soft and it is really hard to strike the right balance—that's the big lesson of 20th-century diplomacy"), and scores of 7 placed the problem of integrating causal forces into a broader systemic frame of reference (e.g., "YJU could argue that we got off lucky and escaped nuclear war or that we were incredibly unlucky and wound up with a holocaust that was the product of one man's obsession. How you look at it is a matter of personal temperament and philosophy. My guess is that we are running about par for the course"). Intercoder agreement was .85 between two raters who were blind to both the hypotheses being tested and to the sources of the material.

Tetlock 1998 found the results between the two measures of need-for-closure and integrative complexity to be highly similar, in that individuals with a high need-for-closure scored low on integrative complexity, and vice versa. He combined the results of the two measures to a single variable in analyzing the results of that study. IIRC, in later studies only need-for-closure was used.

Fleshed out and increased the weight of heuristics to counter this kind of thing in response to this.

Upvoted; just enough math.