Note: This post is part of my series of posts on forecasting, but this particular post may be of fairly limited interest to many LessWrong readers. I'm posting it here mainly for completeness. As always, I appreciate feedback.
In the course of my work looking at forecasting for MIRI, I repeatedly encountered discussions of how to communicate forecasts. In particular, a concern that emerged repeatedly was the clear communication of the uncertainty in forecasts. Nate Silver's The Signal and the Noise, in particular, focused quite a bit on the virtue of clear communication of uncertainty, in contexts as diverse as financial crises, epidemiology, weather forecasting, and climate change.
In this post, I pull together discussions from a variety of domains about the communication of uncertainty, and also included my overall impression of the findings.
Summary of overall findings
- In cases where forecasts are made and used frequently (the most salient example being temperature and precipitation forecasts) people tend to form their own models of the uncertainty surrounding forecasts, even if you present forecasts as point estimates. The models people develop are quite similar to the correct ones, but still different in important ways.
- In cases where forecasts are made more rarely, as with forecasting rare events, people are more likely to have simpler models that acknowledge some uncertainty but are less nuanced. In these cases, acknowledging uncertainty becomes quite important, because wrong forecasts of such events can lead to a loss of trust in the forecasting process, and can lead people to ignore correct forecasts later.
- In some cases, there are arguments for modestly exaggerating small probabilities to overcome specific biases that people have that cause them to ignore low-probability events.
- However, the balance of evidence suggests that forecasts should be reported as honestly as possible, and all uncertainty should be clearly acknowledged. If the forecast does not acknowledge uncertainty, people are likely to either use their own models of uncertainty, or lose faith in the forecasting process entirely if the forecast turns out to be far off from reality.
Probabilities of adverse events and the concept of the cost-loss ratio
A useful concept developed for understanding the utility of weather forecasting is the cost-loss model (Wikipedia). Consider that if a particular adverse event occurs, and we do not take precautionary measures, the loss incurred is L, whereas if we do take precautionary measures, the cost is C, regardless of whether the event occurs. An example: you're planning an outdoor party, and the adverse event in question is rain. If it rains during the event, you experience a loss of L. If you knew in advance that it would rain, you'd move the venue indoors, at a cost of C. Obviously, C < L for you to even consider the precautionary measure.
The ratio C/L is termed the cost-loss ratio and describes the probability threshold above which it makes sense to take the precautionary measure.
One way of thinking of the utility of weather forecasting, particularly in the context of forecasting adverse events (rain, snow, winds, and more extreme events) is in terms of whether people have adequate information to make correct decisions based on their cost-loss model. This would boil down to several questions:
- Is the probability of the adverse event communicated with sufficient clarity and precision that people who need to use it can plug it into their cost-loss model?
- Do people have a correct estimate of their cost-loss ratio (implicitly or explicitly)?
As I discussed in an earlier post, The Weather Channel has admitted to explicitly introducing wet bias into its probability-of-precipitation (PoP) forecasts. The rationale they offered could be interpreted as a claim that people overestimate their cost-loss ratio. For instance, a person may think his cost-loss ratio for precipitation is 0.2 (20%), but his actual cost-loss ratio may be 0.05 (5%). In this case, in order to make sure people still make the "correct" decision, PoP forecasts that fall between 0.05 and 0.2 would need to inflated to 0.2 or higher. Note that TWC does not introduce wet bias at higher probabilities of precipitation, arguably because (they believe) that this is well above the cost-loss ratio for most situations.
Words of estimative probability
In 1964, Sherman Kent (Wikipedia), the father of intelligence analysis, wrote an essay titled "Words of Estimative Probability" that discussed the use of words to describe probability estimates, and how different people may interpret the same word as referring to very different ranges of probability estimates. The concept of words of estimative probability (Wikipedia), along with its acronym, WEP, is now standard jargon in intelligence analysis.
Some related discussion of the use of words to convey uncertainty in estimates can be found in the part of this post where I excerpt from the paper discussing the communication of uncertainty in climate change.
Other general reading
- Nate Silver's The Signal and the Noise is worth reading in full if this topic interests you.
- The essay Communicating Uncertainty: Fulfilling the Duty to Inform by Baruch Fischhoff does a great job of reviewing communication uncertainty and how decision-makers can do a better job of eliciting uncertainty information from subject matter experts.
#1: The case of weather forecasting
Weather forecasting has some features that make it stand out among other forecasting domains:
- Forecasts are published explicitly and regularly: News channels and newspapers carry forecasts every day. Weather websites update their forecasts on at least an hourly basis, sometimes even faster, particularly if there are unusual weather developments. In the United States, The Weather Channel is dedicated to 24 X 7 weather news coverage.
- Forecasts are targeted at and consumed by the general public: This sets weather forecasting apart from other forms of forecasting and prediction. We can think of prices in financial markets and betting markets as implicit forecasts. But they are targeted at the niche audiences that pay attention to them, not at everybody. The mode of consumption varies. Some people just get their forecasts from the weather reports in their local TV and radio channel. Some people visit the main weather websites (such as the National Weather Service, The Weather Channel, AccuWeather, or equivalent sources in other countries). Some people have weather reports emailed to them daily. As smartphones grow in popularity, weather apps are an increasingly common way for people to keep tabs on the weather. The study on communicating weather uncertainty (discussed below) found that in the United States, people in its sample audience saw weather forecasts an average of 115 times a month. Even assuming heavy selection bias in the study, people in the developed world probably encounter a weather forecast at least once a day.
- Forecasts are used to drive decision-making: Particularly in places where weather fluctuations are significant, forecasts play an important role in event planning for individuals and organizations. At the individual level, this can include deciding whether to carry an umbrella, choosing what clothes to wear, deciding whether to wear snow boots, deciding whether conditions are suitable for driving, and many other small decisions. At the organizational level, events may be canceled or relocated based on forecasts of adverse weather. In locations with variable weather, it's considered irresponsible to plan an event without checking the weather forecast.
- People get quick feedback on whether the forecast was accurate: The next day, people know whether what was forecast transpired.
The upshot: people are exposed to weather forecasts, pay attention to them, base decisions on them, and then come to know whether the forecast was correct. This happens on a daily basis. Therefore, they have both the incentives and the information to form their own mental model of the reliability and uncertainty in forecasts. Note also that because the reliability of forecasts varies considerably by location, people who move from one location to another may take time adjusting to the new location. (For instance, when I moved to Chicago, I didn't pay much attention to weather forecasts in the beginning, but soon learned that the high variability of the weather combined with reasonable accuracy of forecasts made then worth paying attention to. Now that I'm in Berkeley, I probably pay too much attention to the forecast relative to its value, given the stability of weather in Berkeley).
With these general thoughts in mind, let's look at the paper Communicating Uncertainty in Weather Forecasts: A Survey of the U. S. Public by Rebecca E. Morss, Julie L. Demuth, and Jeffrey K. Lazo. The paper is based on a survey of about 1500 people in the United States. The whole paper is worth a careful read if you find the issue fascinating. But for the benefits of those of you who find the issue somewhat interesting but not enough to read the paper, I include some key takeaways from the paper.
Temperature forecasts: the authors find that even though temperature forecasts are generally made as point estimates, people interpret these point estimates as temperature ranges. The temperature ranges are not even necessarily centered at the point estimates. Further, the range of temperatures increases with the forecast horizon. In other words, people (correctly) realize that forecasts made for three days later have more uncertainty attached to them than forecasts made for one day later. In other words, peoples understanding of the nature of forecast uncertainty in temperatures is correct, at least in the broad qualitative sense.
The authors believe that people arrive at these correct models through their own personal history of seeing weather forecasts and evaluating how they compare with the reality. Clearly, most people don't keep close track of how forecasts compare with the reality, but they are still able to get the general idea over several years of exposure to weather forecasts. The authors also believe that since the accuracy of weather forecasts varies by region, people's models of uncertainty may also differ by region. However, the data they collect does not allow for a test of this hypothesis. For more, read Sections 3a and 3b of the paper.
Probability-of-precipitation (PoP) forecasts: The authors also look at people's perception of probability-of-precipitation (PoP) forecasts. The correct meteorological interpretation of PoP is "the probability that precipitation occurs given these meteorological conditions." The frequentist operationalization of this would be "the fraction (situations with meteorological conditions like this where precipitation does occur)/(situations with meteorological conditions like this)." To what extent are people aware of this meaning? One of the questions in the survey elicits information on this front:
TABLE 2. Responses to Q14a, the meaning of the forecast
“There is a 60% chance of rain for tomorrow” (N 1330).
It will rain tomorrow in 60% of the region. 16% of respondents
It will rain tomorrow for 60% of the time. 10% of respondents
It will rain on 60% of the days like tomorrow.* 19% of respondents
60% of weather forecasters believe that it will rain tomorrow. 22% of respondents
I don’t know. 9% of respondents
Other (please explain). 24% of respondents
* Technically correct interpretation, according to how PoP forecasts are verified, as interpreted by Gigerenzer et al. (2005).
So about 19% of participants choose the correct meteorological interpretation. However, of the 24% who offer other explanations, many suggest that they are not so much interested in the meteorological interpretation as in how this affects their decision-making. So it might be the case that even if people aren't aware of the frequentist definition, they are still using the information approximately correctly as it applies to their lives. One such application would be a comparison with the cost-loss ratio to determine whether to engage in precautionary measures. Note that, as noted earlier in the post, it may be the case that people overestimate their own cost-loss ratio, but this is a distinct problem from incorrectly interpreting the probability.
I also found the following resources, that I haven't had the time to read through, but that might help people interested in exploring the issue in more detail (I'll add more to this list if I find more):
- Completing the Forecast: Characterizing and Communicating Uncertainty for Better Decisions Using Weather and Climate Forecasts (2006), open book by the National Academies Press.
#2: Extreme rare events (usually weather-related) that require significant response
For some rare events (such as earthquakes) we don't know how to make specific predictions of their imminent arrival. But for others, such as hurricanes, cyclones, blizzards, tornadoes, and thunderstorms, specific probabilistic predictions can be made. Based on these predictions, significant action can be undertaken, ranging from everybody deciding to stock up on supplies and stay at home, to a mass evacuation. Such responses are quite costly, but the loss they would avert if the event did occur is even bigger. In the cost-loss framework discussed above, we are dealing with both a high cost and a loss that could be much higher. However, unlike the binary case discussed above, the loss spans more of a continuum: the amount of loss that would occur without precautionary measures depends on the intensity of the event. Similarly, the costs span a continuum: the cost depends on the extent of precautionary measures taken.
Since both the cost and loss are huge, it's quite important to get a good handle on the probability. But should the correct probability be communicated, or should it be massaged or simply converted to a "yes/no" statement? We discussed earlier the (alleged) problem of people overestimating their cost-loss ratio, and therefore not taking adequate precautionary measures, and how the Weather Channel addresses this by deliberately introducing a wet bias. But the stakes are much higher when we are talking of shutting down a city for a day or ordering a mass evacuation.
Another complication is that the rarity of the event means that people's own mental models haven't had a lot of data to calibrate the accuracy and reliability of forecasts. When it comes to temperature and precipitation forecasts, people have years of experience to rely on. They will not lose faith in a forecast based on a single occurrence. When it comes to rare events, even a few memories of incorrect forecasts, and the concomitant huge costs or huge losses, can lead people to be skeptical of the forecasts in the future. In The Signal and the Noise, Nate Silver extensively discusses the case of Hurricane Katrina and the dilemmas facing the mayor of New Orleans that led him to delay the evacuation of the city, and led many people to ignore the evacuation order even after it was announced.
A direct strike of a major hurricane on New Orleans had long been every weather forecaster’s worst nightmare. The city presented a perfect set of circumstances that might contribute to the death and destruction there. [...]
The National Hurricane Center nailed its forecast of Katrina; it anticipated a potential hit on the city almost five days before the levees were breached, and concluded that some version of the nightmare scenario was probable more than forty-eight hours away . Twenty or thirty years ago, this much advance warning would almost certainly not have been possible, and fewer people would have been evacuated. The Hurricane Center’s forecast, and the steady advances made in weather forecasting over the past few decades, undoubtedly saved many lives.
Not everyone listened to the forecast, however. About 80,000 New Orleanians —almost a fifth of the city’s population at the time— failed to evacuate the city, and 1,600 of them died. Surveys of the survivors found that about two-thirds of them did not think the storm would be as bad as it was. Others had been confused by a bungled evacuation order; the city’s mayor, Ray Nagin, waited almost twenty-four hours to call for a mandatory evacuation, despite pleas from Mayfield and from other public officials. Still other residents— impoverished, elderly, or disconnected from the news— could not have fled even if they had wanted to.
Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (pp. 109-110). Penguin Group US. Kindle Edition.
So what went wrong? Silver returns to this later in the chapter:
As Max Mayfield told Congress, he had been prepared for a storm like Katrina to hit New Orleans for most of his sixty-year life. Mayfield grew up around severe weather— in Oklahoma, the heart of Tornado Alley— and began his forecasting career in the Air Force, where people took risk very seriously and drew up battle plans to prepare for it. What took him longer to learn was how difficult it would be for the National Hurricane Center to communicate its forecasts to the general public.
“After Hurricane Hugo in 1989,” Mayfield recalled in his Oklahoma drawl, “I was talking to a behavioral scientist from Florida State. He said people don’t respond to hurricane warnings. And I was insulted. Of course they do. But I have learned that he is absolutely right. People don’t respond just to the phrase ‘hurricane warning.’ People respond to what they hear from local officials. You don’t want the forecaster or the TV anchor making decisions on when to open shelters or when to reverse lanes.”
Under Mayfield’s guidance, the National Hurricane Center began to pay much more attention to how it presented its forecasts. It contrast to most government agencies, whose Web sites look as though they haven’t been updated since the days when you got those free AOL CDs in the mail, the Hurricane Center takes great care in the design of its products, producing a series of colorful and attractive charts that convey information intuitively and accurately on everything from wind speed to storm surge.
The Hurricane Center also takes care in how it presents the uncertainty in its forecasts. “Uncertainty is the fundamental component of weather prediction,” Mayfield said. “No forecast is complete without some description of that uncertainty.” Instead of just showing a single track line for a hurricane’s predicted path, for instance, their charts prominently feature a cone of uncertainty—“ some people call it a cone of chaos,” Mayfield said. This shows the range of places where the eye of the hurricane is most likely to make landfall. Mayfield worries that even this isn’t enough. Significant impacts like flash floods (which are often more deadly than the storm itself) can occur far from the center of the storm and long after peak wind speeds have died down. No people in New York City died from Hurricane Irene in 2011 despite massive media hype surrounding the storm, but three people did from flooding in landlocked Vermont once the TV cameras were turned off.
Mayfield told Nagin that he needed to issue a mandatory evacuation order, and to do so as soon as possible.
Nagin dallied, issuing a voluntary evacuation order instead. In the Big Easy, that was code for “take it easy”; only a mandatory evacuation order would convey the full force of the threat. Most New Orleanians had not been alive when the last catastrophic storm, Hurricane Betsy, had hit the city in 1965. And those who had been, by definition, had survived it. “If I survived Hurricane Betsy, I can survive that one, too. We all ride the hurricanes, you know,” an elderly resident who stayed in the city later told public officials. Reponses like these were typical. Studies from Katrina and other storms have found that having survived a hurricane makes one less likely to evacuate the next time one comes.
The reasons for Nagin’s delay in issuing the evacuation order is a matter of some dispute— he may have been concerned that hotel owners might sue the city if their business was disrupted. Either way, he did not call for a mandatory evacuation until Sunday at 11 A.M. —and by that point the residents who had not gotten the message yet were thoroughly confused . One study found that about a third of residents who declined to evacuate the city had not heard the evacuation order at all. Another third heard it but said it did not give clear instructions. Surveys of disaster victims are not always reliable— it is difficult for people to articulate why they behaved the way they did under significant emotional strain, and a small percentage of the population will say they never heard an evacuation order even when it is issued early and often. But in this case, Nagin was responsible for much of the confusion.
There is, of course, plenty of blame to go around for Katrina— certainly to FEMA in addition to Nagin. There is also credit to apportion— most people did evacuate, in part because of the Hurricane Center’s accurate forecast. Had Betsy topped the levees in 1965, before reliable hurricane forecasts were possible, the death toll would probably have been even greater than it was in Katrina. One lesson from Katrina, however, is that accuracy is the best policy for a forecaster. It is forecasting’s original sin to put politics, personal glory, or economic benefit before the truth of the forecast. Sometimes it is done with good intentions, but it always makes the forecast worse. The Hurricane Center works as hard as it can to avoid letting these things compromise its forecasts. It may not be a concidence that, in contrast to all the forecasting failures in this book, theirs have become 350 percent more accurate in the past twenty-five years alone.
“The role of a forecaster is to produce the best forecast possible,” Mayfield says. It’s so simple— and yet forecasters in so many fields routinely get it wrong.
Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (pp. 138-141). Penguin Group US. Kindle Edition.
Silver notes similar failures of communication of forecast uncertainty in other domains, including exaggeration of the 1976 swine flu outbreak.
I also found a few related papers that may be worth reading if you're interested in understanding the communication of weather-related rare event forecasts:
- Communicating forecast uncertainty in hydro-meteorological forecasts by Maria-Helena Ramos, Thibault Mathevet, Jutta Thielen, and Florian Pappenberger.
- Communicating Risk and Uncertainty: Science, Technology, and Disasters at the Crossroads by Havid´an Rodr´ıguez, Walter D´ıaz, Jenniffer M. Santos, and Benigno E. Aguirre.
#3: Long-run changes that might necessitate policy responses or long-term mitigation or adaptation strategies, such as climate change
In marked contrast to daily weather forecasting as well as extreme rare event forecasting is the forecasting of gradual long-term structural changes. Examples include climate change, economic growth, changes in the size and composition of the population, and technological progress. Here, the general recommendation is clear and detailed communication of uncertainty using multiple formats, with the format tailored to the types of decisions that will be based on the information.
On the subject of communicating uncertainty in climate change, I found the paper Communicating uncertainty: lessons learned and suggestions for climate change assessment by Anthony Patt and Suraje Dessai. The paper is quite interesting (and has been referenced by some of the other papers mentioned in this post).
The paper identifies three general sources of uncertainty:
- Epistemic uncertainty arises from incomplete knowledge of processes that influence events.
- Natural stochastic uncertainty refers to the chaotic nature of the underlying system (in this case, the climate system).
- Human reflexive uncertainty refers to uncertainty in human activity that could affect the system. Some of the activity may be undertaken specifically in response to the forecast.
This is somewhat similar to, but not directly mappable to, the classification of sources of uncertainty by Gavin Schmidt from NASA that I discussed in my post on weather and climate forecasting:
- Initial condition uncertainty: This form of uncertainty dominates short-term weather forecasts (though not necessarily the very short term weather forecasts; it seems to matter the most for intervals where numerical weather prediction gets too uncertain but long-run equilibrating factors haven't kicked in). Over timescales of several years, this form of uncertainty is not influential.
- Scenario uncertainty: This is uncertainty that arises from lack of knowledge of how some variable (such as carbon dioxide levels in the atmosphere, or levels of solar radiation, or aerosol levels in the atmosphere, or land use patterns) will change over time. Scenario uncertainty rises over time, i.e., scenario uncertainty plagues long-run climate forecasts far more than it plagues short-run climate forecasts.
- Structural uncertainty: This is uncertainty that is inherent to the climate models themselves. Structural uncertainty is problematic at all time scales to a roughly similar degree (some forms of structural uncertainty affect the short run more whereas some affect the long run more).
Section 2 of the paper has a general discussion of interpreting and communicating probabilities. One of the general points made is that the more extreme the event, the lower people's mental probability threshold for verbal descriptions of likelihood. For instance, for a serious disease, the probability threshold for "very likely" may be 30%, whereas for a minor ailment, it may be 90% (these numbers are my own, not from the paper). The authors also discuss the distinction between frequentist and Bayesian approaches and claim that the frequentist approach is better suited to assimilating multiple pieces of information, and therefore, frequentist framings should be preferred to Bayesian framings when communicating uncertainty:
As should already be evident, whether the task of estimating and responding to uncertainty is framed in stochastic (usually frequentist) or epistemic (often Bayesian) terms can strongly influence which heuristics people use, and likewise lead to different choice outcomes . Framing in frequentist terms on the one hand promotes the availability heuristic, and on the other hand promotes the simple acts of multiplying, dividing, and counting. Framing in Bayesian terms, by contrast, promotes the representativeness heuristic, which is not well adapted to combining multiple pieces of information. In one experiment, people were given the problem of estimating the chances that a person has a rare disease, given a positive result from a test that sometimes generates false positives. When people were given the problem framed in terms of a single patient receiving the diagnostic test, and the base probabilities of the disease (e.g., 0.001) and the reliability of the test (e.g., 0.95), they significantly over-estimate the chances that the person has the disease (e.g., saying there is a 95% chance). But when people were given the same problem framed in terms of one thousand patients being tested, and the same probabilities for the disease and the test reliability, they resorted to counting patients, and typically arrived at the correct answer (in this case, about 2%). It has, indeed, been speculated that the gross errors at probability estimation, and indeed errors of logic, observed in the literature take place primarily when people are operating within the Bayesian probability framework, and that these disappear when people evaluate problems in frequentist terms [23,58].
The authors offer the following suggestions in the discussion section (Section 4) of their paper:
The challenge of communicating probabilistic information so that it will be used, and used appropriately, by decision-makers has been long recognized. [...] In some cases, the heuristics that people use are not well suited to the particular problem that they are solving or decision that they are making; this is especially likely for types of problems outside their normal experience. In such cases, the onus is on the communicators of the probabilistic information to help people find better ways of using the information, in such a manner that respects the users’ autonomy, full set of concerns and goals, and cognitive perspective.
That these difficulties appear to be most pronounced when dealing with predictions of one-time events, where the probability estimates result from a lack of complete confidence in the predictive models. When people speak about such epistemic or structural uncertainty, they are far more likely to shun quantitative descriptions, and are far less likely to combine separate pieces of information in ways that are mathematically correct. Moreover, people perceive decisions that involve structural uncertainty as riskier, and will take decisions that are more risk averse. By contrast, when uncertainty results from well-understood stochastic processes, for which the probability estimate results from counting of relative frequencies, people are more likely to work effectively with multiple pieces of information, and to take decisions that are more risk neutral.
In many ways, the most recent approach of the IPCC WGI responds to these issues. Most of the uncertainties with respect to climate change science are in fact epistemic or structural, and the probability estimates of experts reflect degrees of confidence in the occurrence of one-time events, rather than measurement of relative frequencies in relevant data sets. Using probability language, rather than numerical ranges, matches people’s cognitive framework, and will likely make the information both easier to understand, and more likely to be used. Moreover, defining the words in terms of specific numerical ranges ensures consistency within the report, and does allow comparison of multiple events, for which the uncertainty may derive from different sources.
We have already mentioned the importance of target audiences in communicating uncertainties, but this cannot be emphasized enough. The IPCC reports have a wide readership so a pluralistic approach is necessary. For example, because of its degree of sophistication, the water chapter could communicate uncertainties using numbers, whereas the regional chapters might use words and the adaptive capacity chapter could use narratives. “Careful design of communication and reporting should be done in order to avoid information divide, misunderstandings, and misinterpretations. The communication of uncertainty should be understandable by the audience. There should be clear guidelines to facilitate clear and consistent use of terms provided. Values should be made explicit in the reporting process” .
However, by writing the assessment in terms of people’s intuitive framework, the IPCC authors need to understand that this intuitive framework carries with it several predictable biases. [...]
The literature suggests, and the two experiments discussed here further confirm, that the approach of the IPCC leaves room for improvement. Further, as the literature suggests, there is no single solution for these potential problems, but there are communication practices that could help. [...]
Finally, the use of probability language, instead of numbers, addresses only some of the challenges in uncertainty communication that have been identified in the modern decision support literature. Most importantly, it is important in the communication process to address how the information can and should be used, using heuristics that are appropriate for the particular decisions. [...] Obviously, there are limits to the length of the report, but within the balancing act of conciseness and clarity, greater attention to full dimensions of uncertainty could likely increase the chances that users will decide to take action on the basis of the new information.