Most models attempting to estimate or predict some elements of the world, will come with their own estimates of uncertainty. It could be the Standard Model of physics predicting the mass of the Z boson as 91.1874 ± 0.0021 GeV, or the rather wider uncertainty ranges of economic predictions.

In many cases, though, the uncertainties in or about the model dwarf the estimated uncertainty in the model itself - especially for low probability events. This is a problem, because people working with models often try to use the in-model uncertainty and adjust it to get an estimate of the true uncertainty. They often realise the model is unreliable, but don't have a better one, and they have a measure of uncertainty already, so surely doubling and tripling this should do the trick? Surely...

The following three cases are going to be my go-to examples for showing what a mistake this can be; they cover three situations: extreme error, being in the domain of a hard science, and extreme negative impact.

## Black Monday

On October 19, 1987, the world's stock markets crashed, shedding a huge value in a very short time. The Dow Jones Industrial Average dropped by 22.61% that day, losing between a fifth and a quarter of its value.

How likely was such an event, according to the prevailing models of the time? This was apparently a 20-sigma event, which means that the event was twenty standard deviations away from the expected behaviour.

Such events have a probability of around 10^{-50} of happening, which in technical mathematical terms is classified as "very small indeed". If every day were compressed into a second, and the stock markets had been running since the big bang... this gives us only about 10^{17} seconds. If every star in the observable universe ran its own stock market, and every day were compressed into a second, and we waited a billion times the lifetime of the universe... then we might expect to observe a twenty sigma event. Once.

No amount of reasonable "adjusting" of the probability could bring 10^{-50} into something plausible. What is interesting, though, is that if we took the standard deviation as the measure of uncertainty, the adjustment is much smaller. One day in a hundred years is a roughly 3x10^{-5} event, which corresponds very roughly to three standard deviations. So "simply" multiplying the standard deviation by seven would have been enough. It seems that adjusting ranges is more effective than adjusting probabilities.

## Castle Bravo

But economics is a soft science. Surely errors couldn't occur in harder sciences, like physics? In nuclear bomb physics, where the US had access to the best brains and the best simulations, and some of the best physical evidence (and certainly a very high motivation to get it right), such errors could not occur? Ok, maybe at the very beginning, but by 1954, the predictions must be very accurate?

The Caste Bravo hydrogen bomb was the highest yield nuclear bomb ever detonated by the United States (though not necessarily intended to be). The yield was predicted to be 6 megatons of TNT, within a maximum range of 4 to 8. It ended up being an explosion of 15 megatons of TNT, around triple the expectation, with fallout landing on inhabited islands(on the Rongelap and Utirik atolls) and spreading across the world, killing at least one person (a Japanese fisherman).

What went wrong? The bomb designers considered that the lithium-7 isotope used in the bomb was essentially inert, when in fact it... wasn't. And that was sufficient to triple the yield of the weapon, far beyond what the designers had considered possible.

Again, extending the range is more successful than adjusting the probabilities of high events. And even the hard sciences are susceptible to great errors.

## Physician, wash thyself

The previous are good examples of dramatic underestimates of uncertainty by the model's internal measure of uncertainty. They are especially good because we have numerical estimates of the internal uncertainty. But they lack one useful rhetorical component: evidence of a large scale disaster. Now, there are a lots of of models to choose from which dramatically underestimated the likelihood of disaster, but I'll go with one problem in particular: the practice that doctors used to have of not washing their hands.

Ignaz Semmelweis noted in 1847 that women giving birth in the presence of doctors died at about twice the rate of those attended only by midwives. He correctly deduced that doctors were importing something from their experiments on cadavers, and that this was causing the women to die. He instituted a policy of using a solution of chlorinated lime for washing hands between autopsy work and the examination of patients - with great success, sometimes even reducing the death rate to zero in some months.

This caused a problem. Semmelweis was unable to convince other doctors about this, for a variety of standard biases. But the greatest flaw was that he had no explanation for this behaviour. Yes, he could point at graphs and show improvements - but there was nothing in standard medical theory that could account for it. The other doctors could play around with their models or theories for days, but they could never explain this type of behaviour.

His claims, in effect, lacked scientific basis. So they ignored them. Until Pasteur came up with a new model, there was just no way to understand these odd results.

The moral of this is that sometimes the uncertainty can not only be much greater than that of the model. Sometimes the uncertainty isn't even visible anywhere in the model.

And sometimes making these kinds of mistakes can lead to millions of unnecessary deaths, and the decisions made (using doctors for childbirths) have the absolute opposite effect than was intended.

The "model" you never name for the stock price distribution is a normal or Gaussian distribution. You point out that this model fails spectacularly to predict the highest variability point 20 sigma event. What you don't point out that this model fails to predict even the 5 sigma events.

Looking at the plot of "Daily Changes in the Dow," we see it is plotted over fewer than 95 years. Each trading year has a little less than 260 trading days in it. So plotted should be at most 24,000 daily changes. For a normal distribution, a 1/24000 event, the biggest event we would expect to happen once on this whole graph, would be somewhere between 4 and 4.5 sigma.

But instead of seeing one, or close to one 4.5 sigma or higher events in 24,000 daily changes, we see about 28 by my rough count looking at the graph. The data starts in 1915. By 1922 we have seen 5 "1 in100 years" events.

My point being: by 1922, we know the model that daily changes in the stock market fit a normal or gaussian distribution is just pure crap. We don't need to wait 70 years for 20 sigma event to know the model isn't just wrong, it is stupid. We suspect this fact within the firs... (read more)

Confidence levels inside and outside an argument seems related.

That's just a complicated way of saying "the model was wrong".

Um... it's not that easy. If your model breaks down in a pretty spectacular fashion you don't get to recover by inventing a multiple for your standard deviation. In the particular case of the stock markets, one way would be to posit a heavy-tailed underlying distribution and if it's sufficiently heavy-tailed the standard deviation isn't even defined. Anoth... (read more)

Isn't this more or less what mixture models were made for?

Re Black Monday, and possibly Castle Bravo, this site devoted to Managing the Unexpected seems to have some relevant research and recommendations.

And I don't think that your last example is in the same category of uncertain models with certain predictions.

No, you cannot infer a probability just from a SD. You also need to know what type of distribution it is. You're implicitly assuming a normal distribution, but everyone knows asset price returns have negative skew and excess kurtosis.

You could easily correct this by adding "If you use a normal distribution...".

Seems that (a generalization of) the equation 1 in the paper by Toby Ord http://arxiv.org/abs/0810.5515 linked in the comments to Yvain's post is something like what you are looking for.

On October 18, 1987, what sort of model of uncertainty of models one would have to have to say the uncertainty over the 20-sigma estimative was enough to allow it to be 3-sigma? 20-sigma, give 120 or take 17? Seems a bit extreme, and maybe not useful.