As a part of my work for MIRI on the "Can we know what to do about AI?" project, I read Nate Silver's book The Signal and the Noise: Why So Many Predictions Fail — but Some Don't. I compiled a list of the takeaway points that I found most relevant to the project. I think that they might be of independent interest to the Less Wrong community, and so am posting them here.

Because I've paraphrased Silver rather than quoting him, and because the summary is long, there may be places where I've inadvertently misrepresented Silver. A reader who's especially interested in a point should check the original text. 

Main Points

  • The deluge of data available in the modern world has exacerbated the problem of people perceiving patterns where none exist, and overfitting predictive models to past data.
  • Because of the risk of overfitting a model to past data, using a simple model can give more accurate results than using a refined model does.
  • A major reason that predictions fail is that predictors often don't take model uncertainty into account. Looking at a situation from multiple different angles can be a guard against failure to give adequate weight to model uncertainty. 
  • Average different perspectives often yields better predictive results than using a single perspective.
  • Humans have a very strong tendency toward being overconfident when making predictions.
  • People make better predictions in domains where they have tight feedback loops to use to test their hypotheses.
  • Sometimes people's failure to make good predictions is the result of perverse incentives. 

Chapter Summaries

Introduction

Increased access to information can do more harm than good. This is because the more information is available, the easier it is for people to cherry-pick information that supports their pre-existing positions, or to perceive patterns where there are none.
The invention of the printing press may have given rise to religious wars on account of facilitating the development of ideological agendas.

Chapter 1: The failure to predict the 2008 housing bubble and recession

  • There was an issue of people failing to take into account model uncertainty. In particular, people shouldn't have taken the forecasted 0.12% default rate of mortgage securities at face value. This rate corresponded to the rating agencies giving mortgage securities AAA ratings, which are usually reserved only the world's most solvent governments and best-run businesses.
  • Some of the actors involved failed to look at the situation from many different angles. For example, the fact that the increase in housing prices wasn't driven by a change in fundamentals seems to have been overlooked by some people.
  • Each individual factor that contributed to the housing bubble, and to the recession, seems like a common occurrence (e.g. perverse incentives, inadequate regulation, ignoring of tail risk, and irrational behavior coming from consumers). The severity of the situation seems to have come from the factors all being present simultaneously (by chance). Any individual factor would ordinarily be offset by other safeguards built into our social institutions.

Chapter 2: Political Predictions

  • Political pundits and political experts usually don't do much better than chance when forecasting political events, and usually do worse than crude statistical models.
  • Averaging individual experts' forecasts gives better forecasts than the forecasts of the average individual, with the effect size being about 15-20%.
  • There are some experts who do make predictions that are substantially more accurate than chance.
  • The experts who do better tend to be multidisciplinary, pursue multiple approaches to forecasting at the same time, be willing to change their minds, offer probabilistic predictions, and rely more on observation than on theory.
  • Making definitive predictions that fall into a pre-existing narrative is associated with political partisanship. It's negatively correlated with making accurate predictions, but positively correlated with getting media attention. So the most visible people may make systematically worse predictions than less visible people.
  • The failure to predict the fall of the Soviet Union seems to have arisen from a failure to integrate multiple perspectives. There were some people who were aware of Gorbachev's progressiveness and other people who recognized the dysfunctionality of the Soviet Union's economy, but these groups were largely nonoverlapping.
  • Nate Silver integrates poll data, historical track record of poll data, information about the economy and information about the demographics of states, in order to make predictions about political elections.
  • There's an organization called the Cook Political Report that has a very impressive track record of making accurate predictions about how political elections will go.

Chapter 3: Baseball predictions

  • Baseball statistics constitute a very rich collection of data, and people who aspire to predict how well players will play in the future  have rapid feedback loops that allow them to repeatedly test the validity of their hypotheses.
  • A simple model of how the performance of a baseball player varies with age outperformed a much more complicated model that attempted to form a more nuanced picture of how performance varies with age by dividing players into different classes. This may have been because the latter model was over-fitted to the existing data.

Chapter 4: Weather Predictions

  • Weather forecasters have access to a large amount of data, which offers them rapid feedback loops that allow them to repeatedly test their hypotheses.
  • The method of predicting what would happen under certain initial conditions for many different examples of initial conditions and then averaging over the results is tantalizing. It suggests the possibility of reducing uncertainty in situations that seem hopelessly complicated to analyze, by averaging over the predictions made under different assumptions.
  • It's impressive that the weather experts are well calibrated.
  • Local news networks sacrifice accuracy and honesty to optimize for viewer satisfaction.
  • The integrated use of computer models and human judgment calls does notably better than computer models alone.
  • The human input is getting better over time.
  • Hurricane Katrina wasn't appropriately addressed because the local government didn't listen to the weather forecasters early enough, and the local people didn't take the hurricane warning sufficiently seriously.

Chapter 5: Earthquake predictions:

  • The Gutenberg-Richter law predicts the frequency of earthquakes of a given magnitude in a given location. One can use the frequency of earthquakes of a given magnitude to predict the frequency of earthquakes of a higher magnitude (even without having many data points).
  • Efforts to build models that offer more precise predictions than the Gutenberg-Richter law does have been unsuccessful, apparently owing to overfitting existing data, and have generally done worse than the Gutenberg-Richter law.

Chapter 6: 

  • Communicating a prediction of the median case without giving a confidence interval can be very pernicious, because outcomes can be highly sensitive to error.
  • Economists have a poor track record of predicting GDP growth. There's so much data pertaining to factors that might drive GDP growth that it's easy to perceive patterns that aren't real.
  • The economy is always changing, and often past patterns don't predict the future patterns.
  • Prediction markets for GDP growth might yield better predictions than economists' forecasts do. But existing prediction markets aren't very good.

Chapter 7: Disease Outbreaks

  • Predictions can be self-fulfilling (e.g. in election primaries races) or self-canceling (e.g. when disease outbreaks are predicted, measures can be taken to prevent them, which can nullify the prediction).

Chapter 8: Bayes' Theorem

  • When gauging the strength of a prediction, it's important to view the inside view in the context of the outside view. For example, most medical studies that claim 95% confidence aren't replicable, so one shouldn't take the 95% confidence figures at face value.

Chapter 9: Chess computers

  • Our use of prediction heuristics makes us vulnerable to opponents who are aware of the heuristics that we're using and who can therefore act in unexpected ways that we're not prepared for.

Chapter 10: Poker

  • Elite poker players use Bayesian reasoning to estimate the probability of a hand based on the cards on the table, contingent on opponents' behavior.
  • Elite poker players also additional information, such as the fact that women tend to play more conservatively than men do, in order to refine their predictions about what cards the opponent has
  • Often the 80%/20% rule applies to getting good at predictions relative to what's in principle possible. A relatively small amount of effort can result in large improvements. In competitive contexts such as poker, serious players have all already put this amount of effort in, so beating them requires further effort. But in arenas such as election results predictions, where not many people are trying hard, it's possible to do a lot better than most people do with relatively little effort.

Chapter 11: The stock market

  • It's difficult to distinguish signal from noise when attempting to make predictions about the stock market.
  • There are some systematic patterns in the stock market. For example, between 1965 and 1975, rises in stock prices one day were correlated with rises in stock prices the next day. But such patterns are rapidly exploited once people recognize them, and disappear.
  • It's not so hard to predict a stock market bubble. One can look at the average price to earnings ratio across all stocks, and when it's sufficiently high, that's a signal that there's a bubble.
  • It's hard to predict when a bubble is about to pop.
  • Most investors are relatively shortsighted. This is especially the case because most investors are investing other people's money rather than their own.
  • There are incentives not to short-sell stocks too much, both cultural and legal. This may give rise to a market inefficiency.
  • An 1970 investment of $10k in S&P 500 would have yielded $63k in profit in 2009, but if one adopted the strategy of pulling money out when the market dropped by 25% and putting it back in when it had recovered to 90% of its earlier price, the profit would only be $18k. Many investors behave in the latter fashion.

Chapter 12: Climate change

  • There's a lot of uncertainty around climate change predictions: there's uncertainty about the climate models, uncertainty about the initial conditions, and uncertainty about society's ability to adapt.
  • There may be global cooling coming from sulfer emissions
  • The amount of uncertainty can easily justify focus on mitigating climate change, because the risk of the problem being worse than expected entails more potential negative consequences than the consequences in the median case.
  • A simple regression analysis looking at the correlation between CO2 levels and temperature may give a better predictive model than more sophisticated climate models.

Chapter 13: Terrorism

  • Governments often prepare for terrorist attacks, but often prepare for the wrong kinds of terrorist attacks, unaware of bigger threats.
  • The September 11th scenario hadn't been considered and rejected, but rather, hadn't been considered at all.
  • If one looks at number of terrorist attacks as a function of their magnitude, they seem to obey a power law.
  • There are some reasons to be concerned about the possibility of a nuclear weapon terrorist attack, or bioterrorism, in the United States. Such an attack could kill over a million people

New to LessWrong?

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 11:41 PM

Political pundits and political experts usually don't do much better than chance when forecasting political events, and usually do worse than crude statistical models.

An important caveat to this, though I can't recall whether Silver mentions it: Tetlock (2010) reminds people that he never said experts are "as good as dart-throwing chimps." E.g. experts do way better than chance (or a linear prediction rule) when considering the space of all physically possible hypotheses rather than when considering a pre-selected list of already-plausible answers. E.g. the layperson doesn't even know who is up for election in Myanmar. The expert knows who the plausible candidates are, but once you narrow the list to just the plausible candidates, then it is hard for experts to out-perform a coin flip or an LPR by very much.

One interesting fact from Chapter 4 (on weather predictions) that seems worth mentioning: Weather forecasters are also very good at manually and intuitively (i.e. without some rigorous mathematical method) fixing the predictions of their models. E.g. they might know that model A always predicts rain a hundred miles or so too far west from the Rocky Mountains. So to fix this, they take the computer output and manually redraw the lines (demarking level sets of precipitation) about a hundred miles east, and this significantly improves their forecasts.

Also: the national weather service gives the most accurate weather predictions. Everyone else will exaggerate to a greater or lesser degree in order to avoid getting flak from consumers about, e.g., rain on their wedding day (because not-rain or their not-wedding day is far less of a problem).

A major reason that predictions fail is model uncertainty into account.

This does not seem to be a proper sentence. Some words missing, yes?

Thanks, fixed.

The excerpts I posted from the book may be of interest:

When gauging the strength of a prediction, it's important to view the inside view in the context of the outside view. For example, most medical studies that claim 95% confidence aren't replicable, so one shouldn't take the 95% confidence figures at face value.

This implies that the average prior for a medical study is below 5%. Does he make that point in the book? Obviously you shouldn't use a 95% test when your prior is that low, but I don't think most experimenters actually know why a 95% confidence level is used.

Does the bit on Gorbachev contain any references to Timur Kuran's work on preference falsification & cascades?

Not that I recall, and searching the text for Kuran nets nothing.

The invention of the printing press may have given rise to religious wars on account of facilitating the development of ideological agendas.

Uh, what? Is this meant to suggest that there were no religious wars before the printing press?

[-][anonymous]11y200

Nope, just ambiguity in English. You read "may have given rise to [all/most] religious wars" when he meant "may have given rise to [some] religious wars."

In general, though, it's exhausting to constantly attempt to write things in a way that minimizes uncharitable interpretations, so readers have an obligation not to jump to conclusions.

Ah, I see the ambiguity now. My mistake; thank you for the explanation!

EDIT: You know, I keep getting tripped up by this sort of thing. I don't know if it's because English isn't my first language (although I've known it for over two decades), or if it's just a general failing. Anyway, correction accepted.