This post has been written in relation with work I'm doing for the Machine Intelligence Research Institute (MIRI), but hasn't been formally vetted by MIRI. I'm posting this to LessWrong because of its potential interest to a segment of the LessWrong readership.
In order to assess the quality of current and future forecasts, it's important to consider the historical record of forecasting. Doing such a historical evaluation typically requires a systematic record of past forecasts. The set of forecasts may be:
Forecasts may be evaluated for any of these (for a longer discussion, see here):
The forecasts could also take at least two different perspectives:
A quick list of forecast evaluations so far
See also the detailed discussion of each of the evaluations later in the post.
In this post, I describe existing evaluations and what we can learn from them.
The Makridakis Competitions (Wikipedia), known in the forecasting literature as the M-Competitions, are three competitions organized by teams led by forecasting researcher Spyros Makridakis. Here's a quick listing and summary of the competitions (table from Wikipedia):
According to the authors, the following main conclusions held in all three competitions:
Although the organizers of the M3-Competition did contact researchers in the area of artificial neural networks to seek their participation in the competition, only one researcher participated, and that researcher's forecasts fared poorly. The reluctance of most ANN researchers to participate at the time was due to the computationally intensive nature of ANN-based forecasting and the huge time series used for the competition. In 2005, Crone, Nikolopoulos and Hibon organized the NN-3 Competition, using 111 of the time series from the M3-Competition (not the same data, because it was shifted in time, but the same sources). The NN-3 Competition found that the best ANN-based forecasts performed comparably with the best known forecasting methods, but were far more computationally intensive. It was also noted that many ANN-based techniques fared considerably worse than simple forecasting methods, despite greater theoretical potential for good performance. More on the NN-3 Competition here. It's quite possible that if the competition were rerun a few years out from now, the neural network methods would outperform the best simple methods. We'll talk more about simple versus complicated methods in a later post.
Survey-based macroeconomic forecasts
More details are available in my post reviewing the track record of survey-based macroeconomic forecasting. The following overall conclusions seem to emerge:
Tetlock study of expert political judgment
For his book Expert Political Judgment, Tetlock surveyed 284 experts and collected a total of 28,000 predictions. His findings, as described in the book and in an article for Cato Unbound co-authored with Dan Gardner, are as follows (note that the language is copy-pasted from the Cato Unbound article but restructured somewhat for sentence flow):
Tetlock followed up the research in the project with co-creating The Good Judgment Project (Wikipedia), that used aggregation of information from large numbers of participants who had access to Google search and the Internet but didn't necessarily have prior subject matter expertise. The Good Judgment Project produced better forecasts than other contestants in the IARPA Aggregative Contingent Estimation contest. This finding combines the idea that foxes have advantages over hedgehogs (Google searches by people without much prior knowledge resembles fox-like thinking) and the miracle of aggregation.
Tauri Group Retrospective
The report titled Retrospective Analysis of Technology Forecasting: InScope Extension by Carie Mullins for the Tauri Group often goes by the name of the Tauri Group Retrospective. The report was published on August 13, 2012 and includes 2,092 forecasts that were found to be timely, specific, complete, and relevant enough to be further verified and assessed for accuracy. The following were the main findings (from Table ES-2 of the paper, Page 3):
Proprietary trading in financial markets is often (though not always) about forecasting.
People who can actually forecast economic/financial variables stand to make a LOT of money very quickly.