Tentative tips for people engaged in an exercise that involves some form of prediction or forecasting

Note: This is the concluding post of my LessWrong posts related to my forecasting work for MIRI. There are a few items related to forecasting that I didn't get time to look into and might return to later. I might edit this post to include references to those posts if I get to them later.

I've been looking at forecasting in different domains as part of work for the Machine Intelligence Research Institute (MIRI). I thought I'd draw on whatever I've learned to write up advice for people engaged in any activity that involves making forecasts. This could include a wide range of activities, including those that rely on improving the accuracy of predictions in highly circumscribed contexts (such as price forecasting or energy use forecasting) as well as those that rely on trying to determine the broad qualitative contours of possible scenarios.

The particular application of interest to MIRI is forecasting AI progress, leading up to (but not exclusively focused on) the arrival of AGI. I will therefore try to link my general tips with thoughts on how it applies to forecasting AI progress. That being said, I hope that what I say here will have wider interest and appeal.

If you're interested in understanding the state of the art with respect to forecasting AI progress specifically, consider reading Luke Muehlhauser's summary of the state of knowledge on when AI will be created. The post was written in May 2013, and there have been a couple of developments since then, including:

#1: Appreciate that forecasting is hard

It's hard to make predictions, especially about the future (see also more quotes here). Forecasting is a difficult job along many dimensions. Apart from being difficult, it's also a job where feedback is far from immediate. This holds more true as the forecasting horizon becomes wider (for lists of failed predictions made in the past, see here and here). Fortunately, a fair amount has been discovered about forecasting in general, and you can learn from the experience of people trying to make forecasts in many different domains.

Philip Tetlock's work on expert political judgment, whose conclusions he described here, and that I discussed in my post on the historical evaluations of forecasting, shows that at least in the domain of political forecasting, experts often don't do a much better job than random guesses, and even the experts who do well rarely do better than simple trend extrapolation. Not only do experts fail to do well, they are also poorly calibrated as to the quality of forecasts.

Even in cases where experts are right about the median or modal scenario, they often fail to both estimate and communicate forecast uncertainty.

The point that forecasting is hard, and should be approached with humility, will be repeated throughout this post, in different contexts.

#2: Avoid the "not invented here" fallacy, and learn more about forecasting across a wide range of different domains

The not invented here fallacy refers to people's reluctance to use tools developed outside of their domain or organization. In the context of forecasting, it's quite common. For instance, climate scientists have been accused of not following forecasting principles. The reaction of some of them has been along the lines of "why should we listen to forecasters, when they don't understand any climate science?" (more discussion of that response here, see also a similar answer on Quora). Moreover, it's not enough to only listen to outsiders who treat you with respect. The point of listening to and learning from other domains isn't to be generous to people in those domains, but to understand and improve one's own work (in this case, forecasting work).

There are some examples of successful importation of forecasting approaches from one domain to another. One example is the ideas developed for forecasting rare events, as I discussed in this post. Power laws for some rare phenomena, such as earthquakes, have been around for a while. Aaron Clauset and his co-authors have recently applied the same mathematical framework of power laws to other types of rare events, including terrorist attacks.

Evaluating AI progress forecasting on this dimension: My rough impression is that AI progress forecasting tends to be insular, learning little from other domains. While I haven't seen a clear justification from AI progress forecasters, the typical arguments I've seen are the historical robustness of Moore's law and the idea that the world of technology is fundamentally different from the world of physical stuff.

I think that future work on AI progress forecasting should explicitly consider forecasting problems in domains other than computing, and explicitly explain what lessons cross-apply and what don't, and why. I don't mean that all future work should consider all other domains. I just mean that at least some future work should consider at least some other domains.

#3: Start by reading a few really good general-purpose overviews

Personally, I would highlight Nate Silver's The Signal and the Noise. Silver's book is quite exceptional in the breadth of topics it covers, the clarity of its presentation, and the easy toggling between general principles and specific instances. Silver's book comfortably combines ideas from statistics, data mining, machine learning, predictive analytics, and forecasting. Not only would I recommend reading it quickly when you're starting out, I would also recommend returning to specific chapters of the book later if they cover topics that interest you. I personally found the book a handy reference (and quoted extensively from it) when writing LessWrong posts about forecasting domains that the book has covered.

Other books commonly cited are Tetlock's Expert Political Judgment and the volume Principles of Forecasting edited by J. Scott Armstrong, and contributed to by several forecasters. I believe both these books are good, but I'll be honest: I haven't read them, although I have read summaries of the books and shorter works by the authors describing the main points. I believe that you can similarly get the bulk of the value of Tetlock's work by reading his article for Cato Unbound co-authored with Dan Gardner, that I discussed here. For the principles of forecasting, see #4 below.

Evaluating AI progress forecasting on this dimension: There seems to be a lot of focus on a few AI-related and computing-related futurists, such as Ray Kurzweil. I do think the focus should be widened, and getting an understanding of general challenges related to forecasting is a better starting point than reading The Singularity is Near. That said, the level of awareness among MIRI and LessWrong people about the work of Silver, Armstrong, and Tetlock definitely seems higher than among the general public or even among the intelligentsia. I should also note that Luke Muehlhauser was the person who first pointed me to J. Scott Armstrong, and he's referenced Tetlock's work frequently.

#4: Understand key concepts and distinctions in forecasting, and review the literature and guidelines developed by the general-purpose forecasting community

In this post, I provided an overview of different kinds of forecasting, and also included names of key people, key organizations, key journals, and important websites. I would recommend reading that to get a general sense, and then proceeding to the Forecasting Principles website (though, fair warning: the website's content management system is a mess, and in particular, you might find a lot of broken links). Here's their full list of 140 principles, along with discussion of the evidence base for each principle. However, see also point #5 below.

#5: Understand some alternatives to forecasting, specifically scenario analysis and futures studies

If you read the literature commonly classified as "forecasting" in academia, you will find very little mention of scenario analysis and futures studies. Conversely, the literature on scenario analysis and futures studies rarely cites the general-purpose forecasting literature. But the actual "forecasting" exercise you intend to engage in may be better suited to scenario analysis than to forecasting. Or you might find that the methods of futures studies are a closer fit for what you are trying to achieve. Or you might try to use a mix of techniques.

Broadly, scenario analysis becomes more important when there is more uncertainty, and when it's important to be prepared for a wider range of eventualities. This matters more as we move to longer time horizons for forecasting. I discussed scenario analysis in this post, where I also speculate on possible reasons for the lack of overlap with the forecasting community.

Futures studies is closely related to scenario analysis (in fact, scenario analysis can be considered a method of futures studies) but the futures studies field has a slightly different flavor. I looked at the field of futures studies in this post.

It could very well be the case that you find the ideas of scenario analysis and futures studies inappropriate for the task at hand. But such a decision should be made only after acquiring a reasonable understanding of the methods.

Some other domains that might be better suited to the problem at hand include predictive analytics, predictive modeling, data mining, machine learning, and risk analysis. I haven't looked into any of these in depth in connection with my MIRI project (I've been reading up on machine learning for other work, and have been and will be posting about it on LessWrong but that's independent of my MIRI work).

Evaluating AI progress forecasting on this dimension: I think a reasonable case can be made that the main goals of AI progress forecasting are better met through scenario analysis. I discussed this in detail in this post.

#6: Examine forecasting in other domains, including domains that do not seem to be related to your domain at the object level

This can be thought of as a corollary to #2. Chances are, if you have read Nate Silver and some of the other sources, your curiosity about forecasting in other domains has already been piqued. General lessons about human failure and error may cross-apply between domains, even if the object-level considerations are quite different.

In addition to Silver's book, I recommend taking a look at some of my own posts on forecasting in various domains. These posts are based on rather superficial research, so please treat them only as starting points.

General:

Some domain-specific posts:

Track record of survey-basedmacroeconomic forecasting
Lessons from weather forecasting and its history for forecasting as a domain and
Weather and climate forecasting: how the challenges differ by time horizon
An overview of forecasting for politics, conflict, and political violence
I've written about technology forecasting here, here (a look at Megamistakes), and here.

I also did some additional posts on climate science as a case study in forecasting. I have paused the exercise due to time and ability limitations, but I think the posts so far might be useful:

#7: Consider setting up data collection using best practices early on

Forecasting works best when we have a long time series of data to learn from. So it's best to set up data collection as quickly as possible, and use good practices in setting it up. Data about the present or recent past may be cheap to collect now, but could be hard to collect a few decades from now. We don't want to be spending our time two decades later figuring out how to collect data (and adjudicating disputes about the accuracy of data) if we could collect and archive the data in a stable repository right now.

If your organization is too small to do primary data collection, find another organization that engages in the data collection activities, and make sure you archive the data they collect, so that the data is available to you even if that organization stops operating.

Evaluating AI progress forecasting on this dimension: I think that there are some benefits from creating standardized records and measurements of the current state of AI and the quality of the current hardware and software. That said, there do exist plenty of reasonably standardized measurements already in these domains. There is little danger of this information completely disappearing, so that the project of combining and integrating them into a big picture is important but not time-sensitive. Hardware progress and specs are already well-documented, and we can get time series at places such as the Performance Curve Database. Software progress and algorithmic progress have also been reasonably well-recorded, as described by Katja Grace in her review for MIRI of algorithmic progress in six domains.

#8: Consider recording forecasts and scenarios, and the full reasoning or supporting materials

It's not just useful to have data from the past, it's also useful to have forecasts made based on past data and see how they compared to what actually transpired. The problem with forecasts is even worse than with data: if two decades later we want to know what one would have predicted using the data that is available right now, we simply cannot do that unless we make and record the predictions now. (We could do it in principle by imagining that we don't have access to the intermediate data. But in practice, people can find it hard to avoid being influenced by their knowledge of what has transpired in the interim when they build and tune their models). Retrodictions and hindcasts are useful for analysis and diagnosis, but they ultimately do not provide a convincing independent test of the model being used to make forecasts.

Evaluating AI progress forecasting on this dimension: See the link suggestions for recent work on AI progress forecasting at the beginning of the post.

The remaining points are less important and more tentative. I've included for completeness' sake.

#9: Evaluate how much expertise the domain experts have in forecasting

In some cases, domain experts also have expertise in making forecasts. In other cases, the relationship between domain expertise and the ability to make forecasts, or even to calibrate one's own forecast accuracy, is tenuous. I discussed the issue of how much deference to give to domain experts in this post.

#10: Use best practices from statistical analysis, computer programming, software engineering, and economics

Wherever using these disciplines, use them well. Statistical analysis arises in quantitative forecasting and prediction. Computer programming is necessary for setting up prediction markets or carrying out time series forecasting or machine learning with large data sets or computationally intensive algorithms. Software engineering is necessary once the computer programs exceed a basic level of complexity, or if they need to survive over the long term. Insights from economics and finance may be necessary for designing effective prediction markets or other tools to incentivize people to make accurate predictions and minimize their chances of gaming the system in ways detrimental to prediction accuracy.

The insularity critique of climate science basically accused the discipline of not doing this.

What if your project is too small and you don't have access to expertise in these domains? Often, a very cursory, crude analysis can be helpful in ballparking the situation. As I described in my historical evaluations of forecasting, the Makridakis Competitions provide evidence in favor of the hypothesis that simple models tend to perform quite well, although the correctly chosen complex models can outperform simple ones under special circumstances (see also here). So keeping it simple to begin with is fine. However, the following caveats should be noted:

Even "simple" models and setups can benefit from overview by somebody with subject matter expertise. The overviews can be fairly quick, but they still help. For instance, after talking to a few social scientists, I realized the perils of using simple linear regression for time series data. This isn't a deep point, but it can elude even a smart and otherwise knowledgeable person who hasn't thought much about the specific tools.
The limitations of the model, and the uncertainty in the associated forecast, should be clearly noted (see my post on communicating forecast uncertainty).

Evaluating AI progress forecasting on this dimension: I think that AI progress forecasting is at too early a stage to get into using detailed statistical analysis or software, so using simple models and getting feedback from experts, while noting potential weaknesses, seems like a good strategy.

#11: Consider carefully the questions of openness of data, practices, supporting code, and internal debate

While confidentiality and anonymity are valuable in some contexts, openness and transparency are good antidotes to errors that arise due to insufficient knowledge and groupthink (such as the types of problems I noted in my post on the insularity critique of climate science).

#12: Consider ethical issues related to forecasting, such as the waysyour forecasting exercise can influence real-world decisions and outcomes

This is a topic I intended to look into more but didn't get time to. I've collected a few links for interested parties:

Political and ethical issues in forecasting
The Role of Ethics in Statistical Forecasting
Information for Practitioners, and Legal Aspects of Forecasting on the Forecasting Principles website

LESSWRONG
LW

LESSWRONG
LW

14

Tentative tips for people engaged in an exercise that involves some form of prediction or forecasting

14

14