Missing forecasting tools: from catalogs to a new kind of prediction market

LESSWRONG
is fundraising!
LW

Missing forecasting tools: from catalogs to a new kind of prediction market — LessWrong

Introduction

It’s possible to make quantitative forecasting (forecasting by software) significantly more easy and fun by adding some missing pieces to the forecasting ecosystem. Let’s do it.

The models I’m interested in are “gears” and not “behavior” models: They should be motivated by theory, be built to inform our theoretical understanding, and have relatively few numeric parameters. The point is to do science, not just fit parameters to data.

In this post I focus on forecasting things that are regularly measured and published by institutions. Such data includes economic and population statistics, opinion and value surveys, statistics about violence, public health, etc. I propose that we can build tools that would make it easier to test theories through forecasting models.

For example, a simple model for national meat consumption would use data about GDP growth per capita, population growth, population value judgements (as measured by the World Value Survey) and the past year's meat consumption to predict next year's consumption. Similar inputs would be relevant for predicting carbon emissions. Predicting GDP would presumably rely on corruption indicators, past growth, educational attainment etc.

Motivation

make it easier to build, share and use models

There is no “Wikipedia for predictive models” that I know of. No big repository to easily share and find predictive scientific models other than the relevant domain’s scientific literature, which is not optimized for these tasks: it is not organized by the variables being predicted, it is not generally available as reusable and modular software components, it is usually not focused on predictive work, some of it is paywalled, etc.

low-effort empiricism for non-experts

Evaluating the empirical performance of predictive models is a noisy, but low-bias, way to access the truth. People who aren't domain experts attempt to access the truth by finding people they hope are knowledgeable and then trusting their opinion, because doing research yourself is hard. Think about the people you disagree with regarding the safety of vaccines. Consider how bad these people are in identifying reliable domain experts, and how dangerous that weakness is. We can create tools even low-information users can utilise to get in touch with reality.

The tools we miss

There are several tools we can build to make quantitative forecasting easier and more useful. These tools are:

a machine-readable catalog of publicly accessible datasets, complete with metadata that software can use to access that data
a catalog of forecasting models
an internet service that automatically re-evaluates quantitative models as new data arrives.
a game of betting on models
a market for subsidizing model development

Let us briefly consider the benefit of each of these in turn.

Catalog of datasets

First, there's the catalog of datasets. Once this catalog exists, model developers would be able to find and fetch data more easily, by searching through the catalog for what interests them, and using a software library to download the data from wherever it is, without having to think about what is the access method for the data and in what format its publisher provides it.

Catalog of models

The catalog of models is meant to be a central place to share your models with others, and to find what others have made for you to build upon. One may come here looking for pieces of models to stitch together, to combine with one's own idea so as to make a model that relates more variables and is fit for a particular purpose. Experts and hobbyists alike would be able to use this catalog to put into the public record forecasts, in the expectation that these forecasts would be tested and that good models would win some recognition.^[1]

Model evaluation service

Next is the model-evaluation service. It would monitor data publishers for updates and use new data as it arrives to track the accuracy of predictive models. There would be a scoreboard, showing the best models for predicting mortality, the best models for predicting economic growth, unemployment, carbon emissions, etc.

This is where ordinary people would go to see what are some important factors that affect something that concerns them. Alas, only factors for which there are some published statistics can be related like this, though we may be able to detect indirect links between variables as well.

Model performance prediction market

We can bet (real and/or play money) on the performance of models for fun, education and profit. Whereas ordinary prediction markets provide you with information about discrete events but do not reveal the participants’ thinking proess, model-performance markets teach you how to think about the world. The usual motivations for prediction markets apply: they aggregate information, they provide a trustworthy, difficult-to-corrupt signal, and they provide a channel for results-based subsidies for research, they have educational value, etc.

Plausible questions

Is that basically Kaggle for social science?
- Kaggle as it currently exists is a place for competitive machine learning. What I want to do here is science, not machine learning. Science is different from ML in its heavy emphasis on theory, parsimonious models, interpretability and causality. It’s not about making predictions per se - it’s about gaining an understanding.
- Maybe we can still use Kaggle as a platform for doing some of the things I propose here - and maybe Kaggle can be changed to support the sort of efforts I describe. I’m not opposed to leveraging existing tools, Kaggle included, to implement this vision.
Fitting models to data is easy, why do we need any of the above?
- Machine learning is fairly automated (though there is still a big role for knowledge), but this is not about machine learning (see previous question). This is about easily sharing theoretically motivated modeling work, evaluating models in an a-political^[2] way, and incentivizing model development for fun, impact and money.
We’ve got scientific journals for sharing research, why do we need the tools described in this post?
- Nothing proposed here is likely to replace scientific journals. But journals are not optimized for ease of sharing, objectivity or automation:
  - Getting work published in a scientific journal is very hard. But publishing a model in the catalog of models should be very easy.
  - Scientific journals seem to have some severe editorial and structural biases^[3]^[4].
  - With the tools described we can continuously and automatically update our posteriors about scientific theories as new evidence accumulates - a service not provided by journals.
  - Models that are presented in a scientific article are not presented in a machine-readable form that is easy to locate, reuse, combine with other models, or independently evaluate. All of those things can be done with models that are published in journals, but they take substantial manual work and expertise.
  - Most journals are written for an expert readership.
Are current prediction markets suitable for betting on model performance?
- Some existing markets can be used as-is. We need to build the predictive models and the scoring procedure first, and then we can bet in existing platforms on relative model scores.

Summary

The overall vision is to build better tools and a users' community that facilitate finding out how the world works, to make us curious about predicting stuff, to train us in putting numbers on effect sizes, and that could possibly even focus research funding on research that makes obvious epistemic progress.

Does this interest you? Right now this is just a bunch of ideas I'm discussing with people. I need partners to make this happen. Please talk to me.

Questions to the audience

Do you know whether any of the pieces described already exist?
Do you think you would be a likely user of any of the systems described?
Do you know an organization that might want to support this vision?
Are these good ideas?
Where do you see the most value, if any, in the above?
Do you see a way for this project to fund itself as a business without sacrificing its public purpose?
Would you help me build it? Might you have time, advice, or money to support this?

I've never started anything like this before. I'm gonna need all the help I can get.

I'll be waiting for your thoughts, either here or on https://discord.gg/rBqyQcrT5q. Bring it on!

Acknowledgements

I thank Niki Kotsenko, Yonatan Cale, Lior Oppenheim and Edo Arad for their helpful remarks.

^{^}
A reviewer of this post did not think that the Data Catalog solves a pain point. I suspend judgement on this question for now until I get more experience in building models.
^{^}
Perhaps you think that the scientific establishment needs no improvement in being a-political. A good many people seem to think otherwise, as did Robin Hanson. Even if the scientific establishment is flawless, there is value in creating an evaluation framework that can be easily seen to be unbiased, rather than merely believed to be unbiased.
^{^}
In “Could Gambling Save Science”, Robin Hanson writes: “Peer review is just another popularity contest, inducing familiar political games; savvy players criticize outsiders, praise insiders, follow the fashions insiders indicate, and avoid subjects between or outside the familiar subjects. It can take surprisingly long for outright lying by insiders to be exposed [Re]. There are too few incentives to correct for cognitive [Kah] and social [My] biases, such as wishful thinking, overconfidence, anchoring [He], and preferring people with a background similar to your own.”
^{^}
For a scathing attack of the journal system, see What's Wrong with Social Science and How to Fix It : Reflections After Reading 2578 Papers.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

14

Missing forecasting tools: from catalogs to a new kind of prediction market

14

14

Introduction

Motivation

make it easier to build, share and use models

low-effort empiricism for non-experts

The tools we miss

Catalog of datasets

Catalog of models

Model evaluation service

Model performance prediction market

Plausible questions

Summary

Questions to the audience

Acknowledgements