2021 Note: This was written and posted in December, 2016. The date it shows on LessWrong is 2019; I believe this refers to a time the post was (very minorly) updated, as part of moving it to the Prediction-Driven Collaborative Reasoning Systems sequence.
Prediction markets are powerful, but also still quite niche. I believe that part of this lack of popularity could be solved with significantly better tools. During my work with Guesstimate I’ve thought a lot about this issue and have some ideas for what I would like to see in future attempts at prediction technologies.
1. Machine learning for forecast aggregation
In financial prediction markets, the aggregation method is the market price. In non-market prediction systems, simple algorithms are often used. For instance, in the Good Judgement Project, the consensus trends displays “the median of the most recent 40% of the current forecasts from each forecaster.” Non-financial prediction aggregation is a pretty contested field with several proposed methods.
I haven’t heard much about machine learning used for forecast aggregation. It would seem to me like many, many factors could be useful in aggregating forecasts. For instance, some elements of one’s social media profile may be indicative of their forecasting ability. Perhaps information about the educational differences between multiple individuals could provide insight on how correlated their knowledge is.
Perhaps aggregation methods, especially with training data, could partially detect and offset predictable human biases. If it is well known that people making estimates of project timelines are overconfident, then this could be taken into account. For instance, someone enters in “I think I will finish this project in 8 weeks”, and the system can infer something like, “Well, given the reference class I have of similar people making similar calls, I’d expect it to take 12.”
A strong machine learning system would of course require a lot of sample data, but small strides may be possible with even limited data. I imagine that if data is needed, lots of people on platforms like Mechanical Turk could be sampled.
2. Prediction interval input
The prediction tools I am familiar with focus on estimating the probabilities of binary events. This can be extremely limiting. For instance, instead of allowing users to estimate what Trump’s favorable rating would be, they instead have to bet on whether it will be over a specific amount, like “Will Trump’s favorable rate be at least 45.0% on December 31st?”
It’s probably no secret that I have a love for probability densities. I propose that users should be able to enter probability densities directly. User entered probability densities would require more advanced aggregation techniques, but is doable.
Probability density inputs would also require additional understanding from users. While this could definitely be a challenge, many prediction markets already are quite complicated, and existing users of these tools are quite sophisticated.
I would suspect that using probability densities could simplify questions about continuous variables and also give much more useful information on their predictions. If there are tail risks these would be obvious; and perhaps more interestingly, probability intervals from prediction tools could be directly used in further calculations. For instance, if there were separate predictions about the population of the US and the average income, these could be multiplied to have an estimate of the total GDP (correlations complicate this, but for some problems may not be much of an issue, and in others perhaps they could be estimated as well).
Probability densities make less sense for questions with a discrete set of options, like predicting who will win an election. There are a few ways of dealing with these. One is to simply leave these questions to other platforms, or to resort back to the common technique of users estimating specific percentage likelihoods in these cases. Another is to modify some of these to be continuous variables that determine discrete outcomes; like the number of electoral college votes a U.S. presidential candidate will receive. Another option is to estimate the ‘true’ probability of something as a distribution, where the ‘true’ probability is defined very specifically. For instance, a group could make probability density forecasts for the probability that the blog 538 will give to a specific outcome on a specific date. In the beginning of an election, people would guess 538's percent probability for one candidate winning a month before the election.
3. Intelligent Prize Systems
I think the main reason why so many academics and rationalists are excited about prediction markets is because of their positive externalities. Prediction markets like InTrade seem to do quite well at predicting many political and future outcomes, and this information is very valuable to outside third parties.
I’m not sure how comfortable I feel about the incentives here. The fact that the main benefits come from externalities indicates that the main players in the markets aren’t exactly optimizing for these benefits. While users are incentivized to be correct and calibrated, they are not typically incentivized to predict things that happen to be useful for observing third parties.
I would imagine that the externalities created by prediction tools would be strongly correlate with the value of information to these third parties, which does rely on actionable and uncertain decisions. So if the value of information from prediction markets were to be optimized, it would make sense that these third parties have some way of ranking what gets attention based on what their decisions are.
For instance, a whole lot of prediction markets and related tools focus heavily on sports forecasts. I highly doubt that this is why most prediction market enthusiasts get excited about these markets.
In many ways, promoting prediction markets for their positive externalities is very strange endeavor. It’s encouraging the creation of a marketplace because of the expected creation of some extra benefit that no one directly involved in that marketplace really cares about. Perhaps instead there should be otherwise-similar ways for those who desire information from prediction groups to directly pay for that information.
One possibility that has been discussed is for prediction markets to be subsidized in specific ways. This obviously would have to be done carefully in order to not distort incentives. I don’t recall seeing this implemented successfully yet, just hearing it be proposed.
For prediction tools that aren’t markets, prizes can be given out by sponsoring parties. A naive system is for one large sponsor to sponsor a ‘category’, then the best few people in that category get the prizes. I believe something like this is done by Hypermind.
I imagine a much more sophisticated system could pay people as they make predictions. One could imagine a system that numerically estimates how much information was added to the new aggregate when a new prediction is made. Users with established backgrounds will influence the aggregate forecast significantly more than newer ones, and thus will be rewarded proportionally. A more advanced system would also take into account estimate supply and demand; if there are some conditions where users particularly enjoy adding forecasts, they may not need to be compensated as much for these, despite the amount or value of information contributed.
On the prize side, a sophisticated system could allow various participants to pool money for different important questions and time periods. For instance, several parties put down a total of $10k on the question ‘what will the US GDP be in 2020’, to be rewarded over the period of 2016 to 2017. Participants who put money down could be rewarded by accessing that information earlier than others or having improved API access.
Using the system mentioned above, an actor could hypothetically build up a good reputation, and then use it to make a biased prediction in the expectation that it would influence third parties. While this would be very possible, I would expect it to require the user to generate more value than their eventual biased prediction would cost. So while some metrics may become somewhat biased, in order for this to happen many others would become improved. If this were still a problem, perhaps forecasts could make bets in order to demonstrate confidence (even if the bet were made in a separate application).
4. Non-falsifiable questions
Prediction tools are really a subset of estimation tools, where the requirement is that they estimate things that are eventually falsifiable. This is obviously a very important restriction, especially when bets are made. However, it’s not an essential restriction, and hypothetically prediction technologies could be used for much more general estimates.
To begin, we could imagine how very long term ideas could be forecasted. A simple model would be to have one set of forecasts for what the GDP will be in 2020, and another for what the systems’ aggregate will think the GDP is in 2020, at the time of 2018. Then in 2018 everyone could be ranked, even though the actual event has not yet occurred.
In order for the result in 2018 to be predictive, it would obviously require that participants would expect future forecasts to be predictive. If participants thought everyone else would be extremely optimistic, they would be encouraged to make optimistic predictions as well. This leads to a feedback loop that the more accurate the system is thought to be the more accurate it will be (approaching the accuracy of an immediately falsifiable prediction). If there is sufficient trust in a community and aggregation system, I imagine this system could work decently, but if there isn’t, then it won’t.
In practice, I would imagine that forecasters would be continually judged as future forecasts are contributed that agree or disagree with them, rather than only when definitive events happen that prove or disprove their forecasts. This means that forecasters could forecast things that happen in very long time horizons, and still be ranked based on their ability in the short term.
Going more abstract, there could be more abstract poll-like questions like, “How many soldiers died in war in WW2?” or “How many DALYs would donating $10,000 to the AMF create in 2017?”. For these, individuals could propose their estimates, then the aggregation system would work roughly like normal to combine these estimates. Even though these questions may never be known definitively, if there is built in trust in the system, I could imagine that they could produce reasonable results.
One question here which is how to evaluate the results of aggregation systems for non-falsifiable questions. I don’t imagine any direct way but could imagine ways of approximating it by asking experts how reasonable the results seem to them. While methods to aggregate results for non-falsifiable questions are themselves non-falsifiable, the alternatives also are very lacking. Given how many of these questions exist, it seems to me like perhaps they should be dealt with; and perhaps they can use the results from communities and statistical infrastructure optimized in situations that do have answers.
Each one of the above features could be described in much more detail, but I think the basic ideas are quite simple. I’m very enthusiastic about these, and would be interested in talking with anyone interested in collaborating on or just talking about similar tools. I’ve been considering attempting a system myself, but first want to get more feedback.
The Good Judgement Project FAQ, https://www.gjopen.com/faq
Sharpening Your Forecasting Skills, Link
IARPA Aggregative Contingent Estimation (ACE) research program https://www.iarpa.gov/index.php/research-programs/ace
The Good Judgement Project: A Large Scale Test of Different Methods of Combining Expert Predictions
“Will Trump’s favorable rate be at least 45.0% on December 31st?” on PredictIt (Link).
I believe Quantile Regression Averaging is one way of aggregating prediction intervals https://en.wikipedia.org/wiki/Quantile_regression_averaging