[ Question ]

Wisdom of the crowds?

by Maiwaaro231 min read29th Sep 20212 comments



According to an aggregate of forecasts on metaculus, the probability of bitcoin going above 100k within the next 5 years is 60%.

Under which conditions is it rational to have my probability for some event match the (i) median of a large group of estimates (e.g. from metaculus)
or (ii) the average of their estimates? Or, (iii) is it merely rational to set (i) or (ii) as my prior? 

If yes, to any of these possibilities, why? What is the justification? Could someone point me towards the literature on this question?

New Answer
Ask Related Question
New Comment

2 Answers

  1. If you think the estimates are made using the same or better information than you have, and are representative (unbiased in selection or reporting) of the true beliefs of the estimators.  If these do not hold, the aggregate estimate MAY still be better than yours, or your independent estimate may be better.  
  2. If median is significantly different from the mean of a group of estimates, beware.  Depending on the reasons you see for the variance, you may prefer to throw out outliers and then take the median/mean (which will be closer together).  
  3. Generally, use it as evidence in calculating a posterior from your prior, rather than adjusting your prior.  The trick is in not double-counting evidence that you're using directly which the public estimates are also depending on.  

For Metaculus as evidence, there's not much mechanism for correction - nobody is making money by moving the prediction toward truth.  Which implies that it's not very good for any more than a trigger to look deeper if you're surprised by a result.  You'll have to figure out the reason for the surprising prediction, and use those reasons as evidence (if you agree with them), not just the resulting predictions.

The third part of the question is easier to answer than the first two: whatever distribution you get from the estimates, you can use it as a prior for further Bayesian updates.

Strictly speaking, you should distinguish between a probability of some event, and a distribution over models. Bayesian updates don't really work on probabilities directly, they work on model distributions. For any real-world prediction like this there are lots of relevant models, and a single probability doesn't specify enough information to do Bayesian updates.

For the first two parts, it's a lot harder.

If the estimates were part of a prediction market that you have reason to expect is "efficient enough", then you should expect that you can't do any better than using the average of the probabilities. Otherwise you could make a spread of bets that has positive expected value.

Metaculus is not such a market, and neither the median nor average have this property. There may still be some more complicated function of the predictions in the pool that does have useful properties, but I suspect finding one would be a major research project.

In the absence of such a justification, you will have to settle for the boring but still useful answer "it depends upon what you're doing with it". If you would benefit from having roughly equal numbers of people who make these predictions more and less optimistic than yourself (no matter how much more and less), then the median makes sense. If the degree to which they differ from you does matter, then some sort of average makes more sense. If some other property is desirable to you, then use some other statistic that reflects that.