[EDIT: SimonM pointed out a possibly-fatal flaw with this plan: it would probably discourage more pundits from joining the prediction-making club at all, and adding to that club is a higher priority than comparing the members more accurately.]
Stop me if you've heard this one. (Seriously, I may not be the first to have written this kind of idea here. Let me know if not.)
We've got several pundits making yearly predictions now, which is fantastic progress for the field. However, if they're not answering the same questions, you can't effectively judge their performance against one another.
I suggest that this winter, we do 2 rounds, one for proposing questions and one for making predictions.
December 1: deadline for pundits to propose prediction questions.
December: Metaculus formalizes questions (where possible) and opens markets.
January 1: deadline for pundits to register their predictions (they don't have to bet) on any markets they choose.
At the end of the next year, we can judge pundits against each other on the intersection of their answered questions. (We can also check whether the pundit beat the Metaculus prices at the time they entered their predictions.)
This won't guarantee a total or even partial ordering on pundits, if they choose to answer different sets of questions; but the victor of any pair will be clear (after choosing a scoring rule). We can treat the result as a round-robin tournament among the pundits, or better yet, do data analysis on subdomains (who beat whom in predicting US politics, etc) where clearer winners may emerge.
Additional possible features:
Thanks to ciphergoth for developing this idea with me!
I don't think this is an especially good idea for a bunch of reasons:
Ideally Metaculus (or other prediction platforms) should be asking sufficiently many interesting questions about future years that the questions which the pundits choose to forecast on are already predicted on, and we can make comparisons from there.
I would recommend this article from the EA forum which also lays out a bunch of additional issues around prediction contests
That's a great point. [Getting more pundits to make predictions at all] is much more valuable than [more accurately comparing pundits who do make predictions] right now, to such an extent that I now doubt whether my idea was worthwhile.
This feels solvable with a sufficiently large monetary prize.
I expect it will be easier to get Metaculus users to make forecasts on pundits' questions than to get pundits to make forecasts on each other's questions.
Suggested variant (with dates for concreteness):
Dec 1: deadline for pundits to submit their questionsDec 10: metaculus announces the final version of all the questions they're using, but does not open marketsDec 20: deadline for pundits & anyone else to privately submit their forecasts (maybe hashed), and metaculus markets openDec 31: current metaculus consensus becomes the official metaculus forecast for the questions, and pundits (& anyone else) can publicize the forecasts that they made by Dec 20
Contestants (anyone who submitted forecasts by Dec 20) mainly get judged based on how they did relative to the Dec 31 metaculus forecast. I expect that they will mostly be pundits making forecasts on their own questions, plus forecasting aficionados.
(We want contestants & metaculus to make their forecasts simultaneously, with neither having access to the other's forecasts, which is tricky since metaculus is a public platform. That's why I have the separate deadlines on Dec 20 & Dec 31, with contestants' forecasts initially private - hopefully that's a short enough time period so that not much new information should arise, and long enough for people to have time to make forecasts.)
With only a small sample size of questions, it may be more meaningful to evaluate contestants based on how close they came to the official metaculus forecast rather than on how accurate they were (there's a bias-variance tradeoff). As a contestant does more questions (this year or over multiple years), the comparison with what actually happened becomes more meaningful.