Yes - I suspect a large amount of the variance is explained by features we can measure, and the residual may be currently unexplained, but filtering on the features you can measure probably gets most of what is needed.
However, I don't think the conclusion necessarily follows.
The problem is a causal reasoning / incentive issue (because of reasons) - just because people who update frequently do well doesn't mean that telling people you'll pay those who update frequently will cause them to do better now that they update more often. For instance, if you took MMORPG players and gave them money on condition that they spend money on the game, you'll screw up the relationship between spending and success.
That makes sense as an approach - but as mentioned initially, I think the issue with calling people superforecasters is deeper, since it's unclear how much of the performance is even about their skill, rather than other factors.
Instead of basketball and the NBA, I'd compare superforecasting to performance at a modern (i.e. pay-to-win) mobile MMORPG: you need to be good to perform near the top, but the other factor that separates winners and losers is being willing to invest much more than others in loot boxes and items (i.e. time spent forecasting) because you really want to win.
Superforecasters used only public information, or information they happened to have access to - but the original project was run in parallel with a (then secret) prediction platform for inside the intelligence community. It turned out that the intelligence people were significantly outperformed by superforecasters, despite having access to classified information and commercial information sources, so it seems clear that the information access wasn't particularly critical for the specific class of geopolitical predictions they looked at. This is probably very domain dependent, however.
it reads to me like "2% of people are superheroes" — they have performance that is way better than the rest of the population on these tasks.
As you concluded in other comments, this is wrong. But there doesn't need to be a sharp cutoff for there to be "way better" performance. If the top 1% consistently have brier scores on a class of questions of 0.01, the next 1% have brier scores of 0.02, and so on, you'd see "way better performance" without a sharp cutoff - and we'd see that the median brier score of 0.5, exactly as good as flipping a coin, is WAY worse than the people at the top. (Let's assume everyone else is at least as good as flipping a coin, so the bottom half are all equally useless.)
Agreed. As I said, "it is unlikely that there is a sharp cutoff at 2%, there isn't a discontinuity, and power law is probably the wrong term."
See my response below - and the dataset of forecasts is now public if you wanted to check the numbers.
The clear answer to the question posed, "do the performances of GJP participants follow a power-law distribution, such that the best 2% are significantly better than the rest" is yes - with a minor quibble, and a huge caveat. (Epistemic status: I'm very familiar with the literature, have personal experience as a superforecaster since the beginning, had discussions with Dan Gardner and the people running, have had conversations with the heads of Good Judgement Inc, etc.)The minor quibble is identified in other comments, that it is unlikely that there is a sharp cutoff at 2%, there isn't a discontinuity, and power law is probably the wrong term. Aside from those "minor" issues, yes, there is a clear group of people who outperformed multiple years in a row, and this groups was fairly consistent from year to year. Not only that, but the order withing that group is far more stable than chance. That clearly validates the claim that "superforcasters are a real thing."But the data that those people are better is based on a number of things, many of which aren't what you would think. First, the biggest difference between top forecasters and the rest is frequency of updates and a corresponding willingness to change their minds as evidence comes in. People who invest time in trying to forecast well do better than those who don't - to that extent, it's a skill like most others. Second, success at forecasting is predicted by most of the things that predict success at almost everything else - intelligence, time spent, and looking for ways to improve. Some of the techniques that Good Judgement advocates for superforecasters are from people who read Kahneman and Twersky, Tetlock, and related research, and tried to apply the ideas. The things that worked were adopted - but not everything helped. Other techniques were original to the participants - for instance, explicitly comparing your estimate for a question based on different timeframes, to ensure it is a coherent and reasonable probability. (Will X happen in the next 4 months? If we changed that to one month, would be estimate be about a quarter as high? What about if it were a year? If my intuition for the answer is about the same, I need to fix that.) Ideas like this are not natural ability, they are just applying intelligence to a problem they care about.Also, many of the poorer performers were people who didn't continue forecasting, and their initial numbers got stale - they presumably would have updated. The best performers, on the other hand, checked the news frequently, and updated. At times, we would change a forecast once the event had / had not happened, a couple days before the question was closed, yielding a reasonably large "improvement" in our time-weighted score. This isn't a function of being naturally better - it's just the investment of time that helps. (This also explains a decent part of why weighting recency in aggregate scores is helpful - it removes stale forecasts.)So in short, I'm unconvinced that superforecasters are a "real" thing, except in the sense that most people don't try, and people who do will do better, and improve over time. Given that, however, we absolutely should rely on superforecasters to make better predictions that the rest of people - as long as they continue doing the things that make them good forecasters.
See my other reply about pseudo-pareto improvements - but I think the "understood + endorsed" idea is really important, and worth further thought.