Interesting read, thanks for writing it up. FYI the link "The report on the 2022 results is now available" leads to a private Google Drive file.
Disclaimer: This post was written as part of my job at Open Phil, but it hasn’t been reviewed closely by anyone else at Open Phil, and the opinions and recommendations I put forward are mine only and don’t reflect Open Phil’s views.
In early 2020, Open Philanthropy commissioned forecasts from Hypermind on five socioeconomic indicators. These forecasts were conditional on the result of the 2020 election, i.e. questions were of the form "What will be the value $X on <date> if Trump <wins / doesn't win>?". The five indicators (all about the US) were: total COVID deaths, GDP, S&P 500, prime-age employment-population ratio, and rank in the World Happiness Report. The resolution dates were EOY 2022 and 2024. Forecasts were submitted between August and November 2020.
Now that the 2022 data is in, I summarize what I see as the key results in the tables and figures below.[1] I will only evaluate the forecasts using the mainline aggregation algorithm evaluated on the last day the forecasting window was open. This means I won’t talk about (i) individual differences in forecasting skill, (ii) how forecasts changed over time, or (iii) the merits of different aggregation methods.
Question | Log score[2] | CDF[3] | Interpretation |
COVID Deaths | -inf[4] | 1.000 | Much worse than chance, wildly optimistic |
World Happiness Report Rank | -1.054 | 0.052 | Worse than chance, somewhat pessimistic |
Employment to Population Ratio | -2.380 | 0.995 | Worse than chance, pessimistic |
S&P 500 | 2.881 | 0.418 | Beats chance, no evidence of bias |
GDP | 1.440 | 0.933 | Beats chance, somewhat pessimistic |
Question | Year | Entropy ratio[7] | Mean difference / range[8] |
COVID Deaths | 2022 | 1.015 | 0.088 |
2024 | 1.027 | 0.144 | |
World Happiness Report Rank | 2022 | 1.047 | 0.016 |
2024 | 1.145 | 0.001 | |
Employment to Population Ratio | 2022 | 1.084 | -0.008 |
2024 | 1.009 | 0.004 | |
S&P 500 | 2022 | 1.130 | 0.075 |
2024 | 1.001 | 0.054 | |
GDP | 2022 | 0.978 | -0.003 |
2024 | 1.110 | 0.002 |
See accompanying colab notebook.
This is calculated as log2(pdf(true value) * number of bins). The normalization is such that a uniform distribution over all bins would get a score of zero.
This is the cumulative distribution function at the true value. Lower values indicate forecasters overshot. Higher values indicate forecasters undershot.
The pdf was zero at the true value.
The relevant question range is the difference between the maximum and minimum values allowed in forecasts.
A similar picture emerges if we normalize by the (pooled) standard deviation instead of the range.
This is `trump_entropy / other_entropy`.
This is `(trump_average - other_average) / question_range`.
I tried to find the mid-2020 price of S&P 500 futures expiring in December 2022, but couldn’t. This chart displays the relevant historical prices, but the price for that particular contract (ESZ22) is only available from mid-2021 onwards. As of May 2023, the forward-looking futures chain on Google Finance has contracts up until December 2027, but those expiring after June 2024 have no liquidity. My hypothesis is that, even though the contracts are technically available over longer time horizons, there’s no real trading up until ~18 months before expiration. I find this odd, so there’s a good chance I’m missing something.
E.g. at the time of writing, P(US restricts compute capacity before 2050 | HLMI by 2040) = 37%, but this goes down to 22% if there’s no HLMI by 2040.
In fact, the absolute error in both questions was larger than the difference between the means of the two conditional distributions. E.g. in the question about the World Happiness Report, the mean was 19.9 under Trump 18.4 under not-Trump. The actual number was 15, so the forecast missed the mark by ~2x the difference between conditions. The same calculation for the Covid question yields a factor of ~13x.