Some Thoughts on Conditional Forecasts – Lessons from the 2020 Election

1ChristianWilliams

1Javier

New Comment

Interesting read, thanks for writing it up. FYI the link "The report on the 2022 results __is now available__" leads to a private Google Drive file.

Disclaimer: This post was written as part of my job at Open Phil, but it hasn’t been reviewed closely by anyone else at Open Phil, and the opinions and recommendations I put forward are mine only and don’t reflect Open Phil’s views.In early 2020, Open Philanthropy commissioned forecasts from Hypermind on five socioeconomic indicators. These forecasts were conditional on the result of the 2020 election, i.e. questions were of the form "What will be the value $X on <date> if Trump <wins / doesn't win>?". The five indicators (all about the US) were: total COVID deaths, GDP, S&P 500, prime-age employment-population ratio, and rank in the World Happiness Report. The resolution dates were EOY 2022 and 2024. Forecasts were submitted between August and November 2020.

Now that the 2022 data is in, I summarize what I see as the key results in the tables and figures below.

^{[1]}I will only evaluate the forecasts using the mainline aggregation algorithm evaluated on the last day the forecasting window was open. This means I won’t talk about (i) individual differences in forecasting skill, (ii) how forecasts changed over time, or (iii) the merits of different aggregation methods.## Results

## Accuracy

QuestionLog score^{[2]}CDF^{[3]}Interpretation^{[4]}## Differences between conditions

^{[5]}and in 6/10 this difference was <5%.^{[6]}^{[7]}^{[8]}## Narrative takeaways

^{[9]}Labor force participation forecasts can’t benefit from stubborn secular trends or deep, liquid prediction markets. This hypothesis is just post hoc speculation, so take it with a grain of salt.Metaculus’sbut it gets a negative log score whereas Metaculus got a positive one. I wonder if asking directly about average happiness would have resulted in a more accurate forecast. Ranks can be brittle and messy because they depend on the ranks of every other country in the report (although most are so far apart that they have limited practical impact).this oneandthis onefrom Metaculus) that largely missed the fall and winter waves in 2020 and 2021 driven by new variants.on Metaculus.^{[10]}This makes me believe that the questions in this tournament weren’t very well suited to revealing the usefulness of conditional forecasting.estimateda ~0.5% probability of a second civil war in the US before July 2021, peaking at ~3% around J6. Note however that this is a relatively high bar compared to “autocracy” or “insurrection”.^{[11]}My rough intuition is that, if a forecasting method produces a difference ofXunits between conditionals, but we know based on that method’s track record that the expected absolute error is>X, we should be less inclined to think the difference in forecasts is coming from signal rather than noise. Put another way, the lack of precisionexplains awaythe difference in forecasts.^{^}See accompanying

colab notebook.^{^}This is calculated as log2(pdf(true value) * number of bins). The normalization is such that a uniform distribution over all bins would get a score of zero.

^{^}This is the cumulative distribution function at the true value. Lower values indicate forecasters overshot. Higher values indicate forecasters undershot.

^{^}The pdf was zero at the true value.

^{^}The relevant question range is the difference between the maximum and minimum values allowed in forecasts.

^{^}A similar picture emerges if we normalize by the (pooled) standard deviation instead of the range.

^{^}This is `trump_entropy / other_entropy`.

^{^}This is `(trump_average - other_average) / question_range`.

^{^}I tried to find the mid-2020 price of S&P 500 futures expiring in December 2022, but couldn’t.

This chartdisplays the relevant historical prices, but the price for that particular contract (ESZ22) is only available from mid-2021 onwards. As of May 2023, the forward-lookingfutures chainon Google Finance has contracts up until December 2027, but those expiring after June 2024 have no liquidity. My hypothesis is that, even though the contracts are technically available over longer time horizons, there’s no real trading up until ~18 months before expiration. I find this odd, so there’s a good chance I’m missing something.^{^}E.g. at the time of writing,

P(US restricts compute capacity before 2050 | HLMI by 2040)= 37%, but this goes down to 22% if there’s no HLMI by 2040.^{^}In fact, the absolute error in both questions was larger than the difference between the means of the two conditional distributions. E.g. in the question about the World Happiness Report, the mean was 19.9 under Trump 18.4 under not-Trump. The actual number was 15, so the forecast missed the mark by ~2x the difference between conditions. The same calculation for the Covid question yields a factor of ~13x.