10

PoliticsProbability & Statistics
Personal Blog

There's a lot of debate about how good the polls and 538 have been is this election in comparison to the betting markets. While it's hard to compare it when just looking at percentage for Biden winning, it would be possible to calculate the Briers score by looking at all US states. Did anybody do the math?

New Comment

2Oscar_Cunningham19dDoes it make sense to calculate the score like this for events that aren't independent? You no longer have the cool property that it doesn't matter how you chop up your observations. I think the correct thing to do would be to score the single probability that each model gave to this exact outcome. Equivalently you could add the scores for each state, but for each use the probabilities conditional on the states you've already scored. For 538 these probabilities are available via their interactive forecast [https://projects.fivethirtyeight.com/trump-biden-election-map/]. Otherwise you're counting the correlated part of the outcomes multiple times. So it's not surprising that The Economist does best overall, because they had the highest probability for a Biden win and that did in fact occur. EDIT: My suggested method has the nice property that if you score two perfectly correlated events then the second one always gives exactly 0 points.
1JohnSteidley19dI think this comment would be better placed as a reply to the post that I'm linking. Perhaps you should put it there?
2Oscar_Cunningham19dDone. [https://www.lesswrong.com/posts/muEjyyYbSMx23e2ga/scoring-2020-u-s-presidential-election-predictions?commentId=iibA9SicAbufWE6q5]

steven0461

Nov 10, 2020

4

Looking at states still throws away information. Trump lost by slightly over a 0.6% margin in the states that he'd have needed to win. The polls were off by slightly under a 6% margin. If those numbers are correct, I don't see how your conclusion about the relative predictive power of 538 and betting markets can be very different from what your conclusion would be if Trump had narrowly won. Obviously if something almost happens, that's normally going to favor a model that assigned 35% to it happening over a model that assigned 10% to it happening. Both Nate Silver and Metaculus users seem to me to be in denial about this.

2Rafael Harth19dI think this is a strawman. Nate Silver says that his model has good calibration across its lifetime, and is in fact slightly too conservative. I agree that, if the only two things you consider are (a) the probabilities for a Biden win in 2020, 65% and 89%, and (b) the margin of the win in 2020, then betting markets are a clear winner. But how much does that matter? (And the article you linked doesn't mention markets at all.)
4steven046119dMy impression from Silver's internet writings is he hasn't admitted this, but maybe I'm wrong. I haven't seen him admit it and his claim that "we did a good job" suggests he's unwilling to. Betting markets are the clear winner if you look at Silver's predictions about how wrong polls would be, too. That was always the main point of contention. The line he's taking is "we said the polls might be this wrong and that Biden could still win", but obviously it's worse to say that the polls might be that wrong than to say that the polls probably would be that wrong (in that direction), as the markets implicitly did.
6steven046119dData points come in one by one, so it's only natural to ask how each data point affects our estimates of how well different models are doing, separately from how much we trust different models in advance. A lot of the arguments that were made by people who disagreed with Silver were Trump-specific, anyway, making the long-term record less relevant. If we were observing the results of his bets one by one, and Scott said it was 90% likely and a lot of other people said it was 60% likely, and then it didn't happen, I would totally be happy to say that Scott's model took a hit.
2Rafael Harth19dI think that's the root of our disagreement. In this situation, I would not concede that Scott's model took a hit. Instead, I would claim that 90% was a better estimate than 60%, despite the prediction coming out false. (This is assuming that I already know Scott's overall calibration, which is the case for 538.) I think this point bottoms out at a pretty deep philosophical problem that we can't resolve in this comment thread. (I super want to write a post about it though.)
8steven046118dYes, that looks like a crux. I guess I don't see the need to reason about calibration instead of directly about expected log score.