Last year, I looked at Scott's forecasts for 2021 and compared them to the market forecasts. Today I went through those forecasts (and Zvi's * - a buy/hold/sell exercise done on Scott's estimates) added the resolutions and calculated a Brier score and a log-score.
Results were as follows:
So in summary "market" about as good as Zvi and both better than Scott . (Albeit on a pretty small sample of 19 questions). (Lower is better for Brier score and log-score)
Full details can be found here
* I made a couple of assumptions when calculating Zvi's probabilities for things where he wasn't super explicit about his numbers. I will of course update these if asked.
OK, so I am obviously biased but I'll look to see if I think this is fair.
First of all, I didn't look at market prices for a lot of the things (where I did, I mentioned it). If I had done this more I would have done considerably better. Instead, I was saying whether I would trade on Scott's markets based on my current knowledge level. Does that count as predicting that number when comparing to the market? That's up to you to decide.
You could of course just say 'should have done the research.' You could also say 'I'm comparing your ability to predict to what a market would do, on arbitrary questions, so tough that you only had Scott's prediction' or something. Again, not my call?
Second of all, the procedure for deciding what I meant seems to not match the way I was making predictions. In general, it would be fair to say that 'buy to X%' is actually saying 'it's at least X%' so my 'fair' must be higher than that ,and reverse for selling.
But it's pretty bad to be doing this now, in hindsight, if we want to do Briar we need to specify those numbers at that time.
So for e.g. Biden's approval, 80% was a dumb prediction and I should have sold it down somewhat. But Starlink I would strongly push back. Basic summary:
EDITED VERSION 4/27: I updated a lot on Scott being at 30% for this (e.g. 70% for this being recognized) in the original, and moved it to 50%. With Scott at 70% instead, we’re much closer, but I think I still want to nudge a little higher and buy this to 75%, instead of moving 30% to 50%. This is a sign of how much I’m reluctant to move a reasonable person’s odds in this type of exercise; if you’d asked me before seeing Scott’s number, I’d have said recognition is very unlikely, and put it at something like 85%-90%, and my true probability is still likely 80% or so.
I think when I say my 'true probability is 80% for not happening' you need to give me a 20% for happening.
17. Astrazeneca: Probably was actually slightly lower having only seen Scott, but seeing market would have undone that. 20 seems fine.
The big adjustment is that I took a big knock for the 50% on Q16, and that's just a misread, should be 20%.
I'll let Simon decide what to do with the rest. I also find it super weird to be punished vs. market for when I said "this is the wrong price, do an arbitrage' in the correct direction, and made money even vs. market prices doing the trade, but hey.
So I'm a little worried we've used different sources for your forecasts, but to explain where we differ:
I think the major disagreement seems to be I've used your LW post when I should have used a different blog post. Would you mind linking me to the right one?
https://thezvi.wordpress.com/2021/04/27/scott-alexander-2021-predictions-buy-sell-hold/ is the canonical version. Surprised the differences were this big. The struggle on knowing when to update all versions is real, especially now that there's 3x.
Then beyond that your decisions seem fine.
And no need to apologize for doing the exercise, it's good to check things, long as it's clear what's being done.
When/if I do predictions for 2022 I'll see what I can do about also including explicit fairs (and ideally, where I'd call BS on a market, and where I wouldn't).
Yeah, this is definitely my bad. I didn't ask you (or Scott) whether or not you were happy with me comparing your comments to market forecasts. I apologise. I also didn't intend to make this as normative as it sounds. (FWIW in the past I have gone to bat for your forecasting skills and given your forecast and a market forecast most of the time I would expect to update away from the market and towards you)
I do disagree that you should get "better than market" for some of the things where you would arbitrage Scott. If Scott was putting up $ on his forecasts then I would agree, but afaik his forecasts aren't there to be traded with.
Thanks for doing this!
I think the post should mention the fact that Zvi's forecasts were made after reading Scott's.
I'll add that note
It looks to me as though your computation of log scores in the Google Sheet are wrong, and it’s not just a sign error:
The correct log-score (Y log p + (1-Y) log (1-p), where Y is the outcome and p is the prediction) should be 0 for a perfect prediction (e.g. p=0 and the event didn’t happen) and should approach -infinity as the prediction becomes more and more confidently wrong. However, in the formula you used (-Y log (1-p) - (1-Y) log p), as we approach a perfect prediction, our score becomes infinitely large, whereas at the other extreme, the score is just 0. This can’t be a proper scoring rule, because the guesser would be incentivized to always predict p=0 or p=1.
Thanks for flagging, fixed
Woop great post and great thread!
If anyone can figure out how to format that table, I would appreciate it, thanks!
I have been trying to format tables on LW for a while have up and started using images.