I'm kicking myself on #16 - I don't know enough about epidemiology to make such a strong guess.
Yeah, I did a similar thing on #38 where I was similarly overconfident on an economy question which I don't know nearly enough about.
On #16 itself I was lower than I should have been because I was using "virus" as a reference class rather than "respiratory virus" which was an obvious mistake looking back at it.
It looks like you're using the correct formula but maybe with a mistake of what the "p" in the formula means so that your scores on questions where the result was "false" are incorrect.
I think you maybe used ln(probability put on "true")-ln(.5) and then multiplied the result by -1 if the actual answer was false?
The formulation Scott used was ln(probability put on the correct answer)-ln(.5)
So for q3 for example the calculation shouldn't be
but should be
One for older / more interested kids - the Monty Hall problem.
I remember my uncle spending a long time going through this with me and having to actually run the scenario a few times for me to believe he was right!
Welcome to the predictions fun!
Im impressed with how little you put on 14&15, those were particularly good predictions IMO.
I think there might be an error on your calculation sheet - for instance your score for 3 should be the same as your score for 5?
Looking at the study it doesn’t look like the participants in the trial were randomised - rather if you wanted to use Taffix you could.
If I’m right I’m not sure what to make of it - you could have selection bias either way. More conscientious/concerned people took it or people with jobs where they had higher exposure levels took it. I would guess the former effect would be larger but not sure.
I was going to write up my thoughts on this but it would be easier to just comment here.
I agree with your assessments for almost all of these. I was most impressed by your understanding of the politics in Q9 & 11 (China and Hydroxychloroquine) and the predicting the lack of consensus for Q14 & 15.
A couple where I have a question:
1. On 6/7 (US highest toll official & unofficial) I had a bit more probability on Brazil (similar to India, more than China) – given large population (2/3rds US) and approach of the government.
Regarding official vs unofficial, you only mention deliberate lying but I had more expectation of insufficient / bad testing hiding true amounts than lying. According to WSJ Russia’s excess deaths are 4.8x higher than their official deaths (compared to 1.7x for US). This isn’t enough to overtake the US but I think this gives an idea of the scale of the potential problem. Mexico’s excess deaths are higher than Brazil’s despite having 35% fewer official cases. (India isn’t included in those numbers - excess deaths stats aren’t available I think).
Does that change your mind as to what a good prediction would have been?
2. On q17 (second wave) your prediction for p(17|16) is ~29%. Given that we are in a world where there is a general consensus that summer made things less bad, 29% seems low for a second wave even given the difficult operationalisation? My corresponding number was 50% which still seems better to me (although I messed up q16 so we actually predicted the same for 17 itself). In terms of which way it resolves, I think just numbers of deaths resolves this as clearly true (assuming by Autumn we mean 22 Sep – 21 Dec), both in terms of official result and intent:
Was there a second wave in Autumn? Yes, in late Autumn running into early Winter.
The problem is the notice given which results in the low correlation you mention. (by audit I don't really mean financial audits as I don't have experience of those - I'm more thinking of quality audits)
I find it interesting that company audits (that I’ve experienced anyway) suffer from the same problem as ofstead inspections.
It is perhaps worth noting that Ofstead inspections are nowadays done with a day advance warning and can be done with no warning.
Yeah, I didn't actually answer q18 either (possibly knite maybe used my list as a basis?) for exactly that reason. Scott just put me in as the same as him for that question for the purposes of making an apples-to-apples comparison which seemed fine - no idea what I would have put if I had answered!