Lessons from covid predictions: Always use multiple models

17th Sep 2021

1JBlack

1pku

New Comment

2 comments, sorted by Click to highlight new comments since: Today at 1:46 AM

When in January did you make the first prediction? Near the start of January, 30% would have been too high given publically available information. Even scattered non-public information only collated in retrospect wasn't enough to assign 30% chance to a pandemic at that stage.

Near the end of January, I agree that 30% would have been too low.

I think I was at about the same place for most of it, but unfortunately I didn't write that one down and can't go back and check :/.

## Epistemic status: This point isn't novel (it's one of the the ten commandments of superforecasting, more or less), but it's often underappreciated. Crossposted here.

I went out of my way to make two explicit predictions about covid over the course of the pandemic. The first, back around January 2020, was a vague claim that internet people's worrying about the pandemic was probably overblown and it'd probably level off before it got global. The second one (dated March 12 2021) was more explicit:

My reasoning behind the first prediction was that we'd had pandemic scares every few years for a while - swine flu, SARS 1, Ebola, Zika, and they'd all fizzled out. So there's an argument that your prior should be that most pandemic scares fizzle out.

The first prediction, obviously, was wrong. The second was technically correct (the actual date of the median state making vaccines available was 4/19), but just barely, and thanks to an unprincipled extension of my error bounds (I'd run a spreadsheet with a few different scenarios for the rate of vaccine acceleration, then added a few days to each side to get my interval. The US was nowhere at only 0.63 SPP at the time.) I'd give myself a 0.5/2 here, which isn't a great track record.

Where did I go wrong?

My biggest mistake was only having one model for each prediction. With the first one, I didn't even have a confidence interval. With the second one, I accounted for uncertainty within the model, but not for the uncertainty from the model being off. We should always, always consider at least two models when making a prediction. Think of it as running a Monte Carlo simulation on the meta level - we run through the meta-level process of coming up with different models a few times, and that lets us improve accuracy (by averaging the models), and estimate our model uncertainty (by seeing how much they vary).

What could I have done here? For the first prediction (whether or not covid would go global), I could have done an inside-view estimate. Look an infection rates, look at the error bars on how good preventative measures might be, try to guess how likely they were to fail. I probably (given what I knew at the time) would have ended up at somewhere around 30% chance of it going global - still low, but no longer crazy off.

For the second one, I could have come up with an inside-view model - try to estimate how much pent-up demand there was in special categories and see when we'd run through it, or compare US states to each other instead of just to a different country. It would have given a result closer to the truth, and would have let me estimate model uncertainty without resorting to an ad-hoc "eh let's just add a few days to fudge it".

(Can the multi-model approach fail? If my extra model for the first question was "go out to the street and ask someone if he had covid", it would have made me worse off. Doing the math, adding models fails if the within-model error is significantly larger than our model uncertainty. So our models do have to be at least reasonably good).

Finally, on a more optimistic note - we did manage to learn from this going forward. When we ran through trying to estimate the infection risk at NYC solstice, we made the inside-view calculations for likely results based on microcovid's infection estimates and all that - but we also sanity-checked them by looking at results from similar events like lollapalooza, and it helped to see that they had similar results to our estimates.