Brain-dumping some miscellaneous Covid-19 thoughts.
[if you’re skimming, this is the least interesting part of the post IMO]
In late March I wrote this big, long dramatic proclamation about information cascades and stuff.
Back then, it felt like the situation in the US was at an inflection point – at various kinds of inflection point – and I was feeling this particular combination of anxiety and passion about it. A do-or-die emotion: something was happening quickly, we had a limited window in which to think and act, and I wanted to do whatever I could to help. (”Whatever I could do” might be little or nothing, but no harm in trying, right?)
I felt like the intellectual resources around me were being under-applied – the quality of the discussion simply felt worse than the quality of many discussions I’d seen in the past, on less important and time-sensitive topics. I did my best to write a post urging “us” to do better, and I’m not sure I did very well.
In any event, those issues feel less pressing to me now. I don’t think I was wrong to worry about epistemically suspect consensus-forming, but right now the false appearance of a consensus no longer feels like such a salient obstacle to good decision-making. We’ve seen a lot of decisions made in the past month, and some of them have been bad, but the bad ones don’t reflect too much trust in a shaky “consensus,” they reflect some other failure mode.
Carl Bergstrom’s twitter continues to be my best source for Covid-19 news and analysis.
Bergstrom follows the academic work on Covid-19 pretty closely, generally discussing it before the press gets to it, and with a much higher level of intellectual sophistication while still being accessible to non-specialists.
He’s statistically and epistemically careful to an extent I’ve found uncommon even among scientists: he’s comfortable saying “I’m confused” when he’s confused, happily acknowledges his own past errors while leaving the evidence up for posterity, eloquently critiques flawed methodologies without acting like these critiques prove that his own preferred conclusions are 100% correct, etc.
I wish he’d start writing this great stuff down somewhere that’s easier to follow than twitter, but when I asked him about starting a blog he expressed a preference to stay with twitter.
I was actually thinking about doing a regular “Bergstrom digest” where I blog about what I’ve learned from his twitter, but I figured it’d be too much work to keep up. I imagine I’ll contribute more if I write up the same stuff in a freeform way when I feel like it, as I’m doing now.
So, if you’re following Covid-19 news, be sure to read his twitter regularly, if you aren’t already.
The Covid-19 projections by the IHME, AKA “the Chris Murray model,” are a hot topic right now.
- On the one hand, they have acquired a de facto “official” status.
CNN called it “the model that is often used by the White House,” In other news stories it’s regularly called “influential” or “prominent.” I see it discussed at work as though it’s simply “the” expert projection, full stop. StatNews wrote this about it:
The IHME projections were used by the Trump administration in developing national guidelines to mitigate the outbreak. Now, they are reportedly influencing White House thinking on how and when to “re-open” the country, as President Trump announced a blueprint for on Thursday.
I don’t know how much the IHME work is actually driving decision-making, but if anyone’s academic work is doing so, it’s the IHME’s.
- On the other hand, the IHME’s methodology is really flawed in a bunch of ways.
As far as I can tell, this isn’t a controversial opinion. The flaws weren’t subtle or carefully hidden: Bergstrom noticed some of them so fast he was able to write up his concerns the same day the paper was released. The same concerns, and others, have been echoed elsewhere, e.g.
Annals of Internal Medicine: “Caution Warranted: Using the Institute for Health Metrics and Evaluation Model for Predicting the Course of the COVID-19 Pandemic”
StatNews, Influential Covid-19 model uses flawed methods and shouldn’t guide U.S. policies, critics say
Marchant et. al., an academic preprint doing retrospective model validation – found that >50% of actual outcomes have been outside the IHME’s 95% confidence bands (!!!)
Woody et. al., an academic preprint with a similar overall methodology, but corrects for one of the IHME’s mistakes (IHME did OLS fitting to cumulative data, then used uncertainty estimators that assume uncorrelated errors – this is stuff they tell you not to do in stats 101!!!)
More recent Bergstrom thread on a different flaw in the model
I find this situation frustrating in a specific way I don’t know the right word for. The IHME model isn’t interestingly bad. It’s not intellectually contrarian, it’s just poorly executed. The government isn’t trusting a weird but coherent idea, they’re just trusting shoddy work.
And this makes me pessimistic about improving the situation. It’s easy to turn people against a particular model if you can articulate a specific way that the model is likely to misdirect our actions. “It’s biased in favor of zigging, but everything else says should zag. Will we blindly follow this model off a cliff?” That’s the kind of argument you can imagine making its ways to the news.
But the real objection to the IHME’s model isn’t like this. Because it’s shoddy work, it sometimes makes specific errors identifiable as such, and you can point to these. But this understates the case: the real concern is that trusting shoddy work will produce bad consequences in general, i.e. about a whole set of bad consequences past and future, and the ones that have already occurred are just a subset.
I feel like there’s a more general point here. I care a lot about the IHME’s errors for the same reason I cared so much about Joscha Bach’s bad constant-area assumption. The issue isn’t whether or not these things render their specific conclusions invalid – it’s what it says about the quality of their thinking and methodology.
When someone makes a 101-level mistake and doesn’t seem to realize it, it breaks my trust in their overall competence – the sort of trust required in most nontrivial intellectual work, where methodology usually isn’t spelled out in utterly exact detail, and one is either willing to assume “they handled all the unmentioned stuff sensibly,” or one isn’t.
Quick notes on some of the IHME problems (IHME’s paper is here, n.b. the Supplemental Material is worth reading too):
They don’t use a dynamic model, they use curve-fitting to a Gaussian functional form. They fit these curves to death counts. (Technically, they fit a Gaussian CDF -- which looks sigmoid-like -- to cumulative deaths, and then recover a bell curve projection for daily deaths by taking the derivative of the fitted curve.)
Objection 1. Curve fitting to a time series is a weird choice if you want to model something whose dynamics change over time as social distancing policies are imposed and lifted. IHME has a state-by-state model input that captures differences in when states implemented restrictions (collapsed down to 1 number), but it isn’t time-dependent, just state-dependent. So their model can learn that states with different policies will tend to have differently shaped or shifted curves overall – but it can’t modify the shape of the curves to reflect the impacts of restrictions when they happened.
Objection 2. Curve fitting produces misleading confidence bands.
Many people quickly noticed something weird about the IHME’s confidence bands: the model got more confident the further out in the future you looked.
How can that be possible? Well, uncertainty estimates from a curve fit aren’t about what will happen. They’re about what the curve looks like.
With a bell-shaped curve, it’s “harder” to move the tails of the curve around than to move the peak around – that is, you have to change the curve parameters more to make it happen. (Example: the distribution of human heights says very confidently that 100-foot people are extremely rare; you have to really shift or squash the curve to change that.)
To interpret these bands as uncertainty about the future, you’d need to model the world like this: reality will follow some Gaussian curve, plus noise. Our task is to figure out which curve we’re on, given some prior distribution over the curves. If the curves were a law of nature, and their parameters the unknown constants of this law, this would be exactly the right thing to do. But no one has this model of reality. The future is the accumulation of past effects; it does not simply trace out a pre-determined arc, except in science fiction or perhaps Thomism.
Objection 3. Curve symmetry causes perverse predictions.
Bergstrom brought this up recently. The use of a symmetric bell curve means the model will always predict a decline that exactly mirrors the ascent.
This creates problems when a curve has been successfully flattened and is being held at a roughly flat position. The functional form can’t accommodate that – it can’t made the peak wider without changing everything else – so it always notices what looks like a peak, and predicts an immediate decline. If you stay flat 1 more day, the model extends its estimated decline by 1 day. If you stay flat 7 more days, you get 7 more days on the other side. If you’re approximately flat, the model will always tell you tomorrow will look like yesterday, and 3 months from now will look like 3 months ago.
(Put another way, the model under-predicts its own future estimates, again and again and again.)
This can have the weird effect of pushing future estimates down in light of unexpectedly high current data: the latter makes the model update to an overall-steeper curve, which means a steeper descent on the other side of it.
(EDIT 4/20: wanted to clarify this point.
There are two different mechanisms that can cause the curve to decline back to zero: either R0 goes below 1 [i.e. a “suppressed” epidemic], or the % of the population still susceptible trends toward 0 [i.e. an “uncontrolled” epidemic reaching herd immunity and burning itself out].
If you see more cases than expected, that should lower your estimate of future % susceptible, and raise your estimate of future R0. That is, the epidemic is being less well controlled than you expected, so you should update towards more future spread and more future immunity.
In an uncontrolled epidemic, immunity is what makes the curve eventually decline, so in this case the model’s update would make sense. But the model isn’t modeling an uncontrolled epidemic -- if its projections actually happen, we’ll be way below herd immunity at the end.
So the decline seen in the model’s curves must be interpreted as a projection of successful “suppression,” with R0 below 1. But if it’s the lowered R0 that causes the decline, then the update doesn’t make sense: more cases than expected means higher R0 than expected, which means a less sharp decline than expected, not more.)
This stuff has perverse implications for forecasts about when things end, which unfortunately IHME is playing up a lot – they’re reporting estimates of when each state will be able to lift restrictions (!) based on the curve dipping below some threshold. (Example)
EDIT 4/20: forgot to link this last night, but there’s a great website http://www.covid-projections.com/ that lets you see successive forecasts from the IHME on one axis. So you can evaluate for yourself how well the model updates over time.
I remain frustrated with the amount of arguing over whether we should do X or Y, where X and Y are ambiguous words which different parties define in conflicting ways.
FlattenTheCurve is still causing the same confusions. Sure, whatever, I’ve accepted that one. But in reading over some of the stuff I argued about in March, I’ve come to realize that a lot of other terms aren’t defined consistently even across academic work.
Mitigation and suppression
To Joscha Bach, “mitigation” meant the herd immunity strategy. Bergstrom took him to task for this, saying it wasn’t what it meant in the field.
But the Imperial College London papers (1, 2) also appear to mean “herd immunity” by “mitigation.” They form their “mitigation scenarios” by assuming a single peak with herd immunity at the end, and then computing the least-bad scenario consistent with those constraints.
When they come out in favor of “suppression” instead of “mitigation,” they are really saying that we must lower R0 far enough that we don’t have a plan to get herd immunity and are basically waiting for a vaccine, either under permanent restrictions or trigger-based on/off restrictions.
But the “mitigation” strategy imagined here seems like either a straw man, or possibly an accurate assessment of the bizarre bad idea they were trying to combat in the UK at that exact moment.
Even in the “mitigation scenarios,” some NPI is done. Indeed, the authors consider the same range of interventions as in the “suppression scenarios.” The difference is that, in “mitigation,” the policies are kept light enough that the virus still infects most of the population. Here are some stats from their second paper:
If mitigation including enhanced social distancing is pursued, for an R0 of 3.0, we estimate a maximum reduction in infections in the range […] These optimal reductions in transmission and burden were achieved with a range of reductions in the overall rate of social contact between 40.0%- 44.9% (median 43.9%) […]
We also explored the impact of more rigorous social distancing approaches aimed at immediate suppression of transmission. We looked at 6 suppression scenarios […] the effects of widespread transmission suppression were modelled as a uniform reduction in contact rates by 75%, applied across all age-groups
In other words, if you still want herd immunity at the end, you can ask people to reduce their social contact ~43% (which is a lot!), but not more. The “mitigation” strategy as imagined here is bizarre: you have to be open to non-trivial NPI, open to asking your population to nearly halve their social interaction, but not wiling to go further – specifically because you want the whole population to get infected.
Meanwhile, I’ve seen other academic sources use “mitigation” in closer to Bergstrom’s sense, as general term for NPI and any other measures that slow the spread. (That paper also uses “flatten” in this same generic way.)
When Bach writes “containment,” he seems to mean the thing called “suppression” by ICL. (I.e. the good thing everyone wants, where you impose measures and don’t mysteriously stop them short of what would curtail herd immunity.)
When ICL write “containment,” they appear to mean something different. Among their suppression scenarios, they compare one confusingly labelled “suppression” to another labelled “containment” – see their Fig. 3 in the 3/16 paper. The difference is that, among interventions, “containment” lacks school closure but adds household quarantine. This agrees with the intuitive meaning of “containment,” but differs from Bach’s use and Bergstrom’s different usage.
I have no idea what this means. Apparently I’m in one right now? To Bach, it appears to mean (at least) city-level travel restrictions, a key component of Bach!Containment but not considered by ICL or other academics I’ve read.
While trying to Google this, I found this Vox article, which, well:
“The term ‘lock-down’ isn’t a technical term used by public health officials or lawyers,” Lindsay Wiley, a health law professor at the Washington College of Law, said in an email. “It could be used to refer to anything from mandatory geographic quarantine (which would probably be unconstitutional under most scenarios in the US), to non-mandatory recommendations to shelter in place (which are totally legal and can be issued by health officials at the federal, state, or local level), to anything in between (e.g. ordering certain events or types of businesses to close, which is generally constitutional if deemed necessary to stop the spread of disease based on available evidence).”
These probably mean something, but I cite all of the above as justification for my preemptive wariness about getting into any kind of argument about “whether we should do the hammer,” or who’s currently “doing the dance.”