What do people think of this preprint from March 13th?

It suggests:

  • R0=~5 in Wuhan in January (pre-containment measures)
  • Infection fatality rate=~0.1% (several orders of magnitude smaller than the crude CFR estimated at 4.19%)
  • ~2 million infections in Wuhan on Jan 23rd / ~20% of people infected

The authors are very reputable (GScholar profile first author, senior author, also quoted in the NYT).

If this is true, might there be many more (asymptomatic) cases everywhere now than people think?

[Reddit thread]

From paper:

"Recently more evidence suggests that a substantial fraction of the infected individuals with the novel coronavirus show little if any symptoms, which suggest the need to reassess the transmission potential of this emerging disease"

New Answer
Ask Related Question
New Comment

3 Answers sorted by

Like others I doubt the infection and fatailty rates because of South Korea and Diamond princess (if the author knew about how much this result conflicts with those datasets then its up to them to argue why the new paper is better).

R0=5 isn't completely unbelieveable. If the doubling time without containment measures is 2 days and the infective period is 12 days (i.e. 5 days incubation period and a week afterwards) then R0=5. Unfortunately based on the rather unbelievable infection and fatality rates I don't think this paper really adds any evidence for this - it suggests the model is fatally flawed.

True, but Diamond Princess is full of oldies, and, despite South Korea massive testing, there might be selection bias - I guess people would only get tested if they had some symptom or contact with other infected persons (perhaps you're referring a more specific study?). Notice that, if the science study claiming 86% of the cases in Wuhan were undocumented were right, this would already imply a fatality rate of about 0.6%, below South Korea estimates.

Yet, I agree the fatality rate is surprisingly low, and it's just a statistical model.

Diamond princess is important because they did 100% testing so it gives us an idea of asymptomatic : symptomatic ratio. The result was roughly 1:1, nothing like 50:1 or whatever this paper suggests. The science study with 6:1 is at least plausible if you account for symptomatics who weren't identified.

If South Korea hadn't managed to test the majority of their cases then it is unlikely that they would have managed to reduce their infection rate so dramatically - their quarantine measures aren't massively strict although I think the population are self-enforcing good practice pretty well. I doubt that Wuhan death rates could be below South Korean rates due to the acknowledged overcrowding in Wuhan. Again, 0.6% is kind of plausible, the model here (0.1%) isn't.

3Ramiro P.3y
I'm sorry, I'm not sure if I understood the relevance of asymptomatic : symptomatic ratio here. I think what's at stake in this article is the ratio undocumented : documented cases; it'll include not only asymptomatic, pre-symptomatic or mildly symptomatic people, but people who got really sick but couldn't be tested until Hubei had largely improved their testing capabilities. I do think a 50:1 rate is surprising, though not impossible. If 50% of the cases in South Korea are asymptomatic and so don't get tested, their true death rate would be ~0.4-0.5%; if you add people who got sick before their testing capability was improved, etc., it may be lower. But again, I really prefer to be pessimistic in my death rates.
If there is a 1:1 symptomatic:asymptomatic ratio and 2,000,000 odd infections then there are 1,000,000 symptomatic people out there and only 40,000 identified. Of that 1,000,000 we expect 200,000 to require hospitalisation and 50,000 to require ICU. If this was true I would expect someone to have noticed. There might be another explanation for the figures that I’m missing but, as I said, I think it’s up to them to explain what they think is going on.
More points in favor of a higher IFR: * The percentage of asymptomatic cases on the Diamond Princess was even lower than 50%. It was only about 18%. [https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2020.25.10.2000180](I trust this figure because the paper has author overlap with the paper that gave a higher figure initially, and it's written by the same author who made the 0.1% estimate and we'd expect this person to – if anything – have a bias toward expecting a larger number of asymptomatic cases). * About the age distribution on the Diamond Princess: I tried doing age adjustment for it here [https://www.lesswrong.com/posts/ACyGvQchWzGjGkKgS/coronavirus-open-thread?commentId=vrAnc3EghxprLD2py]. ((Edited because I revised some estimates.))
2Ramiro P.3y
Maybe CDC screwed their data, but they say 46.5% of the Diamond Princess cases were asymptomatic when tested: https://www.cdc.gov/mmwr/volumes/69/wr/mm6912e3.htm?s_cid=mm6912e3_w [https://www.cdc.gov/mmwr/volumes/69/wr/mm6912e3.htm?s_cid=mm6912e3_w] I believe this might be a confusion between asymptomatic and pre/mildly symptomatic - but whatever: the claim at stake is that there's a ton of undocumented cases out there, not that they're asymptomatic
They write "at the time of testing." The study I cite followed up with what happened to patients. Also relevant: In the last 5 days, 3 more people who had tested positive on the Diamond Princess died. And one person died two weeks ago but somehow it wasn't reported for a while. So while my own estimates were based on the assumption that 7 / 700 people died, it's now 11 / 700.
1Ramiro P.3y
I noticed CDC claims 9 deaths from Diamond Princess, but I didn't find support in their source. WHO is still counting 8 deaths. I guess you're right, but I'd appreciate if you could provide the source. I know that. If you follow this discussion up to the beginning, you'll see that all I'm claiming is that the number of documented cases has been affected by selective bias, because asymptomatic / pre-symptomatic etc. cases are unlikely to be diagnosed. Finally, I believe we both agree the current IFR is underestimating the true death rate, because many patients are still fighting for their lives. Actually, the authors of the preprint are not complete morons and estimate the "time-delayed IFR" in 0.12% (which I agree is too low), and make the following remark to explain the higher mortality in Wuhan: I'm not saying this study is right. I'm just saying that, unless someone points a methodological flaw, "their conclusion is too different" is not a reason to discard it.
I read about the new deaths on the Wikipedia article [https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_on_cruise_ships]. -- Okay. I feel like the discussion is sometimes a bit weird because the claim that there are a lot of undocumented cases is something that both sides (high IFR or low IFR) agree on. The question is how large that portion is. You're right to point to some sampling biases and so on, but the article under discussion estimates an IFR that it at least a factor 5 below that of other studies, and a factor of 4 (or 3.5 respectively) below what I think are defensible lower bounds based on analysis of South Korea or the cruise ship. I don't think selection bias can explain this (at least not on the cruise ship; I agree that the hypothesis works for China's numbers but my point is that it conflicts with other things we know). (And I already tried to adjust for selection bias with my personal lower bounds.) It depends on the reasoning. We have three data sets (there are more, but those three are the ones I'm most familiar with): * South Korea * The Diamond Princess * China How much to count evidence from each data set depends on how much model uncertainty we have about the processes that generated the data, how fine-grained the reporting has been, and how large the sample sizes are. China is good on sample size but poor in every other respect. The cruise ship is poor on sample size but great in every other respect. South Korea is good in every respect. If I get lower bounds of 0.4% and 0.35% from the first two examples, and someone writes a new paper on China (where model uncertainty is by far highest) and gets a conclusion that is 16x lower than some other reputable previous estimates [https://www.medrxiv.org/content/10.1101/2020.03.04.20031104v1] (where BTW no one has pointed out a methodological flaw either so far), it doesn't matter whether I can find a flaw in the study design or not. The conclusion is too implausible compared to the pau

New editorial about the asymptomatic rate in Nature - the author of the preprint above are featured in this as well. They say asymptomatic and mild case rate might be up to 50% of all infections and that these people are infectious.

And another preprint saying there were +700k cases in China on 13th of March:

"Since severe cases, which more likely lead to fatal outcomes, are detected at a higher percentage than mild cases, the reported death rates are likely inflated in most countries. Such under-estimation can be attributed to under-sampling of infection cases and results in systematic death rate estimation biases. The method proposed here utilizes a benchmark country (South Korea) and its reported death rates in combination with population demographics to correct the reported CO... (read more)

From the paper: Note that South Korea's reported (naive) CFR is at >1% by now. It's possible that the authors adjusted for the fact that most of South Korea's cases were still active at the time of writing (about 55-60% of cases are still active now, I think), but I don't see this in this paper. It probably doesn't make a huge difference, but still relevant that this could cause the estimates to be a bit too low.
From the paper: Am I right that they're not factoring in that patients had worse prospects in Wuhan than in South Korea? I feel like whatever the outcome of their adjustment process, that value would need to be multiplied by a factor >1 which represents hospital overstrain in Hubei, where at least 60% of China's numbers stem from (probably more but I haven't looked it up). I don't know how large that adjustment should be exactly, but I find it weird that there's no discussion of this. Am missing something about the methodology (maybe it factors in such differences automatically somehow)? Ah, OK: They list this as an assumption: This is important to keep in mind when we try to derive implications from their estimate. Especially if we look at the hospitalization rates estimated here [https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-NPI-modelling-16-03-2020.pdf] on page 5. For this disease in particular where people sometimes have to stay in hospitals for several weeks, it's hard to imagine that treatment only makes a small difference.

And yet another preprint estimating the R0 to be 26.5:

Quotes from paper:

"The size of the COVID-19 reproduction number documented in the literature is relatively small. Our estimates indicate that R0= 26.5, in the case that the asymptomatic sub-population is accounted for. In this scenario, the peek of symptomatic infections is reached in 36 days with approximately 9.5% of the entire population showing symptoms, as shown in Figure 3."

I think they estimate about 1 million severe cases in the US alone if left unchecked at the peak.

"It is unlike... (read more)

1Hauke Hillebrandt3y
from supplementary materials: "DISCLAIMER: The following estimates were computed using 2010 US Census data with 2016 population projections and the percentages of clinical cases and mortality events reported in Mainland China by the Chinese Center for Disease Control as of February 11th, 2020. CCDC Weekly / Vol. 2 / No. 8, page 115, Table 1. The following estimates represent a worst-case scenario, which is unlikely to materialize. • Maximum number of symptomatic cases = 34,653,921 • Maximum number of mild cases = 28,035,022 • Maximum number of severe cases = 4,782,241 • Maximum number of critical cases = 1,628,734 • Maximum number of deaths = 3,439,516" https://drive.google.com/drive/folders/18qaRKnQG1GoXamnzJwkHu2GG9xCe4w8_ [https://drive.google.com/drive/folders/18qaRKnQG1GoXamnzJwkHu2GG9xCe4w8_]
12 comments, sorted by Click to highlight new comments since: Today at 6:03 AM

tl;dr: Someone wrote buggy R code and rushed a preprint out the door without proofreading or sanity checking the numbers.

The main claim of the paper is this:

The total number of estimated laboratory–confirmed cases (i.e. cumulative cases) is 18913 (95% CrI: 16444–19705) while the actual numbers of reported laboratory–confirmed cases during our study period is 19559 as of February 11th, 2020. Moreover, we inferred the total number of COVID-19 infections (Figure S1). Our results indicate that the total number of infections (i.e. cumulative infections) is 1905526 (95%CrI: 1350283– 2655936)

So, they conclude that less than 1% of cases were detected. They claim 95% confidence that no more than 1.5% of cases were detected. They combine this with the (unstated) assumption that 100% of deaths were detected and reported, and that therefore the IFR is two orders of magnitude lower than is commonly believed. This is an extraordinary claim, which the paper doesn't even really acknowledge; they just sort of throw numbers out and fail to mention that their numbers are wildly different from everyone else's. Their input data is

the daily series of laboratory–confirmed COVID-19 cases and deaths  in Wuhan City and epidemiological data of Japanese evacuees from Wuhan City on board government–chartered flights

This is not a dataset which is capable of supporting such a conclusion. On top of that, the paper has other major signals of low quality. The paper is riddled with typos. And there's this bit:

Serial interval estimates of COVID-19 were derived from previous studies of nCov, indicating that it follows a gamma distribution with the mean and SD at 7.5 and 3.4 days, respectively, based on [14]

In this post I collected estimates of COVID-19's serial interval. 7.5 days was the chronologically first published estimate, was the highest estimate, and was an outlier with small sample size. Strangely, reference [14] does not point to the paper which estimated 7.5 days; that's reference 21, whereas reference 14 points to this paper which makes no mention of the serial interval at all.

I was particularly bemused by quoting cumulative infections to 7 significant figures where the 95% confidence interval spanned a factor of 2. This did not fill me with confidence...

This suggests that South Korea missed about 90% of infections despite their extensive testing, which many have argued is responsible for their success at containment. This is so implausible that I'm hesitant to even look at the paper. But I bookmarked it and will report back if I find something interesting!

It's interesting though that with the swine flu pandemic, experts were initially alarmed about a somewhat high IFR, and later on it turned out that the vast majority of cases were extremely mild. The WHO got accused of "crying wolf" over swine flu even though it ended up infecting more than 11% of the planet, and killing more than a hundred thousand people according to this Wikipedia article. So it was really bad, but initially some experts feared it would be a lot worse.

Might something similar be going on with Covid-19? I'm pretty sure that the answer is no, but I thought it was interesting that there's a recent precedent for missing large numbers of milder cases.

I don't think it's the same with Covid-19 because:

  • Sneaky features of the disease, such as transmissibility prior to showing symptoms and the long incubation period, can already account for the R0 being high. The vast iceberg of completely asymptomatic cases is not needed to explain why this virus is so hard to contain.
  • Everything about South Korea's data points toward a high IFR even in conditions where hospitals still have capacity.
  • Only 18% of people on the cruise ship were asymptomatic (this is strong evidence against the vast iceberg of asymptomatic cases), and the ship had an IFR of 1%. After adjusting for age, it doesn't drop enough to go below 0.5%. In fact I'm not sure it drops substantially at all (I've seen different takes on this).
  • This study estimated an IFR of 1.6% for China's numbers (up to a certain point in February). It has been praised as "looks solid" by some knowledgeable EAs and I've yet to see someone criticize it in a direct way.
  • This is more of a system-1 argument than something I can put numbers to, but from reading all the reports from hospitals in Italy and the Seattle area, I find it really hard to square IFR estimates lower than 0.5% with those reports. Doctors are constantly and desperately trying to communicate that it's so much worse than everyone else seems to think. This virus really turns a lot of people lung tissue into something called "ground glass." That sounds like it should be a lot more deadly than Swine flu.

Counterarguments to my view:

  • There was this tweet two days ago by an Italian doctor who reported that it seems as though 50%+ of the people tested in Veneto (and they went from household to household to test almost everyone, apparently) seemed to be asymptomatic.
    • I don't think this counts for too much though, because it could also be that those people are still in the incubation period, or that the test has a somewhat high rate of false positives (I suspect that even 4% false positives would generate this sort of picture, but I'm not sure).
  • Maybe asymptomatic cases happen mostly in young people and children (there's some evidence of this in South Korea, though it's not so clear whether they mean "100% asymptomatic" or "mild symptoms"). There weren't that many young people or children on the Diamond Cruise, so even though only 18% of people there were truly asymptomatic, this evidence might be consistent with the total rate of asymptomatic infections being at 50% for typical demographics.
    • That said, 50% of asymptomatic cases is nowhere near enough to get us down to a 0.1% IFR. The main update I could see happening is reducing other estimates by 25% or (maybe, if we stretch it) 50%, rather than by 70-90%.

IFR for the rest of 2020:

  • Unfortunately, I think we cannot directly rely on IFR estimates based on data from South Korea or the Cruise ship to estimate the IFR for the rest of 2020. I expect the majority of 2020 cases to happen in places where hospitals will be overstrained. I think this is likely to roughly double the IFR compared to more favorable conditions. (Note that this consideration wouldn't necessarily double the estimated 1.6% by the study on China's numbers, because those already factor in hospital crowding to a substantial degree. Most diagnosed cases happened in Hubei.)

Ground glass opacity is named after its visual appearance on a CT scan. Information I can find (and I'm not a doctor, don't trust me at face value!) suggests that it's generally reversible and doesn't indicate any more severity than the pneumonia it's detecting.

The obvious question to reconcile the Diamond Princess and Veneto is: do the tests have subclinical thresholds, and if so are they different? I don't know where to begin researching that, though. (And as a more general concern, I worry the entire line of questioning might be overfitting, maybe there's some random reason that has nothing to do with the general pandemic.)

You're right re the "ground glass", it's describing what the lung looks like on imaging and is very non-specific. (Many etiologies and a long list of differential diagnoses).

A good article re ground-glass opacification and what might have caused it.

As mentioned in a comment above, one of the (pretty highly credentialed) authors of this preprint has written two papers on the Diamond Princess, and so, excuse the appeal to authority, but any argument against this paper based on Diamond Princess doesn't seem likely to invalidate conclusions of this preprint .

Also this squares seemingly squares more with John Ioannidis take on Corona:

"no countries have reliable data on the prevalence of the virus in a representative random sample of the general population."

And that airborn-ish transmission is highly likely.

Also this seemingly squares more with John Ioannidis take on Corona:

Ioannidis makes this claim:

Projecting the Diamond Princess mortality rate onto the age structure of the U.S. population, the death rate among people infected with Covid-19 would be 0.125%.

I don't find a source for this. The adjustments I saw looked different. If he's right about those 0.125%, that would be an important update!

But it feels more plausible to me that the 0.125% thing went wrong somewhere because it just seems ruled out by South Korea, which unlike European countries has their outbreak contained. I can't see how South Korea could somehow have missed 700% of their reported cases even though they are conducting 10,000 tests daily, and have fewer than 10,000 confirmed cases.

UPDATE: I took a shot at doing the age adjustment myself here. The summary: I don't see how one can get anything below 0.3% and, adjusting for selection effects where the least healthy people probably avoid going on cruises, even going below 0.5% seems implausible to me. UPDATE2: I adjusted my estimates after finding more precise data. I still think 0.125% is too low, but I think something like 0.2% is perhaps already defensible. This suggests that the estimate was closer than I thought and I now consider the Diamond Princess not to be evidence in favor of IFR of 0.5% or higher (assuming no hospital overstrain).

As mentioned in a comment above, one of the (pretty highly credentialed) authors of this preprint has written two papers on the Diamond Princess, and so, excuse the appeal to authority, but any argument against this paper based on Diamond Princess doesn't seem likely to invalidate conclusions of this preprint .

Interesting, I wasn't aware of that! Makes me upshift that I was wrong, but also upshift that one author is responsible for several studies that I found dubious.

I looked through his list of publications and it seems he finished 2 papers on the prevalence of asymptomatic cases on the Diamond princess already (but not on fatality rates from there!). And the second one reports a point estimate that is outside the 95% confidence interval of the first paper, yet I don't see any addendum to the first paper. This seems kind of odd?

And that airborn-ish transmission is highly likely.

I don't have strong views on that. The only thing I feel confident about is that an IFR of below 0.5% seems extremely implausible.

The ~1% infection fatality rate on the Diamond Princess (where everyone was tested) is pretty solid evidence against this.

Not sure: the Diamond Princess is mentioned in this preprint and in fact one of the authors of this preprint wrote two papers on the Diamond Princess:


So I think they thought about this,

They don't mention Diamond Princess IFR estimates in their paper, though. In fact, the study doesn't cite other studies on IFR estimates for SARS-CoV-2 at all. I don't get what's going when soemeone writes a paper with a conclusion that's 5-10x lower than all the other estimates before, but instead of including a discussion on why this might be the case or how it might fit with apparently contradictory data points (e.g., the cruise ship IFR or South Korea's IFR), they just move on to the next paper. Credentials or not, I find that process pretty dubious. I realize that there's an implicit hypothesis in the paper that "because transmission is stronger than we thought, others might have underestimated the number of mild or asymptomatic cases." Okay, but that hypothesis is contradicted by data points he must be aware of (as you say, he wrote papers on the cruise ship). Why is there no discussion on this?

New to LessWrong?