Making Sense of Coronavirus Stats

by jmh1 min read20th Feb 202028 comments



[Update: Some have pointed out the definition of mortality rate should be that of deaths to some defined population, typically a median estimate over a period of time (week, month, year...) and not limited only to those infected. (New update here. The ratios I have been considering are called Case Fatality Rate.)

It was also correctly pointed out that my second calculation was simply wrong. The ration should not be deaths/recovered but deaths/(deaths+recovered) -- that is deaths to the total population considered. ]

From what I've seen WHO and other health organizations are saying the mortality rate for the new coronavirus outbreak is between 2 and 3 percent. That seems to be based on the ratio of the reported deaths to the reported cases of infection.

That doesn't seem right to me.

The last statistics I looked at (news report) was:

Total - 75,768

Recovered - 16,329 (21.6%)

Deaths - 2129 (2.8%)

However, that leave us with a bit over 75% with an unknown end state.

If I try to infer the outcome for all the reported infections I can think of two ways to estimate the end results. One, is to assuming the remaining cases will produce a similar outcome as has been observed so far. Using that assumption I can then iterate through the unknown cases using the 21.6% percent will recover, 2.8% will die and ~75% will move to the next round.

The other way would be to assume the ratio of the current deaths to current recoveries is a good measure of the mortality rate.

Using the first approach the total deaths approaches 8,740 people from the current population of 75,768. The second approach results in a higher number, 9879. Both of these numbers would suggest this version of the coronavirus is pretty bad, in terms of mortality rates. In the context of the other two coronavirus outbreaks, it seem closer to SARS, though a bit worse, than to MERS.

Should I think this approach to estimating mortality rates for new diseases without out a long history (like the flu) might be more accurate than the standard approach that seems to be deaths/total infections. This standard approach would seem to systematically under estimate the mortality as initially one would expect infections to be rising more rapidly than deaths.

The implication is that health organizations/bureaucracies might tend to be slow to react. The early numbers are nothing to worry about, and might even show a decreasing mortality rate initially. Then as the infected either succumb to the infection or recover mortality rates start rising producing greater concern and calls for action.

That seems to reflect the general response both from China and from WHO in many ways.

So as I've been writing this I've come to wonder if the way mortality rates are calculated might lead to poor bureaucratic responses. I wonder if we used either of the above, rather than the standard measure, might not be better. At the end of the day, both measures will converge to the same number over time as the daily, weekly, monthly or even annual data points become too small to really move the dial at all.



28 comments, sorted by Highlighting new comments since Today at 9:46 AM
New Comment

The whole situation surrounding the corona virus strikes me as a spectacular clusterfuck of global proportions.

I wouldn't put much confidence on any of those numbers. There's a whole bunch of factors that could skew an estimate of the mortality rate in either direction.

Off the top of my head, here's a few:

-The hospitals in Wuhan are completely overloaded, so:

    -the true mortality rate is going to be higher due to lack of intensive care for people who would otherwise pull through, creating a gap between the mortality rate in wuhan and elsewhere.

    -only the worst cases are going to be dealt with in hospitals, skewing the reported mortality rate towards the higher end.

    -Wuhan is taking extreme containment measures. For instance, [people are probably being welded into their rooms]( God knows what happens to them and how their infections turn out.

-The spread of the virus is rapid and exponential, so mortality can only be lower bounded by taking total dead/total cases

-It's faster to die than to recover, so you get the opposite effect from looking at deaths/(deaths + recovered)

-China seems to have a different method for reporting causes of deaths, leading to underestimation of the mortality rate.

-China has really bad air pollution, and there's weak evidence that smokers might be more suceptible to this disease. Men also smoke far more than women in china and it's being reported that men are disproportionately affected by the disease.

-CPP is probably not being very transparent or outright lying about the situation.

 -for instance as I understand it WHO officials have still not been allowed into Wuhan.

 -this is probably also happening internally such that even CPP officials can't really get a good grasp of the situation if they wanted to.

 -there are rumors of crematoriums operating 24/7 indicating that the real death rate is far higher than the reported death rate.

-Some paper also indicated (very)weak evidence of discrepancies in susceptibility between Asians and other races and at the moment I'm unaware of any confirmed deaths that aren't east asian. Apparently 2 4 Iranians died from the virus.

-I've also heard rumours of widespread online censorship, evidenced by trending misspellings of #coronavirus but without the proper spelling trending on twitter and moderator and administrator actions on reddit. Further obfuscating the underlying situation.

-There are rumours of possible reinfection and even that the second time round might be worse along with heavy censorship/firing of the people starting these rumours in china.

-Many countries are only screening people who have had direct contact with people coming in from china, and there's evidence of many asymptomatic/weakly symptomatic carriers. Thus, external death rates will also be difficult to estimate, as many unusual pneumonia deaths will not be attributed to the virus and the total number infected will not be known either.

-There's also evidence that the incubation period might be in some cases even higher than 14 days. This skews estimates based on total infected vs total dead.

-The tests we are using are new and have false positive and negative rates, further skewing the numbers.

All these factors either add uncertainty or skew the numbers in one direction or the other in a way that is both region and context dependent. When you put all of it together you get mortality rate estimates ranging anywhere from 0.1% up to +15%, and who knows about the long term effects of the disease.

Some paper also indicated (very)weak evidence of discrepancies in susceptibility between Asians and other races and at the moment I'm unaware of any confirmed deaths that aren't east asian.

There's at least two Iranians now, and the thick spread has so far been primarily in Asia aside from the cruise ship which we are only just getting to the point that we're starting to see deaths, and i'm pretty sure that data is confounded by the smoking effect because I have seen a follow-up that got to more of a tissue bank and did not see ethnicity differences.

Was that the analysis that provide the information on smoking in China by gender -- which was then consistent with the pattern in China of the majority of deaths being older males with existing health, and particularly respiratory, weakness.

Iran is accelerating quickly it seems -- 4 deaths now and 13 new cases, in less than a day from the first report I think

I'm combining that analysis with another preprint that went into more extensive higher N tissue bank data and found no correlation of ACE2 expression with ethnicity or gender.

To top it off with Iran, now we have local authorities saying its in many cities and TWO confirmed international travelers that caught it in Iran over the last few weeks (in Canada and Lebanon). That is the smoking gun, i'm calling thousands of cases there as of now.

I'm starting to suspect I won't be getting to that conference this June...

Mortality rate estimation, at this stage, is very hard. The relevant problems:

  • Severe cases are more likely to be detected than minor cases (but we don't know how much more likely)
  • Nearly all of the quantitative data comes from China, which is underreporting both cases and deaths in an unknown ratio
  • Cases which end in death resolve faster than cases which end in recovery

On top of which:

  • No one knows the morbidity rate at all
  • There are unconfirmed rumors that recovery may not confer lasting immunity (this is a priori unlikely but would make the situation much worse)
  • The prior is a sample from a distribution which contained SARS, MERS, and an large but unknown number of variations of common cold
  • The timeline for a cure could be anywhere from a month (chloroquine or remdesivir works) to never

The practical upshot of all of which is, the confidence intervals are all wide enough to drive a truck through.

There are unconfirmed rumors that recovery may not confer lasting immunity (this is a priori unlikely but would make the situation much worse)

There is more reason to think that this could be true in this case than there would be for most viruses. A subset of common colds are caused by very distantly related coronaviruses. I can dig up the paper if I need to, but I found an analysis of healthy volunteers challenged with cold-causing coronaviruses they had been exposed to before, and found that if exposure was more than one or two years ago it was able to cause disease in half the people exposed.

There is also a phenomenon active in some flaviviruses by which antibodies developed during exposure to a strain that is closely related but just different enough to the one you are being newly exposed to actually allows the new strain to gain access to your immune system cells by allowing them to bind to cells which consume antibody-tagged particles without completely disabling it before it is engulfed, and a related phenomenon in the same flaviviruses in which low antibody titres of exact-match antibodies cause the same effect. I have no idea if the unconfirmed rumors of reinfections are to be believed, but if they are I might wonder if one of these two effects is in play rather than the acquired immunity crashing in a month. A potentially more likely explanation to me is that you had someone with very severe disease recovering, and then they get a cold or flu on top of it and they keel over.

As for a 'cure', there is no SARS vaccine after a decade of people trying (albeit not very hard given its containment) in part because the vaccine candidates frequently crash in animal studies because they have frequently actually made the disease *worse* upon infection, triggering immune overreactions. (Maybe via one of the above mechanisms?) Antiviral drugs are where I would put money on actual new things, both the nucleoside analogs (remdesivir) that block the viral RNA polymerase and protease inhibitors (like those used against HIV) tweaked to fit better into this particular protease. Chloroquine is fascinating - the biochemistry of its antiviral mechanisms is very broadly applicable in a way that it is hard to evolve around and is likely going to have some effect, though calling it a cure is probably pushing it.

I just read a news story suggesting that some of the recovered patients are perhaps still contagious. Which perhaps raises the question of recurring symptoms. Wondering if you have seen that or have thoughts.

Believable, considering that people are often contagious for the flu up to three or four days after they recover and kids can be contagious for even longer after they recover from it.

I think this is one situation where we should value the lesswrong maxim of being aware when your level of knowledge is very limited. Hold onto your uncertainty.

As a long term investor I am very aware that statistics out of China are particularly unreliable. Even the hierarchy in China seems to have trouble getting a true picture. I saw an interview with one of the doctors in Wuhan where he related the extreme difficulties of getting people to report the bad news about what was happening up the management tree. Perhaps reflecting this, President Xi sent his man, Shanghai Mayor Ying Yong, to Wuhan, to get a first hand view.

Even on the official statistics it appears Covid-19 is an order of magnitude more contagious than influenza and has a fatality rate also an order of magnitude worse. The fact that over 10% of the passengers on the Diamond Princess contacted the virus in a few weeks testifies to the contagiousness of the disease. There are troubling hints from China that deaths are far higher than admitted. The admitted deaths from covid-19 in Wuhan are no more than 10% of normal deaths in the city, yet, cremation houses are said to be having trouble keeping up, and are bringing in supplies and staff from outside to manage the workload.

I've also been following COVID-19 for investment reasons. Every study I've read of the disease indicates it is extremely contagious relative to the flu. This recent retrospective study indicates that prior to Feb 5th, R0 in China were between 4.7 and 6.6. Time to double was 2.4 days:

However since then China has made herculean efforts to stop the spread of the disease. R0 has certainly plummeted. So I'm not sure what to think. I would imagine officially reported numbers from any country are going to be limited by testing. How many people going to the doctor with flu-like symptoms get a COVID-2019 test? It sounds like no one except Korea and maybe China are testing for community acquired CoV.

What I believe is that if other countries do not take similar measures to China, this thing is going to rapidly spread. From an investment perspective this has created the perfect setup for a short: if a sizable portion of the world's population gets infected, the global economy will greatly suffer. If countries take China-esque measures, the global economy will also suffer.

The rosiest outcome I can imagine is warm weather halts the spread of the disease, and then we get a vaccine ready by the time fall rolls around. It's worth noting I've been the "chicken little" among my investing peers.

One point of optimism is everyone on the Diamond Princess was tested, and "only" 5% of passengers required serious medical care. This number is quite a bit lower than the 10-15% figure I often see thrown around.

Can you link to your source on cremation houses not being able to keep up? That's something I hadn't heard before. Thanks.

Two things to keep in mind here though.

First, it is not clear that the current testing is even catching all the cases. I think one of the Chinese CDC equivalent researchers has said the test they have available is only identifying 30 to 50 percent of the cases but is essentially 100% accurate when identifying an infection. So I'm not sure we get too much by just testing everyone with potential symptoms (community acquired). I suspect that depends on how plentiful the supply of test kits might be.

The other thing, it sounds like this might be a virus that vaccine will ever be available for. There is not one for SARS nor for HIV and COVID-19 seems to be similar to both (clearly same family as SARS).

What I believe is that if other countries do not take similar measures to China, this thing is going to rapidly spread.

Would you say the measures taken in Italy and South Korea (particularly the lockdown of towns in Norther Italy) are sufficiently similar to China?

The rosiest outcome I can imagine is warm weather halts the spread of the disease, and then we get a vaccine ready by the time fall rolls around.

I find that rather unlikely considering the virus' spread in warm regions like Singapore.

Death rates are not the only thing we should be worried about. SARS lead to long-term problems for survivors:

Forty percent [of studied SARS survivors] reported some degree of chronic fatigue and 27 percent met diagnostic criteria for chronic fatigue syndrome; people with fatigue symptoms were also more likely than those without them to have psychiatric disorders. For comparison, far less than one percent of Americans met chronic fatigue syndrome criteria, according to the U.S. Centers for Disease Control and Prevention, although many more than that have symptoms.

It's important to know to what extend similar problems might appear with this coronavirus.

I certainly agree but that information will only be known with a much longer delay than either the case fatality rate (which will initially be over estimated) and the infection rate (which will be under estimated). So that doesn't really help with how we should initially react to any new outbreak. Seems like we want to understand the date that is available early to assess the risks and therefore policy actions. How we present the data (and I don't get to see what any of the big bureaucracies use) seems to matter. This may be due to subject experts being who actually generates the data but non-experts have to understand the implications.

I would really like to see COVID-19 used as a case study for the Information Hazards theory.

It will be hard to be certain about the long-term effects and the likelihood of CFS but at the same time it's quite plausible to estimate the value even when uncertainty is larger then we like.

from the CDC. (the definition of "mortality rate")

A mortality rate is a measure of the frequency of occurrence of death in a defined population during a specified interval.

i.e not based on number of cases v. death, it's population v. death.

You link indicates "defined population" which can be a lot of different things. For instances, look at the examples for calculating the number. But this just gets to my point, using that definition and considering the defined population to be those infected with the virus doesn't provide good information to anyone. If we then change that to some middle period measurement of total population during a period of time it is even worse.

[This comment is no longer endorsed by its author]Reply

I think the numbers from China are basically completely meaningless other than signaling 'higher than they want you to know'.

As much as I hate to say it, the cruise ship that currently represents ~50% of the PCR-confirmed out of China cases is going to be a gold mine of information, as you KNOW where and when they were infected and can track their outcomes. Will be biased towards the elderly a bit.

Let's take the out-of-China data as reported at the moment I am writing this as provisionally true due to efforts at contact tracing which are robust in large chunks of the world. I suspect there are a lot of low-symptom clusters that are being missed, as evidenced that you can have South Korea suddenly up and find 50 people and Iran suddenly up and report 2 deaths out of nowhere. But it's a baseline for now. 1095 confirmed, and 11 dead, for a naive ratio of ~1%. But in the case studies I have seen, most deaths seem to be happening after ~3 weeks of illness, ~80% of cases in well-documented clusters resemble flu and nothing worse for a week or so, and 20% go longer with worse disease and risk mortality. So if we go back 2 weeks we get to 227 confirmations and a naive ratio of 5%, and 3 weeks we get to 110 and a naive ratio of 10%. A lot of the people who are seriously ill and dying now were not caught in that early time though, so combined with the likely missing of low-symptom clusters I think the estimates of mortality rates of 1% to 3% seem quite reasonable. There will be higher morbidity, and I have seen case studies of SARS survivors that they often have chronic issues for the rest of their lives.

Age structure of the population will make different nations differently susceptible to death as well, from what I am seeing death rates may not be rising above 1% until people are in their forties or fifties and reach 15% for those in their eighties.

Anyways, I think the outside-of-China numbers are still low. Many nations are not applying molecular tests to those who do not have travel history to hot spots and will thus necessarily miss local transmission if the initial traveler was not caught. We just saw Iran suddenly report two deaths with no foreign contacts testing positive for the virus and shut down a city of 1.5 million people. If we assume a death rate of 1% and a disease course of 2 weeks, with the sheer infectivity of this disease I suspect there are several hundred infected in Iran alone.

I think for approach 2 you want the ratio of deaths:(deaths+recoveries) rather than just deaths:recoveries. This means that the 2 approaches actually give the same result.

This method may lead to an overestimate as deaths occur earlier in the illness than declarations of recovery. If deaths occur at a constant rate between diagnosis and recovery then it would be more like half this value (depending on how the infection rate is changing). Looking at it like this gives 5.8 - 11.5% mortality.

An alternative method would look at average time from diagnosis to death and look back to how many diagnoses there were that many days ago. For Sars/Mers this looks like 10-18 days which would give death rates for Covid-19 of 4.4% - 12.4% which is pleasingly similar to the above method but might not actually mean anything!

Yes, bad mistake there! Thanks.

Bad data equals incorrect answers.

China is lying about the statistics and everyone knows it. Their bad data makes any meaningful conclusions impossible.

Data always says something unless it's randomly generated. At the very least Chinese data provides lower bounds on some things. You can get somewhat better estimates if you model their incentives (though the lying will greatly increase the uncertainty and complexity of any model)

A bad model that can be refined is better than no model, so point taken.

That being said, if China is on the one hand lying about this and on the other effectively implementing pest houses, martial law, and quarantining millions then I'd be inclined to ignore their words and look to their deeds to make a conclusion here.

There's some but not a lot of interest in this topic on LW; I have a mailing list with primarily rationalist types on the topic; PM me email address to be added

This report was the first report from official sources that made sense to me. Experts are estimating that a large portion of people infected with the virus only show mild symptoms, or are even asymptomatic. Those people are unlikely to get tested in hospitals. If we factor them in, this dilutes the CFR and, according to current estimates, it will likely more than cancel out the delay for confirmed deaths compared to confirmed infections.

Good find. Check Figure 2 - they have the expected numbers of deaths over time in patients identified by February 8 with different case fatality rates. Helpfully, the exact right side of the graph is today, February 20. The current deaths are 11, but several of those were not identified by Febraury 8 - I know for a fact that the 2 Iranian cases, the Korean case, and the Japanese case were not, but I am unable to determine how many of the others were not. Seven puts the central tendency at ~5% but that is definitely an overestimate due to some of the deaths likely being newly identified and the missing of mild cases.

Now that you have a definition for mortality rate - time for an update in your post?

I presume more people will be reading so clarification would be valuable.

I could but don't think it matters.

First, most here (and I don't disagree) are saying the numbers are all incorrect anyhow so using a different calculation accomplishes nothing.

One is still left with the question of what the defined population should be. Moreover, I don't see why one cannot define the population to be those who are infected so it is not clear to me this is not consistent with the definition. We should also ask should there be multiple defined population, so suggesting a mortality rate largely vacuous. (Something I clearly did not address initially as well.)

Even if it is not correct to call the numbers I generated a mortality rate it seems sensible to have some sense of dangerous the situation and some generic rate definition you linked to really doesn't much insight to that.