Quantifying Household Transmission of COVID-19

Owain_Evans

Overview

If someone in your household gets COVID-19, how likely are you to get infected? Is it possible to reduce this risk with interventions? How much of all transmission is between members of the same household? Is household transmission less bad because infections in the household don’t spread to the outside?

We (Mihaela Curmei, Andrew Illyas, Jacob Steinhardt and Owain Evans) wrote an academic paper on these questions. Owain made an informal slide show with the same material. The full version (34 slides) is here, and this LW post contains some highlights.

Key Results

We show how to adjust previous estimates of household transmission to correct for inaccurate testing and selection bias. We pool existing data using a Bayesian meta-analysis and estimate the chance of being infected by an infected household member as 30% (95% CI 18%-43%). This probability is heterogeneous across studies, with a standard deviation of 15% (9%-27%). Household transmission was likely a small fraction of transmission before social distancing (5%-35%) but a large fraction (30%-55%) after. Our results and observational studies suggest household transmission can be reduced with behavioral interventions. It is uncertain how much infections in households spread to the outside, but we show this is related to the effectiveness of contact tracing.

Highlights from Slide Show

We consider two main ways of quantifying household transmission. The first is the intra-household effective reproductive number $R_{h}$ . This is defined by decomposing the familiar $R$ number (the effective reproductive number) into a sum of community and intra-household reproductive numbers. These reproductive numbers will change over time due to behavioral interventions and reduced susceptibility. Second is the household SAR. The SAR varies depending on the age of $i$ and $j$ and the relationship (spouses vs housemates). For our purposes, the SAR is an average over the rates for different groups.

This diagram illustrates $R$ , $R_{h}$ , and SAR. At time t, there is a set of primary cases who are infected. They each have a set of contacts and some of those become infected at time t+1. Infected contacts are shown in red. Household members of primary cases have a blue box around them. The topmost primary case has two household members and infects 1/3 of them. The middle primary case has one household member and doesn’t infect them, and the bottom primary case has no household members. To compute $R_{h}$ , we look at the red nodes in blue boxes (positive cases) and do not consider negative cases. Here $R_{h}$ =1/3. To compute the SAR we look at the ratio of red to white nodes in blue boxes. Here SAR =1/4.

The empirical studies of SAR are based on government contact tracing data. They found primary cases based on symptoms or travel history and PCR testing and then investigated whether their household members were infected.

The studies aren't as rigorous as we would hope. Some studies didn't test asymptomatic household members and all studies used tests (RT-PCR) that have a high false-negative rate. However, some sources of bias can be adjusted for statistically.

PCR testing has a high false-negative rate (or low sensitivity). These graphs come from Kucirka et al [7]. We see that on the first few days after being infected, someone was unlikely to test positive. During the 10 days after typical symptom onset (Days 5-15) the mean false-negative rate is still more than 17% (with different papers giving different estimates [8]).

PSA: The false-negative rate for PCR tests may be lower (or higher) in your local test center. However, these graphs are based on results mainly from China in Spring, and this is where most of our SAR data comes from.

We did a Bayesian meta-analysis of the nine SAR studies [1], [3], [4], [9]–[14]. The model corrects the original estimates of SAR for false negatives (for all studies) and for the failure to test asymptomatics (in some studies). In the model, the household SAR for study $i$ is generated from a Beta distribution with a flat (improper) prior on its parameters. The precise false-negative rate $F N R_{i}$ and asymptomatic rate AR are unknown and so we sample them from priors based on existing estimates. This model allows us to estimate heterogeneity in SAR across studies and to pool data.

The results show that correcting for false negatives and asymptomatics has a substantial effect: the mean SAR estimate increased from 20% to 30% (second to last row). It’s also clear that SAR is heterogeneous across studies, with some 95% credible intervals not overlapping. Part of this heterogeneity is likely due to false negatives and asymptomatics (which we model but do not observe for each study). Another source of heterogeneity is the actions taken by households in different locations. There is evidence that early isolation of symptomatic family members and PPE used at home can reduce SAR.

Our results are quite uncertain. The 95% credible interval around the mean for the SAR distribution is 18%-43%. Having a better estimate for the prior on false-negative rates and the asymptomatic rate would lead to more accurate estimates of SAR. We do not adjust for lack of asymptomatics among primary cases. My guess is that asymptomatics are under-sampled and that they are less infectious. (At the same time, their lack of symptoms means that household members will not take any precautions). Adjusting for lack of asymptomatics will revise the SAR estimate down, but probably not by a large amount. Future work (drawing on better studies on false-negatives, asymptomatics, NPIs that reduce SAR) could put all these together and more accurately model the SAR.

You might be concerned that the studies from China, South Korea and Taiwan are not representative of the rest of the world. Maybe the SAR in these countries is lower than in Europe or the US. Another issue (raised above) is the lack of asymptomatics among primary or secondary cases. We address both of these issues using data from European studies (in Germany and Italy) that did random population testing. We find that results are broadly consistent with the SAR estimates derived from East Asian studies. See the full slide show for details.

We can compare our estimate for the household SAR of SARS-CoV-2 to other related viruses. The SAR is correlated with the reproductive number R0. The R0 numbers are taken from Wikipedia. SAR estimates taken from these papers. I didn't do a detailed survey of other diseases and the issues of heterogeneity, selection bias and imperfect testing probably distort the estimate of other diseases too. (I only found one study involving deliberate infection to measure SAR.)

We don’t have data on $R_{h}$ for US states, but we approximate it using the value $R_{h} = 0.3$ pre-lockdown based on our earlier results. The main result here is that $R_{h}$ is a small fraction of R before lockdown but 25-60% of R during lockdown.

Our results show that SAR varies a lot between experiments. Some of this variation is probably explained by NPIs (non-pharmaceutical interventions) taken by households to reduce transmission. However, for most studies we don’t have information about NPIs. There are two exceptions. Both are observational studies with fairly small n, and so this is not watertight evidence. Each study suggests that avoiding contact with the primary case and using standard NPIs (masks and disinfectant cleaning of surfaces) reduce the SAR. We think it’s likely that other standard NPIs also reduce SAR: e.g. having close contact outdoors vs indoors, hand hygiene, and so on.

For more, read the paper or the full version of the slide show.