For group houses, one of the most important factors when deciding how to relate to COVID-19 is the question: If one of my housemates gets infected, how likely am I to also get infected? This is known as the household secondary attack rate, and it determines how much you need to worry about your housemates' level of precaution (as compared to your own), and how much within-house social distancing is necessary.

Household secondary attack rate is context dependent; it could mean one of several things, which I will illustrate with two scenarios:

  • Scenario 1: A member of your household is returning from a trip to Wuhan, which you have heard is high risk. You react… however you react to that. You might find a way to be out of the house for awhile, or prepare an isolated room they can lock themself in. If they're your spouse, you might decide that isolation isn't worth it. Some time not long after they become symptomatic, they check into a hospital, where they remain until after their infectious period is over.
  • Scenario 2: You live in a group house. You all have separate rooms, but share a kitchen and living room. One of your housemates is infected during a trip to the grocery store. Some time later they become symptomatic, and you react… however you react to that. You might or might not have somewhere to move to, or somewhere to move them to, but they can't check into a hospital because they're all full. You might or might not have been exposed to presymptomatic transmission. If neither of you has the ability to move, you'll be in the same house until after they've recovered.

Right now there are two studies which purport to measure the household secondary attack rate:

I'll refer to these as the CDC report and the Shenzhen study, respectively. These are the only studies I have been able to find which make any quantitative claim about COVID-19's secondary attack rate, and all other claims I've found have traced back to either of these two. The CDC study finds a household secondary attack rate of 10%; the Shenzhen study finds a household secondary attack rate of 15%.

When I started writing this post, I thought I'd be focusing on the difference between scenarios 1 and 2. Unfortunately, as I dug into the studies in detail, I found evidence of severe problems which make me think that these two studies provide almost no evidence whatsoever about COVID-19's household secondary attack rate, even in scenario 1.

CDC Report

On March 3, CDC published a report on the results of a contact-tracing program started on January 20. The report statistics on contacts of the first 10 patients with travel-related confirmed COVID-19 reported in the US; presumably, all of these travellers came from Hubei on or after January 20. They trace 445 contacts total, of which 54 developed concerning symptoms, became "persons under investigation", and were tested. It doesn't sound like anyone besides those 54 were tested.

The 445 contacts break down as follows:

  • 222 were health care personnel
  • 100 were "community members who were exposed to a patient in a health care setting"
  • 104 were "community members who spent at least 10 minutes within 6 feet of a patient with confirmed disease"
  • 19 were members of a patient's household

Out of the 54 people who were tested, two were positive; of those two, both were members of a patient's household. The CDC report does not provide any further information about those two positive cases, but they can be pretty easily matched to public news coverage. The first case was in Illinois and is described in detail in this Lancet paper; the second was in San Benito and is described in this announcement from a local public health agency. Both transmissions were to the spouses of travellers who returned from Wuhan.

The Lancet paper describes the first instance of person-to-person spread in detail, with a complete timeline of travel, symptoms, and tests. A woman who returned from Wuhan to Illinois on January 13 tested positive on January 20; her husband tested positive on January 24. In the Lancet paper, a few things are striking.

The first striking thing is that the husband was not tested until he developed a fever, at which point his wife had been hospitalized with a positive test for 4 days. So, testing was very much not proactive.

The second striking thing is that they ran many different tests in parallel, and appear to have been grappling with false negatives.

The third striking thing about the Lancet paper involved monitoring of 372 contacts, of which 44 became PUIs and were tested. Of these 44, one was her husband and was positive, and this was the only household contact. The CDC report had 445 contacts and 54 people tested. So after subtracting out the Lancet study and the San Benito case, we're left with 17 household members, 56 miscellaneous contacts, and... only 9 people tested. There is no information on how those 9 tests were allocated, except that they were negative.

Of the 19 household members in the CDC study, five stayed in the house with an infected person after they were diagnosed. 

So to summarize: Two family members of index cases were tested and were positive. Nine more tests were allocated between 17 household members and 56 miscellaneous other contacts; none of those nine tests were positive. From this, the CDC report concludes that the household secondary attack rate is 2/19 (~10%).

I would say that this is laughable, but unfortunately it isn't funny. The practical upshot of all this is that the CDC report provides almost no information whatsoever about the household secondary attack rate.

Shenzhen Study

Shenzhen is a Chinese city in Guangdong province. The Shenzhen study looks at 391 cases and 1286 close contacts between Jan 14 and Feb 12, and estimates a household secondary attack rate of 15%.

298 (76%) of the index cases were travelers. Sick people were isolated an average of 2.57 days after symptom onset (if they were being monitored for symptoms because they had been labelled as at-risk by contact tracing) or 4.64 days after symptom onset (if they weren't). The study estimates R during the observation period to have been 0.4, implying successful containment.

I have a few concerns with this study.

My first concern is that the household secondary attack rate is an important factor in peoples' decision whether to stay put when a household member is sick, which might create political pressure to find a low number. If people tried to move out when their housemates got sick, they wouldn't lower their own risk much, but they would spread it wherever they moved to.

My second concern is that 9 days before the Shenzhen study was published as a preprint, the Report of the WHO-China Joint Mission on COVID-19 stated that

Household transmission studies are currently underway, but preliminary studies ongoing in Guangdong estimate the secondary attack rate in households ranges from 3-10%.

I believe the Shenzhen study is the preliminary study referred to (the geographic location matches, and I can find no other studies in that geographic region which attempt to measure the rate). This seems like evidence of political pressure to report a low attack rate. (10% was the household secondary attack rate for SARS, and was used in some preliminary modeling of COVID-19 transmission dynamics before data was available.)

My third concern is that the paper contains three different household secondary attack rates: 15% in the Findings section, 14.9% in the Transmission Characteristics section, and 12.9% in Table 3. I cannot reconcile these numbers, and my attempts to cross-check numbers between different sections and tables within the paper all ended in mismatches and muddle.

My fourth concern is that in table 3, adding up the numbers within the category labels implies a substantial amount of data is missing, in ways that make no sense. 19% of contacts are missing a gender, 17% are missing an age, 10% are missing the annotation of whether they're a household-member or not, and 14% are missing the annotation for whether they interacted with the contact rarely, moderately often, or often. I am having a hard time imagining what sort of data collection process could do this, without being such a mess that serious errors are likely.

My fifth concern is that during the period studied, China was having significant issues with false negatives. Feb 12, the last day covered in the Shenzhen study, is the day before China changed its diagnostic criteria and reported a 34% one-day increase in cases. The study itself states that it changed its definition of a confirmed case changed on Feb 7, to require symptoms, "but sensitivity analyses show that truncating the data at this point does not qualitatively impact results". The paper reports results for many variables, and does not state which variables had sensitivity analysis performed.

These issues add up to extremely low confidence in the paper. I might change my mind if the authors release data that someone else can analyze, or someone manages to make sense of the seeming inconsistencies within it. Either of these things would surprise me.


The unfortunate practical upshot is that there's no good quantitative estimate of the household secondary attack rate (or attack rates in general). My belief, based on priors and on the observed large values for R0, is that it's probably quite high, and I will be acting accordingly; but even a small amount of non-terrible evidence could shift this belief greatly.


New Comment
11 comments, sorted by Click to highlight new comments since: Today at 7:49 PM

Has there been research from other similarish diseases breaking down the household secondary attack rate by relevant variables? It seems like there could be large differences between:

romantic partners who sleep in the same bed vs. housemates who sleep in different rooms

circumstances where the household has heightened concerns and is taking precautions vs. unsuspecting households

situations where people are removed from the household shortly after they're infected vs. households where people continue to live after infection

Group houses are mostly in the safer of the two possibilities for the first 2 of these 3.

Contact tracing research from Korea. Seems more solid than the "CDC" and "Shenzhen" papers. Estimates a household SAR of 7.56% (95% CI 3.7% - 14.26%), given the (explicitly called out) caveat that these are Korean households of unspecified nature.

I would specifically add to that, given Korea's famously aggressive and successful management program, I would guess that symptomatic household members would have been quickly and isolated, reducing the effective household SAR compared to situations where symptomatic household members continue to interact with the rest of the household.

This is definitely an improvement over the US CDC and Shenzhen papers, but I still have reservations about it. The first issue is that it's based on calling people and asking about symptoms, not based on testing. So it doesn't count asymptomatic people, nor people with mild symptoms who don't disclose them. The second issue is that their numbers imply an average household size of 6.4, which implies a definition of "household" which is somehow not as expected.

They track contacts of the first 30 identified cases of COVID-19 in South Korea, and find 119 household contacts, of which 9 are infected. Table 2 describes every transmission they found, and whether it was a household transmission. Of the first 30 cases, 8 of them got it by household transmission from someone else who was also one of the first 30 cases, so that's 22 distinct households.

(30 people + (119 contacts - 8 already counted)) / 22 households = 141/22 = 6.4 people per household.

My belief, based on priors and on the observed large values for R0, is that it's probably quite high

What does this mean? Can you give an over-under?

R0 is the number of people that each person will go on to infect, on average. R0 for COVID-19 is high compared to other common diseases, indicating high transmissibility.

I avoided stating a quantitative estimate of the attack rate because my confidence intervals are too wide to be useful. If I had to bet, I'd say 90% CI 15-85%, 50% CI 30-65%. I'm hoping people can gather weak evidence of various forms (secondary attack rates and R0s of other diseases, anecdotes in which household members do or don't get it, or in the best case a dataset with household memberships labelled).

This paper (from June 27) collects studies published after this post, does meta-analysis, and corrects for some methodological problems like false negative rates, and gives a central estimate of the household secondary attack rate of 30%.

New study from South Korea of spread in a crowded call center. There were 94 infections on one floor (43% of workers on the floor). As most people had symptom onset during a three day period, this suggests 1-2 people were superspreaders. They have a seating chart, which suggests the secondary attack rate was significantly higher for people sitting in the same room (eyeballing maybe 60%). It's notable that some people don't get infected, despite spending 4-5 days full workdays being exposed to a superspreader and possibly other infectious people. Only 4% were asymptomatic for the whole period of the study.

They tested households of the infected office workers and get a household secondary attack rate of 16%. How much were people trying to avoid infecting their families? It's hard to say from the study, but we know the following:

1. This was around the peak of cases in South Korea. People would be primed to take Covid-like symptoms seriously.

2. After a few days where many workers developed symptoms, the office was closed. At this point, it seems very likely that most workers took efforts to isolate from their families.

3. 72% of subjects are women, with mean age 38. It seems that having roommates is relatively rare among Koreans. I'd guess these are nearly all people living older parents and nuclear families. (It's easier for someone to isolate from their parents or older children than from spouse or young kids).

4. From other studies, under 18s are less likely than adults to get secondary infections and the number is very low for under 10s. It's not clear whether children were tested, but they list 2.3 household contacts per person, which suggests they are. If 1/5 of contacts were younger children, and we removed them, you'd get a secondary attack rate of ~20%.

So what about roommates living together? I'd guess:

1. If people are fairly sensitive to Covid symptoms and make some efforts to isolate, 15-25% secondary attack rate.

2. If people don't make any effort to isolate after onset of symptoms, 20-40%.

The spread in the call center and other studies of choirs/restaurants suggest that direct physical contact is not necessary for very effective spread. So roommates spending time together in common spaces would be at high risk.

This paper analyzes specific incidents in which a group of one infected person plus some uninfected people sat down together, and some uninfected people got it. They find a secondary attack rate (from mostly non-household interactions) of 35%.

There are two big issues that prevent this paper from being used to draw good inferences about the household secondary attack rate. First, the incidents were found by specifically looking for superspreading events, and does not include any events where transmission didn't happen. And second, the events are single gatherings, whereas living with someone may involve many opportunities to get infected.

Hey group house Berkeley people, what procedures are you guys setting up?

We've set up a spreadsheet where people share their house isolation levels, and any symptoms they've been having.

The actual doc is private, but you can see a template to get the idea across here: