At the Nucleic Acid Observatory (NAO) we're evaluating pathogen-agnostic surveillance. A key question is whether metagenomic sequencing of wastewater can be a cost-effective method to detect and mitigate future pandemics. In this report we investigate one piece of this question: at a given stage of a viral pandemic, what fraction of wastewater metagenomic sequencing reads would that virus represent?

To make this concrete, we define RA(1%). If 1% of people are infected with some virus (prevalence) or have become infected with it during a given week (incidence), RA(1%) is the fraction of sequencing reads (relative abundance) generated by a given method that would match that virus. To estimate RA(1%) we collected public health data on sixteen human-infecting viruses, re-analyzed sequencing data from four municipal wastewater metagenomic studies, and linked them with a hierarchical Bayesian model.

Three of the viruses were not present in the sequencing data, and we could only generate an upper bound on RA(1%). Four viruses had a handful of reads, for which we were able to generate rough estimates. For the remaining nine viruses we were able to narrow down RA(1%) for a specific virus-method combination to approximately an order of magnitude. We found RA(1%) for these nine viruses varied dramatically, over approximately six orders of magnitude. It also varied by study, with some viruses seeing an RA(1%) three orders of magnitude higher in one study than another.

The NAO plans to use the estimates from this study as inputs into a modeling framework to assess the cost effectiveness of wastewater MGS detection under different pandemic scenarios, and we include an outline of such a framework with some rough estimates of the costs of different monitoring approaches.

Read the full report: Predicting Virus Relative Abundance in Wastewater.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 4:12 AM

>If you're paying $8k per billion reads

>This will likely go down: Illumina has recently released the more cost effective NovaSeq X, and as Illumina's patents expire there are various cheaper competitors.

Indeed it did go down. Recently I paid $13,000 for 10 billion reads (NovaSeq X, Broad Institute; this was for my meiosis project). So sequencing costs can be much lower than $8K/billion.

Illumina is planning to start offering a 25 billion read flowcell for the NovaSeq X in October; I don't know how much this will cost but I'd guess around $20,000.

ALSO: if you're trying to detect truly novel viruses, using a Kraken database made from existing viral sequences is not going to work! However, many important threats are variants of existing viruses, so those could be detected (although possibly with lower efficiency).

Thanks! Responded there.