LESSWRONG
LW

Ben Champion
2020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
SARS-CoV-2 pool-testing algorithm puzzle
Ben Champion5y10

Would you be able to post this under answers? It seems to describe algorithms and their performance bounds in quite some detail and definitely deserves to be the top answer.

Reply
SARS-CoV-2 pool-testing algorithm puzzle
Answer by Ben ChampionMar 22, 2020*30

[EDIT:

I see the monograph in gwern's post below makes the observation that group testing is a Boolean version of compressed sensing and references Gilbert et al. 2008 which I'm just reading - found a freely available version on Google Scholar. And more importantly, the monograph contains algorithms with similar bounds to CS.

READ GWERN'S POST BELOW FIRST

]

Some ideas...

This is can be seen as a nonlinear version of a compressed sensing problem (see here for a good "25 minute" intro to compressed sensing, or the Wikipedia article).

The nonlinearity comes from the fact that the pooled test seems to behave like an approximate OR-gate: if any sample is positive the pooled sample will be positive, otherwise the pooled sample will test negative. I'm not sure how to deal with that.

The vector to be "reconstructed" is

xj={1if personjhas SARS-CoV-20otherwise

(We can replace "has" with "is deemed by this test to have", obviously there will be an error rate)

Let our M×N "sampling matrix" be

Aij={1if personjis in samplei0otherwise

Then our sample results are

yi=⋁jAijxj

where ⋁ denotes the sample-taking operation, I'm using the OR symbol here assuming that's approximately how it works.

If instead the sample response is linear and we rescale so that the sample responses count the number of positives in each sample we have

yi=∑jAijxj

The compressed sensing idea is, if you choose the rows (sampling patterns) Ai∗ to be random and incoherent (meaning, very roughly, not too similar), you only need roughly M≈DlogND samples to reconstruct xj with "high" probability, where D is the number of nonzeros in xj . (All this hand-waving is tightened up in the literature). In other words, we only need to do roughly −NPlogP pooled samples. The method is to solve:

min∥x∥ℓ1subject toAx=y

There are known algorithms for doing this (see the 25-minute intro), e.g. any linear programming (LP) algorithm. I'm thinking if there is a way to encode the OR-gate behaviour as LP (strictly speaking, mixed-integer programming, MIP) constraints. Surely the results are different in terms of required M and computational efficiency...but worth trying I think.

Some thought still required on how the test false-positive, false-negative, true-positive, true-negative rates propagate through the model.

Well-known researchers in this area are Terence Tao, Emmanuel Candès, and David Donoho ... there are undoubtedly many others but (clearly!) I don't have extensive knowledge of the literature.

The connection with gwern's post below is that another way of looking at compressed sensing is "sub-Nyquist/Shannon sampling for sparse signals" - the information theory connection again. [EDIT: and the algorithms there seem to achieve similar performance bounds]

If the maximum subset size S<N then I think you can just do batch processing of batches of size S?

Reply
No posts to display.