[ Question ]

Ideas on estimating personal risk of infection

by jmh1 min read23rd Mar 20201 comment


Personal Blog

I am currently tracking the local data for where I live to get an idea of risk level for me personally.

The model is simple, total population, reported infections (not sure if I want to try adjusting for under reporting or not but should be a simple effort mechanically even if putting a number to it isn't) and some estimate of how many people I might "interact" with on a daily basis.

The formula was the one posted on MR for calculating the probability of someone in a conference of size X being infected. Not quite the same but don't mind having an over stated risk.

One thing I'm wondering about is how to estimate the number of people in "the conference". This has to be the number of people I might randomly cross paths with that could transfer the virus to me. Since some of the risk then comes from being in public places, such as a grocery store, I'm wondering how best to think about that setting.

One way would be to think about the average daily shoppers and workers at the store. Another might be the number of people I am waiting in line with and passing in the isles. Clearly the two will be significantly different.

I'm also not thinking about any cumulative impact here -- not probability of infection in the next 3 month (or even interacting with someone infected over that period which is actually the calculation I am doing) but what does today look like.

Would be interested in thoughts those here have.

New Answer
Ask Related Question
New Comment

1 Answers


1. Any estimate is likely to be very inaccurate. Also there are fat tails. There might be one very infectious person there such that you actually have a very high risk of infection in this case. A single point estimate of risk throws away a lot of information. What would be better would be a Bayesian type probability density.

2. This may be the wrong question. Given in Iceland for example half the infected showed no symptoms, it is important to consider the risk that you may infect other people.

3. (total) Risk (across the crowd) goes up approximately with the square of the number of people in the crowd.

4. This is somewhat analogous to the situation with STDs. It is not just the number of people you interact with, but the number of people they have interacted with, and the number they have interacted with. The rather disconcerting image with STDs is that you are actually in bed with maybe 1,000 people; similarly here you are actually in a room with who knows how many people.

5. Also take into account that in many situations at present, there is a bias to the people you are going to interact with. In any given bar or conference, the risk averse, careful people will be staying at home. The risk-blind careless reckless people will be over-represented. E.g. the people who attended Chinese New Year gatherings in NYC.

6. You need to update the calculation on a daily basis.

7. I think that with viral diseases the initial viral load is important because it greatly affects how long your body has to mount a defense against the disease. No idea how to model this. There are also differences in individual vulnerability etc.

8. You might also take into account the flow on effects. If you get infected, how many people will ultimately get infected as a result, similarly if you infect someone, how many people will ultimately be infected.

Here is my attempt

IR: Reported infections in NY 20k, I assume that the true number is 2X, 10X or 20X that. So the rate is about 20k/9M = 0.002 (reported) 0.004 2X, 0.020 10X, 0.04 20X

B: Bias for people attending a gathering, maybe 2X 5X or 10X more risk-loving than the average.

PT: Chance of transmission 1% 2% or 5% if you come into contact with people

If a gathering has X people that you come into contact with then the risk of infection is

1 - (1 - IR*B*PT)^X

and the risk of infecting someone else would be similar.

I sum across all combinations of the estimates. Note do the formula above for each combination and then average the results. Do not average then do the formula - this is wrong because of the nonlinearity.

Plugging this into a simple spreadsheet (Guaranteed to be wrong - use at your own risk https://drive.google.com/file/d/15Qdwqcjg-4g3Kn4r7hFH-BlRGrq80Wpt/view?usp=sharing) I get

#People => Risk to me

10 => 5%

100 => 27%

1000 => 66%

10000 => 95%

This is most sensitive to the higher factors above and to the numbers of people. But even with low factors, with large numbers it is bad.