In the early months of the pandemic, San Francisco posted case and death counts on a daily basis. It was sometimes messy with significant corrections, but the city did the best they could (or at least better than most) to keep the public informed.

Around half a year ago, they got worse. Data is now published with a 5 day lag. Presumably, the politicians and decision makers get timely data. Everyone else has to wait.

This is not okay. This is very not okay. In the post vaccination but pre-Omicron days, I was annoyed philosophically. But now. Now?

https://sf.gov/data/covid-19-cases-and-deaths

Cases as of 12/21, as reported on 12/26

At this point, we're fucked and it's definitely over. But this is still not okay, and now I am furious. Anyone who wants to know the true state of the world cannot, unless they are On The Inside.

So, where is the pebble that starts an avalanche? Who do I need to Tweet or email or call to pressure the city into changing this?

5 comments, sorted by Click to highlight new comments since: Today at 11:00 AM
New Comment

I can't speak to San Francisco specifically. But if it's anything like many other locations in the US, the problem isn't malice or indifference: it's that generating this data is vastly harder than you realize. The politicians get the data the same time you do: as soon as it's ready.

Here's one tiny true example, from one part of the pipeline in one particular location. A substantial amount of data enters the system as faxes. The faxes go to a room full of National Guard, who manually enter the data into computers, from whence it begins a complicated process of validation and de-duplication before it enters the main pipeline. You can imagine that this system doesn't scale particularly well as case counts rise.

At a broad scale, what's happening is that an immense amount of data is trying to enter a legacy system that was designed for less than one percent of its current load. Some of the data comes from sleek modern hospitals with state of the art medical informatics systems. And some comes from computer illiterate rural doctors, and some comes from nursing homes that had never reported lab results before Covid, and some comes from employers who test their employees, and some comes from private labs, and some comes from sovereign tribes that have complicated data sharing agreements with the state, and...

If I can find the time, I might write a post explaining in more detail how surveillance data is generated and processed. But for now, I assure you this problem is incredibly hard. Update: here's the post

Important disclaimer: my opinions are mine alone and I don't speak for any government agency.

I understand that data collection is difficult and empathize with the people responsible for doing the work.

The thing is, SF used to publish everything as soon as they could! We accepted that numbers could be revised up or down as data was fully coded. This 5 day lag is IMO far on the wrong side of timeliness vs correctness.

It seems like the main alternative would be to have something like Alameda County's reporting, which has a couple days fewer lag at the expense of less quality control: https://covid-19.acgov.org/data.page?#cases.

It's really unclear to me that Alameda's data is more informative than SF's. (In fact I'd say it's the opposite--I tend to look at SF over Alameda even though I live in Alameda County.)

I think there is some information lost in SF's presentation, but it's generally less information lost than most alternatives on the market. SF is also backdating the data to when the tests were actually performed, thus being transparent about the fact that most test data is about what happened several days ago. Websites that claim to give you more up-to-date information are not actually doing so, they're just hiding this fact.

If you looked at the next 4 days in the time series it would probably look something like: 500, 200, 100, 100. Not because Omicron is abating but because most tests taken in the last 4 days haven't had time to be processed and recorded. I think if I was careful I could squeeze a small amount of information out of those numbers (e.g. based on whether the 500 was actually 400 or 600) but it would require a lot of work. I tried this in the past when working with some public health researchers and it's surprisingly hard to not fool yourself into thinking that cases are going down again when it's actually reporting lag.

Also my personal take is that SF, on a pure scientific/data basis, has had one of the best responses in the nation, probably benefiting from having UCSF for in-house expertise. (I'm less enthusiastic about the political response--I think we erred way too far on the "take no risks" side, and like everyone else prioritized restaurants over schools which seems like a clear mistake. But on the data front I feel like you're attacking one of the singularly most reasonable counties in the U.S.)

Yeah this sucks, and I have no idea how to fix it. 

But if you want top notch information of what's happening at the front lines in SF, follow this twitter account:

https://twitter.com/Bob_Wachter/status/1475188579684913152