[Petition] We Call for Open Anonymized Medical Data on COVID-19 and Aging-Related Risk Factors

by avturchin4 min read23rd Mar 2020No comments

6

Coronavirus
Personal Blog

We, on behalf of Open Longevity, together with the International Longevity Alliance, wrote a letter to WHO about the need for open anonymized medical data on patients with COVID-19 and the risk factors associated with aging. If WHO listens to us, this will accelerate the development of therapies against coronaviruses and against risk factors, and help fight future epidemics. The letter was signed by scientists from the USA, Europe, Israel and Russia, as well as longevity activists.

We are confident that WHO is now receiving a lot of requests and our letter will be lost in the information noise if we do not make additional efforts to promote it. Therefore, we have prepared a petition to be signed by anyone, who agrees with us http://chng.it/cLwkxSsP

If the arguments presented in the petition seem reasonable to you, please sign it. Repost, send it to your friends directly. This will help fight coronavirus.

What kind of data are we requesting from WHO? Medical data: medical history, blood tests, x-ray, etc. (for example [1]). And the thing is that WHO does not want to share! Here’s what they state [2]:

“In accordance with Article 11(4) of the IHR (2005), WHO will not make the Anonymized-COVID-19 Data generally available to other State Parties until such time as any of the conditions set forth in paragraph 2 of such Article 11 are first met and following consultation with affected countries.

Pursuant to that same Article 11, WHO will not make Anonymized -COVID-19 data available to the public, unless and until Anonymized -COVID-19 data has already been made available to State Parties, and provided that other information about the -COVID-19 epidemic has already become publicly available and there is a need for the dissemination of authoritative and independent information.”

For what we need open data?

However, open medical data, simply speaking, medical history (of course, anonymized, without names and surnames), is needed to:

  • Predict the severity of the disease course. If an anamnesis is provided, blood tests, age, questionnaire responses, etc. will be indicated, this will help with predictions;
  • Develop therapies taking into account risk factors or directly aimed at eliminating risk factors;
  • For better machine learning. The predictive power of models is very much dependent on the number of samples on which they train. This is especially true for omics data, where the requirements for the minimum number of samples are much higher due to the large number of parameters in the models.

These were reasons for medical data to be useful today. But there are a number of other reasons, which are associated with future research, with preventive measures. But not only in the future: all this may come in handy, since the solution to the problem of high mortality in older ages may lie in the field of aging biology. This way, medical data is also needed for:

  • Dealing with aging risk factors during future epidemics;
  • Creating open medical datasets with annotation of patients age parameters;
  • Existing national health systems cannot cope with the current situation. The cornerstone is the issue of collecting, storing and analyzing the medical data, necessary for successful research.

The main problems here are:

a) Local storage. Each national system (and sometimes even each medical facility) stores patient data in its own format with its own access rules. Data transfer from hospital to hospital or from country to country is difficult. Testing protocols are also local.

b) At the discretion of a particular researcher in accordance with the recommendations of regulatory authorities, only part of the data is made available to other scientists.

The prerequisites for these problems’ solutions have long been known: cloud storage, anonymization and de-identification technologies, and blockchain for secure and controlled access. Also now is exactly the moment, when the difficulties in standardizing formats can be effectively solved, when many people are ready to get involved in activities that contribute to a quick exit from a critical situation.

Many countries are currently attracting volunteers to help doctors treat patients with COVID-19. However, a huge number of bioinformatics and IT specialists can be no less useful in this situation. Creation of a prototype of a global patient database and the local involvement of one or two IT specialists in a hospital can help quickly, efficiently and relatively inexpensively (with the help of volunteers) collect data in a standardized format for subsequent analysis by the best scientists and AI algorithms around the world.

By allowing access to all types of anonymized or unidentifiable data now, using patient data from COVID-19 as an example, WHO can significantly accelerate the development of vaccines and treatment protocols. In addition, the current situation can serve as a tremendous impetus for optimizing the entire system of working with medical data, allowing us to develop an algorithm for the exchange and standardization of data on an international scale.

What other types of data are important for dealing with coronavirus?

  • Genomic data [4], primarily genomes and phylogenetic trees of the virus (examples will be in the list of examples below). Here things are much better with openness by the way. This data is needed to track differences in strains of the virus in different populations / countries, to understand how versatile therapies and tests will be. You can also select the most evolutionarily conservative regions of viral RNA to affect them—potentially, these may be the most effective therapies.
  • Transcriptome data [5] (primarily sc-RNA-Seq of immune cells). Here's the thing. Hypermutation and VDJ recombination of genes responsible for the coding of antibodies and T-cell receptors occurs in immune cells. That is, the genomes in immune cells are different. The set of known sequences (clonotypes) of antibodies and T-cell receptors is called a repertoire. Since these are coding regions, the repertoire is most often recognized based on single-cell RNA sequencing of immune cells. This data is needed to compare people, recovered from illness, with non-infected ones. You can also compare the repertoires of immune cells of different infected people (with different severity, different courses of the disease). In the end, all this will help to diagnose disease and develop a vaccine.
  • And, by the way, there is a clear deficit of this data, the Antibodies Society even called for action [6]: “...the AIRR-C hereby calls upon its members, and the wider research community, to share experiences, resources, samples, and data as openly and freely as possible, and to work within their respective systems to break down barriers to achieve this goal, subject to the overarching directives of respect, privacy, and protection for patients and all people. We are in this together.”
  • Information about test kits [7] and diagnostics [8]. Many tests didn’t have enough time to be certified, clinics and some countries are afraid to use them; this applies not only to test kits but also to PCR machines and other equipment.
  • Data on the spread of the virus [9] and prognosis [10]. The situation is getting better every day, but diagnostics, still not being perfect, make their own adjustments to the epidemiological data.
  • General educational information. WHO is doing well in this department [11].
  • Data on publications and clinical trials [12].
  • Newsfeeds with new articles on the topic [13].

Existing initiatives, including Kaggle Challenges [14], do not solve the problem of collecting and forming medical COVID-19 datasets and are focused on other tasks (training NLP systems on the texts about coronavirus, analysis of genomes, predicting the spread of the virus, etc.).

The idea is to find ways to significantly reduce the mortality rate from COVID-19 by influencing risk factors. IL-6 is an example of a promising risk factor target.
Sign the petition! The World Health Organization is obliged to both share their existing medical data, and to organize the work on obtaining new qualitative data. Contribute to our common cause—the fight against death.

References and examples

1. https://github.com/…/covid-chestxray…/blob/master/README.md…
2. https://www.who.int/…/technical-guidan…/early-investigations
3. https://www.who.int/csr/ihr/WHA58-en.pdf
4. https://qbrc.swmed.edu/projects/2019ncov_immuneviewer/, https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/, https://www.kaggle.com/pa…/repository-of-coronavirus-genomes
5. https://www.medrxiv.org/conte…/10.1101/2020.02.23.20026690v1
6. https://www.antibodysociety.org/covid-19-demands-increased…/
7. https://www.360dx.com/coronavirus-test-tracker-launched-cov…
8. https://sph.nus.edu.sg/…/COVID-19-Science-Report-Diagnostic…
9. https://coronavirus.jhu.edu/map.html, https://ncov2019.live/, https://www.worldometers.info/coronavirus/
10. https://www.kaggle.com/c/covid19-global-forecasting-week-1
11. https://www.who.int/emergen…/diseases/novel-coronavirus-2019
12. https://figshare.com/…/Dimensions_COVID-19_publi…/11961063/6
13. https://connect.biorxiv.org/relate/content/181
14. https://www.kaggle.com/tags/covid19

Coronavirus2
Personal Blog

6