A friend of mine requested that I write up some of my comments on the lab leak hypothesis, since I had done quite a bit of research into this in 2020. This was originally asked in the context of the longform Facebook post that Eliezer wrote regarding the origins of the Covid-19 pandemic and the implications for the future: https://www.facebook.com/yudkowsky/posts/10159653334879228 My comment is less organized than I would prefer, but I figured it was better to post in a rough form than not post at all.
In early January 2021, I wrote on Facebook and Lesswrong:
I've done a lot of thinking about the origins of SARS-CoV-2 and I still find the lab escape hypothesis quite credible. We still haven't found an intermediate host, which is surprising if the virus emerged naturally. This issue has become politicized, but we really need a neutral investigation in order to figure out what actually happened. This is quite important for understanding how to prevent future pandemics. The New York Magazine article, "The Lab-leak Hypothesis" is long, but I can personally verify the author is reporting on the source material pretty accurately. I've done over 200 hours of research on this topic and have read basically all the sources the article cites. That said, I don't agree with all of the claims. I do not think the SARS-CoV-2 virus is very likely to have been created using the RATG13 virus, because of the genetic differences spread out throughout the genomes. However, there are many other paths that could have led to a lab escape, and I'm somewhat agnostic between several of them.I believe we need more transparency into the whole investigation, especially the efforts on the ground in China. What is the process by which scientists in Hubei, Yunnan, and other provinces are looking for intermediate hosts? How have the investigation(s) into the WIV been organized and what processes did they use? What agencies, methods, and results?COVID-19 has had a huge effect on the whole world. It's politically appropriate to call for accountability in these investigations, because there is a clear public interest for literally the whole global population.
I've done a lot of thinking about the origins of SARS-CoV-2 and I still find the lab escape hypothesis quite credible. We still haven't found an intermediate host, which is surprising if the virus emerged naturally. This issue has become politicized, but we really need a neutral investigation in order to figure out what actually happened. This is quite important for understanding how to prevent future pandemics.
The New York Magazine article, "The Lab-leak Hypothesis" is long, but I can personally verify the author is reporting on the source material pretty accurately. I've done over 200 hours of research on this topic and have read basically all the sources the article cites. That said, I don't agree with all of the claims. I do not think the SARS-CoV-2 virus is very likely to have been created using the RATG13 virus, because of the genetic differences spread out throughout the genomes. However, there are many other paths that could have led to a lab escape, and I'm somewhat agnostic between several of them.
I believe we need more transparency into the whole investigation, especially the efforts on the ground in China. What is the process by which scientists in Hubei, Yunnan, and other provinces are looking for intermediate hosts? How have the investigation(s) into the WIV been organized and what processes did they use? What agencies, methods, and results?
COVID-19 has had a huge effect on the whole world. It's politically appropriate to call for accountability in these investigations, because there is a clear public interest for literally the whole global population.
I think that the Covid-19 pandemic was the result of a lab escape, with 85% probability. I'm up for bets with people who think a lab escape is unlikely. The clearest resolution criteria I can imagine involve the question of whether an intermediate host will be found, or evidence of direct transfer from bats.
Alina Chan (https://twitter.com/Ayjchan) has a lot of good discussion of the lab escape hypothesis - She comments on basically all the articles and makes good conceptual clarifications. She also wrote, https://www.rationaloptimist.com/8691 which is a decent overview overview.
I think the ideal piece explaining the state of the evidence on the lab leak hypothesis isn't very long. Eliezer is right that most of the supporting evidence comes from broad circumstantial evidence: The kind of viruses the WIV was sampling. The kind of research they were conducting (documented in Ecohealth Alliance's grant proposal & in their published papers). The closest relative of SARS-CoV-2 being very far away. The virus already seeming well adapted to humans in Wuhan when it was first sequenced. The lack of intermediate host being found.
Lots of pieces include these things but also go into great detail about, say, the RATG13 renaming or the WIV database being taken offline. I agree with Eliezer that details like this should not be taken as much evidence for a lab escape.
One question that I think hasn't received enough attention, though Alina Chan has a paper on it, is how well adapted to humans SARS-CoV-2 is. https://www.biorxiv.org/content/10.1101/2020.05.01.073262v1
If Alina is right, and my own epistemic spot checking suggests she is (though I'd like to hear arguments against, such as the reviewer feedback on her paper submission), then this is a lot of evidence for the gain-of-function (GoF) lab leak hypothesis, especially in conjunction with the emergence in Wuhan
It would be surprising for the virus to show up in Wuhan first, but if it showed up there and clearly went through a process of human adaption, that would be much more consistent with the natural origin hypothesis. If we accept that it was well adapted to humans, and that this couldn't have happened without evolution within a human host, now we have to posit a whole period of cryptic transmission while the virus became adapted to humans.
Some questions we can't really answer without more data that we're unlikely to get. We're unlikely to get the WIV's records, for example. But someone who understands viral evolution could do more work to analyze Alina's claim that the virus was well adapted to humans right away. (the main evidence for this is the comparatively low rate of amino acid changes early in the Wuhan outbreak compared with the original SARS-CoV outbreak and other spillovers from animal hosts.)
https://thebulletin.org/2021/05/the-origin-of-covid-did-people-or-nature-open-pandoras-box-at-wuhan/amp/ - This is an article I would recommend, though with some caveats. I think most people won't be able to evaluate the arguments about the furin cleavage sites or codons -- if I were writing this article I would leave them out.
And I'm not sure those arguments are actually that good. I had a hard time evaluating them. Protease cleavage sites can certainly evolve naturally or during passaging. And they're also a reasonable thing to insert to study potential for transmission in humans. But these arguments seem pretty easy to get wrong / I don't trust myself to evaluate them and probably wouldn't trust someone without more experience working with viruses.
There was a recent letter to Science arguing that the lab escape hypothesis sould be investigated in depth in addition to investigating te other hypotheses (https://science.sciencemag.org/content/372/6543/694.1)- The main reason this is important is that Ralph Baric signed this letter. He leads one of the top two coronavirus labs in the world studying SARS-like viruses. (The other being the WIV.) He is a collaborator with the WIV as well. If his signature here means that he thinks it could be a lab leak and should be investigated (caveat because he might have other reasons for signing), then that is decent evidence that a GoF lab origin is technically possible.
I’ve found a shortage of articles that make a good argument for a natural origin. I do not think this is because there is no case for a natural origin, but rather that there are more incentives to write compelling cases for a lab origin. In articles about the virus’ origin, I’ve seen a lot more words on the history of lab escapes than I have about the history of natural spillovers. This is dumb -- we have lots of data about how natural spillovers occur, and these data should be incorporated to any investigation into SARS-CoV-2 origins. Edit: Someone shared an interesting preprint that Eliezer re-shared, Early appearance of two distinct genomic lineages of SARS-CoV-2 in different Wuhan wildlife markets suggests SARS-CoV-2 has a natural origin. I haven't had time to look closely at it, but as I said above, I think evolutionary analysis of the early pandemic origin is important fro revealing its source. To tell if this paper's results are significant, I'd want to know exactly how many changes were present in the two lineages, and estimate how many human-infections-ago was the common ancestor. I don't find the results immediately compelling because I'd expect the changes between these lineages to be large if they represented two distinct animal-to-human spillover events.
The recent article by Steven Quay & Richard Muller in the Wall Street Journal attempts to bring the issue to a head by simplifying it down to two main points:
(1) The double CGG codons in the SARS2 furin cleavage site were deliberately designed by the 11 or 12 researchers who have created chimeric viruses as an unmistakable 'marker' for lab-made viruses so that you could always tell which future mutations evolved from a lab virus and which were naturally evolved. SARS2 has these tell-tale double CGG codons in its furin cleavage site, ergo it's lab-made.
(2) Natural evolution, of the type displayed by SARS1 & MERS, involves a long series of "run-up" mutations (tries and fails) both in the bat & in the intermediary animals (palm civets, dromedary camels). They also had a similar series of immediate "follow-on" mutations in a race for "optimization" of infectivity once the virus broke out in humans. No evidence has been found for SARS2 displaying either the run-up or the follow-on, ergo it's unlikely to be naturally evolved.
I would welcome hearing of competent commentary that directly refutes these two arguments. Maybe an actual gene-splitting researcher might say "Nah, we did it that way because it was easier, or because it was cheaper. We didn't do it to create a marker." Or maybe "We kept using the same codons as previous researchers had used merely in order to eliminate one factor of variability and make it easier to analyze our results." Something like that.
Or, "The way SARS1 and MERS developed is only one of the possible ways for viruses to evolve. There are many other ways. That they didn't have a run-up and follow-on of mutations is in no way indicative."
If anyone finds articles addressing these arguments head-on, I would appreciate hearing about it.
Re 1) the codons, according to Christian Drosten, have precedence for evolving naturally in viruses. That could be because viruses evolve much faster than e.g. animals. Source: search for 'codon' and use translate here: https://www.ndr.de/nachrichten/info/92-Coronavirus-Update-Woher-stammt-das-Virus,podcastcoronavirus322.html
The link also has a bunch of content about the evolution of furin cleavage sites, from a leading expert.
Do you have a cite for previous work reporting or using this sequence (something like cct cgg cgg gca) for a cleavage site in viruses? I only ended up finding and looking through one bit of prior gain of function research that's the sort of genetic engineering you're hypothesizing ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3168280/ ) but it used a totally different sequence. Better yet, someone from pre-covid-19 times talking about how they made their code include "cggcgg" as a marker.
Richard Muller, co-author with Steven Quay of the WSJ article, states in his interview with Sky News Australia (Scientific report suggests Wuhan lab leak as origin of COVID-19, YouTube, 10 June 2021, at 5:40 mark) that CGG was the spelling of arginine "most used in the laboratory" in lab-inserted furin cleavage sites and was in fact used by Shi Zhengli at the WIV, as she reported in one of her published papers.
But Steven Quay's mammoth 193-page Bayesian Analysis of SARS-Cov-2 Origin (https://zenodo.org/record/4477081#.YMU0-S0ZNE4) puts the number at only a half of lab experiments and suggests additional reasons for the choice, in addition to tracking, which seems to be mentioned as merely another "additional advantage". See the section entitled "Evidence. Laboratory codon optimization uses CGG for laboratory insertions of arginine residues 50% of the time." (p. 90)
The interpretation of "marker" as a deliberate research strategy for distinguishing lab-made from naturally occurring viruses is my own, and may be overstating the explicit intentions of researchers. It is derived from Steven Quay's SJW article, specifically this passage:
"Although the double CGG is suppressed naturally, the opposite is true in laboratory work. The insertion sequence of choice is the double CGG. That’s because it is readily available and convenient, and scientists have a great deal of experience inserting it. An additional advantage of the double CGG sequence compared with the other 35 possible choices: It creates a useful beacon that permits the scientists to track the insertion in the laboratory."
Still, deliberate or incidental, the presence of a double-CGG in the furin cleavage site of COVID-19 weighs heavily on the lab origin side of the probability argument, since its 50% use in lab insertions contrasts strongly with a 0% probability (so far) of finding it anywhere in the entire genome of all other viruses in the sarbecovirus sub-class of betacoronaviruses that SARS1, MERS & SARS2 belong to -- none of which, apart from SARS2, even has a furin cleavage site.
Whatever the intentions of researchers, is there another interpretation of the empirical data that would alter Steven Quay's "beyond a reasonable doubt" conclusion that the virus came from a lab?
What other factors could be at play here to qualify further the results of his Bayesian analysis?
If you wanted strong tracking why would you only do it once and not a few times so it's more stable?
It's because there is only one single place in the genome that you really want to track: the furin cleavage site (FCS). I assumed the wrong reason for using double CGG.
It's not to distinguish natural from lab-made viruses (although it does do that).
It's so that you can have a test in the lab for whether the FCS you have inserted is working or not. It's so that you can "check your work".
The unique spelling with double CGG is the only one out of the 36 possible configurations of arginine (the "R" in the "PRRA" FCS insertion) that allows you to track whether the cleavage you are trying to engineer has happened.
Steven Quay explains this at the 59:00 mark of his interview with Julius KIllerby, which is well worth listening to in its entirety, as it explains the odds of a lab leak vs. natural evolution, based on undisputed facts.
I'm not remotely qualified to comment on this, but fwiw in the Mojiang Mine Theory (which says it was a lab leak, but did not involve GOF), six miners caught the virus from bats (and/or each other), and then the virus spent four months replicating within the body of one of these poor guys as he lay sick in a hospital (and then of course samples were sent to WIV and put in storage).
This would explain (2) because four months in this guy's body (especially lungs) allows tons of opportunity for the virus to evolve and mutate and recombine in order to adapt to the human body, and maybe it also explains (1) either randomly or via recombination between viral and human DNA (if that makes sense?), again during those four months in this poor guy's body.
It seems like an interesting hypothesis but I don't think it's particularly likely. I've never heard of other viruses becoming well adapted to humans within a single host. Though, I do think that's the explanation for how several variants evolved (since some of them emerged with a bunch of functional mutations rather than just one or two). I'd be interest to see more research into the evolution of viruses within human hosts, and what degree of change is possible & how this relates to spillover events.
The clearest resolution criteria I can imagine involve the question of whether an intermediate host will be found, or evidence of direct transfer from bats.
That's no clear resolution criteria and it took 15 years for SARS even if you have a decent solution criteria.
I don't think it's super clear, but I do think it's the clearest that we are likely to get that's more than 10% likely. I disagree that SARS could 15 years, or at least I think that one could have been called within a year or two. My previous attempt to operationalize a bet had the bet resolve if, within two years, a mutually agreed upon third party updated to believe that there is >90% probability that an identified intermediate host or bat species was the origin point of the pandemic, and that this was not a lab escape. Now that I'm writing this out, I think within two years of SARS I wouldn't have been >90% civet-->human origin. I'd guess I would have been 70-80% on civet-->human. But I'm currently <5% on any specific intermediate host for SARS-CoV-2, so something like the civet finding would greatly increase my odds that SARS-CoV-2 is a natural spillover.
Having looked more into it, it's quite plausible that we will have confirmation that it's a lab leak in a few months or years. The US intelligence community is currently tasked with looking for evidence, and it's quite plausible that someone in China actually knows that it's a lab leak and the US intelligence community manages to intercept clearcut information that goes beyond the reduced cell phone traffic and possible road closures around the WIV in October 2019 and the 3 researchers from the WIV who went to the hospital with symptoms matching flu and COVID-19 in November 2019.
Given that full transparency from Chinese authorities is unlikely, assessing the probabilities is the best we can do. Fortunately, that has been done with with impressive scientific rigour by DRASTIC member Dr. Steven Quay MD, PhD in his technically detailed 193-page Bayesian analysis of 26 known facts about the outbreak:
which he explains in layman's terms in his interview with Julius Killerby (cited in my comment above).
The advantage of this approach is that it follows the scientific method: laying out clearly its premises and calculations so that they can be challenged and tested by experts in the field.
The evidence is so convincing that, along with his influential piece in the Wall Street Journal (co-authored by astrophysicist Richard Muller)
his Bayesian analysis -- made available to both the WHO and the Biden administration -- likely represents 'the writing on the wall' for public decision-makers. It was the 'nudge' indicating that keeping the story low-key was no longer an option, given the amount of technical expertise weighing in on the subject in public discussion.
In my view, given the dramatic quality of the statistical evidence, the Biden administration now finds itself the dog that caught the car. The three-month time period for a report from the intelligence community is likely only a breather to assess how to handle the truth of the matter politically with China, and no longer an attempt to establish what is actually true.
If his signature here means that he thinks it could be a lab leak and should be investigated (caveat because he might have other reasons for signing), then that is decent evidence that a GoF lab origin is technically possible.
If he signed because he found the public pressure to sign high enough that also tells us something about his assessment of the current situation.