A friend of mine requested that I write up some of my comments on the lab leak hypothesis, since I had done quite a bit of research into this in 2020. This was originally asked in the context of the longform Facebook post that Eliezer wrote regarding the origins of the Covid-19 pandemic and the implications for the future: https://www.facebook.com/yudkowsky/posts/10159653334879228
My comment is less organized than I would prefer, but I figured it was better to post in a rough form than not post at all.
In early January 2021, I wrote on Facebook and Lesswrong:
I've done a lot of thinking about the origins of SARS-CoV-2 and I still find the lab escape hypothesis quite credible. We still haven't found an intermediate host, which is surprising if the virus emerged naturally. This issue has become politicized, but we really need a neutral investigation in order to figure out what actually happened. This is quite important for understanding how to prevent future pandemics.
The New York Magazine article, "The Lab-leak Hypothesis" is long, but I can personally verify the author is reporting on the source material pretty accurately. I've done over 200 hours of research on this topic and have read basically all the sources the article cites. That said, I don't agree with all of the claims. I do not think the SARS-CoV-2 virus is very likely to have been created using the RATG13 virus, because of the genetic differences spread out throughout the genomes. However, there are many other paths that could have led to a lab escape, and I'm somewhat agnostic between several of them.
I believe we need more transparency into the whole investigation, especially the efforts on the ground in China. What is the process by which scientists in Hubei, Yunnan, and other provinces are looking for intermediate hosts? How have the investigation(s) into the WIV been organized and what processes did they use? What agencies, methods, and results?
COVID-19 has had a huge effect on the whole world. It's politically appropriate to call for accountability in these investigations, because there is a clear public interest for literally the whole global population.
I think that the Covid-19 pandemic was the result of a lab escape, with 85% probability. I'm up for bets with people who think a lab escape is unlikely. The clearest resolution criteria I can imagine involve the question of whether an intermediate host will be found, or evidence of direct transfer from bats.
Alina Chan (https://twitter.com/Ayjchan) has a lot of good discussion of the lab escape hypothesis - She comments on basically all the articles and makes good conceptual clarifications. She also wrote, https://www.rationaloptimist.com/8691 which is a decent overview overview.
I think the ideal piece explaining the state of the evidence on the lab leak hypothesis isn't very long. Eliezer is right that most of the supporting evidence comes from broad circumstantial evidence: The kind of viruses the WIV was sampling. The kind of research they were conducting (documented in Ecohealth Alliance's grant proposal & in their published papers). The closest relative of SARS-CoV-2 being very far away. The virus already seeming well adapted to humans in Wuhan when it was first sequenced. The lack of intermediate host being found.
Lots of pieces include these things but also go into great detail about, say, the RATG13 renaming or the WIV database being taken offline. I agree with Eliezer that details like this should not be taken as much evidence for a lab escape.
One question that I think hasn't received enough attention, though Alina Chan has a paper on it, is how well adapted to humans SARS-CoV-2 is. https://www.biorxiv.org/content/10.1101/2020.05.01.073262v1
If Alina is right, and my own epistemic spot checking suggests she is (though I'd like to hear arguments against, such as the reviewer feedback on her paper submission), then this is a lot of evidence for the gain-of-function (GoF) lab leak hypothesis, especially in conjunction with the emergence in Wuhan
It would be surprising for the virus to show up in Wuhan first, but if it showed up there and clearly went through a process of human adaption, that would be much more consistent with the natural origin hypothesis. If we accept that it was well adapted to humans, and that this couldn't have happened without evolution within a human host, now we have to posit a whole period of cryptic transmission while the virus became adapted to humans.
Some questions we can't really answer without more data that we're unlikely to get. We're unlikely to get the WIV's records, for example. But someone who understands viral evolution could do more work to analyze Alina's claim that the virus was well adapted to humans right away. (the main evidence for this is the comparatively low rate of amino acid changes early in the Wuhan outbreak compared with the original SARS-CoV outbreak and other spillovers from animal hosts.)
https://thebulletin.org/2021/05/the-origin-of-covid-did-people-or-nature-open-pandoras-box-at-wuhan/amp/ - This is an article I would recommend, though with some caveats. I think most people won't be able to evaluate the arguments about the furin cleavage sites or codons -- if I were writing this article I would leave them out.
And I'm not sure those arguments are actually that good. I had a hard time evaluating them. Protease cleavage sites can certainly evolve naturally or during passaging. And they're also a reasonable thing to insert to study potential for transmission in humans. But these arguments seem pretty easy to get wrong / I don't trust myself to evaluate them and probably wouldn't trust someone without more experience working with viruses.
There was a recent letter to Science arguing that the lab escape hypothesis sould be investigated in depth in addition to investigating te other hypotheses (https://science.sciencemag.org/content/372/6543/694.1)- The main reason this is important is that Ralph Baric signed this letter. He leads one of the top two coronavirus labs in the world studying SARS-like viruses. (The other being the WIV.) He is a collaborator with the WIV as well. If his signature here means that he thinks it could be a lab leak and should be investigated (caveat because he might have other reasons for signing), then that is decent evidence that a GoF lab origin is technically possible.
I’ve found a shortage of articles that make a good argument for a natural origin. I do not think this is because there is no case for a natural origin, but rather that there are more incentives to write compelling cases for a lab origin. In articles about the virus’ origin, I’ve seen a lot more words on the history of lab escapes than I have about the history of natural spillovers. This is dumb -- we have lots of data about how natural spillovers occur, and these data should be incorporated to any investigation into SARS-CoV-2 origins.
Edit: Someone shared an interesting preprint that Eliezer re-shared, Early appearance of two distinct genomic lineages of SARS-CoV-2 in different Wuhan wildlife markets suggests SARS-CoV-2 has a natural origin. I haven't had time to look closely at it, but as I said above, I think evolutionary analysis of the early pandemic origin is important fro revealing its source. To tell if this paper's results are significant, I'd want to know exactly how many changes were present in the two lineages, and estimate how many human-infections-ago was the common ancestor. I don't find the results immediately compelling because I'd expect the changes between these lineages to be large if they represented two distinct animal-to-human spillover events.