Saar Wilf is an ex-Israeli entrepreneur. Since 2016, he’s been developing a new form of reasoning, meant to transcend normal human bias.

His method - called Rootclaim - uses Bayesian reasoning, a branch of math that explains the right way to weigh evidence. This isn’t exactly new. Everyone supports Bayesian reasoning. The statisticians support it, I support it, Nate Silver wrote a whole book supporting it.

But the joke goes that you do Bayesian reasoning by doing normal reasoning while muttering “Bayes, Bayes, Bayes” under your breath. Nobody - not the statisticians, not Nate Silver, certainly not me - tries to do full Bayesian reasoning on fuzzy real-world problems. They’d be too hard to model. You’d make some philosophical mistake converting the situation into numbers, then end up much worse off than if you’d tried normal human intuition.

Rootclaim spent years working on this problem, until he was satisfied his method could avoid these kinds of pitfalls. Then they started posting analyses of different open problems to their site, Here are three:

For example, does Putin have cancer? We start with the prior for Russian men ages 60-69 having cancer (14.32%, according to health data). We adjust for Putin’s healthy lifestyle (-30% cancer risk) and lack of family history (-5%). Putin hasn’t vanished from the world stage for long periods of time, which seems about 4x more likely to be true if he didn’t have cancer than if he did. About half of cancer patients lose their hair, and Putin hasn’t, so we’ll divide by two. On the other hand, Putin’s face has gotten more swollen recently, which happens about six times more often to cancer patients than to others, so we’ll multiply by six. And so on and so forth, until we end up with the final calculation: 86% chance Putin doesn’t have cancer, too bad.

This is an unusual way to do things, but Saar claimed some early victories. For example, in a celebrity Israeli murder case, Saar used Rootclaim to determine that the main suspect was likely innocent, and a local mental patient had committed the crime; later, new DNA evidence seemed to back him up.

One other important fact about Saar: he is very rich. In 2008, he sold his fraud detection startup to PayPal for $169 million. Since then he’s founded more companies, made more good investments, and won hundreds of thousands of dollars in professional poker.

So, in the grand tradition of very rich people who think they have invented new forms of reasoning everywhere, Saar issued a monetary challenge. If you disagree with any of his Rootclaim analyses - you think Putin does have cancer, or whatever - he and the Rootclaim team will bet you $100,000 that they’re right. If the answer will come out eventually (eg wait to see when Putin dies), you can wait and see. Otherwise, he’ll accept all comers in video debates in front of a mutually-agreeable panel of judges.

Since then, Saar and his $100,000 offer have been a fixture of Internet debates everywhere. When I argued that Vitamin D didn’t help fight COVID (Saar thinks it does), people urged me to bet against Saar, and we had a good discussion before finally failing to agree on terms. When anti-vaccine multimillionaire Steve Kirsch made a similar offer, Saar took him up on it, although they’ve been bogged down in judge selection for the past year.

Rootclaim also found in favor of the lab leak hypothesis of COVID. When Saar talked about this on an old ACX comment thread, fellow commenter tgof137 (Peter Miller) agreed to take him up on his $100K bet.

At the time, I had no idea who Peter was. I kind of still don’t. He’s not Internet famous. He describes himself as a “physics student, programmer, and mountaineer” who “obsessively researches random topics”. After a family member got into lab leak a few years ago, he started investigating. Although he started somewhere between neutral and positive towards the hypothesis, he ended up “90%+” convinced it was false. He also ended up annoyed: contrarian bloggers were raking in Substack cash by promoting lab leak, but there seemed to be no incentive to defend zoonosis.

[Rest of the article here]

New Comment
22 comments, sorted by Click to highlight new comments since:

Way back in 2020 there was an article A Proposed Origin For SARS-COV-2 and the COVID-19 Pandemic, which I read after George Church tweeted it (!) (without comment or explanation). Their proposal (they call it "Mojiang Miner Passage" theory) in brief was that it WAS a lab leak but NOT gain-of-function. Rather, in April 2012, six workers in a "Mojiang mine fell ill from a mystery illness while removing bat faeces. Three of the six subsequently died." Their symptoms were a perfect match to COVID, and two were very sick for more than four months.

The proposal is that the virus spent those four months adapting to life in human lungs, including (presumably) evolving the furin cleavage site. And then (this is also well-documented) samples from these miners were sent to WIV. The proposed theory is that those samples sat in a freezer at WIV for a few years while WIV was constructing some new lab facilities, and then in 2019 researchers pulled out those samples for study and infected themselves.

I like that theory! I’ve liked it ever since 2020! It seems to explain many of the contradictions brought up by both sides of this debate—it’s compatible with Saar’s claim that the furin cleavage site is very different from what’s in nature and seems specifically adapted to humans, but it’s also compatible with Peter’s claim that the furin cleavage site looks weird and evolved. It’s compatible with Saar’s claim that WIV is suspiciously close to the source of the outbreak, but it’s also compatible with Peter’s claim that WIV might not have been set up to do serious GoF experiments. It’s compatible with the data comparing COVID to other previously-known viruses (supposedly). Etc.

Old as this theory is, the authors are still pushing it and they claim that it’s consistent with all the evidence that’s come out since then (see author’s blog). But I’m sure not remotely an expert, and would be interested if anyone has opinions about this. I’m still confused why it’s never been much discussed.

I agree, I think the most likely version of the lab leak scenario does not involve an engineered virus. Personally I would say 60% chance zoonotic, 40% chance lab leak.

Given that they had engineered viruses in the lab at biosafety level II, why do you think the most likely version of the lab leak scenario does not involve an engineered virus?

I’m interested in Metacelsus’s answer.

My take is: I really haven’t been following the lab leak stuff. The point of my comment was to bring this hypothesis to the attention of people who have, and hopefully get some takes from them. As I understand it:

  • We know for sure that miners went into a cave, the same cave where btw one of the closest known wild relatives of COVID was later sampled
  • We know for sure that the miners got sick with COVID-like symptoms, some for 4+ months
  • We know for sure that samples (including posthumous samples) from those sick miners were sent to WIV, and that the researchers still had access to those samples into 2020

I think that’s more than enough to at least raise the Mojiang Miner Passage theory to consideration. Figuring out whether the theory is actually true or not would require a lot more beyond that, e.g. arguments about the exact genetic code of the furin cleavage site and all this other stuff which is way outside my area of expertise.  :)


The frustrating thing about the discussion about the origins is that people seldom show recognition of the priorities here, and all get lost in the weeds.

You can get n layers deep into the details, and if the bottom is at n+1 you're fucked. To give an example I see people talking about with this debate, "The lab was working on doing gain of function to coronaviruses just like this!" sounds pretty damning but "actually the grant was denied, do you think they'd be working on it in secret after they were denied funding?" completely reverses it. Then after the debate, "Actually, labs frequently write grant proposals for work they've already done, and frequently are years behind in publishing" reverses it again. Even if there's an odd number of remaining counters, the debate doesn't demonstrate it. If you're not really really careful about this stuff, it's very easy to get lost and not realize where you've overextended on shaky ground.

Scott talks about how Saar is much more careful about these "out of model" possibilities and feels ripped off because his opponent wasn't, but at least judging from Scott's summary it doesn't appear he really hammered on what the issue is here and how to address it.

Elsewhere in the comments here Saar is criticized for failing to fact check the dead cat thing, and I think that's a good example of the issue here. It's not that any individual thing is too difficult to fact check, it's that when all the evidence is pointing in one direction (so far as you can tell) then you don't really have a reason to fact check every little thing that makes total sense so of course you're likely to not do it. If someone argues that clay bricks weigh less than an ounce, you're going to weigh the first brick you see to prove them wrong, and you're not going to break it open to confirm that it's not secretly filled with something other than clay. And if it turns out it is, that doesn't actually matter because your belief didn't hinge on this particular brick being clay in the first place.

If it turns out that a lot of your predictions turn out to be based on false presuppositions, this might be an issue. If it turns out the trend you based your perspective on just isn't there, then yeah that's a problem. But if that's not actually the evidence that formed your beliefs, and they're just tentative predictions that aren't required by your belief under question, then it means much less. Doubly so if we're at "there exists a seemingly compelling counterargument" and not "we've gotten to the bottom of this, and there are no more seemingly compelling counter-counterarguments".

So Saar didn't check if the grant was actually approved. And Peter didn't check if labs sometimes do the work before writing grant proposals. Or they did, and it didn't come through in the debate. And Saar missed the cat thing. Peter did better on this game of "whack-a-mole" of arguments than Saar did, and more than I expected, but what is it worth? Truth certainly makes this easier, but so does preparation and debate skill, so I'm not really sure how much to update here.

What I want to see more than "who can paint an excessively detailed story that doesn't really matter and have it stand up to surface level scrutiny better", is people focusing on the actual cruxes underlying their views. Forget the myriad of implications n steps down the road which we don't have the ability to fully map out and verify, what are the first few things we can actually know, and what can we learn from this by itself? If we're talking about a controversial "relationship guru", postpone discussions of whether clips were "taken out of context" and what context might be necessary until we settle whether this person is on their first marriage or fifth. If we're wondering if a suspect is guilty of murder, don't even bother looking into the credibility of the witness until you've settled the question of does the DNA match.

If there appears to be a novel coronavirus outbreak right outside a lab studying novel coronaviruses, is that actually the case? Do we even need to look at anything else, and can looking at anything else even change the answer?

To exaggerate the point to highlight the issue, if there were unambiguously a million wet markets that are all equivalent, and one lab, and the outbreak were to happen right between the lab and the nearest wet market, you're done. It doesn't matter how much you think the virus "doesn't look engineered" because you can't get to a million to one that way. Even if you somehow manage to make what you think is a 1000:1 case, a) even if your analysis is sound it still came from the lab, b) either your analysis there or the million to one starting premise is flawed. And if we're looking for a flaw in our analyses, it's going to be a lot easier to find flaws in something relatively concrete like "there are a million wet markets just like this one" than whatever is going into arguing that it "looks natural".

So I really wish they'd sit down and hammer out the most significant and easiest to verify bits first. How many equally risky wet markets are there? How many labs? What is the quantitative strength of the 30,000 foot view "It looks like an outbreak of chocolatey goodness in Hershey Pennsylvania"? What does it actually take to have arguments that contain leaks to this degree, and can we realistically demonstrate that here?

I think Michael Weissman's v5.7 research/analysis might be exactly what you are looking for. I've been searching for a long time for analysis that makes a compelling case in either direction, especially for the absolutely most important core components of the debate. In a sea of high-effort research and analysis, Michael's post is the first one that has convinced me. He dives into very similar points to what you're searching for.

Even if you don't read it in full (it's long), I still see value in searching for specific elements to see his analysis on those points, such as his discussion about the wet market. For example, if you search for "animals/year" and "HSM" (Huanan Seafood Market), you'll see he goes into the animal trade numbers specifically at the HSM when compared to numbers for other wet markets in China. There are many other topics he analyzes that you might find similarly interesting.

Like you, I am wary of getting distracted too much with lines of evidence that may ultimately carry little weight. I appreciate that Gwern likely was motivated by the cat evidence to demonstrate to everyone how Peter may misrepresent evidence/arguments; I also think this evidence is so insignificant to the overall debate that it's not important enough to get bogged down in. 

This is an oversimplification, but for brevity, I think the case really rests on two components: the wet market as the origin, and the DEFUSE proposal. The wet market is so foundational to a Zoonosis argument that if it were disproved, it really seems like the closest thing we've got right now to a "does the DNA match?" question.

Here's a brief list of some recent information (some as recent as March 2024) that updated me towards lab leak and added crucial evidence for what we actually "know". This is for the sake of explaining my thoughts to others, but is in no way all-encompassing. Michael does a far superior job of explaining these in great depth.

  • Study published March 5th, 2024 finding intermediate sequences between Lineage A and B. This research shows that Lineage B very likely came from Lineage A. All cases in the market were Lineage B, but none were Lineage A. In short, the research shows that a single spillover is much more likely than a double-spillover Zoonotic event. The double-spillover theory is a foundational argument of the ZW theory that Peter Miller and others use. This is a massive blow to the probability that the wet market was the origin of the virus, to the point where it now seems extremely unlikely that the wet market was the origin.
  • Wildlife trade in Wuhan is significantly less than Wuhan's percentage of the population, which significantly changes the probabilities downwards of a ZW origin in the bayesian calculations that Peter Miller and others use.
  • Although the DEFUSE proposal leaked in 2021, more recent drafts were discovered in 2024 which contain what appears to be damning evidence. New information included their approach using restriction enzymes (BsaI/BsmBI) that ultimately matched precisely with what Bruttel et al. (2022) found as the assembly process that would create exactly this virus, years before this DEFUSE draft leak was even public. Michael describes the degree of how unlikely this would be if the origin was Zoonotic. The DEFUSE budget leak confirms that they were purchasing these enzymes. To your point about focusing on things that we "know", the BsaI/BsmBI restriction enzyme information is new and now falls in the category of actual high-weight evidence for a high-weight core component of the overall debate. Additionally, the new documents contained draft comments that were not available in the original leaked proposal. Among many other things, the comments show that the research work was actually planned to be done at the WIV at BSL-2 levels for cost reduction, but they edited the final document to "BSL-3" because they thought "US researchers will likely freak out" if they knew this research was being done in lower safety BSL-2 labs. The researchers seemed to think the distinction didn't matter for their research and that it was bureaucratic tape slowing them down, so they fudged the proposal to hide this. Considering BSL-2 labs are not sufficiently designed to contain airborne disease (whereas BSL-3 labs are), this does not seem to be an insignificant point in this whole debate.

The DEFUSE proposal is especially difficult because it's uncertain and very much in the realm of "how much can we really know", but it seems so incredibly relevant and high-weight to the debate that I really think it still should be considered at the core and should be hammered out as much as possible. When looking at how SARS-CoV-2 ended up, they are unbelievably spot-on with describing specifically what they were working on, how precisely they would do it, the restriction enzymes they would use, the Furin cleavage site, the locations they would do it, the unsafe biosecurity levels the research would be done at, their motivations for the research, and much more. My understanding is that there were only 3 institutions in the world that were doing this exact research, and two of them (WIV and UNC) were involved with this proposal. The proposal describes a research plan that uncannily resembles the precise sequence of events and conditions one would anticipate if a pandemic were to emerge from a laboratory incident at or near the WIV. It really is almost as close a match as you could possibly expect.


I hope this helps. I'm curious what you and others think.


My current initial impression is that this debate format was not fit for purpose:

A debate sequel, with someone other than Peter Miller (but retaining and reevaluating all the evidence he got from various sources) would be nice. I can easily imagine Miller doing better work on other research topics that don't involve any possibility of cover ups or adversarial epistemics related to falsifiability, which seem to be personal issues for him in the case of lab leak at least.

Maybe with 200k on the line to incentivize Saar to return, or to set up a team this time around? With the next round of challengers bearing in mind that Saar might be willing to stomach a net loss of many thousands of dollars in order to promote his show and methodology?


If $100k was not enough to incentivize Saar & his team to factcheck Peter's simplest claims like "Connor said his cat died of COVID-19", where it takes me literally 15 seconds* to find it in Google and verify that Connor said the exact opposite of that (where an elementary school child could have factchecked this as well as I did), I don't think $200k is going to help Saar either. And I don't know how one would expect the debate format to work for any genuinely hard question if it takes approaching a million dollars to get anyone to do sub-newspaper-level factchecking of Peter's claims. (If you can't even check quotes, like 'did this dude say in the Daily Mail what Peter said he said?' how on earth are you going to do well at all of these other things like mahjong parlors in wet markets that no longer exist or novel viral evolution or CCP censorship & propaganda operations or subtle software bugs in genomics software written by non-programmers...?) The problem is not the dollar amount.

* and I do mean "literally" literally. It should take anyone less than half a minute to check the cat claim, and if it takes more, you should analyze what's wrong with you or your setup. If you doubt me, look at my directions, which are the first query anyone should make - and if that's not an obvious query, read my search case-studies until it is - then get a stopwatch, open up in a tab if you have neglected to set up a keyboard shortcut, and see how long it takes you to factcheck it as I describe.

Curated. (In particular recommending people click through and read the full Scott Alexander post)

I've been tracking the Rootclaim debate from the sidelines and finding it quite an interesting example of high-profile rationality. 

I have a friend who's been following the debate quite closely and finding that each debater, while flawed, had interesting points that were worth careful thought. My impression is a few people I know shifted from basically assuming Covid was probably a lab-leak, to being much less certain.

In general, I quite like people explicitly making public bets, and following them up with in-depth debate.

[Mod note: I edited out some of the meta commentary from the beginning for this curation. In-general for link posts I have a relatively low bar for editing things unilaterally, though I of course would never want to misportray what an author said] 

I've been tracking the Rootclaim debate from the sidelines and finding it quite an interesting example of high-profile rationality.

Would you prefer the term "high-performance rationality" over "high-profile rationality"?


One thing that occurs to me is that each analysis, such as the Putin one, can be thought of as a function hypothesis.

It takes as inputs the variables:

Russian demographics

healthy lifestyle

family history

facial swelling

hair present

And is outputting the probability 86%, where the function is

P = F(demographics, lifestyle, history, swelling, hair) and then each term is being looked up in some source, which has a data quality, and the actual equation seems to be a mix of Bayes and simple probability calculations.

There are other variables not considered, and other valid reasoning tracks.  You could take into account the presence of oncologists in putin's personal staff.  Intercepted communication possibly discussing it.  Etc.  I'm not here to discuss the true odds of putin developing cancer, but note that if the above is "function A", and another function that takes into account different information is "function B", you should be aggregating all valid functions, forming a "probability forest".  

Perhaps you weight each one by the likelihood of the underlying evidence being true.  For example each of the above facts is effectively 100% true except for the hair present (putin could have received a hair transplant) and family history (some relative causes of death could be unknown or suspicious that it was cancer)

This implies a function "A'n", where we assume and weight in the probability that each combination of the underlying variables has the opposite value.  For example, if pHair_Present = 0.9, A' has one permutation where the hair is not present due to a transplant.

This hints at why a panel of superforecasters is presently the best we can do.  Many of them do simple reasoning like this and we see it in the comment section on Manifold.  But each individual human doesn't have the time to think of 100 valid hypotheses and to calculate the resulting probability, many manifold bettors seem to usually consider 1 and bet their mana.

An AI system (LLM based with plugin access) able to do the legwork here would be very useful...

Giving this kind of pearls in the description of the method : " “There is only one straight line that contains two different points”." (, one can't help but wonder if the claimed method is as sound as it's supposed implications are far reaching...

A problem with the debate format is mistakes that may be picked up if submissions were filed in advance can get missed. For example, the claim serial passage would show N501Y mutations that are not seen in SARS-CoV-2 was incorrect. It would in BALB/c mice but not hACE2 mice which is what WIV had.

In terms of getting to the truth of the matter since the debate several new papers have undermined the core arguments relied on from Worobey et al and Pekar et al. for Huanan Seafood Market origin:

  1. Spatial statistics experts Stoyan and Chiu (2024) find the statistical argument by Worobey et. al. that Huanan Seafood Market was the early epicenter is flawed.

  2. Lv et. al. (2024) found new intermediate genomes so the multiple spillover theory is unlikely (it was anyway given lineage A and B are only two mutations apart). Single point of emergence is more likely with lineage A coming first. The market cases were all lineage B so not the primary cases. Their findings are consistent with Caraballo-Ortiz (2022), Bloom (2021).

  3. Jesse Bloom (2023) published a new analysis showing that genetic material from some animal CoVs is fairly abundant in samples collected during the wildlife-stall sampling of the Huanan Market on Jan-12-2020. However, SARS-CoV-2 is not one of these CoVs.

  4. Michael Weissman (2024) shows a model with ascertainment collider stratification bias fits early Covid case location data much better than the model that all cases ultimately stemmed from the market. George Gao, Chinese CDC head at the time, acknowledged this to the BBC last year - they focused too much on and around the market and may have missed cases on the other side of the city).

  1. The anonymous expert who identified coding errors in Pekar et. al. leading to an erratum last year has found another significant error. Single spillover looks more likely.

  2. Ultimately was performing in vivo experiments in transgenic (human ACE2 expressing) mice and civets in 2018 and 2019 in SARS-like CoVs. The results are unknown and they won't share their records.

Ultimately was performing

missing subject, who was performing? I guess WIV?