This is a linkpost, created for the 2021 Review.

I know I’m two months late here. Everyone’s already made up their mind and moved on to other things.

But here’s my pitch: this is one of the most carefully-pored-over scientific issues of our time. Dozens of teams published studies saying ivermectin definitely worked. Then most scientists concluded it didn’t. What a great opportunity to exercise our study-analyzing muscles! To learn stuff about how science works which we can then apply to less well-traveled terrain! Sure, you read the articles saying that experts had concluded the studies were wrong. But did you really develop a gears-level understanding of what was going on? That’s what we have a chance to get here!

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 4:05 AM

Alexandros Marinos (LW profile) has a long series where he reviewed Scott's post:

The Potemkin argument is my public peer review of Scott Alexander’s essay on ivermectin. In this series of posts, I go through that essay in detail, working through the various claims made and examining their validity. My essays will follow the structure of Scott’s essay, structured in four primary units, with additional material to follow

 This is his summary of the series, and this is the index. Here's the main part of the index:


Part 1: Introduction (TBC)

Part 2: The Devil's Advocate

The Studies

The first substantial part of the essay is Scott’s analysis of a subset of the ivermectin studies, taking up over half of the word-count of the entire essay. I go through his commentary in detail:

Part 3 - The deeply flawed portrayal of Dr. Flavio Cadegiani.

Part 4 - The equally false portrayal of the Biber et al. study, built on flawed claims by Gideon Meyerowitz-Katz.

Part 5 - Another near-identical pattern of false portrayal took place with Babalola et al.

Part 6 - The statistical methods of Dr. John Carlisle (studies: Cadegiani et al. (again), Elafy et al., and Ghauri et al.).

Part 7 - Statistical power (studies: Mohan et al., Ahmed et al., Chaccour et al., Buonfrate et al.).

Part 8 - Synthetic control groups (studies: Cadegiani et al. (again!), Borody et al.).

Part 9 - Observational studies (studies: Merino et al., Mayer et al., and Chahla et al.).

Part 10 - Discussing the Lopez-Medina trial.

Part 11 - Discussing the TOGETHER trial in comparison with the infamous Carvallo trial.

Part 12 - The Krolewiecki research program.

Part 13 - Wrapping up the study review and extracting the implied criteria.

The Analysis

Part 14 - We get into the meta-analysis part of the essay, and possibly the deepest flaw of them all

I have previously written two essays that are summarized in part 14. I don’t really recommend spending time on these, but if somehow you still want bonus material feel free to read “Scott Alexander's Correction on Ivermectin, and its Meaning for the Rationalist Project.” and “Scott Alexandriad III: Driving up the Cost of Critique.”

The Synthesis

Interlude I - Scott Alexander and the Worm of Inconsistency

Part 15 - Do Strongyloides worms explain the positive ivermectin trials?

Part 16 - How the perfect meme was delivered to the masses

Part 17 - Funnel Plots and all that can go wrong with them

The Takeaways

Part 18 - Scott’s Scientific Takeaway

Part 19 - My Scientific Takeaway

Part 20 - Scott’s Sociological Takeaway

Part 21 - The Political Takeaway

He also wrote a response right after Scott published (and before writing this series).

Here's some excerpts from the summary that show his position (but without arguing for it):

Scott Alexander’s argument on ivermectin, in terms of logical structure, went something like this:

  1. Of the ~80 studies presented on in Nov ‘21, zoom in on the 29 “early treatment” studies.
  2. After reviewing them, he rejected 13 for various reasons, leaving him with 16.
  3. He also rejected an additional five studies based on the opinion of epidemiologist Gideon Meyerowitz-Katz, leaving him with 11 studies.
  4. He ran a t-test on the event counts of the remaining studies, finding a “debatable” benefit, though he did later admit results were stronger than that based on a correction I provided.
  5. He then explained that the prevalence of Strongyloides worms is correlated with how well ivermectin did in those studies, based on Dr. Avi Bitterman’s analysis.
  6. This doesn’t explain non-mortality associated results, especially viral load reductions and PCR negativity results, but a funnel plot analysis—also by Dr. Avi Bitterman—indicated there was substantial indication of asymmetry in the funnel.
  7. Scott interpreted that asymmetry as publication bias, and in effect attributes any improvement seen to an artifact of the file-drawer effect.
  8. Scott’s conclusion was that there is little if any signal left to explain once we take this principled approach through the evidence—considering all the reasons for concern—and as a result he considers it most likely that if ivermectin works at all, it would be only weakly so, and only in countries with high Strongyloides prevalence.

Here is my incredibly implausible thesis, that I never would have believed in a million years, had I not done the work to verify it myself:

Not just one, but each of the steps from 2 to 7 were made in error.

What’s more, when we correct the steps, the picture of ivermectin’s efficacy emerges much stronger than Scott represented.

Towards the end:

As we saw by exploring the logical backbone of Scott’s argument, it’s not that the chain of inference has a weak link, but more that each link in the chain is weak. Even if you’re not convinced by each of the arguments I make (and I do think they’re all valid), being convinced by one or two of these arguments makes the whole structure collapse. In brief:

  1. Scott’s starting point of the early treatment studies from ivmmeta is somewhat constrained, but given the number of studies, it should be sufficient to make the case.
  2. If we accept the starting point, we must note that Scott’s filtering of the studies is over-eagerly using methods such as the one by John Carlisle that are simply not able to support his definitive conclusions. Worse, some of his sources modify Carlisle’s methods in ways that compromise any usefulness they might have originally had.
  3. Even if we accept Scott’s filtering of studies though, throwing out even more studies based on trust in Gideon Meyerowitz-Katz without any opposing argument, is all but certain to shift the results in the direction of finding no effect.
  4. Even if we accept the final study selection, the analysis methodology is invalid.
  5. Even if it wasn’t, the Strongyloides co-infection correlation is not the best explanation for the effect we see.
  6. Even if it was, it can’t explain the viral load and PCR positivity results, but Scott offers us a funnel plot that he claims demonstrates publication bias. However, it should have been computed as random, not fixed effect. Also, if we use the only available test that is appropriate for such high heterogeneity studies, there is no asymmetry to speak of.
  7. Even if there was, funnel plot asymmetry doesn’t necessarily imply publication bias, especially in the presence of heterogeneity, so Scott’s interpretation is unjustified.
  8. When we look at the evidence, sliced and diced in different ways, as Scott did, we consistently see a signal of substantial efficacy. And even though the original meta-analysis from Scott started from can be criticized for pooling different endpoints, the viral load results do not suffer from such concerns, and still show a similar degree of efficacy.

Each of bullet points 2-7 is detailed in the section with the same number.

As I mentioned in my original response, Scott’s argument requires each of these logical steps to be correct. All of them have to work to explain away the signal. It’s not enough for a couple of them to be right, because there’s just too much signal to explain away.


While the above cover the logical flaws in Scott’s argument, before closing, I need to highlight what I see as moral flaws. In particular, I found his flippant dismissal of various researchers to be in contradiction to the values he claims to hold. I will only highlight some of the most egregious examples, because it is deeply meaningful to set the record straight on these. (Click on the study name for more in-depth analysis about what my exact issue with each accusation is):

  • Biber et al.—Accused of performing statistical manipulations of their results when, in fact, the change in question was extremely reasonable, had been done with the blessing of their IRB, and the randomization key of the trial had not been unsealed until after recruitment had ended.
  • Cadegiani et al.—The most substantial accusation Scott has on Cadegiani is that in a different paper than the one examined, there are signs of randomization failure. For this, he is dragged throughout the essay as a fraudster. While Scott has partially corrected the essay, it still tars him as a fraudster accused of “crimes against humanity.” If some terms should not be thrown around for effect, I put to you that this is one of them.
  • Babalola et al.—Lambasted for presenting impossible numbers despite the fact that Kyle Sheldrick had already reviewed and cleared the raw data of the study. A commenter on Scott’s post demonstrated very clearly how the numbers were not impossible at all, but instead a result of practices that pretty much all clinical trials follow.
  • Carvallo et al.—Accused of running a trial that the hosting hospital didn’t know anything about. As it turns out, Carvallo presented paperwork demonstrating the approval by the ethics board of that hospital, which Buzzfeed confirmed. The accusation is that a different hospital—from which healthcare professionals had been recruited—did not have any record of ethics approval for that trial, though the person who spoke to Buzzfeed admitted that it may not have been needed. After all, the exact same pattern is visible in the Okumus trial where four hospitals participated, but the IRB/ethics approval is from the main hospital only. The issue with Carvallo—that most recognize—is that he didn’t record full patient-level data but summaries. That could have been OK if he was upfront about it, but he instead made a number of questionable statements, that he was called out on. Given this history, it is sensible to disregard the trial. But this is very different from the accusations of fraud that Scott makes.
  • Elafy et al.—Accused of incompetence for failing to randomize their groups multiple times in Scott’s piece. The paper writes in six separate places that it is not reporting on a randomized trial, amongst them on a diagram that Scott included in his own essay. Hard to imagine how else they could have made it clear.
  • Ghauri et al.—Scott accuses the authors of having suspicious baseline differences but without actually running the Carlisle tests to substantiate his claims. Reviewing the same data, I am entirely unconvinced.
  • Borody et al.—“this is not how you control group, %$#^ you.” This is what Scott has to say to the man who invented the standard therapy for h. pylori, saving a million lives—by conservative estimates. To top things off, what was done in that paper was an entirely valid way to control group. Maybe they should have recruited more patients to make their data even more compelling—and they would have, had the Australian regulator not banned the use of ivermectin for COVID-19, even in the context of a trial.

To be extremely clear, I’m not saying that Scott should have necessarily kept one or more of these trial in his analysis, only that he failed to treat others as he would like them to treat him.

I only read his initial response in full, and thought it was very good. I haven't read his series, and haven't fully read his summary of it yet, but Alexandros seems to me like a serious and competent guy, and Scott also took him seriously and made some corrections due to his review, so it seems to me that he did great work here. And if reviews made before the reviews phase can be considered for prizes, and I think his deserves a prize.

Good and important, but long. I'd like to see a short summary in the book.

New to LessWrong?