French long COVID study: Belief vs Infection

by Bucky7 min read23rd Nov 20217 comments


Personal Blog

Thanks to JustisMills and Ruby for reviewing this post. Any errors are my own.

TLDR: The French long COVID study which suggests that belief in having had COVID is correlated with long COVID symptoms but that actually having had COVID is not correlated with long COVID symptoms used the wrong statistical tool to obtain this result.

In reality, the study data show that long COVID symptoms are correlated with having had COVID and agree with Scott’s conclusions in Long COVID: Much more than you wanted to know.


The authors suggest that nearly all long COVID symptoms might not be caused by SARS-CoV-2 (except for those associated with anosmia). I believe that this is true in some cases but not remotely close to the extent suggested by the paper.

Study Design 

Roughly speaking the experimental setup was:

  • Send out a whole load of serology tests to ~36,000 people in May-Nov 2020 (Serology tests are antibody tests which are intended to show if you have ever had COVID).
  • Perform the tests (~27,000 received back), then give participants their results.
  • Send out a questionnaire about whether participants think they’ve had COVID and what persistent symptoms they’ve had (Dec 2020-Jan 2021).
  • Exclude some people for reasons.
    • e.g. Participants who thought that they had had COVID after they did the serology test were excluded.
  • Run some logistic regressions on different symptoms vs belief in having had COVID and/or serology results.

You may have spotted the first problem. We’re trying to test whether people’s belief in whether they’ve had COVID or their actually having had COVID is a better predictor of long COVID symptoms but we’ve given participants their serology results before we ask them if they think they’ve had COVID.

You’d think that this would ruin the results – belief in having had COVID should be extremely well correlated with having a positive serology result.

Fortunately (?!) this doesn’t seem to be the case. Of everyone who had a positive serology results, only 41.5% replied that they thought they’d had COVID. Of everyone who thought they’d had COVID, 50.4% had had a negative serology result.

I’m super confused by this but I’ll take this at face value for the moment and move on to the analysis.

Combined effects logistic regression with correlated predictors

The main reported result comes from model 3 of the study's analysis. This is the combined effects logistic regression model which uses 2 predictors:

  • Belief in having had COVID.
  • Serology results.

To predict:

  • Presence of persistent symptoms (18 different symptoms).

The result of this model was that a lot of symptoms (16/18) were predicted well by belief in having had COVID but that only anosmia was predicted by serology results.

This seems pretty damning of long COVID symptoms being caused by SARS-CoV-2, at least until we consider the correlation between the 2 predictive properties.

Consider the following example with 100 participants:

  • 89 are negative for belief and serology.
    • None have symptom A.
  • 10 are positive for belief and serology.
    • 9 have symptom A.
    • 1 does not have symptom A.
  • 1 is positive for belief and but negative for serology.
    • They have symptom A.

Running the equivalent of model 3 from the study on these data will show that belief in having had COVID is a positive predictor of symptom A but that a positive serology result is a negative predictor of symptom A.

At the same time, 90% of people who had COVID have symptom A compared to 1.1% of people who didn't have COVID!

This is kinda tricky to explain but bear with me.

  • Taking each predictor separately, belief is a stronger predictor of symptom A than serology.
    • This is due to the last participant mentioned (the only participant whose belief and serology don't match). For them, positive belief predicts having symptom A but negative serology predicts symptom A.
  • The model notices this difference in predictive power and makes belief a strong positive predictor.
  • It then looks at any variation which isn't explained by belief but that can be explained by serology.
    • Consider the last 11 people on the list (all the positive for belief participants).
      • 100% of people who were negative for serology had symptom A.
      • 90% of people who were positive for serology had symptom A.
    • So, given that someone is positive for belief, being positive for serology actually decreases the probability of having symptom A.
  • In reality the 2 predictors are optimised concurrently using gradient descent but the result is the same.

Probably people who are familiar with statistics are cringing slightly at that explanation but I hope it gives an intuitive idea of what is happening. Essentially:

  • All of the examples where positive serology makes you more likely to have symptom A are better explained (according to the model) by being positive for belief.
  • After adjusting for belief, being serology positive makes you slightly less likely to experience symptom A.

Of course this example is me just making up numbers to show how counter-intuitive results can be from this kind of model.

However, hopefully it illustrates the problems you can have when running a combined effects logistic regression with correlated predictors. This might not be a problem (or even be a feature) in some cases but when one of your predictors (having COVID) often causes the other (believing that you had COVID) then you have to think more carefully about your model.


Is there a simple way to assess whether COVID causes the symptoms in the study? Yes, just run the logistic regression with serology results as the only predictor. Fortunately for us the study includes this model – model 2.

Model 2 results show that the likelihoods of experiencing the following persistent symptoms are increased by having had COVID (odds ratio / percentage point increase vs serology negative):

  • Fatigue (2.59 / 5.0%)
  • Anosmia (15.69 / 4.3%)
  • Poor attention/concentration (2.10 / 2.8%)
  • Breathing difficulties (3.60 / 2.3%)
  • Chest pain (3.70 / 1.4%)
  • Palpitations (2.61 / 1.2%)
  • Headache (1.69 / 0.9%)
  • Dizziness (2.37 / 0.6%)
  • Cough (2.22 / 0.6%)
  • Other symptoms (1.91 / 1.3%)

If we add all the percentage point increases (i.e. how many more percentage points serology positive participants experienced persistent symptoms vs serology negative participants - data from table 2) then we get 20.3%. So having COVID on average gives you ~0.2 persistent symptoms vs not having COVID, with presumably some people having more than one symptom.

This is roughly in line with Scott’s conclusions in Long COVID: Much more than you wanted to know. The specific symptoms experienced are also in line with that post, so if that post reflects your current understanding of long COVID then I wouldn’t update much based on this study except to add some more confidence to a couple of the points Scott makes:

2. The prevalence of Long COVID after a mild non-hospital-level case is probably somewhere around 20%, but some of this is pretty mild.

3. The most common symptoms are breathing problems, issues with taste/smell, and fatigue + other cognitive problems.

Serology vs Belief

Can we say anything about how much effect belief in having had COVID has on Long COVID compared to actually having had COVID?

I think it’s difficult based on this study, because participants knew their serology results before stating their belief and I really have no idea how this affected the results. I’ll keep pretending that this isn’t an issue for the moment.

We can compare model 2 (serology) results to model 1 (belief in having had COVID) along with values from table 2. The percentage points increases from belief are on average 2.17x (range 1.55-2.92) higher than the equivalents for serology (for the symptoms which are significant for serology). So if the belief value represents the full population who report symptoms then actually having had COVID accounts for 46% of those. If we include the other symptoms which aren't significant for serology then this number will get lower.

At face value this suggests that just over half of the people with long COVID symptoms who think that they had COVID are wrong. This is important but not the same as "A serology test result positive for SARS-COV-2 was positively associated only with persistent anosmia" as is reported in the study.

If we factor in the obvious problems with the experimental setup, then it's hard to know how much credence to give the study's data on this topic.


6 comments, sorted by Highlighting new comments since Today at 11:52 PM
New Comment

It seems (on the basis of what you say here; I haven't looked at the actual study) as if everything is consistent with the following situation:

  • "Long COVID" symptoms other than anosmia/parosmia are caused by believing you have had COVID-19.
  • Actually having COVID-19 makes you more likely to believe you have had COVID-19.
  • This is how it comes about that "having COVID on average gives you ~0.2 persistent symptoms vs not having COVID".

Does the study give detailed enough numbers to distinguish this scenario from one where the disease causes the symptoms by "non-psychological" mechanisms?

Thats a fair point. I don’t think the data does distinguish between the two so maybe I’ve overstated the case here.

I think it’s important to distinguish between “is consistent with” and “implies that”. I think the belief hypothesis should be given a much lower prior than just Covid causing long Covid symptoms plus some additional cases for belief on top of that.

I would expect that the low probability of reporting COVID given that you have a positive serology test is due to the fact that many COVID cases are asymptomatic. If I had no symptoms of COVID, but someone told me I tested positive for COVID one time, would I consider myself to have had COVID? I probably would, but I expect most people wouldn't since "had COVID" is an experience centered on the experience of disease for most people (i.e. coughing and feeling unwell), not centered on the presence or absence of a virus in your body. The fact that half of the people who have a positive test result don't think they have had COVID approximately matches my expectation about the rate of asymptomatic infection.

It actually lines up with the official terminology: The "D" in "COVID-19" stands for disease. Not all infections cause disease.

Is there a simple way to assess whether COVID causes the symptoms in the study? Yes, just run the logistic regression with serology results as the only predictor. Fortunately for us the study includes this model – model 2.

This makes the assumption that people are equally likely to get infected with COVID regardless of health. What evidence is there for this assumption?

Yes, this is a good point, I suspect most long COVID studies probably have the same flaw