Here are two cases involving absence of evidence:

  • You wait at the bus stop for a half hour without seeing a bus, even though this bus is supposed to come every 10 minutes. This is strong evidence against the hypothesis that the buses are running today (perhaps it's a holiday you forgot about?)

  • Someone is arrested and charged with a crime, and they do not confess. This is weak evidence against the hypothesis that they are the perpetrator.

In both cases, presence of the evidence (a bus, a confession) would have been strong evidence in favor of the hypothesis. They differ in the meaning of the absence of that evidence.

The difference between these cases was explained in Absence of Evidence Is Evidence of Absence:

Under the vast majority of real-life circumstances, a cause may not reliably produce signs of itself, but the absence of the cause is even less likely to produce the signs. The absence of an observation may be strong evidence of absence or very weak evidence of absence, depending on how likely the cause is to produce the observation. The absence of an observation that is only weakly permitted (even if the alternative hypothesis does not allow it at all) is very weak evidence of absence (though it is evidence nonetheless).

I posited that the bus comes frequently so that the bus line running as normal (the cause) would be very likely to result in a bus within thirty minutes (the observation). In contrast, the absence of a confession is unsurprising even if the suspect committed the crime. (Instead of the criminal example, I could have just modified the first example so you only wait one minute).

The point of this post is to show a formula for the magnitude of the update on absence of evidence. We'll start with the "update factor" from the odds form of Bayes' theorem (discussed in Arbital intro guide or advanced explanation, and by 3Blue1Brown). When you observe evidence E, the magnitude of the update in favor of H is

r = P(E|H) / P(E|¬H)

High r (for "ratio") means strong evidence. r=1 means no evidence. In both the examples above, r is high: a bus is much more likely given the bus line is running, and a confession is much more likely from a perpetrator.

The degree to which you expect the evidence, if the hypothesis is true, is:

p = P(E|H)

This should be high when you've waited long enough for the bus (conditional on the bus line running), but low for a confession from a suspect (even conditional on them being the perpetrator).

And since we're going use it later, let's note that

P(E|¬H) = p / r

Now, if we use this "update factor" to quantify strength of evidence, how much evidence is ¬E, for ¬H? How much evidence is a lack of confession, for innocence?

P(¬E|¬H) / P(¬E|H)

= (1 - P(E|¬H)) / (1 - P(E|H))

= (1-p/r) / (1-p)

The observation I want you to make is: when p=P(E|H) is small, both numerator and denominator are close to 1.

The reason I like introducing the symbols "p" and "r", instead of just sticking with P(E|H) and P(E|¬H), is it makes it clear to the eye that you can take a limit. Holding r constant, and taking the limit as p approaches 0, the limit is 1. Interpreting that in English gets a little wordy, but here it is:

Holding the strength of evidence constant, in the limit as the probability of the evidence given the hypothesis goes to zero, the update on absence of evidence approaches no update at all.

So while "absence of evidence is not evidence of absence" is not true in general, and from a Bayesian perspective is never precisely true, it is a limit that individual cases may approach.

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 1:50 AM

I really like this perspective! Great first post!