Bayesian analysis under threat in British courts

by whpearson1 min read3rd Oct 201124 comments

15

Personal Blog

This is an interesting article talking about the use of bayes in british courts and efforts to improve how statistics are used in court cases. Probably worth keeping an eye on. It might expose more people to bayes if it becomes common and thus portrayed in TV dramas.

24 comments, sorted by Highlighting new comments since Today at 10:48 PM
New Comment

I have a friend who works in legal administration. A couple of months ago he asked me to explain Bayesian statistics. I made a little internal "woop!" noise, since I'd been waiting for years for someone to ask that question.

His reason for asking was because he'd been working on disclosure correspondence between prosecution and defence, and some of it basically said "here's a Bayesian analysis of this particular bit of evidence, which I'm including for completeness, but it's going to be removed from the documentation before being handed to the court, c.f. precedent case".

His take on the practise was that it amounted to a Mexican stand-off with statistics, whereby if one side brought in an authority on statistics, the other side would follow suit, and it would become incomprehensible to the jury and make the whole trial intractable.

Given the number of intelligent, numerate people I know who nonetheless fail to get statistics, I have at least some sympathy with the notion of not trying to sell a case to a jury on a Bayesian ticket.

It is not uncommon for one side in a legal dispute to have no objective other than to stall for time. It would seem in that case they should just go ahead and "go nuclear".

Don't read just the article, go see also the actual judgment (HT ciphergoth; pdf). I won't say "read it" because it's the kind of thing that may not be worth reading entire, but at least skim it to get a feel for what's actually being argued.

My sense of it is that the judge is saying "stats should not be allowed when the numbers on which they're based are 'merely' quantifying someone's uncertainty, rather than be anointed by scientists". Which is still silly, as it ignores that "scientific" stats do nothing other than quantify uncertainty; but it doesn't say "Ban Bayes".

Thanks for the link.

I think paragraphs 80 to 86 are the key paragraphs.

They're declaring that using a formula isn't allowed in cases where the numbers plugged into the formula are themselves uncertain.

But in this case, where there was uncertainty in the underlying data the expert tried to take a conservative figure. The judges don't seem to think that helps, but they don't say why. In particular, para 108 iv) seems rather wrongheaded for this reason.

(It looks like one of the main reasons they overturned the original judgement was that the arguments in court ended up leaving the jury hearing less conservative estimates of the underlying figures than the ones the expert used (paras 103 and 108). That seems like a poor advertisement for the practice of keeping explicit calculations away from the jury.)

[-][anonymous]9y 9

I am reminded of this paper written by philosopher Neven Sesardic regarding the Sally Clark case: www.ln.edu.hk/philoso/staff/sesardic/getfile.php?file=SIDS.pdf

I quote from the Guardian article:

When Sally Clark was convicted in 1999 of smothering her two children, jurors and judges bought into the claim that the odds of siblings dying by cot death was too unlikely for her to be innocent. In fact, it was statistically more rare for a mother to kill both her children. Clark was finally freed in 2003.

In the original trial, the prosecution used probabilistic reasoning as a minor element of their case. This prompted several prominent statisticians to intervene, criticising the prosecution's statistical methods using Bayes's Theorem. I have also seen comments on lesswrong referring to this intervention as a positive example of Bayesian reasoning.

The kicker is, Sesardic conducts his own analysis using Bayes's Theorem and finds that the professional statisticians' analyses were seriously flawed and in fact according to his calculations, Clark was probably guilty after all. I find the paper very convincing and I expect that most here would find it worth reading.

Clearly, the use of Bayes's Theorem by trained statisticians is insufficient guarantee of rational verdicts. After all it takes a small statistical error to render the posterior probability estimate wildly inaccurate (Sesardic estimates probability of guilt >0.9, whereas one statistician estimated 0.04). Clearly Bayes's Theorem is not immune to the GIGO principle, and its use may in fact decrease the quality of an individual's judgements since his handling of statistics may lack the garbage rejection properties of his brain's native pattern-recognition faculties. In light of this I think it is perfectly reasonable to question whether a general increase in the use of statistical methods in court would be likely to improve the accuracy of verdicts.

Having said that, if an individual is perspicacious (as for example Neven Sesardic and Eliezer Yudkowsky are) then the use of Bayes's Theorem to evaluate criminal cases is likely to improve accuracy. The problem is that most people are not perspicacious, nor is a lecture on the use of Bayes's Theorem likely to make them so (as the Sally Clark example demonstrates quite clearly).

I have some sympathy for the judge here, even as I wince. If in real life juries don't understand Bayes and the actual effect of its use in court is to be grossly misused or make wild guesses sound formal and authoritative, then in the end you can't have Bayes's Theorem used formally in courts.

Is it bad that that sounds to me more like an argument against undereducated juries than against the use of Bayes in court?

That isn't what was going on in this case. The expert wasn't presenting statistics to the jury (apparently that's already forbidden).

The good news from this case (well, it's news to me) is that the UK forensic science service both understands the statistics and has sensible written procedures for using them, which some of the examiners follow. But they then have to turn the likelihood ratio into a rather unhelpful form of words like 'moderately strong scientific support' (not to be confused with 'moderate scientific support', which is weaker), because bringing the likelihood ratios into court is forbidden.

(Bayes' Theorem itself doesn't really come into this case.)

What are the options? Frequentist statistics, Bayesian statistics, both, or neither?

How many jurors understand statistical significance is surprise assuming one is wrong?

How many scientists understand the grant renewal case, or differences in differences?

No statistics at all.

Or to be a bit more precise: If you have good enough data to do anything useful with frequentist methods then you may use bayesian reasoning as well. What the judge forbade is using bayes to sound scientific when you can't back up your priors.

Priors don't come into it. The expert was presenting likelihood ratios directly (though in an obscure form of words).

+1 for biting the bullet. But...

What the judge forbade is using bayes to sound scientific when you can't back up your priors.

The advantage of Bayesianism is that it is open about the relationship between prior beliefs, evidence, and updated beliefs.

Where there is enough data to use frequentist methods, that doesn't imply one can produce relevant evidence for a case using those methods. I interpret you as agreeing with this based on your response, but feel free to clarify.

Jurors are not going to be able to tell to what extent frequentist methods produce valid evidence or not. It seems to me that if it is a good idea for judges to forbid using Bayesian reasoning because they can see where priors are arbitrary and are worried the jurors can't, it is an even better idea for judges to forbid frequentist reasoning that doesn't have a parallel permitted Bayesian process.

The two methods have similar relevance but differing opacity, and the clearer method is being punished because judges can understand its shortcomings. This leaves juries to deal with only evidence that the judge wasn't able to understand.

Where there is enough data to use frequentist methods, that doesn't imply one can produce relevant evidence for a case using those methods. I interpret you as agreeing with this based on your response, but feel free to clarify.

Jurors are not going to be able to tell to what extent frequentist methods produce valid evidence or not. It seems to me that if it is a good idea for judges to forbid using Bayesian reasoning because they can see where priors are arbitrary and are worried the jurors can't, it is an even better idea for judges to forbid frequentist reasoning that doesn't have a parallel permitted Bayesian process.

I agree with all of this. What I was trying to say is precisely that this isn't about Bayes vs Fischer or whoever. Perhaps what I should have said to make that clearer is that the judge in this case did not (just) throw out Bayes, he threw out statistical inference.

Statistical methods are used by courts all the time. Whether frequentist or Bayesian or some hybrid they aren't portrayed in TV dramas about courts. The audiences don't want to hear about the statistics and the writers don't understand statistics.These are the sorts of where you get scenes like two people typing on the same keyboard at once.

Also, the court ruling in question seems to be against Bayesian methods as far as I can tell from that article. But it may just mean that people are going to need to be much more careful about stats in the British court system.

This isn't quite "a judge has ruled that [Bayes' theorem] can no longer be used", but I don't think it's good.

The judges decided that using a formula to calculate likelihood isn't allowed in cases where the numbers plugged into the formula are themselves uncertain (paragraph 86), and using conservative figures apparently doesn't help.

Paragraph 90 says that it's already established law that Bayes' theorem and likelihood ratios "should not be used", but I think it means "shouldn't be talked about in front of the jury".

Paragraph 91 says explicitly that the court wasn't deciding how (or whether) Bayes' Theorem and likelihood ratios can be used in cases where the numbers plugged into the formula aren't themselves very uncertain.

In paragraph 95, the judges decide that (when matching footprints) it's OK for an expert to stare at the data, come up with a feeling about the strength of the evidence, and express that in words, while it's not OK for the same expert to do a pencil-and-paper calculation and present the result in similar words.

I think part of the point is that when the expert is cross-examined, the jury will react differently if she says "this evidence is strong because I've got lots of experience and it feels strong to me", rather than "this evidence is strong because I looked up all the frequencies and did the appropriate calculation".

I do get the impression that the approach of multiplying likelihood ratios is being treated as a controversial scientific process (as if it were, say, a chemical process that purported to detect blood), and one which is already frowned upon. Eg paras 46, 108 iii).

[-][anonymous]9y 0

Re-reading that Sesardic paper set me to thinking about further issues relating to use of statistical evidence in criminal cases. In this post Yudkowsky points out that whilst all legal evidence should ideally be rational evidence, not all rational evidence is suitable as legal evidence. This is because certain rational evidence sources would become systematically corrupted and cease to function as such, if they were liable to be used as legal evidence (he uses as an example the police commissioner's confidential disclosure to a friend of the identity of the city's crime boss).

In Sesardic's paper, he calculates (using the same statistical sources chosen by the statisticians that he is criticising) that the prior probability of a mother such as Sally Clark, who has had two infants die in succession for no apparent medical reason, being guilty of double murder is 25 times greater than the prior probability of her children having both died innocently through "SIDS" (ignoring the probability of one infant having died of SIDS and the other having been murdered, which is very tiny and superfluous to the analysis). This is before he gets to the Bayesian effect of the evidence from the specific case, which turns out to increase the likelihood of the double murder hypothesis at the expense of the double SIDS hypothesis.

Clearly (if Sesardic is convincing) Sally Clark should have been found guilty. But what if the evidence from the alleged crime scenes had been indecisive, i.e. the likelihood ratio were ~1? In this case, Bayes’s Theorem tells us that Clark is very probably guilty, but this is essentially a judgement based on statistics alone. Would it be proper for courts to convict based on this kind of result from an application of Bayes’s Theorem, assuming that said analysis had been subjected to rigorous scrutiny and appeared highly convincing to the jury? My gut feeling is no - that this is of a similar class to Yudkowsky’s rational, but not suitable legal evidence. But I’d have to think some more before defending that statement.

Would it be proper for courts to convict based on this kind of result from an application of Bayes’s Theorem, assuming that said analysis had been subjected to rigorous scrutiny and appeared highly convincing to the jury?

It would be no different than other cases of conviction or lack thereof under similar odds. Whether people who probably guilty but have an X% chance of being innocent go free or not should not depend on how jurors concluded X.

[-][anonymous]9y 0

But it seems quite bizarre that somebody might be convicted purely on a statistical basis, i.e. based on a favoured Bayesian prior.

And what about politically controversial statistics and priors? Is there really any particular reason why a Bayesian shouldn't have a significantly higher prior probability that members of certain ethnic or religious groups commit certain crimes (whatever the reasons for that may be), based on government statistics? And then convict them at a relatively high rate based on this (assuming convictions using Bayesian priors don't contribute to future statistics used in Bayesian priors - to prevent double-counting of evidence)? Oops!

The judge in the Sally Clark case also stated his belief (in other words) that convinction should not be based on priors alone, but that there should be compelling evidence specific to the case as well.

Here is a paper discussing the problem. I don't know if you can access that.

It doesn't seem to me to be a problem that can be resolved easily and simply. For example in a case of terrorism, we may prefer a likelihood of guilt (derived in whatever manner) to be sufficient cause to convict. And if a woman had, say, 5 children die ostensibly of "SIDS", then even if there was no specific evidence to suggest that it was murder rather than SIDS, the Bayesian likelihood of guilt would be so very high that it would seem to override concerns about convicting based on a prior.

It doesn't seem to me that criminal cases are merely a matter of convicting based on likelihood of guilt, natural as that may sound. There are other human values to consider.

But it seems quite bizarre that somebody might be convicted...on a favoured Bayesian prior.

How would you describe how an ideal jury should perform its task? Not how real ones work, but an ideal one.

Is there really any particular reason why a Bayesian shouldn't have a significantly higher prior probability that members of certain ethnic or religious groups commit certain crimes (whatever the reasons for that may be), based on government statistics?

  1. Conviction should not be based merely on probability of guilt; consider for example where society has for good reason excluded rational evidence from being legal evidence, such as with the 5th Amendment in the United States.

  2. The relevant comparison will be between the likelihood of arrest and trial for an innocent person of group X and a guilty person of group X, not between the rates of likelihood of conviction for random individuals of group X and Y, respectively.

If the defendant is a woman, it is not relevant that "women commit (or are convicted of) crimes less often then men". What is relevant is how likely a female defendant is to be guilty. This may be less or more than a male defendant, but I don't consciously have a different prior for defendants based on gender.

For example in a case of terrorism, we may prefer a likelihood of guilt (derived in whatever manner) to be sufficient cause to convict.

The likelihood of guilt we convict at should differ among crimes, just as punishment differs among crimes. This is good discrimination, discriminating between importantly different cases on the basis of important differences among them.

Convicting people differently based on the type of evidence that gives the same probability of their having committed the crime is generally baseless discrimination. If two people Al and Bob each may have committed the crime of public urination, and A did it with probability of X% considering the legal evidence, and B did it with probability of Y% considering the legal evidence, and X>Y, I don't know if one, both, or neither should be convicted. But I do know that if Al is not convicted, then Bob shouldn't be either.

There are a few exceptions for which good public policy depends on type of evidence, generally involving excluding it entirely for public policy reasons. For example, societies restrict how police may gather evidence and then restrict what they may do with inappropriately gathered evidence to make police comply with the rules.

[-][anonymous]9y 0

The relevant comparison will be between the likelihood of arrest and trial for an innocent person of group X and a guilty person of group X, not between the rates of likelihood of conviction for random individuals of group X and Y, respectively.

In the Sally Clark case, prior probabilities are derived from statistics relating to the incidence of SIDS in the general population, in comparison to infant murder. Let us imagine that there were different statistics for group X mothers/babies and group Y mothers/babies. It might then be the case that the incidence of SIDS was lower, and murder higher, in group X than group Y. Therefore, a group Y mother with two dead infants and indecisive evidence from the alleged crime scenes might perhaps be acquitted (e.g. probability of guilt 40%) whereas the group X mother in exactly the same situation is convicted (e.g. probability of guilt 90%).

What we are questioning in this case is whether mothers are more likely to murder their babies twice, or have their babies die innocently twice for no apparent medical reason. But we are using statistics recording the incidence of one infant dying from “SIDS” or one infant being murdered. So we are legitimately using statistics about the likelihood of a random mother in either group either murdering her infant or having it die from SIDS, in a case of double infant murder or double SIDS.

This is one example in which what you implied is untrue; we have good reason to use statistics in our prior relating to the general population rather than people in court accused of this particular offence (since mothers with one SIDS infant death are not necessarily suspected of murder and arrested) and one example is enough to prove the point that use of Bayesian priors in court may have the unfortunate consequence of allowing differential convinction rates purely based on sex, ethicity or social group membership.

But in any case, even if the relevant comparison is “the likelihood of arrest and trial for an innocent person of group X and a guilty person of group X”, there is no particular reason why this should be the same across all possible groups. Therefore this (social and political) problem is likely to emerge in all sorts of criminal cases, if Bayesian priors were to become more widely used in court.

How would you describe how an ideal jury should perform its task?

That's far too broad a question to expect someone to answer in a comment thread. All I'm saying is that, even setting aside the issue of ethicity/sex discrimination, the idea of convicting someone primarily on a statistical basis makes me uncomfortable and does not resonate with my values of fairness. I also believe that most other people would feel the same way (for example the judge in the Sally Clark case evidently agrees).

Since we try and punish criminals purely for our own reasons - not because the God of criminal cases wants us to - there's no reason why we have to convict based on totally unbiased Bayesian calculations of guilt. If we feel that this conflicts with our other values (apart from the desire to see guilty people punished, and prevent future crimes by them), then perhaps there should be other considerations informing verdicts in criminal trials. And there might well be pragmatic reasons not to do so in any case, since the way in which the criminal justice system works in a country is not causally isolated from the behaviour of its citizens. If people think that the courts are evil, this probably isn't going to improve any man's quality of life even if he likes the idea of convicting based primarily on Bayesian priors.

I don't think there's an easy answer to this conundrum - but I'm arguing that it is a conundrum that cannot be dismissed with a wave of the hand.

Invented by an 18th-century English mathematician, Thomas Bayes,

"Invented"?

[-][anonymous]9y 9

It's common to say mathematicians "invented" certain techniques, especially when the result is constructive (in the sense that an explicit thing is built, not in the technical sense).

[-][anonymous]9y -2

Nobody seems to be able to decide if mathematics was discovered or invented...

[This comment is no longer endorsed by its author]Reply