Bayesian justice

by gwern1 min read26th Jul 201124 comments

27

Personal Blog

"The mathematical mistakes that could be undermining justice"

They failed, though, to convince the jury of the value of the Bayesian approach, and Adams was convicted. He appealed twice unsuccessfully, with an appeal judge eventually ruling that the jury's job was "to evaluate evidence not by means of a formula... but by the joint application of their individual common sense."

But what if common sense runs counter to justice? For David Lucy, a mathematician at Lancaster University in the UK, the Adams judgment indicates a cultural tradition that needs changing. "In some cases, statistical analysis is the only way to evaluate evidence, because intuition can lead to outcomes based upon fallacies," he says.

Norman Fenton, a computer scientist at Queen Mary, University of London, who has worked for defence teams in criminal trials, has just come up with a possible solution. With his colleague Martin Neil, he has developed a system of step-by-step pictures and decision trees to help jurors grasp Bayesian reasoning (bit.ly/1c3tgj). Once a jury has been convinced that the method works, the duo argue, experts should be allowed to apply Bayes's theorem to the facts of the case as a kind of "black box" that calculates how the probability of innocence or guilt changes as each piece of evidence is presented. "You wouldn't question the steps of an electronic calculator, so why here?" Fenton asks.

It is a controversial suggestion. Taken to its logical conclusion, it might see the outcome of a trial balance on a single calculation. Working out Bayesian probabilities with DNA and blood matches is all very well, but quantifying incriminating factors such as appearance and behaviour is more difficult. "Different jurors will interpret different bits of evidence differently. It's not the job of a mathematician to do it for them," says Donnelly.

The linked paper is "Avoiding Probabilistic Reasoning Fallacies in Legal Practice using Bayesian Networks" by Norman Fenton and Martin Neil. The interesting parts, IMO, begin on page 9 where they argue for using the likelihood ratio as the key piece of information for evidence, and not simply raw probabilities; page 17, where a DNA example is worked out; and page 21-25 on the key piece of evidence in the Bellfield trial, no one claiming a lost possession (nearly worthless evidence)

Related reading: Inherited Improbabilities: Transferring the Burden of Proof, on Amanda Knox.

27

24 comments, sorted by Highlighting new comments since Today at 12:45 PM
New Comment

Straight using Bayes' theorem will result in overconfidence. Bias tends to correlate. If you guessed too high on one probability, it's likely that you did on another. In addition, the bias will multiply with each piece of evidence. I'd certainly use Bayes' theorem, but I'd try to correct for overconfidence at the end.

I would strongly encourage folks to adopt the view that we are always "using Bayes' theorem" when reasoning.

That is, instead of saying "Use Bayes' theorem, and then [after you're done using Bayes' theorem] correct for overconfidence", say "Update on the evidence of studies showing that overconfidence is common".

The distinction is important not for the particular result of the calculation, but for stamping out the notion that Bayes' theorem is a "special trick" that is "sometimes useful", rather than a mathematical model of inference itself.

I would strongly encourage folks to adopt the view that we are always "using Bayes' theorem" when reasoning.

This is simply false. As I'm fond of pointing out, often the best judgment you can come up with is produced by entirely opaque processes in your head, whose internals are inaccessible to you no matter how hard you try to introspect on them. Pretending that you can somehow get around this problem and reduce all your reasoning to clear-cut Bayesianism is sheer wishful thinking.

Moreover, even when you are applying exact probabilistic reasoning in evaluating evidence, the numbers you work with often have a common-sense justification that you cannot reduce to Bayesian reasoning in any practically useful way. Knowledge of probability theory will let you avoid errors such as the prosecutor's fallacy, but this leaves more fundamental underlying questions open. Are the experts who vouch for these forensic methods reliable or just quacks and pseudoscientists? Are the cops and forensic experts presenting real or doctored evidence, and are they telling the truth or perjuring themselves in cooperation with the prosecution? You can be all happy and proud that you've applied the Bayes theorem correctly and avoided the common fallacies, and still your conclusion can be completely remote from reality because the numbers you've fed into the formula are a product of quackery, forgery, or perjury -- and if you think you know a way to apply Bayesianism to detect these reliably, I would really like to hear it.

Given the context, I interpreted Komponisto's comment as saying that to the extent that we reason correctly we are using Bayes' theorem, not that we always reason correctly.

Even if the claim is worded like that, it implies (incorrectly) that correct reasoning should not involve steps based on opaque processes that we are unable to formulate explicitly in Bayesian terms. To take an example that's especially relevant in this context, assessing people's honesty, competence, and status is often largely a matter of intuitive judgment, whose internals are as opaque to your conscious introspection as the physics calculations that your brain performs when you're throwing a ball. If you examine rigorously the justification for the numbers you feed into the Bayes theorem, it will inevitably involve some such intuitive judgment that you can't justify in Bayesian terms. (You could do that if you had a way of reverse-engineering the relevant algorithms implemented by your brain, of course, but this is still impossible.)

Of course, you can define "reasoning" to refer only to those steps in reaching the conclusion that are performed by rigorous Bayesian inference, and use some other word for the rest. But then to avoid confusion, we should emphasize that reaching any reliable conclusion about the facts in a trial (or almost any other context) requires a whole lot of things other than just "reasoning."

Even if the claim is worded like that, it implies (incorrectly) that correct reasoning should not involve steps based on opaque processes that we are unable to formulate explicitly in Bayesian terms.

You misunderstand. There was no normative implication intended about explicit formulation. My claim is much weaker than you think (but also abstract enough that it may be difficult to understand how weak it is). I simply assert that Bayesian updating is a mathematical definition of what "inference" means, in the abstract. This does not say anything about the details of how humans process information, and nor does it say anything about how mathematically explicit we "should" be about our reasoning in order for it to be valid. You concede everything you need to in order to agree with me when you write:

You could [justify intuitive judgements in Bayesian terms] if you had a way of reverse-engineering the relevant algorithms implemented by your brain,

In fact, this actually concedes more than necessary -- because it could turn out that these algorithms are only approximately Bayesian, and my claim about Bayesianism as the ideal abstract standard would still hold (as indeed implied by the phrase "approximately Bayesian").

Of course, this does in my view have the implication that it is appropriate for people who understand Bayesian language to use it when discussing their beliefs, especially in the context of a disagreement or other situation where one person's doesn't understand the other's thought process. I suspect this is the real point of controversy here (cf. our previous arguments about using numerical probabilities).

Of course, this does in my view have the implication that it is appropriate for people who understand Bayesian language to use it when discussing their beliefs, especially in the context of a disagreement or other situation where one person's doesn't understand the other's thought process. I suspect this is the real point of controversy here (cf. our previous arguments about using numerical probabilities).

Yes, the reason why I often bring up this point is the danger of spurious exactitude in situations like these. Clearly, if you are able to discuss the situation in Bayesian language while being well aware of the non-Bayesian loose ends involved, that's great. The problem is that I often observe the tendency to pretend that these loose ends don't exist. Moreover, the parts of reasoning that are opaque to introspection are typically the most problematic ones, and in most cases, their problems can't be ameliorated by any formalism, but only on a messy case-by-case heuristic basis. The emphasis on Bayesian formalism detracts from these crucial problems.

If we actually knew how to reason correctly, we could program computers to do it. We reason correctly, better than computers, without understanding how we do it.

The specific example I gave is more due to treating random variables as if they're independant. For example, you're as likely to be off either way on A, and you're as likely to be off either way on B, so for each of those, you in fact gave the correct probability, but you're more likely to be off the same way on both than the opposite ways, so you have to correct more when you use them together.

But yes. Bayes' theorem is always the answer.

[-][anonymous]10y 0

I'm confused. If all that is true, how do you know which direction to correct in?

[This comment is no longer endorsed by its author]Reply

I have only just come across this discussion (the original article referred to my work). The article

Fenton, N.E. and Neil, M. (2011), 'Avoiding Legal Fallacies in Practice Using Bayesian Networks'

was published in the Australian Journal of Legal Philosophy 36, 114-151, 2011 (Journal ISSN 1440-4982) A pre-publication pdf can be found here:

https://www.eecs.qmul.ac.uk/~norman/papers/fenton_neil_prob_fallacies_June2011web.pdf

The point about the use of the likelihood ratio (to enable us to evaluate the probative value of evidence without having to propose subjective prior probabilities) is something that I am increasingly having grave doubts about. This idea has been oversold by the forensic statistics community. I am currently writing a paper which will show that, in practice, the likelihood ratio as a measure of evidence value can be fundamentally wrong. The example I focus on is the Barry George case. Here is a summary of what the article says:

One way to determine the probative value of any piece of evidence E (such as some forensic match of an item found at the crime scene to an item belonging to the defendant) is to use the likelihood ratio (LR). This is the ratio of two probabilities, namely the probability of E given the prosecution hypothesis (which might be ‘item at crime scene belongs to defendant’) divided by the probability of E given the alternative defence hypothesis (which might be ‘item at crime scene does not belong to defendant’). By Bayes’ theorem, if the LR is greater than 1 then the evidence supports the prosecution hypothesis and if it is less than 1 it supports the defence hypothesis. If the LR is 1, i.e. the probabilities are equal, then the evidence is considered to be ‘neutral’ – it favours neither hypothesis over the other and so offers no probative value. The simple relationship between the LR and the notion of ‘probative value of evidence’ actually only works when the two alternative hypotheses are mutually exclusive and exhaustive (i.e. each is the negation of the other). This is often not clearly stated by proponents of the LR leading to widespread confusion about the notion of value of evidence. In many realistic situations it is extremely difficult to determine suitable hypotheses that are mutually exclusive. Often an LR analysis is performed against hypotheses that are assumed to be mutually exclusive but which are not. In such cases the LR has a much more complex impact on the probative value of evidence than assumed. We show (using Bayes’ theorem and Bayesian networks applied to simple, non-contentious examples) that for sensible alternative hypotheses – which are not exactly mutually exclusive – it is possible to have evidence with an LR of 1 that still has significant probative value. It is also possible to have evidence whose LR strongly favours one hypothesis, but whose probative value strongly favours the alternative hypothesis. We consider the ramifications on the case of Barry George. The successful appeal against his conviction for the murder of Jill Dando was based primarily on the argument that the firearm discharge residue (FDR) evidence that was assumed to support the prosecution hypothesis at the original trial actually had an LR equal to 1 and hence was ‘neutral’. However, our review of the appeal transcript shows numerous inconsistencies and poorly defined hypotheses and evidence such that it is not clear that the relevant elicited probabilities could have been based on mutually exclusive hypotheses. Hence, contrary to the Appeal conclusion, the probative value of the FDR evidence may not have been neutral

I'd be curious to know whether advocacy of professional juries was common on LW.

If so (or even if not), what training do people think should be required for a professional juror?

"Professional juries" is essentially an oxymoron. The basic idea of the jury system is that people who judge your guilt are your fellow citizens, not government functionaries. (Whether this is good or bad by whatever metric is beside the point.)

Besides, in common law jurisdictions, you can typically waive your right to be tried by a jury and have a bench trial, where the judge is responsible for findings of fact as well as law. So basically, you already have the option to be tried by a "professional juror."

Not prepared to advocate professional juries, but off the top of my head, I'd have a professional juror train in law, statistics, demographics, forensic science, and cognitive biases.

Even better would be subsidized prediction markets. That way people train themselves as necessary to get results.

There'd be some kinks to work out, since you don't always get the answer handed to you after all the bets are in, but I think this would be a solvable problem. You could find ways of having occasional payouts based on cases where there is slam dunk evidence which is withheld and the professional bettors don't know if their decision will affect the defendant's sentence or just their pay. You could also try rewarding consistency between separate prediction markets in the hope that "our actual best guess" is the most salient Schelling point.

I'd be curious to know whether advocacy of professional juries was common on LW.

A great idea, except for the corruption magnet. (But probably no worse than judges as they stand.)

There's more to justice than empiricism. You have to use decision theory.

Decision theory is for policymakers, not jurors. The latter should be concerned exclusively with epistemic calculation.

(At least that's how the system is supposed to work.)

Jurors sometimes have to rule based on how the law ought to be, rather than how it is.

"Have to"? It's not even universally agreed that jury nullification is permissible, let alone obligatory.

What about "jury requests"?

Apply Bayesian methods to trial evidence in criminal trials would make explicit a conflict that currently goes unstated. Unlike the standard "a preponderance of the evidence," for which there is both folk and professional consensus that a probability of 0.51 is required, "beyond a reasonable doubt" does not, to my knowledge, have an associated mathematical probability. At a folk level people in the US claim to believe "It is better to let ten guilty men go free than to send one innocent man to prison" but there is ample circumstantial evidence that this not always a true preference, and I highly doubt there would be anything remotely approaching consensus for a standard of 0.9.

Trying establish a numerical standard would devolve into mindkilling politics fairly quickly, I suspect. It might break along party lines, or it might break along lines of "people more likely to know someone who was the victim of a previously-acquitted criminal" vs. "people more likely to know someone who was wrongfully prosecuted", but either way, it would just be something new to argue about.

[-][anonymous]6y 0

I like how Fenton and Neil put a bar chart in a decision tree.

See also Jaynes's section on "Bayesian Jurisprudence" (which at least initially bears a remarkable resemblance to my http://www.gwern.net/Death%20Note%20Anonymity essay, although I had not even read up to that part of PT:tLoS yet).