Application of: How Much Evidence Does It Take?

(trigger warning: some description of domestic violence)

Summary: I discuss the strengths and weaknesses of one way that the American legal system tries to assess and cope with the unreliability of certain kinds of evidence. After explaining the relevant rules with references to a few recent famous cases and a non-notable case that I'm working on now, I briefly consider whether this part of the evidence code is above or below the sanity waterline, and suggest an incremental improvement.

Recently, I got to the point in my legal career where people are trusting me to write evidentiary briefs, i.e., to argue in front of a judge about what kinds of evidence are reliable enough to be safely presented to a jury. There is an odd division of epistemological labor in the American court system: judges are thought [page 90] to be better than juries at resisting passionate or manipulative oratory, and juries are thought to be better than judges at resisting bribery and (pre-existing) personal hatred. As a result, potentially inflammatory or unreliable evidence is presented first to a judge, who (much like one of Eliezer's Confessors) is supposed to sift the exhibit to see if normal people can handle it without losing their tenuous grip on sanity. If and only if the evidence seems safe for ordinary human consumption, the judge will allow the lawyers to argue about that evidence in front of the jury. Otherwise, the evidence sits in a cardboard box in an unheated warehouse, safely away from the eyes of the jury, until it's time for an appeal.

The Hearsay Rule

By way of a concrete example, one famous recent case featured a recorded 911 call made by a domestic violence victim to the emergency phone operator. The operator asked questions about the location and identity of the person who was accused of beating the caller. The caller answered the questions on tape, explicitly identifying her abuser as Mr. Adrian Martell Davis, and the answers were used first to find and arrest the suspect, and ultimately to convict him. The victim was apparently too intimidated to testify in open court, and so her recorded statement as to the name of her abuser was absolutely necessary to support a conviction -- no recording, no conviction. Under the 400-year-old hearsay rule, recorded testimony typically is not allowed to be presented to a jury -- courts are concerned that the person giving the recorded statement might be pressured by the police in ways that wouldn't show up on tape, and that allowing a witness to testify without showing up in court unfairly deprives the defendant of a chance to (a) cross-examine the witness, and (b) have the jury see any facial tics, body language, etc. that undercut the witness's credibility. In the 911 case, though, the Court faced a straight choice between finding an exception to the hearsay rule and letting an apparent abuser go free.

In making this choice, the US Supreme Court managed to ignore a variety of emotionally salient but epistemologically irrelevant distractions, such as the seriousness of the crime, the relative helplessness of the victim, and the respectability of the 911 operator. Instead, the Court focused on the purpose for which the 911 statements were obtained. If the statements were obtained to help gather information needed to safely resolve an ongoing emergency, they could be used at trial. If the statements, however, were obtained to gather information about a past event, they could *not* be used at trial.

The theory supporting this distinction seems to have been that the right to cross-examine and the right to have the jury see body language are fungible elements of a more general reliability test. A stranger's assertion, without more, could be true or could be false. It doesn't count as very much evidence. To turn an assertion into enough evidence to convict someone beyond a reasonable doubt, you need to show that the assertion comes with "indicia of reliability." Two of these indicia are cross-examination and body language -- if a story checks out despite a vigorous unfriendly interview and the peer pressure of having to tell the story while physically in the room with other people from your community, then that's pretty good evidence. But you might have reasons to believe a story even if you don't get cross-examination or body language. In the case of the 911 call, one might think that the caller had a strong motive to tell the truth, because if she didn't, then the police would go looking for the wrong guy, and her abuser would come find her and continue hurting her. Similarly, one might think that the operators had a strong motive to ask fair, non-leading questions, because of they didn't get the right answer, then the police might show up in the wrong neighborhood or with the wrong expectations, and there could be an unnecessary firefight. Finally, one could argue that a recorded statement made as events were unfolding is inherently more reliable (in some ways) than a narrative given months or years after the event; human memory gets corrupted faster than 8-track tapes.

Some combination of these factors convinced the Court to admit the evidence. Other, very similar cases have been decided differently. Whether they got that particular decision right or wrong, though, the framework of "indicia of reliability" is hard-coded into American evidence law, especially for civil cases. If you want to present evidence to a jury based on a statement that was made outside of court, you have to give at least one reason why the statement is nevertheless reliable.

Double and Triple Hearsay

Here's where things really get interesting: if your out-of-court statement quotes another out-of-court statement, the evidence is called "double hearsay," and you need to independently verify each statement. If any link in the chain breaks, the whole document gets excluded. For example, in the case I'm working on now, the defendants want to show the jury a report filled out by California's Occupational Health and Safety Administration ("OSHA"). The OSHA report is based almost entirely on an accident report form filled out by a private corporation. That report form, in turn, is based almost entirely on an informal interview of the only eyewitness to an accident. So the defendants can use the OSHA report if and only if the OSHA report, the accident report, and the informal interview are all reliable. Use  A ↔ (A ∧ B ∧ C) are reliable.

To try to qualify the OSHA report, the defendants are arguing that the OSHA report is reliable under the public record exception to the hearsay rule, meaning that the public officials who prepared it had a stronger interest in accurately reporting public information than they did in the outcome of the accident victim's private case. To get the accident report form in, the defendants are arguing that it is reliable under the business record exception to the hearsay rule, meaning that the corporate officials who prepared it had a stronger interest in making sure their company had access to accurate information about safety risks than they did in the outcome of any one customer's lawsuit. As for the informal interview...well, I honestly have no idea how they plan to justify its reliability. But, then again, I'm biased. My professional interest lies in making sure that the whole string of unhelpful quotations stays in a cardboard box in a dank garage, far away from any juries.

Do the Rules Work?

So far, I've been pleasantly surprised at how well the American legal system handles some of these challenges. The fact that we have a two-tiered system of evaluating evidence at all is a cut above average -- imagine, e.g., the doctor who examines you taking notes on your condition, filtering out any subjective comments you make about how you're sure it's just a cold, and reporting only your objective symptoms to a second doctor, who then renders a diagnosis. Or imagine a team of business consultants who interview a Fortune 500 company's leadership team, and then pass their written notes back to a team at HQ (who has never met the executives) so that HQ can catch any obvious mistakes in reasoning before sending out recommendations. We know, intellectually, that meeting people tends to make us friendlier toward them and more likely to adopt their point of view even if we encounter no Bayesian evidence that increases the plausibility of their opinions, but our institutions rarely take steps to guard against that bias.

I think my biggest criticism of the American evidence code is that it doesn't account for uncertainty in the model. For instance, if I read the headline on a piece of science journalism saying that (e.g.) coffee consumption reduces the risk of prostate cancer, or that receiving spankings in childhood is negatively correlated with conscientiousness as an adult, there are least six layers of 'hearsay' -- I might have misunderstood the headline, the headline might have mis-summarized the article, the article might have misquoted the scientist, the scientist might have misinterpreted the recorded data, the recorded data might not faithfully reflect what actually happened during the experiment, and the experiment might not faithfully replicate the real-world conditions that interest us.

Even if I can articulate plausible reasons why each step in the transmission of information was "reliable," I should be very skeptical that my *model* of the transmission is accurate. I only have to be wrong about one of the six steps for my estimate of the information's plausibility to be untrustworthy. If the information would only provide a few decibels of evidence even if it were perfectly reliable, then trying to calculate how many points a semi-reliable piece of evidence is worth can fail because of a low signal-to-noise ratio. E.g., suppose I learn that neither the suspect nor the actual criminal were redheads - I might be absolutely certain of this new piece of information, but that's still nowhere near enough evidence to support a conviction. If instead I learn that there is probably something like a 60% chance that neither the suspect nor the criminal had red hair, that datum really doesn't tell me anything at all -- the info shouldn't shift my prior enough for my prior to be noticeably different.

Although courts are allowed to consider the extent to which an unduly long chain of inferences makes evidence less "trustworthy," I think that on balance decisions would be more accurate if there were a firm limit -- say, three layers -- beyond which evidence was simply inadmissible as a matter of law. If A says that B says that C says that D shot someone, then no matter how reliable we think A, B, and C are, we should probably keep that evidence away from the jury unless we can haul at least one of B, C, or D into court to answer cross-examination.

New to LessWrong?

New Comment
107 comments, sorted by Click to highlight new comments since: Today at 3:26 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I am sorry I did not manage to comment on this earlier; I did not suspect it would get promoted.

In short, your treatment of hearsay, and how the legal system addresses it, is simply wrong. Most of what you talk about is actually about the Confrontation Clause. I don't know if this is due to an intentional simplification of your examples, but the cases you use just don't work that way.

The main case you talk about, Davis v. Washington, is not a case about hearsay; just look at the wikipedia summary. It is a case about the confrontation clause. This is a clause that says that those accused of crimes have the right to confront the witnesses against them; if someone talks to the police under certain circumstances, that testimony may not be entered. It does not matter how reliable it is. See Crawford v. Washington. The "indicia of reliability test" was abandoned in Crawford, because it was completely circular - it was compared to doing away with a jury trial because the defendant was obviously guilty.

More generally, there is almost never a balancing test in hearsay. Hearsay is a series of rules that are applied systematically. Out of court statements are considered unreliable ... (read more)

Yes, it is. Lawyers and judges have a tendency to invent dozens of fuzzily overlapping concepts without even considering whether one or two concepts could do just as much useful intellectual work. I could tease out the difference between testimonial and nontestimonial evidence, assertions and non-assertions, matters offered for the truth of the matter asserted, matters offered for other purposes, matters pretextually offered for other purposes, matters honestly offered for other purposes but with an unacceptable tendency to prejudice the jury...but I'm not writing a law review article; I'm writing a Less Wrong post. I tried to focus on what I thought the audience would find relevant. What interests me here is the distinction between the truth of evidence (does the content of this document describe reality?) and the reliability of evidence (would we ordinarily expect documents like this one to describe reality?). Anything further would be an explanation of the law for its own sake. Give me a little credit, here; don't you think I looked at the Wikipedia summary before publishing the post? I also linked to Michigan v. Bryant, a newer Supreme Court case which extensively discusses Crawford. I think the cases I linked to provide a discussion of evidentiary reliability that illustrates some important Bayesian concerns. Whether every doctrine in every case I cite is still good law is not really the point. I may not have been clear on this point -- I'm not claiming that judges weigh evidence to see if it should be considered hearsay. Rather, the very process of determining whether evidence is hearsay appears to be designed so as to indirectly prompt judges to weigh whether evidence is reliable. By systematically applying the rules about what counts as hearsay, judges consciously or unconsciously wind up admitting only evidence that the system views as reliable. If you like, we could say that the people who write the laws of evidence in the first place are the ones who p

The issue is that Confrontation clause != hearsay. Confrontation rights belong to criminal defendants only, while hearsay is an issue in any trial. As you note, hearsay is conceptually a reliability indicator, while Confrontation clause analysis is trying to determine when the government must go through the time and effort to produce a witness at the actual trial.

In general, criminal defendant rights are not well correlated with reliability. For example, suppression of illegally obtained evidence is anti-correlated with accuracy. This piece makes a good point about chaining evidence. As a lawyer, I thought the piece did a great job of highlighting when the legal system does a better job of truth discovery than society as a whole, and the more frequent occurrence when the legal system is just as misguided as ordinary Joe Citizen.

In short, please accept the word of an expert that the discussion under the heading The Hearsay Rule is not about the hearsay rule and is unrelated to the remainder of the excellent piece.

More accurate would be to say that they wind up excluding only evidence that the system views as unreliable. Whether the evidence is reliable is always the jury's call--a point I don't think a quibble because hearsay rules are designed to exclude certain unreliable evidence: that which has particular potential to confuse the jury. From a rationalist perspective, then, you need to consider not only whether multiple levels of hearsay, admissible at each step, tend to confuse the jury or whether, on the other hand, the jury can competently evaluate the noisiness of the evidence's transmission. I don't think multiple levels of admissible hearsay have much credibility with jurors; I think the transmission chain is readily subject to effective attack by the defense. (Every child has played the telephone game.) But here is where considering biases would have been fruitful (and necessary to your thesis). It isn't enough to prove chains of hearsay are unreliable. Many kinds of evidence are admitted despite their unreliability: say, the testimony of a witness who's a known habitual liar. The problem for any rule of admissibility is to weigh the risk of the jury being mislead. Unless you can show the jury is unfit to discount multiple levels of hearsay--with the help of a competent adversary--the proposal is tantamount to having juries base their conclusions on less information than they would otherwise use. Since both parties are subject to the same hearsay rules, it could mean being unable to exonerate a defendant with sound evidence based on multiple levels of hearsay, merely because in general multiple levels of hearsay tend to suffer reduced reliability.
I agree that the Supreme Court cases are not on point, but the discussion of chains of evidence is worth thinking about. If we accept hearsay exceptions based on reliability (and I think exceptions for things like business records have little other justification), then hearsay within hearsay is perhaps not treated correctly. Once evidence is admitted, it is admitted - the judge doesn't instruct the jury to give less weight to hearsay within hearsay. But probability says that if business records are 80% reliable, and present sense impressions are 70% reliable, a business record that is based on a present sense impression is 56% reliable (if I did the math right). That's unintuitive to the jury, and the legal system makes no effort to correct for this misunderstanding of statistics. Edit: And the description of the judge sifting out evidence is a good explanation for non-lawyers.
Well, .7*.8=.56, certainly, but personally I would not be so casual about assuming that the two failure rates are independent.

Is there any evidence than American or any other legal system is significantly better than chance at what it does? Or even not significantly worse than chance? (by being biased instead of just random)

That's the first question we should be asking, before concerning ourselves with minor issues about admissibility of evidence.

That's an excellent question. The answer depends on exactly what you mean by "better than chance." If you mean "more than half of those convicted of a crime are guilty of that crime," then I'd say yes, there is excellent reason to think that they are. Prosecutors usually have access to several times more reports of crime than they can afford to go out and prosecute. Prosecutors are often explicitly or implicitly evaluated on their win ratio -- they have strong incentives to pick the 'easy' cases where there is abundant evidence that the suspect is guilty. Most defense lawyers will cheerfully concede that the vast majority of their clients are guilty -- either the clients admit as much to their lawyers, or the clients insist on implausible stories that don't pass muster, which the lawyers have to disguise in order to get their clients to go free. Although as a matter of law and rhetoric people are presumed innocent until proven guilty, as a matter of cold statistics, someone who has been lawfully indicted in America is probably more likely to be guilty than innocent. In fact, there are probably so many guilty suspects in Court that the legal system does strictly ... (read more)

Careful, there: the economic damage of not locking up a thief is much lower than the economic damage of incorrectly locking up a non-thief. "It's better that X guilty people go free than that one innocent person goes to prison" is a good principle. (Note that X is likely to have different values for weed users, thieves and serial killers.)
Worse epistemically, not instrumentally.
Could you expand on that?
I'll have a go. A thief at large but constrained by trying not to get caught doesn't do as much net economic damage as destroying the economic, family and social life of an otherwise productive member of society. For example, it would be almost impossible for a single shoplifter or casual stealer of office supplies to do anything like the economic damage of even removing themselves from the economy and putting them in prison. The only reason we still do it is to discourage behaviour that would seriously hurt the economy if it became the norm. If you start locking people up with careless disregard for whether or not they are guilty, you get all the economic cost of the prison system but lose the clarity of the signal. The message becomes "hey, you're probably going to prison at some point anyway. Might as well grab what you can while you are still free to enjoy it".
I wouldn't quite expect that, but certainly I agree that if there's no expectation that punishment correlates better with committing a crime than with refraining from doing so, then punishment loses its deterrent function, which I gather is your real point here.
Part of my point. In addition to that, perception of unfairness in the justice system devalues the social contract between government and individuals. It places the individual in a competing rather than cooperating relationship with social institutions (such as law enforcement), so it's not a mere lack of deterrence, it's active encouragement to defect for individual gain at the expense of "the man". If you doubt it, ask anyone who lives in a neighbourhood where the cops have a record of incompetence or abuse of authority.
I've heard (anecdotally, not with good evidence) that this is precisely the attitude that many poor urban young black men already have (actually with ‘go to prison’ changed to ‘go to prison or get killed’). Thus one has every reason to join a gang for the immediate services that it provides, future be damned.
If a random 80% of suspects are guilty, the appropriate naive predictor is one that always votes "guilty", not one that tries to match probabilities by choosing a random 80% of suspects to call guilty. Then you get an accurate result 80% of the time, which is a lot better than 68%. That seems to me a more appropriate benchmark. (Alternatively, you might consider a predictor that matches its probabilities not to the proportion of defendants who are guilty but to the proportion who are convicted. There might be something to be said for that.)

I think the intended question is whether the legal system adds anything beyond a pure chance element. Somehow we'd need a gold standard of actually guilty and innocent suspects, then we'd need to measure whether p(guilty|convicted) > 80%. You could also ask if p(innocent|acquitted) > 20%, but that's the same question.

Thank you! Intended or not, it's a fantastic question, and I don't know where to look up the answer. I'm not even sure that anyone has seriously tried to answer that question. If they haven't, then I want to. I'll look into it.
The closest thing I know of is the "actually innocent, but convicted" sample that gradually came to light under DNA testing of inmates. Unreported crime rates get estimated somehow, so I'd be surprised if nobody had combined those numbers to do a study in this vein. Haven't found one with a cursory googling, though.
I don't see how those are "the same question". If out of 8 accused 4 are guilty and two of them are convicted, the rest acquitted. Than p(guilty|convicted) = 1 and p(innocent|acquitted) = 2/3.
The assumption was that 80% of defendants are guilty, which is more than 4 of 8. Under this assumption, asking whether p(guilty|convicted) > 80% is just asking whether conviction positively correlates with guilt. Asking if p(innocent|acquitted) > 20% is just asking if acquittal positively correlates with innocence. These are really the same question, because P correlates with Q iff ¬P correlates with ¬Q.
Perfect. Thanks.
I'm sorry, but I can't produce any response but bewilderment. You think DNA exonerations don't give evidence about the accuracy of the system? Really?

It proves that mistakes have been made, but in the end, no, I don't think it's terribly useful evidence for evaluating the rate of wrongful convictions. Why not? There have been 289 post-conviction DNA exonerations in US history, mostly in the last 15 years. That gives a rate of under 20 per year. Suppose 10,000 people a year are incarcerated for the types of crime that DNA exoneration is most likely to be possible for, namely murder and rape (I couldn't find exact figures, but I suspect the real number is at least this big). Then considering DNA exonerations gives us a lower bound of something like .2% on the error rate of US courts.

That is only useful evidence about the error rate if your prior estimate of the inaccuracy was less than that, and I mean, come on, really? Only one conviction in 500 is a mistake?


DNA exoneration happens when one is innocent and combination of extremely lucky circumstances make retesting of evidence possible. The latter I would be shocked to find at higher than 1:100 chance.

The appellate system itself - of which cases involving new DNA evidence are a tiny fraction - is a much more useful measure. There are a whole lot more exonerations via the appeals process than those driven by DNA evidence alone. This aught to be obvious, and the 0.2% provided by DNA is an extreme lower bound, not the actual rate of error correction. Case in point, I found an article describing a study on overturning death penalty convictions, and they found that 7% of convictions were overturned on re-trial, and 75% of sentences were reduced from the death penalty upon re-trial. One in fourteen sounds a lot more reasonable to me, and again that's just death penalty cases, for which you'd expect a higher than normal standard for conviction and sentencing. The standard estimate is about 10% for the system as a whole.
A few observations: * Literally 100% of people who ever lived have done multiple things which unfriendly legal system might treat as crimes, starting from simple ones like watching youtube videos uploaded without consent of their copyright owners, making mistakes on tax forms, reckless driving, defamation, hate speech, and going as far as the legal system wants to go. * Vast majority of suspects in US do not get any trial whatsoever, they're forced to accept punishment or risk vastly higher punishment if they want to take their chance of trial. * There are good reasons to believe few trials that happen are extremely far from any kind of fairness, and they're stacked to give persecution an advantage. Just compare massive funding of police and prosecutors with puny funding of defense attorneys. * US has extraordinarily high number of prisoners per capita. Looking at crime rates alone, it does not have extraordinarily high levels of serious crime per capita. There's no way most people in prisons can be anything but innocent (or "guilty" of minor and irrelevant "crimes" pretty much everybody is "guilty" of and persecuted on legal system's whims). * Unless you believe that young black men in US are the most criminal group in history of the world, most of them who are in prisons must be innocent by pure statistics.

Literally 100% of people who ever lived have done multiple things which unfriendly legal system might treat as crimes

Entirely for the sake of being pedantic, I'll point out that many people have avoided this, if only by dying very shortly after being born.

taw was only reporting to one significant digit :P
I see three.
100.% (with a decimal before the percent sign) would be three. 100% with no decimal is one.
I'd consider numbers such as “100” to be ambiguous between one, two or three significant digits. As for “literally”...
Still, I'd bet that at least 99.5% of people alive now have done something which is technically illegal in their jurisdiction. (Not sure about “who ever lived”, though.)

Unless you believe that young black men in US are the most criminal group in history of the world, most of them who are in prisons must be innocent by pure statistics.

This is by far your weakest point.

Men commit more crime than women. Young people commit more crime than the elderly. Black people commit more crime than white people.

Ergo yes young black males are probably one of the most criminal groups in the developed world. I bet Japanese American grandmothers are among the least, I can't imagine why,but somehow it just seems overwhelmingly likley.

If you want to quibble that government is more likley to make things that men, the young and black people do, illegal feel free to, but considering all three opening statements are also true for violent crimes and that victim reports basically match arrest ratios on all of them. I think all three are blatantly obviously true, but somewhat impolite to state.

Violent crime is something that for understandable reasons catches attention more than say white collar crime. It causes greater psychological distress to notice it or suspect you are vulnerable to it. This translates into greater pressure on politicians to make laws against it and provide law enforcement more resources to target it.


Unless you believe that young black men in US are the most criminal group in history of the world, most of them who are in prisons must be innocent by pure statistics.

Or there is a significant selection bias in the quasi-random selection process known as "arresting people."

By the way, have you read Bernard Harcourt's research, which suggests that the "imprisonment" rate in the United States has been constant over time, provided that you count commitment for mental illness as imprisonment. Thus, the recent growth in prison population in the United States reflects the shrinkage in long-term involuntary commitment of the mentally ill. In other words, a lot of the restrictions on the extremely mentally ill that used to be "provided" by dedicated institutions (i.e. mental hospitals) are now "provided" by jails and prisons.

You confuse different parts of the justice system, and your criticism is internally self-contradictory. If everyone who ever lived is guilty of something, then high or racially disparate incarceration rates need not catch innocent people. The fact that this occurs is more an indictment of the laws in place and the people who prosecute them, not the courts that adjudicate them. Put another way, if breathing were a crime, then everyone convicted of it would in fact be guilty. If there were a lot of black men convicted of it, more so than other races, it would likely be due to different rates of prosecution, given how easy the charge would be to prove this is bad, but it is a criticism of the wrong part of the government. It would be like blaming the mayor for the ineffectiveness of the postal service (the former is the city government, the latter is federal). Edited to clarify: I am referring to the value of our adversarial method and our rules of evidence / constitutional protections - the mechanics of how a trial works. There is an entirely separate issue of prosecutorial discretion and unequal police enforcement and overly draconian laws, which certainly lead to the problems being discussed here. But the entire purpose of evidence law and courtroom proceedings generally is to determine if the person charged is in fact guilty. It is not to determine if the prosecution is charging the right people or if the laws are just or justly enforced. So this criticism seems misplaced.
Selective prosecution is a problem that can be laid at the prosecutor's doorstep, and if proven, it's a basis for acquittal. The problem is that it's hard to prove, not that the prosecutor and cops aren't responsible. (See my essay "Threat to advocacy from overdeterrence."
If a bad law is applied in a racist way, surely that's a problem with both the law itself and the justice system's enforcement of it?
There's nothing self-contradictory about my criticism, there are multiple interrelated problems. American legal system is totally broken, it's not just one easy fix away from sanity. Fetishization of the concept of being "guilty" is one of these problems. For example - everybody on less wrong is obviously either "guilty" of conspiracy to commit treason and replace various governments by Friendly AI, or at least "guilty" of helping people directly involved in such conspiracy. That's a problem. If someone rounded up whoever on lesswrong happens to be black, and prosecuted them, that's would be a second problem.
It is self contradictory on its face . Compare these statements: The first statement provides a complete alternative explanation for the second two. It is entirely possible to believe that (A) there are far too many crimes, (B) police and prosecutors are biased against black people, and this fully explains why there are so many black people in prison without a single one of them needing to be innocent. Similarly, you say that, given crime rates and incarceration rates, some people must be innocent. Again, this is undermined by the fact that you say everyone is guilty of something. You just can't argue that everyone is a criminal, and then argue that high incarceration rates must necessarily be attributable to a high rate of convicting innocent persons. They may both be true, but you have no basis to infer the latter given the former. To the extent that you disclaim this by saying that people are "guilty" of minor "crimes," your argument becomes largely circular, and is still not supported by evidence. What percentage of thieves/murderers/rapists/etc. are actually caught? How long are sentences? Combine a higher crime rate with a higher catch rate and longer sentences and you easily get a huge prison population without innocent people being convicted. I don't claim to know if this is the case, but you do, so you need to back it up. There are good reasons to believe few trials that happen are extremely far from any kind of fairness, and they're stacked to give persecution an advantage. Just compare massive funding of police and prosecutors with puny funding of defense attorneys.
You completely confuse "did something for which an unfriendly legal system can find them guilty" and "did something seriously dangerous to others that could sensibly be a reason to lock someone one". These are not even remotely close. Essentially everyone is the former, very few people are the latter.
If your point is that there are a lot of people locked up for violating laws that are basically stupid, you're absolutely right. But that issue is largely irrelevant to the subject of the primary post, which is the accuracy of courts. If the government bans pot, the purpose of evidence law is to determine whether people are guilty of that crime with accuracy. In other words, your criticism of the normative value of the American legal system is spot-on; we imprison far more people than we should and we have a lot of stupid statutes. But since this context is a discussion of the accuracy of evidentiary rules and court procedure, your criticism is off-topic.
Is it really off-topic to suggest that looking at the accuracy of the courts may amount to rearranging the deck chairs on the titanic in a context where we've basically all agreed that 1. the courts are not terrible at making accurate determinations of whether a defendant broke a law 2. The set of laws where penalties can land you in prison are massively inefficient socially and in most people's minds unjust (when we actually grapple with what the laws are, as opposed to how they are usually applied to people like us, for those of us who are white and not poor). 3. The system of who is tried versus who makes plea bargains versus who never gets tried is systematically discriminatory against those with little money or middle/upper class social connections, and provides few effective protections against known widespread racial bias on the part of police, prosecutors and judges. How different is this in principle from TimS's suggestion about lower hanging fruit within evidentiary procedure, just at a meta level? Or did you consider that off-topic as well?
Can you cite evidence for this? Most of the evidence for this is based on arguing that P(conviction|African descent) > P(conviction|Eurasian descent) and dismissing anyone who points out that P(guilty|African descent) > P(guilty|Eurasian descent) as a racist.
(Or even do the homophone and 'cite'?) That is Bayesian evidence that there is a racial bias in the conviction process. (But that isn't evidence!) Mind you those people are probably being racist. In particular they are 'pointing out' rather than, say, hypothesizing.
Thanks fixed. Only if one doesn't know anything about the base rate. Why not? According to your profile you're from Australia, so I'm going to give you the benefit of the doubt and assume the above statement is due to all your information about the US being filtered through a politically correct filter.
No, unless you mean by 'base rate' something entirely different to the actual priors I would use when making the update the reverse is true. You need to be able to calculate some prior for the relevant base rate in order to make the update either way. Hey, I was agreeing with you! The reason I did so was that the quoted behavior is a social attack that contains more or less no information that is not orthogonal to the issue. I'm going to suggest, instead, that you are so (justifiably) fed up with abusive usage of political correctness that you patterned matched my statement onto a far more general position which I would not dream of supporting. My actual semantics of my claim is rather mild. The limitation to a probability would itself be sufficient to make it correct but more important is the emphasis on how the potentially claim is presented. The likely neglect of corrections for socioeconomic status and how that impacts whether the difference is in what kinds of crimes are committed and how much those crimes are the sort that actually get convictions irrespective of guilt. A relevant datapoint regarding the drug use - the most ridiculous element of America's absurd legal system: the last relevant study I read (the abstract of) found that contrary to common belief white Americans actually use illegal drugs more than black people. Of course the manner and type of drug use is also on average drastically different. (I also corrected the word 'decent' into 'descent'. My original reading of the claim you actually described regarding decency made the claim even more racist and also confusing. Now responding to the suggestion of political correctness based naivete I counter that I am actually far less politically correct than average except when the politically correct beliefs are coincidentally correct. With respect to the topic of race in particular I, like many Australians are far less inclined to hit the berserk button and start throwing around social signalling
You might be interested in this thread where Alicorn challenged me on that very claim. (Since I had to update on it, I'm obligated by the terms of my Bayesian novitiate to loudly point it out when someone reiterates my previous prior.) In gist, white Americans are overrepresented in lifetime illegal drugs use, but black Americans are overrepresented in recent (e.g. last-month) illegal drugs use. This is relevant because you don't get arrested today for having tried cocaine ten years ago. (In other words, if you're white you're more likely to have tried dope, but if you're black, you're more likely to have used it in the past month.) However, proportional to recent illegal drugs use, black Americans are overrepresented in illegal drugs arrests. That said, the statistics I was able to find did not distinguish arrests for drug possession vs. drug dealing; did not distinguish occasional from heavy users (so long as they'd used within the past month); and did not distinguish among different illegal drugs. They also didn't control for economic class, which is probably significant in where people choose to obtain and use their illegal drugs, which in turn would have some effect on arrest rates.
They also wouldn't have controlled for technological and strategic capability. ie. The knowing or being able to find out how to use an anonymous service like silkroad, use a conservative policy when receiving such goods and make use of substances in a way that minimises arrest potential. These days working the basics on thing likes this is easy. Spend several hours with google. Yet it remains the case that which subculture someone is in will drastically influence how much they will end up knowing and caring about minimising risk. ("Rah geeky rationalist skills!")
Or willingness to use said techniques. I am routinely contacted, due to my Silk Road page, by people wishing me to explain how to use Tor/Bitcoin/Silk Road (despite the entire page being just that sort of guide!) or to buy stuff on Silk Road for them (seriously? you think I might buy some LSD for a random stranger?). I've begun to understand why ESR could feel compelled to write how to ask questions the smart way.
Some (most) people really are just mindboggling bad at knowing when and how to just google something, consolidate the knowledge and practically implement it! Or perhaps we just overestimate our capabilities in this regard relative to the norm.
P(guilty|African decent) > P(guilty|Eurasian decent) alright, but that kind of evidence is easily screened out. For any non-trivial amount of forensic evidence E, P(guilty|E, African descent) ought to be approximately the same as P(guilty|E, Eurasian descent). (For example, if E points toward the defendant being guilty, even though you'd assign a higher prior probability of guilt for a black person than for a white person, you'd assign a higher prior probability of E for a black person than for a white person too, so that in the posterior probabilities those cancel out.)
Agreed, however, I fail to see what this has to do with my point.
Never mind. I had misunderstood your point.
The original question was: which I would interpret as ‘Is P(Conviction|Guilt) substantially larger than P(Conviction|Innocence)?’ Now, for some crimes such as copyright infringement, P(G) is very close to 1, so P(C|G) cannot be close to 1 simply because then there wouldn't be enough room in prisons to hold NP(C|G)P(G) people (N being the population -- times the mean sentence length, over the mean lifespan, and possibly some other factor of order unity I'm forgetting of), and since P(I) is small, in order for P(C|I) to be much less than P(C|G), P(C and I) = P(C|I)P(I) must be very small. (Also, we want the system to be unbiased, i.e. P(C|G, brown skin) to be close to P(C|G, pink skin), P(C|G, penis) to be close to P(C|G, vagina), and so on, and so forth. The best way of achieving this would IMO be for all of these numbers to be close to 1, but that's impossible with the current definition of G and finite capacity of prisons.)
It should be noted that the reasonable meaning of 'substantially' in such a case is relative not just the unadorned absolute difference in the ridiculously low probabilities. In comparisons of this nature a difference it is overwhelmingly obvious that "0.0001 - 0.000001" is much more significant than "0.6001 - 0.600001". The above consideration is even more important when the substitution of "is significantly better than chance" with "‘Is P(Conviction|Guilt) substantially larger than P(Conviction|Innocence)?’" is yours and not the original question.
I'm not sure of this: I'd very much prefer a world where 0.01% of guilty people and 0.0001 of innocent people are convicted to one where 60.01% and 60.001% are. (If convicting a guilty person has utility A and convicting an innocent person has utility -B, what you want to maximise is AP(C|G) - BP(C|I), which also depends on the magnitudes of the probabilities, not only on their ratio.) Huh? How else could the original question be interpreted? [ETA: Well, if better is interpreted instrumentally rather than epistemically, “the legal system is significantly better than chance” means “AP(C|G) - BP(C|I) is significantly greater than AP(C) - BP(C)”; and since A is orders of magnitude less than B (unless we're talking about serial killers or similarly serious stuff), that boils down to “P(C|I) is significantly less than P(C)”, where it's the magnitude of the difference that matters, rather than the ratio.]
So do I. This answers the question "do I consider false convictions worse than guilty parties who are not punished" but does not tell us much about the original question. The point of the original question was not "with trivial consideration of how much the conviction process is different to chance how high is the base rate of convictions for this crime?" That is a reasonable interpretation - the point is that we must also interpret 'significance' in light of the original meaning. That original meaning does not contain an overwhelming emphasis on the base rate of convictions!
Are we restricting to cases that are prosecuted or doing this over all people?
The immediately obvious answer would be “over the prosecuted if you're only ‘testing’ the courtroom itself, over all the people if you're ‘testing’ the whole system”, but I'm not sure what the ‘ideal’ thing for a courtroom to do in terms of P(C|Prosecution, G) and P(C|P,I) if the police is ‘non-ideal’ so that P(P|G) is not close to or greater than P(P|I) to start with. Or even whether this question makes sense... I'll have to think more clearly about this when I'm not this tired.
Doesn't it? I remember gun control arguments from years back where people would note the US's high murder rate compared to Canada and other similar countries with more restricted gun ownership. Then someone would bring up Switzerland which has wide gun ownership and low crime.
Homicide rates by country. US has higher homicide rate than Europe but nothing unusual. Switzerland only has high gun "ownership" when you include military weapons with sealed ammo nobody is allowed to use in their own free time outside regulated places. Recently they don't even give people sealed ammo. And rifles in any case are pretty useless in crimes, just as are hunting guns (see Canada). Handgun ownership is the key statistics, and that's why US has higher violent crime rate, which is pretty immaterial here anyway since vast majority of prisoners are there for non-violent crimes.
Thanks for the link to homicide rates, upvoted. I would consider the US rate high, it's around 4 times that of Australia or Western and Central Europe, 3 times that of Canada, and over 2.5 times that of New Zealand, but agreed it's not that relevant.
IAWYC, but Seriously? How many crimes has someone dying at the age of 1 committed in average?

How many crimes has someone dying at the age of 1 committed in average?

Being a public nuisance.
Destruction of their parents' property.
Kicking their mom can probably be classified as domestic violence.

Edited to add: Also biting.

A but of hyperbole, perhaps. Nonetheless, it is all but certain that the overwhelming majority of Americans (100% - epsilon would not surprise me) over the age of responsibility (~7 years old) have committed a felony.
An unfriendly legal system might treat being born as a crime. In fact, I'd be surprised if some politician in Arizona hasn't tried to make being born to illegal immigrant parents a crime.
The downvotes surprise me, because the rhetoric is precisely along these lines, e.g. Sonny Bono wanting to remove state benefits from the US-born children of illegal immigrants on the justification "they're illegal". Illegal people.
Yeah, I was wondering about the downvotes. The welcome thread says that it's perfectly acceptable to ask for an explanation... So, for anyone who downvoted me, why?
Didn't downvote, but I think your comment visually matches the 'strawman argument' pattern. Except that it is not.
[citation please]
It seems to me that even children that young have generally done various sorts of property damage and assault, and that most excuses for that behavior will rest on not considering them "people".

An adult with the behavior of a very young child would be declared irresponsible by psychiatrists, and thus couldn't be tried (but could be arrested and committed). It seems reasonable to apply this to children, and they are automatically committed to restrictive institutions already.

A 1 year old not many yet, but in US police has been increasingly used even against elementary school children, so 1 year olds are not safe forever.
This book seems relevant.
I agree with most of what you say, but I'm not so sure about the last two. As others have pointed out, there are many, many cases where the primary suspect of a crime is never prosecuted. Given a choice, prosecutors will usually choose "easy" cases. So an alternate explanation for America's high prison population and incredibly high black prison population is that * more criminals are prosecuted and convicted in America, and * jurors are biased and black criminals are therefore easier to convict; and/or prosecutors are biased and therefore prosecute more black criminals. Now, since I don't think it's actually optimal for everyone who ever breaks a law to be punished, I have no problem saying, for example, "More criminals are prosecuted and convicted here, and that's too bad."
This doesn't follow. You could believe that other groups have higher rates of criminality, but are underimprisoned relative to their rate of guilt.
On the other hand, the prosecution needs to convince twelve jurors, the defense only needs to convince one.
Not necessarily. If the 11 and the 1 hold fast, then this results in a mistrial, not an acquittal; so really the defence needs to convince only 1 but every time, while the prosecution needs to convince all 12 but only once. (And in fact the 1 will be under enormous pressure from the 11 to convert before that point is reached.) ETA: Of course ‘every time’ is not forever; eventually the prosecution will give up.

By "better than chance" do you mean whether when investigating e.g. a murder, the American police and legal system have more than P(1/population of America) of locating and punishing the actual guilty party?

How does the legal system normally deal with cases where someone has a chain of logic where each link seems strong but there are a dangerously large number of links? This seems like a special case of a more general issue that the court must face regularly.

Would an argument to the judge like "even if each of these reports comes from a person trying to do a good job in passing along the truth, there are too many places where any of these people could have made a simple error" stand a chance?

No. The lawyer would argue the point to the jury. Unless the judge is the finder of fact because it's a bench trial, the judge doesn't "weigh" the evidence (assess its plausibility). Hearsay is a special case because juries are believed prone to overestimating its reliability and because it implicates the right to confront witnesses. Long chains of reasoning, on the other hand, aren't likely to mislead the jury. Research on cognitive fluency, in fact, shows people usually discount evidence excessively when it's complicated.

A few comments:

  1. It is somewhat confusing (at least to legal readers) that you use legal terms in non-standard ways. Conflating confrontation with hearsay issues is confusing because making people available for cross-examination solves the confrontation problem but not always the hearsay one.

  2. I like your emphasis on the filtering function of evidentiary rules. Keep in mind, however, that these rules have little effect in bench trials (which are more common than jury trials in state courts of general jurisdiction). And relatively few cases reach trial

... (read more)
I'm skeptical. After all, the anchoring effect isn't weakened by being reminded that it exists. It seems that anything the jury sees will influence their decision, and they will likely be unable to discount its influence appropriately to account for its unreliability (especially if its emotionally charged). I've always been uneasy when the judge on some court TV drama sustains an objection or asks that something be stricken from the record, as if that means it's stricken from the minds of the jury so it won't influence their decision. We have good reason to believe that that's impossible - the jury's brains have been primed with a piece of argumentation that the judge has recognized is unadmissible. It's too late. At least, it has always seemed that way to me. What does the legal literature say about this?

I like the idea of capping the length of an admissible chain of hearsay, but whenever I hear about a rule like that, I always think of the risk that you'll miss an obviously true conclusion just because the evidence wasn't admissible. Of course, that's a silly argument, since we have lots of such limits and they're not something I disagree with.

The obvious solution to this entire debate is to teach people a basic understanding of practical probability, but I guess you work with what you've got...

Incidentally, is the title a deliberate play on "Lies, damn lies, and statistics"? I couldn't work it out.

I think it's just the standard "a thing, another thing, and yet one more additional thing". A common species, of which "lies damned lies, and statistics" is another example.
But it's a more specific pattern than that: "X, adjective X, and Scientific Term".

Fascinating article! I have to confess that I don't know a lot about the legal system or how it works. It strikes me as the kind of field that would be both useful to know in some detail, and interesting to learn about. So "study the modern legal system" is somewhere on my list of "random personal research projects."

The explanation of the current system, and how to view it in a rationalist manner was really interesting.

The problem as you state it seems to be that the court (and people in general) have a tendency to evaluate each link in a chain separately. For instance, if there was one link with an 80% chance of being valid, both a court and a bayesian would say "ok, lets accept it provisionally for now", but if there's three or four links, a court might say "each individual link seems ok, so the whole chain is ok" but a Bayesian would say "t... (read more)

I suspect the low hanging fruit in improving the legal system is related to evaluating witness credibility and eyewitness accuracy. Judges talk all the time about how the factfinder was present when a witness testified and was in a unique position to evaluate the credibility of a witness' testimony. But the evidence is pretty strong that people are terrible at those kinds of judgments and don't realize how bad they are at them.

Thanks for this article. I now finally start to understand the sense behind the judge/jury system, which I always found a little strange (compared to just a qualified judge making the whole decision).


The "legal system" is concerned, above all else, that citizens regard its workings as legitimate, The appearance of inevitability promotes the sense of legitimacy, and any procedures that appear arbitrary interfere with it. Thus, the law would exclude all "hearsay within hearsay" before it would impose a three-level limit. Statistical evidence might show that three levels is optimal (or that some other cutoff is), but the provision's artificiality is patent. "I was treated unjustly because my evidence consisted of four levels of hearsay" sounds unjust because "arbitrary" limitations denude the law of evidence of the sense that it's natural.

That critique might sound good in theory, but I think it falls flat in practice. Hearsay is a rule with more than 30 exceptions, many of which seem quite technical and arbitrary. But I have seen no evidence that the public views legal systems that employ this sort of convoluted hearsay regime as less legitimate than legal systems that take a more naturalistic, Benthamite approach. In practice, even laypeople who are participating in trials don't really see the doctrine that lies beneath the surface of evidentiary rulings, so I doubt they form their judgments of the system's legitimacy based on such details.

Use A <-> (A ^ B ^ C) are reliable.

This threw me a little. Those ^ characters look a lot like the logical conjunction ("and") operator ∧, but they also look like the exclusive-or operator in C-like programming languages. For clarity, maybe spell this out in plain English: "Use A if and only if A and B and C are reliable."

Or actually use the proper symbols. ASCII is not a requirement for compatibility in these days.
Thanks; fixed.
I can also nitpick that the parentheses are in the wrong place. It's not that the conjunction of A, B, and C must be reliable, but that each of them individually must be reliable (as is indicated in English by the use of ‘are’ instead of ‘is’). So first you should make it all symbols by introducing modal operators for the usage of a fact and for the reliability of a fact, then group properly to get U[A] ↔ (R[A] ∧ R[B] ∧ R[C]), not U[A] ↔ R[A ∧ B ∧ C].

Why can't they just say that each additional layer makes it weaker evidence? For example, hearsay is 50% as strong as seeing it., double hearsay is 25% as strong, triple hearsay is 12.5% as strong, etc.


"Extraordinary claims require extraordinary evidence. But on uninteresting topics, surprising claims usually are surprising evidence; we rarely make claims without sufficient evidence. On interesting topics, however, we can have interests in exaggerating or downplaying our evidence, and our actions often deviate from our interests. In a simple model of noisy humans reporting on extraordinary evidence, we find that extraordinary claims from low noise people are extraordinary evidence, but such claims from high noise people are not; their claims are more likely unusual noise than unusual truth. When people are organized into a reporting chain, noise levels grow exponentially with chain length; long chains seem incapable of communicating extraordinary evidence."

Lawyers don't calculate probabilities, juries don't understand them, so exact numerical values are irrelevant.

Also I would say people don't like probabilistic arguments used in justice. Punishing someone for a high probability that they did something, feels very unfair. But in this universe, this is all we can have.

Would it feel fair to imprison someone because there is a 50% probability they did something wrong? How about 80%? 90%? 99%? People like to pretend that there are some magical values called "reasonable doubt" and "beyond a shadow of doubt" where probabilities stop being probabilities and become a separate magisterium.

We are not good at dealing with probabilities and what we intuitively seek is probably a social consensus -- if everyone important thinks the guy is guilty, then it is safe to punish him. We are trying to be more fair than this, and partially we are succeeding, and partially we are trying to do the impossible, because we can never get beyond probabilities. But there are huge inferential gaps that prevent explaining this in the court.

There is a very important sense in which a 99% chance and certainty are in separate magisteria.
Add, they don't are specialists like mathematicians or cognitive scientists(hard), and the ability to judge the culpability of someone is pretty limited, like non-lawyers.
This comment is very difficult to read. Suggestion: write it in proper Portuguese, and I will translate it into English. (I am guessing Portuguese from your name. Any other common language should work.)
That's not a great system. The numbers aren't necessarily going to be the same from case to case, and having actual numbers in real life that don't come from very careful evaluations can easily lead to the illusion of precision where it doesn't exist.

It seems it is ensuring at each link no one has motivation to report wrongly, rather than noone would mess up.


to see if normal people can handle it

"Evidence? You can't handle the evidence!"

[This comment is no longer endorsed by its author]Reply