Why Do Math?

The Interrogator’s Fallacy

MATHEMATICAL RECREATIONS by Ian Stewart

Scientific American, September 1996

Mathematics is invading the courtroom. Juries are routinely instructed to convict the accused of a crime provided they are sure “beyond a reasonable doubt” of guilt. This instruction is qualitative—it depends on what a juror considers to be reasonable. A future civilization might attempt to quantify guilt by replacing the jury with a court computer that weighs the evidence and calculates a probability of guilt. But today we do not have court computers, so juries are forced to grapple with probability theory.

One reason is the increasing use of DNA evidence. The science of DNA profiling is relatively new, so the interpretation of DNA evidence relies on assessing probabilities. Similar problems could have arisen when conventional fingerprinting was first introduced, but lawyers were presumably less sophisticated in those days; at any rate, fingerprint evidence is no longer contested on probabilistic grounds.

Robert A. J. Matthews, has pointed out that a far more traditional source of evidence in court cases ought to be analyzed using probability theory—namely, confessions. To Tomás de Torquemada, the first Spanish grand inquisitor, a confession was complete proof of guilt—even if the confession was extracted under duress, as it generally was. One of Matthews’s most surprising conclusions, which he calls the “interrogator’s fallacy,” is that there are circumstances under which a confession adds weight to the view that the accused is innocent rather than guilty.

Matthews’s ideas offer a reason for distrusting confessions in trials of terrorists—who are fortified against interrogation—unless corroborated by other evidence. Modern legal practice is quite skeptical about confessions known to have been obtained under duress. In the U.K. a series of high-profile terrorism convictions, hinging on confessional evidence, have been overturned because of doubts that the confessions were genuine.

The main mathematical idea required to explain Matthews’s conclusion is that of conditional probability. Suppose Mr. and Mrs. Smith tell you they have two children, one of whom is a girl. What is the probability that the other is a girl?

The reflex response is that the other child is either a boy or a girl, with a probability of 1/2 for either. There are, however, four possible gender distributions: BB, BG, GB and GG, where B and G denote “boy” and “girl,” respectively, and the letters are arranged in order of birth. Each combination is equally likely and so has a probability of 1/4. In exactly three cases, BG, GB and GG, the family includes a girl; in just one of this group, GG, the other child is also a girl. So the probability of two girls, given that there is at least one girl, is actually 1/3.

Suppose that instead the Smiths tell you that their eldest child is a girl. What is the probability that the youngest is a girl, too? This time the possible gender distributions are GB and GG, and the youngest is a girl only for GG. So the probability becomes 1/2.

Probabilities of this type are said to be conditional, the probability of some event occurring given that some other event has definitely occurred. As the Smiths’ children show, the use of conditional probabilities involves specifying a context—which can have a strong effect on the computed probability.

To see how subtle such issues are, suppose that one day you see the Smiths in their garden. One child is clearly a girl; the other is partially hidden by the family dog, so its gender is uncertain. What is the probability that the Smiths have two girls?

You could argue that the question is just like the first scenario above, giving a probability of 1/3. Or you could argue that the information presented to you is “the child not playing with the dog is a girl.” Like the second scenario, this statement distinguishes one child from the other, so the answer is 1/2. Mr. and Mrs. Smith, who know that the child playing with the dog is William, would say that the probability of two girls is 0. So who is right?

The answer depends on a choice of context. Have you sampled randomly from situations in which there are many different families in which either child plays with the dog? Or from families in which only one child ever plays with the dog? Or are you looking only at a specific family, in which case probabilities are the wrong model altogether?

The interpretation of statistical data requires an understanding of the mathematics of probability and the context in which it is being applied. Throughout the ages lawyers have shamelessly abused jurors’ lack of mathematical sophistication. One example in DNA profiling—now well understood by the courts—is the “prosecutor’s fallacy.” DNA profiling was invented in 1985 by Alec J. Jeffreys of the University of Leicester and draws on a so-called variable number of tandem repeat (VNTR) regions in the human genome. In each such region a particular DNA sequence is repeated many times. VNTR sequences are widely believed to identify individuals uniquely.

For use in courts, scientists use standard techniques from molecular biology to look for matches between several different VNTR regions in two samples of DNA—one related to the crime, the other taken from the suspect. Sufficiently many matches should provide overwhelming statistical evidence that both samples came from the same person.

The prosecutor’s fallacy refers to a confusion of two different probabilities. The “match probability” answers the question “What is the probability that an individual’s DNA will match the crime sample, given that he or she is innocent?” But the question that should concern the court is “What is the probability that the suspect is innocent, given a DNA match?” The two queries can have wildly different answers.

The source of the difference is, again, context. In the first case, the individual is conceptually being placed in a large population chosen for scientific convenience. In the second case, he or she is being placed in a less well defined but more relevant population—those people who might reasonably have committed the crime.

The use of conditional probabilities in such circumstances is governed by a theorem credited to the Englishman Thomas Bayes. Let A and C be events, with probabilities P(A) and P(C), respectively. Write P(A|C) for the probability that A happens, given that C has definitely occurred. Let A&C denote the event “both A and C have happened.” Then Bayes’s theorem tells us that P(A|C) = P(A&C) / P(C).

For example, in the case of the Smith children (first scenario), we have

C = at least one child is a girl
A = the other child is a girl
P(C) = 3/4
P(A&C) = 1/4

because A&C is also the event “both children are girls,” or GG. Then Bayes’s theorem says the probability that the other child is a girl, given that one of them is a girl, is (1/4)/(3/4) = 1/3, the value we arrived at earlier. Similarly, with the second scenario, Bayes’s theorem gives the answer 1/2, also as before.

For the application to confessional evidence, Matthews designates

A = the accused is guilty
C = he or she has confessed

Derivation of Matthews’s Formula
By Bayes’s theorem we have                                   because either A or A' must happen, but
      P(A|C) = P(A&C)/P(C)                                        not both. Finally, P(A')= 1 – P(A).
and similarly                                                               Putting all this together, we get
      P(C|A) = P(C&A)/P(A).                                               P(A|C)= P(A)/[P(A)+                                But C&A = A&C, so we can combine the                       P(C|A')P(A')/P(C|A)].
two equations to get                                                  If we replace P(A) by p and
      P(A|C) = P(C|A)P(A)/P(C).                                         P(C|A')/P(C|A) by r,
Moreover,                                                                   we get
      P(C) = P(C|A)P(A) + P(C|A')P(A')                             P(A|C) = p/[p + r(1 – p)].

As is normal in Bayesian reasoning, he takes P(A) to be the “prior probability” that the accused is guilty—that is, the probability of guilt as assessed from evidence obtained before the confession. Let A' denote the negation of event A, namely, “the accused is innocent.”

Then Matthews derives the formula P(A|C) = p/[p + r(1 – p)], where to keep the algebra simple we write p = P(A) and r = P(C|A')/P(C|A), which we call the confession ratio. Here P(C|A') is the probability of an innocent person confessing, and P(C|A) is that of a guilty person confessing. Therefore, the confession ratio is less than 1 if an innocent person is less likely to confess than a guilty person.

If the confession is to increase the probability of guilt, then we want P(A|C) to be larger than P(A), which equals p. Thus, we need p/[p + r (1 – p)] > p, which some simple algebra boils down to r < 1. That is, the existence of a confession increases the probability of guilt if and only if an innocent person is less likely to confess than a guilty one.

The implication is that sometimes the existence of a confession may reduce the probability of guilt. In fact, this will occur whenever an innocent person is more likely to confess than a guilty one. Such a situation might arise in terrorist cases. Psychological profiles indicate that individuals who are more suggestible, or more compliant, are more likely to confess under interrogation. These descriptions seldom apply to a hardened terrorist, who will be trained to resist interrogation techniques. It is plausible that this is what happened when securing the convictions that have now been reversed in U.K. courts.

Bayesian analysis also demonstrates some other counterintuitive features of evidence. For example, suppose that initial evidence of guilt (X) is followed by supplementary evidence of guilt (Y). A jury will almost always assume that the probability of guilt has now gone up. But probabilities of guilt do not just accumulate in this manner. In fact, the new evidence increases the probability of guilt only if the probability of the new evidence given the old evidence and the accused being guilty exceeds the probability of the new evidence given the old evidence and the accused being innocent.

When the prosecution case depends on a confession, two quite different things may happen. In the first, take X to be the confession and Y the evidence found as a result of the confession—for example, discovery of the body where the accused said it would be. Because an innocent person is unlikely to provide such information, Bayesian considerations show that the probability of guilt is increased. So corroborative evidence that depends on the confession being genuine increases the likelihood of guilt.

On the other hand, X might be the discovery of the body and Y a subsequent confession. In this case, the evidence provided by the body does not depend on the confession and so cannot corroborate it. Nevertheless, there is no “body-finder’s fallacy” like the interrogator’s fallacy, because it is hard to argue that an innocent person is more likely to confess than a guilty one just because they know that a body has been discovered.

Of course, it would be silly to suggest that every potential juror should take (and pass) a course in Bayesian inference, but it seems entirely feasible that a judge could direct them on some simple principles. Moreover, the same ideas apply to DNA profiling but in circumstances that are much more intuitive for jurors. A quick review of the interrogator’s fallacy could be an excellent way to discourage lawyers from making fallacious claims about DNA evidence.