Here is a simple puzzle:
A man takes a diagnostic test for a certain disease and the result is positive. The false positive rate for the test in this case is the same as the false negative rate, 0.001. The background prevalence of the disease is 1 in 10,000. What is the probability that he has the disease?
This problem is one of the simplest possible examples of a broad class of problems, known as hypothesis testing, concerned with defining a set of mutually contradictory statements about the world (hypotheses) and figuring out some kind of measure of the faith we can have in each of them.
It might be tempting to think that the desired probability is just 1- (false-positive rate), which would be 0.999. Be warned, however, that this is quite an infamous problem. In 1982, a study was published1 for which 100 physicians had been asked to solve an equivalent question. All but 5 got the answer wrong by a factor of about 10. Maybe it’s a good idea then to go through the logic carefully.
Think about the following:
- What values should the correct answer depend on?
- Other than reducing the false-positive rate, what would increase the probability that a person receiving a positive test result would have the disease?
The correct calculation needs to find some kind of balance between the likelihood that the person has the disease (the frequency with which the disease is contracted by similar people) and the likelihood that the positive test result was a mistake (the false positive rate). We should see intuitively that if the prevalence of the disease is high, the probability that any particular positive test result is a true positive is higher than if the disease is extremely rare.
The rate with which the disease is contracted is 1 in 10,000 people, so to make it simple, we will imagine that we have tested 10,000 people. Therefore we expect 1 true case of the disease. We also expect 10 false positives, so our estimate goes from 0.999 to 1 in 11, 0.09091. This answer is very close, but not precisely correct.
The frequency with which we see true positives must be reduced by the possibility that we can have false negatives also, how do we encode that in our calculation?
We require the conditional probability that the man has the disease, given that his test result was positive, P(D|R+). This is the number of ways of getting a positive result and having the disease, divided by the total number of ways of getting a positive test result,
where D is the proposition that he has the disease, C means he is clear, and R+ denotes the positive test result.
If we ask what is the probability of drawing the ace of hearts on the first draw from a deck of cards and the ace of spades on the second, without replacing the first card before the second draw, we have P(AHAS) = P(AH)P(As|AH). The probability for the second draw is modified by what we know to have taken place on the first.
Similarly, P(R+D) = P(D)P(R+|D), and P(R+C) = P(C)P(R+|C), so
- P(D) is the background rate for the disease.
- P(R+|D) is the true positive rate, equal to 1 – (false negative rate).
- P(C) = 1 – P(D).
- P(R+|C) = false positive rate
which is 0.09090.
The formula we have arrived at above, by simple application of common sense is known as Bayes’ theorem. Many people assume the answer to be more like 0.999, but the correct answer is an order of magnitude smaller. As mentioned, most medical doctors also get questions like this wrong by about an order of magnitude. The correct answer to the question, 0.0909, is called in medical science the positive-predictive value of the test. Generally, it is known as the posterior probability.
Bayes’ theorem has been a controversial idea during the development of statistical reasoning, with many authorities dismissing it as an absurdity. This has led to the consequence that orthodox statistics, still today, does not employ this vitally important technique. Here, we have developed a special case of Bayes’ theorem by simple reasoning. In generality, it follows as a straightforward re-arrangement of probabilistic laws (the product and sum rules) that are so simple that most authors treat them as axioms, but which in fact can be rigorously derived (with a little effort) from extremely simple and perfectly reasonable principles. It is overwhelmingly one of my central beliefs about science that a logical calculus of probability can only be achieved, and the highest quality inferences extracted from data when Bayes’ theorem is accepted and applied whenever appropriate.
The general statement of Bayes’ theorem is
Here 'I' represents the background information: a set of statements concerning the scope of the problem that are considered true for the purposes of the calculation. In working through the medical testing problem, above, I have omitted the 'I', but in every case where I right down a probability without including the 'I', this is to be recognized as short hand - the 'I' is always really there and the calculation makes no sense without it.
The error that leads many people to over estimate, by an order of magnitude, probabilities such as the one required in this question is known as the base-rate fallacy. Specifically in this case, the base rate, or expected incidence, of the disease has been ignored, leading to a calamitous miscalculation. The base-rate fallacy amounts to believing that P(A|B) = P(B|A). In the above calculation this corresponds to saying that P(D|R+), which was desired, is the same as P(R+|D), the latter being equal to 1 – false positive rate.
In frequentist statistics, a probability is identified with a frequency. In this framework, therefore, it makes no sense to ask what is the probability that a hypothesis H is true, since there is no sense in which a relative frequency for the truth of H can be obtained. As a measure of faith in the proposition H in light of data, D, therefore, the frequentist habitually uses not P(H|D), but P(D|H), and so he commits himself to committing the base-rate fallacy.
In case it is still not completely clear that the base rate fallacy is indeed a fallacy, lets employ a thought experiment with an extreme case. (These extreme cases, while not necessarily realistic, allow the desired outcome of a theory to be obtained directly and compared with the result of the theory - something computer scientists call a 'sanity check'.) Imagine the case where the base rate is higher than the sensitivity of the test. For example let the sensitivity be 98% (ie 2% false positive rate) and let the background prevalence of the disease be 99%. Then, P(B|A) is 0.98, and substituting this for P(A|B), we have an answer that is lower than P(A) = 0.99. The positive result of a high-quality test (98% sensitivity) giving lower probability that the test subject has the disease than before the test result was known.
 Eddy, D. M. (1982). Probabilistic reasoning in clinical medicine: Problems and opportunities. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 249–267). Cambridge, England: Cambridge University Press. (In this study 95 out of 100 physicians answered between 0.7 and 0.8 to a similar question, to which the correct answer was 0.078.)