# On the Base Rate Fallacy, Redux

Tagged:`COVID`

/
`MathInTheNews`

/
`PharmaAndBiotech`

/
`Sadness`

/
`SomebodyAskedMe`

/
`Statistics`

Will we ever *not* be trapped by the base rate fallacy?

## The base rate fallacy

There’s this fallacy, a common bug in the way people think, called the base rate
fallacy. ^{[1]}

Consider Wikipedia’s illustrative example:

- If you just look in the black circle on the left, you would conclude more of the hospitalized people were vaccinated than unvaccinated. Naïve observers conclude, incorrectly, that vaccines don’t work.
- But if you consider the
*base rates*, i.e., the fraction of people vaccinated versus unvaccinated as shown on the right, you see that many more people are vaccinated in the first place. In fact, a far*smaller*fraction of the vaccinated end up hospitalized than the unvaccinated. A less naïve observer concludes, correctly, that vaccines do, indeed, work.

The general case is that when you measure some differential property (hospitalization),
it’s important to consider the *base rate*, or how often the classes of examples occur in
the population (more vaccinated than unvaccinated).

Alas, to me that’s mostly word salad. (As with many things.) Fortunately, there’s a Bayesian way to look at this:

- Let $H$ = hospitalization, $V$ = vaccination, and $\sim V$ = no vaccination.
- Naïve observers look
*only in the hospital, not in the population,*and sees the probability of being vaccinated or not given they’re in the hospital.- That is, they see $\Pr(V \vert H) \gt \Pr(\sim V \vert H)$ and conclude vaccination does not work.

- Less naïve observers note that they want to see the Bayesian reverse. That’s the
probability of being hospitalized, given vaccination status:

\(\left\{ \begin{align*} \Pr(H \vert V) &= \frac{\Pr(V \vert H) \Pr(H) }{ \Pr(V) } \\ \Pr(H \vert \sim V) &= \frac{\Pr(\sim V \vert H) \Pr(H) }{ \Pr(\sim V) } \end{align*} \right.\)- In these 2 equations, $\Pr(H)$ is the same in both cases, because the number of people hospitalized is just whatever it is, independent of vaccination.
- The denominators, $\Pr(V)$ and $\Pr(\sim V)$ are the
*base rates:*the probability that an individual chosen at random from the population is vaccinated or not.

It’s that normalization by the base rate that makes all the difference! It transforms the thing you can observe (what percent of hospitalized people are vaccinated) into the thing you want to know (what percent of vaccinated people are hospitalized). The former is observable, but nonsensical input to policy. The latter is the only thing that matters.

(See the Addendum below for the particulars of the Wikipedia example pictured above.)

## Previously, on some crummy little blog that nobody reads…

Now, most NTs are gonna see those equations and say “oh, just another nerd thing I can skip”… again. I mean, it was just some Wikipedia example, right?

Well… previously on this CLBTNR, we documented
real-life examples of this, in terms of Simpson’s paradox ^{[3]},
the base rate fallacy, and Bayesian thinking in COVID-19 hospitalization data in
mid-2021.

There we worked through the real-life example presented by the Israeli hospitalization data. In a
population that’s about 20% unvaccinated versus about 80% vaccinated, it would be
*astounding* if most of the hospitalized people weren’t vaccinated. That’s because there
are 4x as many vaccinated as unvaccinated. It turned out that the vaccinated were a bit
*more than 3x less likely to be hospitalized* than the unvaccinated, which is what
mattered!

So if you want to see actual combat usage of these ideas, the blog post linked above will walk you through the process using Israel hospitalization data as of mid-2021.

If you want to conclude something about vaccine efficacy, you *have* to do the Bayesian
calculation above. (And
stratify by age groups,
with
confidence intervals,
as we also showed in that same blog post.)

## SMBC really gets it

The always-excellent web cartoon *Saturday Morning Breakfast Cereal* (SMBC) by Zach
Weinersmith is on the case. He illustrates the base rate fallacy via the example of *base
jumping* ^{[2]}, where it allows you to assert base jumping is
perfectly safe because more people die of old age than base jumping!

Of course this is nonsense: almost all the people dying of old age *were not base
jumpers*, and some large-ish fraction of those who were base jumpers *died of base
jumping, not old age!*

If you ignore the base rates, you conclude incorrectly that base jumping is perfectly safe, in fact safer than living to a ripe old age.

There is a technical term for this sort of thing: “fatal nonsense”.

You probably want to avoid it.

## The Weekend Conclusion

Look, even the *cartoonists* get it nowadays. (Albeit an excellent cartoonist.)

Isn’t it time we all “got it”, too?

(If you find all this confusing, Gary Cornell wrote an explainer for general audiences,
about a year and a half ago, published in *Slate.* ^{[4]}
Recommended.)

## Addendum 2023-Feb-16: Wikipedia example, math details

Somebody asked me to work through the details of the Wikipedia example pictured above. Ok, sure, let’s do that.

First step is just to count the dots:

- The number of unvaccinated people (green dots) is $N_{\sim V} = 10$.
- The number of unvaccinated people who are also hospitalized (green dots inside black circle) is $N_{\sim VH} = 5$.
- The number of vaccinated people (red dots) is $N_{V} = 100$.
- The number of vaccinated people who are also hospitalized (red dots inside black circle) is $N_{VH} = 10$.

The conditional probabilities about which our putative naïve observer is making such
a fuss are:

\(\left\{
\begin{align*}
\Pr(V \vert H) &= \frac{10}{5 + 10} = \frac{2}{3} \sim 66.7\% \\
\Pr(\sim V \vert H) &= \frac{5}{5 + 10} = \frac{1}{3} \sim 33.3\%
\end{align*}
\right.\)

It looks (to the naïve) as though vaccinated are twice as likely to be hospitalized?! Let’s do better than that blunder!

The overall probabilities of being hospitalized and the probability of being vaccinated
are:

\(\left\{
\begin{align*}
\Pr(H) &= \frac{5 + 10}{10 + 100} = \frac{15}{110} = \frac{3}{22} \sim 13.6\% \\
\Pr(V) &= \frac{100}{10 + 100} = \frac{100}{110} = \frac{10}{11} \sim 90.9\%
\end{align*}
\right.\)

Now let’s work out the conditional probability of being hospitalized given vaccinated, and
hospitalized given unvaccinated:

\(\left\{
\begin{align*}
\Pr(H \vert V) &= \frac{\Pr(V \vert H) \Pr(H)}{\Pr(V)} = \frac{(2/3) (3/22)}{(10/11)} = \frac{1}{10} = 10\% \\
\Pr(H \vert \sim V) &= \frac{\Pr(\sim V \vert H ) \Pr(H)}{\Pr(\sim V)} = \frac{(1/3) (3/22)}{(1/11)} = \frac{1}{2} = 50\%
\end{align*}
\right.\)

We can verify this directly, using our counts of dots:

\(\left\{
\begin{align*}
\Pr(H \vert V) &= \frac{N_{VH}}{N_V} = \frac{10}{100} = 10\% \\
\Pr(H \vert \sim V) &= \frac{N_{\sim VH}}{N_{\sim V}} = \frac{5}{10} = 50\%
\end{align*}
\right.\)

The right of it, finally: *vaccinated are 5x less likely to be hospitalized (10% vs 50%)!*

**Moral:** Leaping to public policy choices from ignorantly measuring the wrong thing is a
very bad idea. Here, it would have led to *exactly the opposite* policy for saving
lives.

## Notes & References

1: Wikipedia contributors, “Base rate fallacy”, *Wikipedia*, retrieved 2023-Feb-14.↩

2: Z Weinersmith, “Weekend activity: Murdering people with statistics”, *Saturday Morning Breakfast Cereal*, 2022-Dec-24. ↩

3: Wikipedia contributors, “Simpson’s paradox”, *Wikipedia*, retrieved 2023-Feb-14.↩

4: G Cornell, “What Does It Really Mean When a Headline Says ‘75 Percent of Cases Occurred in Vaccinated People’?”, *Slate*, 2021-Aug-04. ↩

*Published*Tue 2023-Feb-14

## Gestae Commentaria

Comments for this post are closed, but the Email/Twitter/Mastodon links at page-top always work.