Probability · June 10, 2026

The Pen That Cried Wolf

Intermediate Probability
Time: 00:00

A counterfeit-detector pen is 99% accurate at catching fakes and almost never misfires. It just flagged a customer's $100 bill. So why is the bill still almost certainly real?

It is a slow Tuesday at Bridgeway Hardware, and Maya is working the register. A customer hands her a crisp $100 bill for a cordless drill. Following store policy, she swipes it with the counterfeit-detection pen — the kind that leaves a pale amber mark on genuine currency and a dark stain on fakes.

The mark comes up dark.

The customer goes pale. Maya hesitates. The pen, according to its manufacturer, is genuinely good at its job:

  • When a bill really is counterfeit, the pen flags it 99% of the time.
  • When a bill is perfectly legitimate, the pen mistakenly stains it only 2% of the time.

Those sound like the numbers of a trustworthy device. But here is the piece everyone forgets: genuine counterfeits are rare. In the kind of cash a suburban hardware store sees, only about 1 bill in every 10,000 is actually fake.

Maya looks at the dark stain and the nervous customer and wonders the only question that matters:

Given that the pen reacted, what is the probability that this $100 bill is actually counterfeit?

Take a guess before you reach for a calculator. Most people — including most cashiers, and a fair number of statisticians caught off guard — guess somewhere near 99%. The truth is startlingly far from that.

Interactive Supplement
The Cashier's Dilemma — Interactive Explorer

Explore this puzzle visually with an interactive diagram — drag sliders, watch the geometry update in real time, and build intuition before you solve.

Open interactive →
💡 Hint

The pen's 99% accuracy answers the question "if the bill is fake, will the pen react?"

But that is the reverse of what Maya needs.

Try imagining a concrete batch of 1,000,000 bills and count two separate groups: the fakes the pen correctly flags, and the genuine bills it flags by mistake. Because genuine bills outnumber fakes about 10,000 to 1, even a tiny 2% error rate produces a surprising number of false alarms. Compare the two counts.


Solution

The intuition that the bill is "99% likely fake" confuses two different questions:

  • P(pen reacts | bill is fake) = 99% — this is what the manufacturer advertises.
  • P(bill is fake | pen reacts) = ? — this is what Maya actually needs.

These are not the same number, and conflating them is the famous base-rate fallacy. To find the second, the cleanest path is to imagine a large, concrete population.

Step 1 — Build a population of 1,000,000 bills

Since fakes occur at a rate of 1 in 10,000:

  • Counterfeit bills: 1,000,000 ÷ 10,000 = 100
  • Genuine bills: 1,000,000 − 100 = 999,900

Step 2 — Count who makes the pen react

  • Of the 100 true fakes, the pen catches 99%: 100 × 0.99 = 99 reactions (true positives).
  • Of the 999,900 genuine bills, the pen wrongly stains 2%: 999,900 × 0.02 = 19,998 reactions (false positives).

Total bills that make the pen react: 99 + 19,998 = 20,097.

Step 3 — Keep only the bills that reacted

The pen reacted, so Maya's bill is one of those 20,097. The fraction of those that are truly fake is:

P(fake | reacted) = 99 ⁄ 20,097 ≈ 0.0049 = about 0.5%

The same result drops straight out of Bayes' theorem:

P(F | R) = [ P(R | F) · P(F) ] ⁄ [ P(R | F) · P(F) + P(R | not F) · P(not F) ]
= (0.99 × 0.0001) ⁄ (0.99 × 0.0001 + 0.02 × 0.9999)
≈ 0.000099 ⁄ 0.020097 ≈ 0.49%

What this means

Despite the alarming dark stain, the bill is roughly 99.5% likely to be genuine. Out of every 203 bills that trigger this pen, only about one is a real counterfeit. The flood of false positives from the enormous pool of legitimate bills completely drowns out the handful of true catches.

This is not a flaw in the math — it is exactly why a single pen test is treated as a prompt to look closer, never a verdict. It is the very same logic that governs medical screening (a rare disease plus an imperfect test yields mostly false alarms), credit-card fraud flags, and email spam filters. The rarer the thing you are hunting for, the more your base rate dominates, no matter how sharp your detector.

So Maya is right to take a second, careful look — checking the watermark, the security thread, the feel of the paper — rather than accusing her customer on the strength of one amber pen.

The one-line takeaway: a test's accuracy tells you how it behaves given the truth; it does not tell you the truth given the test. To flip the conditional, you must know how rare the thing you are testing for really is.

Further Reading
- Tversky, A. & Kahneman, D. (1982). "Evidential impact of base rates." In Judgment Under Uncertainty: Heuristics and Biases (Kahneman, Slovic & Tversky, eds.), Cambridge University Press. The foundational treatment of why people ignore base rates.
- Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux. Chapters on representativeness and base-rate neglect give the most accessible book-length account of exactly this fallacy.
-Gigerenzer, G. (2002). Calculated Risks: How to Know When Numbers Deceive You (published in the UK as Reckoning with Risk). Simon & Schuster. Argues that natural-frequency framing ("100 out of 1,000,000") makes problems like this one far easier to reason about than probabilities.
- Bar-Hillel, M. (1980). "The base-rate fallacy in probability judgments." Acta Psychologica, 44(3), 211-233. The classic experimental paper documenting the effect.
-Eddy, D. M. (1982). "Probabilistic reasoning in clinical medicine." In Judgment Under Uncertainty (op. cit.). The famous mammography example that mirrors this puzzle's structure almost exactly.
-"Bayes' theorem" and "Base rate fallacy" entries on Wikipedia (https://en.wikipedia.org/wiki/Base_rate_fallacy) - solid starting points with worked numerical examples.
- 3Blue1Brown, "Bayes theorem, the geometry of changing beliefs" (https://www.youtube.com/watch?v=HZGCoVF3YvM) - an excellent visual derivation for early-college learners.
- Dri-Mark Products (https://www.drimark.com) - a long-standing manufacturer of the iodine-based counterfeit-detector pens described in this puzzle; their product pages and FAQs candidly discuss the pens' limitations (they detect starch-based paper, not all forgeries), which is itself a real-world illustration of imperfect sensitivity and specificity.
-U.S. Currency Education Program (https://www.uscurrency.gov) - the U.S. Treasury/Federal Reserve resource explaining the genuine security features (watermark, security thread, color-shifting ink) that a careful second check relies on.
Interactive diagram → Browse more puzzles