The Pen That Cried Wolf

It is a slow Tuesday at Bridgeway Hardware, and Maya is working the register. A customer hands her a crisp $100 bill for a cordless drill. Following store policy, she swipes it with the counterfeit-detection pen — the kind that leaves a pale amber mark on genuine currency and a dark stain on fakes.

The mark comes up dark.

The customer goes pale. Maya hesitates. The pen, according to its manufacturer, is genuinely good at its job:

When a bill really is counterfeit, the pen flags it 99% of the time.
When a bill is perfectly legitimate, the pen mistakenly stains it only 2% of the time.

Those sound like the numbers of a trustworthy device. But here is the piece everyone forgets: genuine counterfeits are rare. In the kind of cash a suburban hardware store sees, only about 1 bill in every 10,000 is actually fake.

Maya looks at the dark stain and the nervous customer and wonders the only question that matters:

Given that the pen reacted, what is the probability that this $100 bill is actually counterfeit?

Take a guess before you reach for a calculator. Most people — including most cashiers, and a fair number of statisticians caught off guard — guess somewhere near 99%. The truth is startlingly far from that.

Challenge a friend: Email Post

Interactive Supplement

The Cashier's Dilemma — Interactive Explorer

Explore this puzzle visually with an interactive diagram — drag sliders, watch the geometry update in real time, and build intuition before you solve.

Open interactive →

💡 Hint

The pen's 99% accuracy answers the question "if the bill is fake, will the pen react?"

But that is the reverse of what Maya needs.

Try imagining a concrete batch of 1,000,000 bills and count two separate groups: the fakes the pen correctly flags, and the genuine bills it flags by mistake. Because genuine bills outnumber fakes about 10,000 to 1, even a tiny 2% error rate produces a surprising number of false alarms. Compare the two counts.

Solution

The intuition that the bill is "99% likely fake" confuses two different questions:

P(pen reacts | bill is fake) = 99% — this is what the manufacturer advertises.
P(bill is fake | pen reacts) = ? — this is what Maya actually needs.

These are not the same number, and conflating them is the famous base-rate fallacy. To find the second, the cleanest path is to imagine a large, concrete population.

Step 1 — Build a population of 1,000,000 bills

Since fakes occur at a rate of 1 in 10,000:

Counterfeit bills: 1,000,000 ÷ 10,000 = 100
Genuine bills: 1,000,000 − 100 = 999,900

Step 2 — Count who makes the pen react

Of the 100 true fakes, the pen catches 99%: 100 × 0.99 = 99 reactions (true positives).
Of the 999,900 genuine bills, the pen wrongly stains 2%: 999,900 × 0.02 = 19,998 reactions (false positives).

Total bills that make the pen react: 99 + 19,998 = 20,097.

Step 3 — Keep only the bills that reacted

The pen reacted, so Maya's bill is one of those 20,097. The fraction of those that are truly fake is:

P(fake | reacted) = 99 ⁄ 20,097 ≈ 0.0049 = about 0.5%

The same result drops straight out of Bayes' theorem:

P(F | R) = [ P(R | F) · P(F) ] ⁄ [ P(R | F) · P(F) + P(R | not F) · P(not F) ]
= (0.99 × 0.0001) ⁄ (0.99 × 0.0001 + 0.02 × 0.9999)
≈ 0.000099 ⁄ 0.020097 ≈ 0.49%

What this means

Despite the alarming dark stain, the bill is roughly 99.5% likely to be genuine. Out of every 203 bills that trigger this pen, only about one is a real counterfeit. The flood of false positives from the enormous pool of legitimate bills completely drowns out the handful of true catches.

This is not a flaw in the math — it is exactly why a single pen test is treated as a prompt to look closer, never a verdict. It is the very same logic that governs medical screening (a rare disease plus an imperfect test yields mostly false alarms), credit-card fraud flags, and email spam filters. The rarer the thing you are hunting for, the more your base rate dominates, no matter how sharp your detector.

So Maya is right to take a second, careful look — checking the watermark, the security thread, the feel of the paper — rather than accusing her customer on the strength of one amber pen.

The one-line takeaway: a test's accuracy tells you how it behaves given the truth; it does not tell you the truth given the test. To flip the conditional, you must know how rare the thing you are testing for really is.

Enjoyed this one?

The next puzzle lands in two weeks.

One email, one puzzle, no noise — with a hint ladder and a full worked solution.

The Pen That Cried Wolf

Too Even to Be Random

The Cards That Count for Nothing... Or Do They?

Ride-sharing: What the Wait Time Tells You