Simpson's Paradox

When aggregated data tells the opposite story

A treatment can help every group individually, yet appear to hurt when you combine the groups. This isn't a statistical trickβ€”it's a fundamental feature of how aggregation can reverse conclusions.

Adjust the parameters and watch the paradox emerge. The treatment helps both groups, but hurts overall.


Scenario: Medical Treatment Trial

Two hospitals test a new treatment. Hospital A treats mostly mild cases. Hospital B treats mostly severe cases. The treatment helps patients at both hospitals. But when you combine the data...

Hospital A (Mild Cases)

20
100
85%
80%

Hospital B (Severe Cases)

100
20
50%
40%

Hospital A (Mild Cases)

Treated 85%
17/20
Control 80%
80/100
βœ“ Treatment helps (+5%)

Hospital B (Severe Cases)

Treated 50%
50/100
Control 40%
8/20
βœ“ Treatment helps (+10%)

🀯 The Paradox


Why this happens

The key is unequal group sizes combined with different base rates.

Hospital A (mild cases, high recovery) mostly uses the control. Hospital B (severe cases, low recovery) mostly uses the treatment. When you combine them, the treatment group is dominated by severe cases, making it look worse overall.

The treatment genuinely helps at both hospitals! But the aggregated data hides this because it confounds treatment effect with case severity.


Real-world examples

πŸŽ“

UC Berkeley Admissions (1973)

Overall, men were admitted at higher rates than women. But in nearly every department, women had equal or higher admission rates. Women applied more to competitive departments.

⚾

Baseball Batting Averages

Player A can have a higher batting average than Player B in both halves of the season, yet a lower average for the full season. It depends on how many at-bats they had in each half.

πŸ’Š

Kidney Stone Treatment

Treatment A had better success for small stones AND large stones, but Treatment B looked better overallβ€”because A was used more on large (harder) stones.