A guide to the mathematics that governs wealth, cities, earthquakes, and everything else that doesn't follow the bell curve
Imagine measuring the height of 1,000 random people. Most cluster around 170 cm, with the tallest perhaps 20% above average. Now measure their wealth. The richest person isn't 20% wealthier — they might be 10,000× wealthier.
Same number of people. Completely different mathematics. The problem: your brain was built for the first world but increasingly lives in the second.
The difference comes down to two probability distributions:
That negative exponent in the power law changes everything. The exponential vs polynomial distinction in tail decay is the source of every counterintuitive result that follows.
The average is meaningful. Sample enough people and their average height converges to the true population mean.
The theoretical mean is infinite. Your sample average drifts upward forever. There is no "typical" value to converge to.
The green line settles — there's a typical height. The red line wanders — each new billionaire jerks the average upward. In power law distributions, the mean is either infinite or dominated by outliers.
Regression to the mean is built into the normal distribution. The child of very tall parents is probably tall, but likely shorter than them. A stock that crashed 50% will probably bounce back somewhat.
Power laws don't do this. If a city is huge, it tends to get huger (agglomeration). If a website is popular, it becomes more popular (network effects). If you're rich, you get richer (capital compounds).
Extreme values are pulled back toward center by pure statistics. Regression to the mean is guaranteed.
Rich get richer. Big cities grow bigger. Popular things become more popular. Extremes amplify.
For normal distributions, extreme values are exponentially unlikely. For power laws, they're only polynomially unlikely. That difference is everything.
| Event | Normal | Power Law | Difference |
|---|---|---|---|
| 3× typical | 1 in 741 | 1 in 15 | ~50× more likely |
| 5× typical | 1 in 3.5 million | 1 in 56 | ~63,000× more likely |
| 7× typical | 1 in 390 billion | 1 in 130 | ~3 billion× more likely |
In Gaussian world, a "seven sigma" event is essentially impossible — it should happen once in the age of the universe. In power law world, you'll see one next month. This is why financial models that assume normality catastrophically underestimate crash risk.
For a power law with exponent α, the mathematical moments only exist under certain conditions:
| Moment | Exists when... | Implication |
|---|---|---|
| Mean | α > 2 | Below this, averages are meaningless |
| Variance | α > 3 | Below this, spread is undefined |
| Skewness | α > 4 | Below this, asymmetry is undefined |
| Most real data | α ≈ 1.5–2.5 | Variance usually doesn't exist |
When variance is infinite, your estimate of variability from any finite sample is meaningless. Sample more, and your variance estimate grows without bound.
In Gaussian world, more data means better predictions. A century of observations gives high confidence about what's possible.
In power law world, the probability of seeing an event 10× larger than anything observed depends only on α:
You could have 100 years of earthquake data. The probability of something 10× larger than any recorded event is still 10% if α ≈ 2.
In a room of 50 people: The tallest person contributes ~2% of total height — everyone matters roughly equally. The richest person might contribute 50–90% of total wealth. One observation dominates everything.
Normal vs Pareto is the most dramatic contrast, but it's just two points on a spectrum. The full taxonomy matters because different tail behaviors require different tools.
The lognormal is particularly insidious. It emerges when you multiply many positive random variables (compare to normal, which emerges from sums):
The lognormal looks like a power law over much of its range. Many claimed "power laws" in the literature are actually lognormals. The distinction matters:
| Property | Lognormal | Power Law |
|---|---|---|
| All moments | Finite (but can be huge) | Often infinite |
| Log-log plot | Curves down eventually | Straight line |
| Convergence | Happens... eventually | May never happen |
| Examples | Income (middle range), file sizes | Wealth extremes, earthquakes |
The deepest insight is to ask: what process generates this data?
| Process | Result | Example |
|---|---|---|
| Sum of many small effects | Normal | Height, measurement error |
| Product of many effects | Lognormal | Income growth, stock returns |
| Multiplicative growth + feedback | Power law | Wealth, city sizes, web traffic |
| Waiting in memoryless process | Exponential | Radioactive decay, service times |
| Bounded process | Beta / truncated | Percentages, test scores |
Here's something textbooks won't tell you: most real-world phenomena are mixtures. The body follows one process. The tail follows another. Somewhere in between, the rules change.
Income is the canonical example. Most incomes are lognormally distributed — multiplicative career growth, raises, investments. But above some threshold (~$500k), a different mechanism kicks in: returns to capital, winner-take-all dynamics, inheritance. The tail becomes Pareto.
The danger: If you fit a lognormal and extrapolate, you underestimate tail risk. If you fit a Pareto and interpolate, you misunderstand typical behavior. The distribution that governs the middle 99% is not the distribution that governs the extreme 1%.
| Domain | Body Behavior | Tail Behavior |
|---|---|---|
| Income | Lognormal (career growth) | Pareto (capital returns) |
| Stock returns | ~Normal (typical days) | Fat-tailed (crashes) |
| Waiting times | Exponential (normal service) | Heavy-tailed (system failures) |
| Insurance claims | Lognormal (routine claims) | Pareto (catastrophes) |
The practical upshot: Don't ask "is this normal or Pareto?" Ask "where does the crossover happen?" Model the body with one tool, the tail with another, and never extrapolate across the boundary.
The Central Limit Theorem is perhaps the most powerful result in statistics — and the most misunderstood. It says sums of random variables tend toward normal distributions. But there's a hidden assumption that changes everything.
When variance is infinite — as in power laws with α < 3 — the CLT doesn't apply. Sums don't converge to normal distributions no matter how many terms you add.
Watch the blue: It becomes more Gaussian-shaped as n grows. The CLT is working. Watch the red: It stays stubbornly skewed. No matter how many samples you add, the sum never "calms down" into a bell curve.
When variance is infinite but the mean exists (1 < α < 2), sums converge to a Lévy stable distribution instead of a Gaussian. The Gaussian is actually a special case — the Lévy stable with α = 2.
For α < 2, you get distributions with progressively heavier tails, culminating in the Cauchy (α = 1), which has neither finite mean nor variance.
Correlations become meaningless. Pearson correlation assumes finite variance. For power laws with α < 3, sample correlations are unstable and may not converge. Use rank correlations (Spearman, Kendall) instead.
Independence doesn't save you. In Gaussian world, independent risks diversify away. In power law world, the sum of n independent Pareto variables is still Pareto. Diversification reduces body risk but not tail risk.
Sample size requirements explode. For a normal distribution, ~30 samples gives decent estimates. For power laws:
| α | Samples for stable mean |
|---|---|
| 3.0 | ~100 |
| 2.5 | ~1,000 |
| 2.0 | ~10,000 |
| 1.5 | Never converges |
2008 Financial Crisis: Risk models using Gaussian assumptions called correlated housing declines a "25-sigma event." Under fat-tailed distributions? Maybe 2-3 sigma — painful but not inconceivable.
Fukushima 2011: The plant was built to withstand historical maximum earthquake levels. Then an earthquake exceeded anything in recorded history. In power law world, this wasn't surprising.
COVID-19: Seemed like a "black swan." But pandemic severity follows a power law. Given enough time, a pandemic of that magnitude was statistically inevitable.
Normal distributions emerge from sums. When you add many small, independent, finite-variance effects, the Central Limit Theorem produces a bell curve. Height is the sum of thousands of genetic and environmental factors.
Power laws emerge from products and feedback. When success breeds success, when rich get richer, when effects multiply rather than add — you get power laws. Variances explode, and Gaussian scaffolding collapses.
The question to ask about any phenomenon: Is this thing made from sums or products?