Power Laws & Fat Tails

A guide to the mathematics that governs wealth, cities, earthquakes, and everything else that doesn't follow the bell curve

Two Different Worlds

Imagine measuring the height of 1,000 random people. Most cluster around 170 cm, with the tallest perhaps 20% above average. Now measure their wealth. The richest person isn't 20% wealthier — they might be 10,000× wealthier.

Same number of people. Completely different mathematics. The problem: your brain was built for the first world but increasingly lives in the second.

The difference comes down to two probability distributions:

Normal: P(x) ∝ e^(-(x-μ)²/2σ²)

Exponential decay in the tails

Power Law: P(x) ∝ x^(-α)

Polynomial decay — much slower

That negative exponent in the power law changes everything. The exponential vs polynomial distinction in tail decay is the source of every counterintuitive result that follows.

Five Broken Intuitions

1. The Mean is a Mirage

Gaussian

The average is meaningful. Sample enough people and their average height converges to the true population mean.

x̄ → μ as n → ∞

Power Law (α ≤ 2)

The theoretical mean is infinite. Your sample average drifts upward forever. There is no "typical" value to converge to.

E[X] = ∞

Running mean vs sample size

Normal converges. Power law wanders forever.

Normal (heights)

Power law α=1.5 (wealth)

The green line settles — there's a typical height. The red line wanders — each new billionaire jerks the average upward. In power law distributions, the mean is either infinite or dominated by outliers.

2. Extremes Don't Regress

Regression to the mean is built into the normal distribution. The child of very tall parents is probably tall, but likely shorter than them. A stock that crashed 50% will probably bounce back somewhat.

Power laws don't do this. If a city is huge, it tends to get huger (agglomeration). If a website is popular, it becomes more popular (network effects). If you're rich, you get richer (capital compounds).

Gaussian

Extreme values are pulled back toward center by pure statistics. Regression to the mean is guaranteed.

E[X₂ | X₁ = extreme] → μ

Power Law

Rich get richer. Big cities grow bigger. Popular things become more popular. Extremes amplify.

Preferential attachment wins

3. Rare Events Aren't Rare

For normal distributions, extreme values are exponentially unlikely. For power laws, they're only polynomially unlikely. That difference is everything.

Tail probability comparison

P(X > x) on log scale — the gap shows how much more likely extremes are in power law distributions

Normal (exponential decay)

Power law α=2.5 (polynomial decay)

Event	Normal	Power Law	Difference
3× typical	1 in 741	1 in 15	~50× more likely
5× typical	1 in 3.5 million	1 in 56	~63,000× more likely
7× typical	1 in 390 billion	1 in 130	~3 billion× more likely

In Gaussian world, a "seven sigma" event is essentially impossible — it should happen once in the age of the universe. In power law world, you'll see one next month. This is why financial models that assume normality catastrophically underestimate crash risk.

4. Variance Doesn't Exist

For a power law with exponent α, the mathematical moments only exist under certain conditions:

Moment	Exists when...	Implication
Mean	α > 2	Below this, averages are meaningless
Variance	α > 3	Below this, spread is undefined
Skewness	α > 4	Below this, asymmetry is undefined
Most real data	α ≈ 1.5–2.5	Variance usually doesn't exist

When variance is infinite, your estimate of variability from any finite sample is meaningless. Sample more, and your variance estimate grows without bound.

5. History Doesn't Protect You

In Gaussian world, more data means better predictions. A century of observations gives high confidence about what's possible.

In power law world, the probability of seeing an event 10× larger than anything observed depends only on α:

P(X_next > 10 × X_max) = 10^(-(α-1))

For α = 2, this is 10% — regardless of how much data you have

You could have 100 years of earthquake data. The probability of something 10× larger than any recorded event is still 10% if α ≈ 2.

What one outlier does to a sample

Contribution of single largest value to total sample sum (100 samples of n=50)

Normal (heights)

Power law (wealth)

In a room of 50 people: The tallest person contributes ~2% of total height — everyone matters roughly equally. The richest person might contribute 50–90% of total wealth. One observation dominates everything.

The Taxonomy of Tails

Normal vs Pareto is the most dramatic contrast, but it's just two points on a spectrum. The full taxonomy matters because different tail behaviors require different tools.

Light tails Heavy tails

Normal

All moments finite

Exponential

Memoryless

Lognormal

Moments huge but finite

Pareto

Moments may not exist

Survival functions on log-log scale

A power law appears as a straight line. Everything else curves down eventually.

Normal

Exponential

Lognormal

Pareto

The Lognormal Trap

The lognormal is particularly insidious. It emerges when you multiply many positive random variables (compare to normal, which emerges from sums):

X = e^(μ + σZ) where Z ~ Normal(0,1)

The lognormal looks like a power law over much of its range. Many claimed "power laws" in the literature are actually lognormals. The distinction matters:

Property	Lognormal	Power Law
All moments	Finite (but can be huge)	Often infinite
Log-log plot	Curves down eventually	Straight line
Convergence	Happens... eventually	May never happen
Examples	Income (middle range), file sizes	Wealth extremes, earthquakes

Generative Processes

The deepest insight is to ask: what process generates this data?

Process	Result	Example
Sum of many small effects	Normal	Height, measurement error
Product of many effects	Lognormal	Income growth, stock returns
Multiplicative growth + feedback	Power law	Wealth, city sizes, web traffic
Waiting in memoryless process	Exponential	Radioactive decay, service times
Bounded process	Beta / truncated	Percentages, test scores

The Stitched Distribution

Here's something textbooks won't tell you: most real-world phenomena are mixtures. The body follows one process. The tail follows another. Somewhere in between, the rules change.

Income is the canonical example. Most incomes are lognormally distributed — multiplicative career growth, raises, investments. But above some threshold (~$500k), a different mechanism kicks in: returns to capital, winner-take-all dynamics, inheritance. The tail becomes Pareto.

The crossover phenomenon

Many real distributions are lognormal in the body, Pareto in the tail

Lognormal body

Pareto tail

Actual distribution

The danger: If you fit a lognormal and extrapolate, you underestimate tail risk. If you fit a Pareto and interpolate, you misunderstand typical behavior. The distribution that governs the middle 99% is not the distribution that governs the extreme 1%.

Examples of Regime Changes

Domain	Body Behavior	Tail Behavior
Income	Lognormal (career growth)	Pareto (capital returns)
Stock returns	~Normal (typical days)	Fat-tailed (crashes)
Waiting times	Exponential (normal service)	Heavy-tailed (system failures)
Insurance claims	Lognormal (routine claims)	Pareto (catastrophes)

The practical upshot: Don't ask "is this normal or Pareto?" Ask "where does the crossover happen?" Model the body with one tool, the tail with another, and never extrapolate across the boundary.

When the Central Limit Theorem Fails

The Central Limit Theorem is perhaps the most powerful result in statistics — and the most misunderstood. It says sums of random variables tend toward normal distributions. But there's a hidden assumption that changes everything.

S_n = X₁ + X₂ + ... + X_n → Normal as n → ∞

If the X_i have finite variance

When variance is infinite — as in power laws with α < 3 — the CLT doesn't apply. Sums don't converge to normal distributions no matter how many terms you add.

Distribution of sums as n increases

Finite-variance sums become Gaussian. Infinite-variance sums stay heavy-tailed.

Exponential (finite variance)

Pareto α=1.5 (infinite variance)

n = 10

n = 100

n = 1000

Watch the blue: It becomes more Gaussian-shaped as n grows. The CLT is working. Watch the red: It stays stubbornly skewed. No matter how many samples you add, the sum never "calms down" into a bell curve.

The Generalized CLT

When variance is infinite but the mean exists (1 < α < 2), sums converge to a Lévy stable distribution instead of a Gaussian. The Gaussian is actually a special case — the Lévy stable with α = 2.

S_n → Lévy(α) when Var(X) = ∞ but E[X] < ∞

For α < 2, you get distributions with progressively heavier tails, culminating in the Cauchy (α = 1), which has neither finite mean nor variance.

Practical Implications

What Else Breaks Down

Correlations become meaningless. Pearson correlation assumes finite variance. For power laws with α < 3, sample correlations are unstable and may not converge. Use rank correlations (Spearman, Kendall) instead.

Independence doesn't save you. In Gaussian world, independent risks diversify away. In power law world, the sum of n independent Pareto variables is still Pareto. Diversification reduces body risk but not tail risk.

Sample size requirements explode. For a normal distribution, ~30 samples gives decent estimates. For power laws:

α	Samples for stable mean
3.0	~100
2.5	~1,000
2.0	~10,000
1.5	Never converges

Real-World Failures

2008 Financial Crisis: Risk models using Gaussian assumptions called correlated housing declines a "25-sigma event." Under fat-tailed distributions? Maybe 2-3 sigma — painful but not inconceivable.

Fukushima 2011: The plant was built to withstand historical maximum earthquake levels. Then an earthquake exceeded anything in recorded history. In power law world, this wasn't surprising.

COVID-19: Seemed like a "black swan." But pandemic severity follows a power law. Given enough time, a pandemic of that magnitude was statistically inevitable.

The Bottom Line

Don't

Use averages to characterize power-law data
Expect extremes to moderate over time
Underestimate tail risk with Gaussian models
Assume your sample represents the population
Use variance-based metrics (VaR, Sharpe) in fat-tailed domains

Do

Think in medians instead of means
Expect discontinuities and jumps
Prepare for outliers that dominate totals
Question whether CLT applies
Ask: is this additive or multiplicative?

The Deepest Insight

Normal distributions emerge from sums. When you add many small, independent, finite-variance effects, the Central Limit Theorem produces a bell curve. Height is the sum of thousands of genetic and environmental factors.

Power laws emerge from products and feedback. When success breeds success, when rich get richer, when effects multiply rather than add — you get power laws. Variances explode, and Gaussian scaffolding collapses.

The question to ask about any phenomenon: Is this thing made from sums or products?