Bias-Variance Analysis

Definition

A diagnostic framework for understanding whether a model’s error is primarily due to underfitting (bias) or overfitting (variance), using human-level performance as a proxy for the irreducible (Bayes) error.

Intuition

The gap between training error and Bayes error tells you how much the model is underfitting; the gap between training error and dev error tells you how much it is overfitting; targeting the larger gap gives the highest expected improvement.

Formal Description

Bayes error: the theoretically minimum achievable error for a given task; unknowable in general.

Human-level performance (HLP): a practical proxy for Bayes error on tasks where humans perform near-optimally (perception, language). Note: HLP ≤ Bayes error is not guaranteed — humans can sometimes achieve below-Bayes error on narrow tasks, but for practical purposes HLP is used as the Bayes proxy.

Avoidable bias: training error − HLP; represents the gap between current training performance and the theoretical ceiling.

Variance: dev error − training error; represents the generalization gap.

Four-gap diagnostic examples:

HLPTrain errorDev errorAvoidable biasVarianceFocus
1%8%10%7%2%Bias reduction
7.5%8%10%0.5%2%Variance reduction

Bias reduction tactics: larger model, more features, better architecture, longer training.

Variance reduction tactics: more data, regularization, dropout, data augmentation.

Applications

Any supervised learning diagnostic; especially useful when progress has stalled to identify the bottleneck.

Trade-offs

  • HLP is not always available (e.g., predicting stock prices)
  • HLP underestimates Bayes error for superhuman tasks
  • The framework assumes a clean train/dev split from the same distribution (use training-dev set to disentangle mismatch)