VC Dimension

Definition

The Vapnik-Chervonenkis (VC) dimension of a hypothesis class over input space is the size of the largest set that shatters:

A set is shattered by if every possible binary labelling of is realised by some .

Intuition

VC dimension measures the “expressive capacity” of a hypothesis class — how complex a binary pattern can it learn? If you can find a set of points that labels in all possible ways, then . But if for every set of points there’s some labelling can’t produce, then .

Higher VC dimension = more expressive = needs more data to learn without overfitting. The VC dimension is the right notion of “degrees of freedom” for a hypothesis class.

Formal Description

Growth function : the maximum number of distinct labellings that can produce on any points:

Sauer-Shelah lemma: if , then:

This transitions from exponential growth (, when ) to polynomial growth. The key insight: once you have more data than the VC dimension, the class is “effectively finite”, enabling generalisation.

Fundamental theorem of statistical learning: is agnostic PAC learnable if and only if . The sample complexity is:

where .

Standard VC dimensions:

Hypothesis classVC dimension
Halfspaces in ()
Axis-aligned rectangles in
Polynomials of degree in
Linear classifiers (neural net, one hidden layer, parameters)
Finite set $\mathcal{H}

Infinite VC dimension: if no finite exists, is not PAC learnable (e.g., all functions ).

VC bound: with samples and with , with probability :

Applications

ApplicationRole of VC dimension
Model selectionHigher-capacity models (large ) require more data
SVM generalisationSupport vector margin theory bounds generalisation via a normalised VC measure
Neural network theoryNetworks with parameters have VC dimension — but in practice generalise much better than this bound implies
Regularization theoryConstraining model complexity ↔ reducing effective VC dimension

Trade-offs

  • VC dimension is a worst-case measure; it ignores the actual data distribution. Rademacher complexity and PAC-Bayes give tighter distribution-dependent bounds.
  • Modern deep networks have enormous VC dimension yet generalise well — classical VC theory doesn’t fully explain this; double-descent and implicit regularisation are active research areas.
  • Computing VC dimension exactly is often NP-hard; it is generally used as a theoretical tool rather than a practical quantity.