PAC Learning

Definition

A concept class is probably approximately correct (PAC) learnable if there exists an algorithm and a polynomial such that for all , every concept , and every distribution over the input space, if receives at least i.i.d. samples from labelled by , then with probability at least , outputs a hypothesis with true error .

  • : accuracy parameter (maximum tolerated error)
  • : confidence parameter (allowed failure probability)
  • Probably: succeeds with probability
  • Approximately correct: error

Intuition

PAC learning formalises “what does it mean to learn from data?” Instead of demanding a perfect classifier, it asks: can you, with enough data, produce a classifier that is almost certainly good enough? The framework cleanly separates the statistical question (how many samples?) from the computational question (how fast can you find the hypothesis?).

Think of it as a contract: the learner doesn’t know the true concept or the data distribution , but given enough examples, it commits to outputting something close to with high confidence.

Formal Description

Setup:

  • Input space , label space
  • Concept class (all learnable target functions)
  • Hypothesis class (all functions the learner can output; may differ from )
  • True risk:
  • Empirical risk:

Realizable PAC learning: assumes with (target in the class). Standard result: for finite , empirical risk minimization (ERM) satisfies the PAC bound with

Agnostic PAC learning: no assumption that contains a perfect hypothesis. Goal: find satisfying

Agnostic PAC is strictly harder; sample complexity depends on VC dimension (see vc_dimension).

Consistent learner: an algorithm that always returns with zero empirical error on the training set. For realizable PAC, any consistent learner is a valid PAC learner.

Occam’s razor / compression: simpler hypotheses (shorter description length ) require fewer samples. This is the formal justification for preferring simple models.

Computational vs statistical learnability: PAC is a statistical framework; even if a class is statistically PAC learnable, finding the optimal may be NP-hard (e.g., learning intersections of halfspaces).

Applications

ConceptConnection to PAC
Sample complexity boundsLower bound on to achieve -PAC
RegularizationConstraining (smaller size) reduces required
Model selectionSimpler models ↔ smaller $
Cross-validationEmpirical substitute for estimating true risk

Trade-offs

  • PAC bounds are often loose (vacuous for realistic and ); non-uniform bounds (Rademacher complexity) are tighter in practice.
  • The framework assumes i.i.d. data; distribution shift invalidates PAC guarantees.
  • PAC learning ignores computational cost; a class can be PAC learnable but computationally hard to learn.