Gradient Checking

Definition

Numerically approximates partial derivatives using the centered difference formula and compares them against analytical gradients produced by backpropagation. Used exclusively as a debugging tool to verify a backprop implementation is correct.

Intuition

Backpropagation is easy to implement incorrectly — sign errors, missing terms, wrong tensor shapes, and off-by-one errors all produce gradients that look plausible but are subtly wrong. Gradient checking acts as a unit test: if the numerical and analytical gradients agree to high precision, the implementation is almost certainly correct.

Formal Description

Centered difference approximation for parameter :

where is the -th standard basis vector and .

The centered difference has error vs. for the one-sided difference, making it significantly more accurate.

Relative error metric:

Interpretation:

Relative errorAssessment
Great — implementation very likely correct
Acceptable, borderline
Likely a bug in backprop

Applications

  • Debugging new backpropagation implementations before training
  • Catching sign errors, missing regularization terms, wrong matrix transpositions
  • Verifying custom layers or loss functions in deep learning frameworks

Trade-offs

  • Computational cost: requires forward passes where is the number of parameters — completely impractical for training; use only on small models or a parameter subset
  • Incompatible with dropout: stochastic layers produce different values on each forward pass, making the numerical approximation meaningless; disable dropout during checks
  • Must include regularization: in the numerical approximation must match used in backprop exactly, including any penalty
  • Numerical precision: very small causes floating-point cancellation; very large increases approximation error; is the standard choice