Gradient Checking
Definition
Numerically approximates partial derivatives using the centered difference formula and compares them against analytical gradients produced by backpropagation. Used exclusively as a debugging tool to verify a backprop implementation is correct.
Intuition
Backpropagation is easy to implement incorrectly — sign errors, missing terms, wrong tensor shapes, and off-by-one errors all produce gradients that look plausible but are subtly wrong. Gradient checking acts as a unit test: if the numerical and analytical gradients agree to high precision, the implementation is almost certainly correct.
Formal Description
Centered difference approximation for parameter :
where is the -th standard basis vector and .
The centered difference has error vs. for the one-sided difference, making it significantly more accurate.
Relative error metric:
Interpretation:
| Relative error | Assessment |
|---|---|
| Great — implementation very likely correct | |
| – | Acceptable, borderline |
| Likely a bug in backprop |
Applications
- Debugging new backpropagation implementations before training
- Catching sign errors, missing regularization terms, wrong matrix transpositions
- Verifying custom layers or loss functions in deep learning frameworks
Trade-offs
- Computational cost: requires forward passes where is the number of parameters — completely impractical for training; use only on small models or a parameter subset
- Incompatible with dropout: stochastic layers produce different values on each forward pass, making the numerical approximation meaningless; disable dropout during checks
- Must include regularization: in the numerical approximation must match used in backprop exactly, including any penalty
- Numerical precision: very small causes floating-point cancellation; very large increases approximation error; is the standard choice