Singular Value Decomposition (SVD)

Definition

Every real matrix can be factorised as

where and are orthogonal matrices and is a diagonal matrix with non-negative entries (the singular values), .

Intuition

Every linear map decomposes into three geometric steps: (1) rotate/reflect input space (), (2) scale along each axis (), and (3) rotate/reflect output space (). The singular values measure how strongly the map stretches in each direction — the first is the largest and captures the most “action”.

This is the most general matrix factorisation: it always exists, requires no special structure (unlike eigendecomposition), and applies to rectangular matrices.

Formal Description

Thin (economy) SVD: keep only the non-zero singular values:

Relationship to eigendecomposition:

The singular values are ; the columns of are eigenvectors of (right singular vectors); columns of are eigenvectors of (left singular vectors).

Rank- approximation (Eckart-Young theorem):

minimises (Frobenius) and (spectral) over all rank- matrices . The approximation error is .

Moore-Penrose pseudoinverse:

where replaces each non-zero by . Gives the minimum-norm least-squares solution to : .

Condition number:

A large condition number indicates a near-singular (ill-conditioned) matrix; small perturbations in can cause large changes in .

Numerical computation: the Golub-Reinsch algorithm computes SVD in time via bidiagonalization followed by QR iteration.

Applications

ApplicationHow SVD is used
PCAPrincipal components = right singular vectors of centred data matrix ; = variance explained
Low-rank approximationCompress images, text co-occurrence matrices, recommendation systems
Pseudoinverse / least squaresSolve overdetermined or rank-deficient systems
Latent Semantic Analysis (LSA)Compress term-document matrix to latent semantic dimensions
Noise reductionTruncate small singular values to remove noise
Numerical linear algebraMatrix condition estimation, regularization

Trade-offs

  • Full SVD costs for ; use randomized SVD (e.g., sklearn.utils.extmath.randomized_svd) for large matrices when only the top- factors are needed.
  • Numerically more stable than eigendecomposition of (squaring the condition number).
  • Rank- truncation is the globally optimal low-rank approximation — no other rank- factorisation is better in Frobenius or spectral norm.