Singular Value Decomposition (SVD)
Definition
Every real matrix can be factorised as
where and are orthogonal matrices and is a diagonal matrix with non-negative entries (the singular values), .
Intuition
Every linear map decomposes into three geometric steps: (1) rotate/reflect input space (), (2) scale along each axis (), and (3) rotate/reflect output space (). The singular values measure how strongly the map stretches in each direction — the first is the largest and captures the most “action”.
This is the most general matrix factorisation: it always exists, requires no special structure (unlike eigendecomposition), and applies to rectangular matrices.
Formal Description
Thin (economy) SVD: keep only the non-zero singular values:
Relationship to eigendecomposition:
The singular values are ; the columns of are eigenvectors of (right singular vectors); columns of are eigenvectors of (left singular vectors).
Rank- approximation (Eckart-Young theorem):
minimises (Frobenius) and (spectral) over all rank- matrices . The approximation error is .
Moore-Penrose pseudoinverse:
where replaces each non-zero by . Gives the minimum-norm least-squares solution to : .
Condition number:
A large condition number indicates a near-singular (ill-conditioned) matrix; small perturbations in can cause large changes in .
Numerical computation: the Golub-Reinsch algorithm computes SVD in time via bidiagonalization followed by QR iteration.
Applications
| Application | How SVD is used |
|---|---|
| PCA | Principal components = right singular vectors of centred data matrix ; = variance explained |
| Low-rank approximation | Compress images, text co-occurrence matrices, recommendation systems |
| Pseudoinverse / least squares | Solve overdetermined or rank-deficient systems |
| Latent Semantic Analysis (LSA) | Compress term-document matrix to latent semantic dimensions |
| Noise reduction | Truncate small singular values to remove noise |
| Numerical linear algebra | Matrix condition estimation, regularization |
Trade-offs
- Full SVD costs for ; use randomized SVD (e.g.,
sklearn.utils.extmath.randomized_svd) for large matrices when only the top- factors are needed. - Numerically more stable than eigendecomposition of (squaring the condition number).
- Rank- truncation is the globally optimal low-rank approximation — no other rank- factorisation is better in Frobenius or spectral norm.
Links
- Eigenvalues and Eigenvectors — singular values are eigenvalues of
- Matrix Diagonalization — SVD generalises eigendecomposition to rectangular matrices
- Orthogonal Projections — columns of , are orthonormal bases for column/row spaces
- PCA and Unsupervised Learning