Singular Value Decomposition (SVD)

Definition

Every real matrix $A \in R^{m \times n}$ can be factorised as

A = U Σ V^{⊤}

where $U \in R^{m \times m}$ and $V \in R^{n \times n}$ are orthogonal matrices and $Σ \in R^{m \times n}$ is a diagonal matrix with non-negative entries $σ_{1} \geq σ_{2} \geq \dots \geq σ_{r} > 0$ (the singular values), $r = rank (A)$ .

Intuition

Every linear map decomposes into three geometric steps: (1) rotate/reflect input space ( $V^{⊤}$ ), (2) scale along each axis ( $Σ$ ), and (3) rotate/reflect output space ( $U$ ). The singular values $σ_{i}$ measure how strongly the map stretches in each direction — the first is the largest and captures the most “action”.

This is the most general matrix factorisation: it always exists, requires no special structure (unlike eigendecomposition), and applies to rectangular matrices.

Formal Description

Thin (economy) SVD: keep only the $r$ non-zero singular values:

A = U_{r} Σ_{r} V_{r}^{⊤} U_{r} \in R^{m \times r}, Σ_{r} \in R^{r \times r}, V_{r} \in R^{n \times r}

Relationship to eigendecomposition:

A^{⊤} A = V Σ^{⊤} Σ V^{⊤} (eigendecomp of A^{⊤} A)

A A^{⊤} = U Σ Σ^{⊤} U^{⊤} (eigendecomp of A A^{⊤})

The singular values are $σ_{i} = λ_{i} (A^{⊤} A)$ ; the columns of $V$ are eigenvectors of $A^{⊤} A$ (right singular vectors); columns of $U$ are eigenvectors of $A A^{⊤}$ (left singular vectors).

Rank- $k$ approximation (Eckart-Young theorem):

A_{k} = U_{k} Σ_{k} V_{k}^{⊤} = i = 1 \sum k σ_{i} u_{i} v_{i}^{⊤}

minimises $∥ A - B ∥_{F}$ (Frobenius) and $∥ A - B ∥_{2}$ (spectral) over all rank- $k$ matrices $B$ . The approximation error is $∥ A - A_{k} ∥_{F}^{2} = \sum_{i = k + 1}^{r} σ_{i}^{2}$ .

Moore-Penrose pseudoinverse:

A^{+} = V Σ^{+} U^{⊤}

where $Σ^{+}$ replaces each non-zero $σ_{i}$ by $1/ σ_{i}$ . Gives the minimum-norm least-squares solution to $A x = b$ : $\hat{x} = A^{+} b$ .

Condition number:

κ (A) = \frac{σ _{m a x}}{σ _{m i n}}

A large condition number indicates a near-singular (ill-conditioned) matrix; small perturbations in $b$ can cause large changes in $\hat{x}$ .

Numerical computation: the Golub-Reinsch algorithm computes SVD in $O (min (m, n) \cdot mn)$ time via bidiagonalization followed by QR iteration.

Applications

Application	How SVD is used
PCA	Principal components = right singular vectors of centred data matrix $X$ ; $σ_{i}^{2} / (n - 1)$ = variance explained
Low-rank approximation	Compress images, text co-occurrence matrices, recommendation systems
Pseudoinverse / least squares	Solve overdetermined or rank-deficient systems $A x \approx b$
Latent Semantic Analysis (LSA)	Compress term-document matrix to $k$ latent semantic dimensions
Noise reduction	Truncate small singular values to remove noise
Numerical linear algebra	Matrix condition estimation, regularization

Trade-offs

Full SVD costs $O (m^{2} n)$ for $m \geq n$ ; use randomized SVD (e.g., sklearn.utils.extmath.randomized_svd) for large matrices when only the top- $k$ factors are needed.
Numerically more stable than eigendecomposition of $A^{⊤} A$ (squaring the condition number).
Rank- $k$ truncation is the globally optimal low-rank approximation — no other rank- $k$ factorisation is better in Frobenius or spectral norm.

Notes

Explorer

svd

Singular Value Decomposition (SVD)

Definition

Intuition

Formal Description

Applications

Trade-offs

Links

Graph View

Table of Contents

Backlinks