Spectral Theorem and Symmetric Matrices

Definition

A real symmetric matrix $A = A^{⊤} \in R^{n \times n}$ is always orthogonally diagonalisable:

A = Q Λ Q^{⊤}

where $Q$ is orthogonal ( $Q^{⊤} Q = I$ , columns are orthonormal eigenvectors) and $Λ = diag (λ_{1}, \dots, λ_{n})$ contains real eigenvalues. This is the spectral theorem for real symmetric matrices.

Intuition

Symmetric matrices arise whenever a quantity is “self-adjoint” — e.g., covariance matrices, Hessians, graph Laplacians. The spectral theorem says these matrices are always nicely diagonalisable: their eigenvectors are mutually perpendicular and span the whole space, and all eigenvalues are real. There’s no rotation; the matrix simply stretches along orthogonal axes.

Formal Description

Spectral decomposition:

A = i = 1 \sum n λ_{i} q_{i} q_{i}^{⊤}

The outer products $q_{i} q_{i}^{⊤}$ are rank-1 orthogonal projection matrices. $A$ is a weighted sum of projections onto its eigendirections, with eigenvalues as weights.

Positive semidefinite (PSD) matrices:

$A$ is PSD if $x^{⊤} A x \geq 0$ for all $x \in R^{n}$ . Equivalent conditions:

All eigenvalues $λ_{i} \geq 0$
$A = B^{⊤} B$ for some matrix $B$ (Cholesky-like factorisation)
All principal minors are non-negative

Positive definite (PD): $x^{⊤} A x > 0$ for all $x \neq = 0$ ; all eigenvalues $> 0$ ; $A$ is invertible.

Covariance matrices $Σ = \frac{1}{n} X^{⊤} X$ (centred data) are always PSD: for any $v$ , $v^{⊤} Σ v = \frac{1}{n} ∥ X v ∥^{2} \geq 0$ .

Cholesky decomposition: every PD matrix $A$ factors as $A = L L^{⊤}$ where $L$ is lower-triangular with positive diagonal. More numerically stable than eigendecomposition for solving linear systems.

Functions of symmetric matrices: via spectral decomposition, matrix functions are well-defined:

f (A) = Q diag (f (λ_{1}), \dots, f (λ_{n})) Q^{⊤}

Examples: $A^{1/2} = Q Λ^{1/2} Q^{⊤}$ (matrix square root), $e^{A} = Q e^{Λ} Q^{⊤}$ .

Rayleigh quotient:

R (A, x) = \frac{x ^{⊤} A x}{x ^{⊤} x}

$λ_{m i n} \leq R (A, x) \leq λ_{m a x}$ ; the maximum is attained at the leading eigenvector (used in power iteration and PCA).

Applications

Concept	Role of spectral theorem
PCA	Eigendecomposition of covariance matrix; eigenvectors = principal directions
Kernel methods	Kernel matrix $K_{ij} = k (x_{i}, x_{j})$ is PSD; Mercer’s theorem guarantees spectral expansion
Quadratic forms in optimization	Hessian $H = \nabla^{2} f$ is symmetric; PD Hessian ↔ strictly convex, unique minimum
Graph Laplacian	$L = D - W$ symmetric PSD; eigenvalues encode connectivity
Gaussian distributions	Covariance $Σ$ must be PSD for $N (μ, Σ)$ to be valid

Trade-offs

The spectral theorem only applies to symmetric (or normal) matrices; general square matrices require Jordan normal form, not a clean eigendecomposition.
Numerical eigendecomposition of symmetric matrices is well-conditioned and reliable; use numpy.linalg.eigh (not eig) for symmetric matrices — it’s faster and guarantees real eigenvalues.
PSD testing in practice: add small $ϵ I$ (Tikhonov regularisation) to handle numerical near-singularity.

Notes

Explorer

spectral_theorem

Spectral Theorem and Symmetric Matrices

Definition

Intuition

Formal Description

Applications

Trade-offs

Links

Graph View

Table of Contents

Backlinks