Orthogonal Projections

Definition

Let $V$ be an $n$ -dimensional vector space and $W$ a $p$ -dimensional subspace with orthonormal basis ${s_{1}, \dots, s_{p}}$ . The orthogonal projection of $v \in V$ onto $W$ is the unique vector in $W$ closest to $v$ .

Intuition

Orthogonal projection drops a perpendicular from $v$ onto the subspace $W$ : the foot of that perpendicular is the projection. The residual $v - v_{proj}$ is orthogonal to every vector in $W$ , which is exactly what “closest point” means. When $W = Col (A)$ , projecting $b$ finds the nearest achievable output to $b$ , which is the least-squares solution.

Formal Description

Any $v \in V$ can be written using a full orthonormal basis ${s_{1}, \dots, s_{p}, t_{1}, \dots, t_{n - p}}$ as

v = a_{1} s_{1} + \dots + a_{p} s_{p} + b_{1} t_{1} + \dots + b_{n - p} t_{n - p} .

The orthogonal projection onto $W$ retains only the $W$ -components:

v_{proj_{W}} = a_{1} s_{1} + \dots + a_{p} s_{p},

where the coefficients are computed via inner products: $a_{k} = v^{T} s_{k}$ . Therefore:

v_{proj_{W}} = (v^{T} s_{1}) s_{1} + \dots + (v^{T} s_{p}) s_{p} .

Minimality. $v_{proj_{W}}$ is the closest point in $W$ to $v$ . For any $w \in W$ with expansion $w = c_{1} s_{1} + \dots + c_{p} s_{p}$ :

∥ v - w ∥^{2} = (a_{1} - c_{1})^{2} + \dots + (a_{p} - c_{p})^{2} + b_{1}^{2} + \dots + b_{n - p}^{2} \geq ∥ v - v_{proj_{W}} ∥^{2} .

Projection matrix. When $W = Col (A)$ and the columns of $A$ are linearly independent, the projection of $b$ onto $Col (A)$ is given by

b_{proj_{Col (A)}} = A (A^{T} A)^{- 1} A^{T} b .

The matrix $P = A (A^{T} A)^{- 1} A^{T}$ is idempotent: $P^{2} = P$ .

Applications

Least-squares regression projects the target $b$ onto the column space of the design matrix.
Gram-Schmidt uses successive projections to build an orthonormal basis.
Signal processing: projecting a signal onto a subspace removes noise components outside the subspace.

Trade-offs

The formula $P = A (A^{T} A)^{- 1} A^{T}$ requires $A^{T} A$ to be invertible (full column rank). If $A$ is rank-deficient, use the pseudoinverse $A^{+}$ instead. Explicitly forming $P$ is $O (m^{2} n)$ and memory-intensive; in practice, solve the normal equations directly rather than materialising $P$ .

Notes

Explorer

orthogonal_projections

Orthogonal Projections

Definition

Intuition

Formal Description

Applications

Trade-offs

Links

Graph View

Table of Contents

Backlinks