Least Squares and Normal Equations
Statement
Given an over-determined system (where ), the vector that minimises satisfies the normal equations:
When the columns of are linearly independent, the solution is unique: .
Assumptions
- is with (over-determined).
- The columns of are linearly independent (full column rank), so is invertible.
- .
Proof Sketch
The minimum of is achieved when is as close to as possible inside — i.e., when , the orthogonal projection of onto . The residual must be orthogonal to every column of , which gives , i.e., the normal equations.
Full Proof
Step 1: Decompose . Write , where and is orthogonal to . Since , we have , so .
Step 2: Optimality condition. For any :
using the Pythagorean theorem (since and ). This is minimised when , i.e., .
Step 3: Derive normal equations. Multiply on the left by :
Step 4: Unique solution. By full column rank, is symmetric positive definite and hence invertible:
Projection matrix. The projection satisfies (idempotent). If is square and invertible, .
Example (linear regression). Fitting a line through gives
yielding , , so the least-squares line is .
Notes / Intuition
The normal equations say: the residual must be orthogonal to the column space — which is exactly what “nearest point in a subspace” means. Numerically, solving the normal equations by forming squares the condition number; using QR factorisation (, then solving ) is preferred for numerical stability.