Least Squares and Normal Equations

Statement

Given an over-determined system (where ), the vector that minimises satisfies the normal equations:

When the columns of are linearly independent, the solution is unique: .

Assumptions

  • is with (over-determined).
  • The columns of are linearly independent (full column rank), so is invertible.
  • .

Proof Sketch

The minimum of is achieved when is as close to as possible inside — i.e., when , the orthogonal projection of onto . The residual must be orthogonal to every column of , which gives , i.e., the normal equations.

Full Proof

Step 1: Decompose . Write , where and is orthogonal to . Since , we have , so .

Step 2: Optimality condition. For any :

using the Pythagorean theorem (since and ). This is minimised when , i.e., .

Step 3: Derive normal equations. Multiply on the left by :

Step 4: Unique solution. By full column rank, is symmetric positive definite and hence invertible:

Projection matrix. The projection satisfies (idempotent). If is square and invertible, .

Example (linear regression). Fitting a line through gives

yielding , , so the least-squares line is .

Notes / Intuition

The normal equations say: the residual must be orthogonal to the column space — which is exactly what “nearest point in a subspace” means. Numerically, solving the normal equations by forming squares the condition number; using QR factorisation (, then solving ) is preferred for numerical stability.