Least Squares and Normal Equations

Statement

Given an over-determined system $Ax = b$ (where $b \in / Col (A)$ ), the vector $\overset{x}{^}$ that minimises $∥ Ax - b ∥_{2}$ satisfies the normal equations:

A^{T} A \overset{x}{^} = A^{T} b .

When the columns of $A$ are linearly independent, the solution is unique: $\overset{x}{^} = (A^{T} A)^{- 1} A^{T} b$ .

Assumptions

$A$ is $m \times n$ with $m > n$ (over-determined).
The columns of $A$ are linearly independent (full column rank), so $A^{T} A$ is invertible.
$b \in R^{m}$ .

Proof Sketch

The minimum of $∥ Ax - b ∥_{2}$ is achieved when $Ax$ is as close to $b$ as possible inside $Col (A)$ — i.e., when $Ax = b_{proj}$ , the orthogonal projection of $b$ onto $Col (A)$ . The residual $b - A \overset{x}{^}$ must be orthogonal to every column of $A$ , which gives $A^{T} (b - A \overset{x}{^}) = 0$ , i.e., the normal equations.

Full Proof

Step 1: Decompose $b$ . Write $b = b_{proj} + r$ , where $b_{proj} \in Col (A)$ and $r = b - b_{proj}$ is orthogonal to $Col (A)$ . Since $r ⊥ Col (A)$ , we have $r \in Null (A^{T})$ , so $A^{T} r = 0$ .

Step 2: Optimality condition. For any $x$ :

∥ Ax - b ∥^{2} = ∥ Ax - b_{proj} - r ∥^{2} = ∥ Ax - b_{proj} ∥^{2} + ∥ r ∥^{2},

using the Pythagorean theorem (since $Ax - b_{proj} \in Col (A)$ and $r ⊥ Col (A)$ ). This is minimised when $Ax = b_{proj}$ , i.e., $∥ Ax - b_{proj} ∥^{2} = 0$ .

Step 3: Derive normal equations. Multiply $A \overset{x}{^} = b_{proj}$ on the left by $A^{T}$ :

A^{T} A \overset{x}{^} = A^{T} b_{proj} = A^{T} (b - r) = A^{T} b - A^{T} r = A^{T} b .

Step 4: Unique solution. By full column rank, $A^{T} A$ is symmetric positive definite and hence invertible:

\overset{x}{^} = (A^{T} A)^{- 1} A^{T} b .

Projection matrix. The projection $b_{proj} = A \overset{x}{^} = A (A^{T} A)^{- 1} A^{T} b =: Pb$ satisfies $P^{2} = P$ (idempotent). If $A$ is square and invertible, $P = I$ .

Example (linear regression). Fitting a line through $(1, 1), (2, 3), (3, 2)$ gives

(36614) (β_{0} β_{1}) = (613),

yielding $β_{0} = 1$ , $β_{1} = \frac{1}{2}$ , so the least-squares line is $y = 1 + \frac{x}{2}$ .

Notes / Intuition

The normal equations say: the residual $b - A \overset{x}{^}$ must be orthogonal to the column space — which is exactly what “nearest point in a subspace” means. Numerically, solving the normal equations by forming $A^{T} A$ squares the condition number; using QR factorisation ( $A = QR$ , then solving $R \overset{x}{^} = Q^{T} b$ ) is preferred for numerical stability.

Notes

Explorer

least_squares

Least Squares and Normal Equations

Statement

Assumptions

Proof Sketch

Full Proof

Notes / Intuition

Links

Graph View

Table of Contents

Backlinks