Norms
Definition
A norm is a function mapping vectors to non-negative values, measuring the “size” or distance from the origin. A function is a norm if it satisfies:
- (triangle inequality)
Intuition
Different norms measure “size” with different geometries. The norm measures straight-line distance; the norm measures Manhattan (city-block) distance and is robust to outliers; the norm is controlled by the single largest component. Choosing the right norm shapes the geometry of optimisation — encourages sparsity, encourages smoothness.
Formal Description
The norm is defined as
norm (Euclidean norm). . The squared norm is often preferred computationally since its derivatives depend only on individual elements.
norm. Grows at the same rate everywhere; useful when distinguishing exact zeros from small nonzero values:
norm (max norm). Equals the absolute value of the largest element:
Frobenius norm (for matrices):
The dot product can be expressed in terms of the norm and the angle between vectors:
The “norm” (count of nonzero entries) is not a true norm, as scaling a vector does not change its nonzero count.
Applications
- regularisation (ridge / weight decay) penalises large weights and yields smooth solutions.
- regularisation (LASSO) induces sparsity and automatic feature selection.
- norm arises in minimax problems and robustness guarantees.
Trade-offs
is differentiable everywhere and computationally convenient; is non-differentiable at zero, requiring subgradient or proximal methods. norms with promote stronger sparsity but are non-convex, making optimisation much harder.