Convex Optimization

Definition

An optimization problem is convex if the objective function is convex and the feasible set is a convex set. Any local minimum of a convex problem is a global minimum.

Intuition

Convexity is the “nice” case of optimization: the loss landscape has no local minima traps. This is why methods like logistic regression, SVMs, and lasso are tractable at scale — their objectives are convex. Neural network training is non-convex, but understanding convex problems illuminates why certain algorithms work and fail.

Formal Description

Convex Sets and Functions

Convex set $C \subseteq R^{n}$ : for all $x, y \in C$ and $λ \in [0, 1]$ :

λ x + (1 - λ) y \in C

Examples: halfspaces, polyhedra, norm balls, the positive semidefinite cone.

Convex function $f : C \to R$ : for all $x, y \in C$ and $λ \in [0, 1]$ :

f (λ x + (1 - λ) y) \leq λ f (x) + (1 - λ) f (y)

Equivalently (for differentiable $f$ ): $f (y) \geq f (x) + \nabla f (x)^{⊤} (y - x)$ (first-order condition).

For twice-differentiable $f$ : $f$ is convex iff $\nabla^{2} f (x) ⪰ 0$ (PSD Hessian) everywhere.

Strictly convex: strict inequality above; unique global minimum.

Strongly convex with parameter $m > 0$ : $f (y) \geq f (x) + \nabla f (x)^{⊤} (y - x) + \frac{m}{2} ∥ y - x ∥^{2}$ . Guarantees linear convergence of gradient descent.

Preservation rules:

Non-negative linear combinations of convex functions are convex.
The composition $g (f (x))$ is convex if $g$ is convex and non-decreasing and $f$ is convex.
Maximum of convex functions is convex.

Convex Optimization Problem

x min f (x) s.t. g_{i} (x) \leq 0, h_{j} (x) = 0

where $f, g_{i}$ are convex and $h_{j}$ are affine.

Key result: every local minimum is a global minimum.

First-order optimality (unconstrained): $\nabla f (x^{*}) = 0$ .

KKT Conditions (Constrained Problems)

For the problem $min f (x)$ s.t. $g_{i} (x) \leq 0$ , $h_{j} (x) = 0$ , the Lagrangian is:

L (x, λ, ν) = f (x) + i \sum λ_{i} g_{i} (x) + j \sum ν_{j} h_{j} (x)

KKT necessary conditions for optimality $(x^{*}, λ^{*}, ν^{*})$ :

Stationarity: $\nabla f + \sum_{i} λ_{i}^{*} \nabla g_{i} + \sum_{j} ν_{j}^{*} \nabla h_{j} = 0$
Primal feasibility: $g_{i} (x^{*}) \leq 0$ , $h_{j} (x^{*}) = 0$
Dual feasibility: $λ_{i}^{*} \geq 0$
Complementary slackness: $λ_{i}^{*} g_{i} (x^{*}) = 0$

For convex problems, KKT conditions are sufficient as well as necessary (under Slater’s condition for strong duality).

Common Convex Problems in ML

Problem	Convex?	Notes
Linear regression (squared loss)	✅ Strongly convex	Closed-form solution
Logistic regression	✅ Convex	No closed form; gradient descent works
Lasso ( $ℓ_{1}$ regularisation)	✅ Convex, non-smooth	Requires proximal methods or coordinate descent
Ridge regression ( $ℓ_{2}$ )	✅ Strongly convex	Closed-form solution
Hard-margin SVM	✅ QP	Kernelizable via dual
Neural network training	❌ Non-convex	Multiple local minima, saddle points

Duality

The dual problem: $max_{λ \geq 0, ν} g (λ, ν)$ where $g (λ, ν) = in f_{x} L (x, λ, ν)$ .

Weak duality: $d^{*} \leq p^{*}$ always.

Strong duality (Slater’s condition): if the primal is convex and strictly feasible, $d^{*} = p^{*}$ . SVMs are solved via their dual, which is often more convenient.

Applications

Logistic regression, lasso, ridge, elastic net: convex, efficient global optimisation
SVM training: quadratic program (QP), solved in dual
Convex relaxations of combinatorial problems (e.g., LP relaxation)
Neural tangent kernel theory assumes convex-like analysis in overparameterised regime

Trade-offs

Convex problems are tractable but often insufficient to capture the representational power needed for complex tasks (hence neural networks).
Non-convex objectives like neural networks often still converge to good solutions in practice due to overparameterisation and benign loss landscape structure.

Notes

Explorer

convex_optimization