Convex Optimization

Definition

An optimization problem is convex if the objective function is convex and the feasible set is a convex set. Any local minimum of a convex problem is a global minimum.

Intuition

Convexity is the “nice” case of optimization: the loss landscape has no local minima traps. This is why methods like logistic regression, SVMs, and lasso are tractable at scale — their objectives are convex. Neural network training is non-convex, but understanding convex problems illuminates why certain algorithms work and fail.

Formal Description

Convex Sets and Functions

Convex set : for all and :

Examples: halfspaces, polyhedra, norm balls, the positive semidefinite cone.

Convex function : for all and :

Equivalently (for differentiable ): (first-order condition).

For twice-differentiable : is convex iff (PSD Hessian) everywhere.

Strictly convex: strict inequality above; unique global minimum.

Strongly convex with parameter : . Guarantees linear convergence of gradient descent.

Preservation rules:

  • Non-negative linear combinations of convex functions are convex.
  • The composition is convex if is convex and non-decreasing and is convex.
  • Maximum of convex functions is convex.

Convex Optimization Problem

where are convex and are affine.

Key result: every local minimum is a global minimum.

First-order optimality (unconstrained): .

KKT Conditions (Constrained Problems)

For the problem s.t. , , the Lagrangian is:

KKT necessary conditions for optimality :

  1. Stationarity:
  2. Primal feasibility: ,
  3. Dual feasibility:
  4. Complementary slackness:

For convex problems, KKT conditions are sufficient as well as necessary (under Slater’s condition for strong duality).

Common Convex Problems in ML

ProblemConvex?Notes
Linear regression (squared loss)✅ Strongly convexClosed-form solution
Logistic regression✅ ConvexNo closed form; gradient descent works
Lasso ( regularisation)✅ Convex, non-smoothRequires proximal methods or coordinate descent
Ridge regression ()✅ Strongly convexClosed-form solution
Hard-margin SVM✅ QPKernelizable via dual
Neural network training❌ Non-convexMultiple local minima, saddle points

Duality

The dual problem: where .

Weak duality: always.

Strong duality (Slater’s condition): if the primal is convex and strictly feasible, . SVMs are solved via their dual, which is often more convenient.

Applications

  • Logistic regression, lasso, ridge, elastic net: convex, efficient global optimisation
  • SVM training: quadratic program (QP), solved in dual
  • Convex relaxations of combinatorial problems (e.g., LP relaxation)
  • Neural tangent kernel theory assumes convex-like analysis in overparameterised regime

Trade-offs

  • Convex problems are tractable but often insufficient to capture the representational power needed for complex tasks (hence neural networks).
  • Non-convex objectives like neural networks often still converge to good solutions in practice due to overparameterisation and benign loss landscape structure.