Convex Optimization
Definition
An optimization problem is convex if the objective function is convex and the feasible set is a convex set. Any local minimum of a convex problem is a global minimum.
Intuition
Convexity is the “nice” case of optimization: the loss landscape has no local minima traps. This is why methods like logistic regression, SVMs, and lasso are tractable at scale — their objectives are convex. Neural network training is non-convex, but understanding convex problems illuminates why certain algorithms work and fail.
Formal Description
Convex Sets and Functions
Convex set : for all and :
Examples: halfspaces, polyhedra, norm balls, the positive semidefinite cone.
Convex function : for all and :
Equivalently (for differentiable ): (first-order condition).
For twice-differentiable : is convex iff (PSD Hessian) everywhere.
Strictly convex: strict inequality above; unique global minimum.
Strongly convex with parameter : . Guarantees linear convergence of gradient descent.
Preservation rules:
- Non-negative linear combinations of convex functions are convex.
- The composition is convex if is convex and non-decreasing and is convex.
- Maximum of convex functions is convex.
Convex Optimization Problem
where are convex and are affine.
Key result: every local minimum is a global minimum.
First-order optimality (unconstrained): .
KKT Conditions (Constrained Problems)
For the problem s.t. , , the Lagrangian is:
KKT necessary conditions for optimality :
- Stationarity:
- Primal feasibility: ,
- Dual feasibility:
- Complementary slackness:
For convex problems, KKT conditions are sufficient as well as necessary (under Slater’s condition for strong duality).
Common Convex Problems in ML
| Problem | Convex? | Notes |
|---|---|---|
| Linear regression (squared loss) | ✅ Strongly convex | Closed-form solution |
| Logistic regression | ✅ Convex | No closed form; gradient descent works |
| Lasso ( regularisation) | ✅ Convex, non-smooth | Requires proximal methods or coordinate descent |
| Ridge regression () | ✅ Strongly convex | Closed-form solution |
| Hard-margin SVM | ✅ QP | Kernelizable via dual |
| Neural network training | ❌ Non-convex | Multiple local minima, saddle points |
Duality
The dual problem: where .
Weak duality: always.
Strong duality (Slater’s condition): if the primal is convex and strictly feasible, . SVMs are solved via their dual, which is often more convenient.
Applications
- Logistic regression, lasso, ridge, elastic net: convex, efficient global optimisation
- SVM training: quadratic program (QP), solved in dual
- Convex relaxations of combinatorial problems (e.g., LP relaxation)
- Neural tangent kernel theory assumes convex-like analysis in overparameterised regime
Trade-offs
- Convex problems are tractable but often insufficient to capture the representational power needed for complex tasks (hence neural networks).
- Non-convex objectives like neural networks often still converge to good solutions in practice due to overparameterisation and benign loss landscape structure.