Gradient and Directional Derivative

Definition

The gradient of a scalar field $f : R^{n} \to R$ at a point $x$ is the vector of partial derivatives:

\nabla f (x) = \partial f / \partial x_{1} ⋮ \partial f / \partial x_{n} \in R^{n}

It points in the direction of steepest ascent of $f$ at $x$ , with magnitude equal to the rate of steepest ascent.

Intuition

The gradient generalises the derivative to multiple dimensions. For a 2D landscape $f (x, y)$ = altitude, $\nabla f$ is an arrow on the horizontal plane pointing uphill as steeply as possible. Its magnitude tells you how steep that slope is. Moving in the direction $- \nabla f$ descends as steeply as possible — the basis of gradient descent.

The directional derivative answers: “how fast does $f$ change if I walk in direction $u$ ?” The gradient is the tool that computes this for any direction at once.

Formal Description

Partial derivative: fix all variables except $x_{i}$ and differentiate:

\frac{\partial f}{\partial x _{i}} (x) = h \to 0 lim \frac{f ( x + h e _{i} ) - f ( x )}{h}

Directional derivative in direction $u$ ( $∥ u ∥ = 1$ ):

D_{u} f (x) = h \to 0 lim \frac{f ( x + h u ) - f ( x )}{h} = \nabla f (x) \cdot u

The directional derivative equals the dot product of the gradient with the unit direction. Maximised when $u = \nabla f /∥\nabla f ∥$ , giving steepest ascent. Zero when $u ⊥ \nabla f$ (moving along a level set).

Level sets and gradient orthogonality: the level set ${x : f (x) = c}$ is a $(n - 1)$ -dimensional surface. The gradient $\nabla f (x)$ is orthogonal to the level set at every point $x$ .

First-order Taylor approximation:

f (x + δ) \approx f (x) + \nabla f (x)^{⊤} δ for small δ

This is the linear approximation; the gradient is the coefficient vector.

Chain rule for scalar composition: if $g : R^{n} \to R^{m}$ and $f : R^{m} \to R$ :

\nabla_{x} (f \circ g) = J_{g} (x)^{⊤} \nabla_{y} f (g (x))

where $J_{g}$ is the Jacobian of $g$ (see jacobian_and_hessian).

Gradient in Cartesian coordinates ( $n = 3$ ):

\nabla f = \frac{\partial f}{\partial x} i + \frac{\partial f}{\partial y} j + \frac{\partial f}{\partial z} k

Applications

Application	Role of gradient
Gradient descent	Update $θ \leftarrow θ - α \nabla_{θ} L$
Backpropagation	Accumulate gradients through the computation graph
Lagrange multipliers	Condition: $\nabla f = λ \nabla g$ at constrained optimum
Physics	Force = $- \nabla V$ (potential energy)
Image processing	Image gradient detects edges: $\nabla I = (\partial I / \partial x, \partial I / \partial y)$

Trade-offs

The gradient exists only where $f$ is differentiable. Non-smooth functions (e.g., ReLU) require subgradients at non-differentiable points.
In high dimensions, computing the full gradient requires evaluating all $n$ partial derivatives; automatic differentiation handles this efficiently.
Gradient direction is locally optimal but can lead to saddle points or local minima; second-order information (Hessian) is needed to distinguish these.

Notes

Explorer

gradient_and_directional_derivative

Gradient and Directional Derivative

Definition

Intuition

Formal Description

Applications

Trade-offs

Links

Graph View

Table of Contents

Backlinks