Neural Network Notation

Definition

The standard indexing and shape conventions used throughout deep learning formulations, covering layer indices, example indices, time steps, and matrix dimensions for both parameters and activations.

Intuition

Consistent notation prevents shape bugs and makes it straightforward to verify implementations — if you know the convention, you can derive the shape of any tensor at any point in a network from first principles.

Formal Description

Index superscripts:

$(i)$ = i-th training example
$[l]$ = l-th layer (1-indexed from input layer)
$⟨ t ⟩$ = t-th time step (sequence models)

Activations and pre-activations:

$z^{[l]} = W^{[l]} a^{[l - 1]} + b^{[l]}, a^{[l]} = g^{[l]} (z^{[l]})$

with $a^{[0]} = x$ (the input).

Layer sizes: $n^{[l]}$ = number of units in layer $l$ ; $n^{[0]} = n_{x}$ = input dimension; $L$ = total number of layers.

Matrix shapes (vectorized over $m$ examples):

Tensor	Shape	Description
$X$	$R^{n_{x} \times m}$	features × examples
$W^{[l]}$	$R^{n^{[l]} \times n^{[l - 1]}}$	output units × input units
$b^{[l]}$	$R^{n^{[l]} \times 1}$	broadcast over examples
$Z^{[l]}, A^{[l]}$	$R^{n^{[l]} \times m}$	activations over batch

Gradient shapes: $d W^{[l]}$ has the same shape as $W^{[l]}$ ; $d b^{[l]}$ has the same shape as $b^{[l]}$ .

Applications

Use as a reference when implementing forward and backward passes. Verify correctness by checking that shapes are consistent at each layer: $W^{[l]} A^{[l - 1]}$ requires the inner dimensions to match ( $n^{[l - 1]}$ ).

Trade-offs

Conventions differ between frameworks — PyTorch’s nn.Linear stores weight matrices transposed relative to this convention (shape $(n^{[l - 1]}, n^{[l]})$ ). Be consistent within a codebase and document which convention is in use.

Notes

Explorer

neural_network_notation

Neural Network Notation

Definition

Intuition

Formal Description

Applications

Trade-offs

Links

Graph View

Table of Contents

Backlinks