Neural Network Notation
Definition
The standard indexing and shape conventions used throughout deep learning formulations, covering layer indices, example indices, time steps, and matrix dimensions for both parameters and activations.
Intuition
Consistent notation prevents shape bugs and makes it straightforward to verify implementations — if you know the convention, you can derive the shape of any tensor at any point in a network from first principles.
Formal Description
Index superscripts:
- = i-th training example
- = l-th layer (1-indexed from input layer)
- = t-th time step (sequence models)
Activations and pre-activations:
with (the input).
Layer sizes: = number of units in layer ; = input dimension; = total number of layers.
Matrix shapes (vectorized over examples):
| Tensor | Shape | Description |
|---|---|---|
| features × examples | ||
| output units × input units | ||
| broadcast over examples | ||
| activations over batch |
Gradient shapes: has the same shape as ; has the same shape as .
Applications
Use as a reference when implementing forward and backward passes. Verify correctness by checking that shapes are consistent at each layer: requires the inner dimensions to match ().
Trade-offs
Conventions differ between frameworks — PyTorch’s nn.Linear stores weight matrices transposed relative to this convention (shape ). Be consistent within a codebase and document which convention is in use.