Vectorization and Broadcasting
Definition
Vectorization: rewriting computations over datasets as matrix/tensor operations to exploit BLAS/GPU parallelism instead of explicit Python loops. Broadcasting: extending elementwise operations to arrays of different shapes by implicitly expanding singleton dimensions, according to a fixed set of compatibility rules.
Intuition
Python loops are 10–1000× slower than equivalent NumPy/BLAS operations due to interpreter overhead. Vectorization is not premature optimization in ML — it is necessary for feasible training times even on small datasets. Broadcasting lets you write these vectorized operations concisely without manually tiling arrays.
Formal Description
Vectorized logistic regression (over examples, , ):
Here broadcasts across all examples automatically.
Broadcasting rules: two shapes are compatible dimension-by-dimension (from the trailing dimension) if they are equal or one of them is 1; missing leading dimensions are treated as 1. Examples:
- → broadcasts to ✓
- → trailing dim matches, treated as → ✓
- → incompatible ✗
Common patterns:
- Adding a bias vector to an activation matrix : broadcasts over the columns
- Scaling each sample by a per-sample scalar against a matrix : requires reshaping to
Applications
Forward and backward passes over mini-batches; any batch computation. NumPy, PyTorch, and JAX all follow NumPy broadcasting semantics, so the same intuitions apply across frameworks.
Trade-offs
Unexpected broadcasting can introduce silent bugs: shape and shape behave differently when combined with a matrix. Always check shapes explicitly (e.g., assert W.shape == (n, 1)). Large intermediate tensors from broadcasting can cause memory pressure on GPU.