Vectorization and Broadcasting

Definition

Vectorization: rewriting computations over datasets as matrix/tensor operations to exploit BLAS/GPU parallelism instead of explicit Python loops. Broadcasting: extending elementwise operations to arrays of different shapes by implicitly expanding singleton dimensions, according to a fixed set of compatibility rules.

Intuition

Python loops are 10–1000× slower than equivalent NumPy/BLAS operations due to interpreter overhead. Vectorization is not premature optimization in ML — it is necessary for feasible training times even on small datasets. Broadcasting lets you write these vectorized operations concisely without manually tiling arrays.

Formal Description

Vectorized logistic regression (over examples, , ):

Here broadcasts across all examples automatically.

Broadcasting rules: two shapes are compatible dimension-by-dimension (from the trailing dimension) if they are equal or one of them is 1; missing leading dimensions are treated as 1. Examples:

  • → broadcasts to
  • → trailing dim matches, treated as
  • → incompatible ✗

Common patterns:

  • Adding a bias vector to an activation matrix : broadcasts over the columns
  • Scaling each sample by a per-sample scalar against a matrix : requires reshaping to

Applications

Forward and backward passes over mini-batches; any batch computation. NumPy, PyTorch, and JAX all follow NumPy broadcasting semantics, so the same intuitions apply across frameworks.

Trade-offs

Unexpected broadcasting can introduce silent bugs: shape and shape behave differently when combined with a matrix. Always check shapes explicitly (e.g., assert W.shape == (n, 1)). Large intermediate tensors from broadcasting can cause memory pressure on GPU.