Dimensionality Reduction

Core Idea

Dimensionality reduction maps data from a high-dimensional space to a lower-dimensional representation () while preserving as much task-relevant structure as possible. It is used to remove noise, reduce computational cost, enable visualisation, and mitigate the curse of dimensionality.

Two families: linear methods (PCA, SVD, random projections) that find global linear subspaces, and non-linear methods (t-SNE, UMAP) that preserve local neighbourhood structure.

Mathematical Formulation

Principal Component Analysis (PCA)

PCA finds the orthogonal directions of maximum variance in the data.

Covariance matrix: (centred data).

Eigendecomposition: , where columns of are principal components (eigenvectors) and with .

Projection to dimensions: where contains the top- eigenvectors.

Equivalently via SVD: ; the top- right singular vectors form .

Explained variance ratio: — choose such that this exceeds 0.95 (or use a scree plot elbow).

from sklearn.decomposition import PCA
 
pca = PCA(n_components=0.95, svd_solver='full')  # retain 95% variance
X_reduced = pca.fit_transform(X_train)
 
print(f"Reduced to {pca.n_components_} dimensions")
print(f"Explained variance: {pca.explained_variance_ratio_.cumsum()[-1]:.3f}")

t-SNE (t-Distributed Stochastic Neighbour Embedding)

Non-linear method designed for 2D/3D visualisation of high-dimensional data.

  1. Computes pairwise similarities in original space using a Gaussian kernel centred on each point.
  2. Computes pairwise similarities in low-dimensional space using a Student-t kernel.
  3. Minimises KL divergence between the two similarity distributions via gradient descent.

Key hyperparameter: perplexity (default 30) — roughly the number of neighbours considered; controls the balance between local and global structure.

from sklearn.manifold import TSNE
 
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_2d = tsne.fit_transform(X)

Caution: t-SNE distances in the 2D plot do not reflect true distances. Cluster sizes and inter-cluster distances are not meaningful. Use only for visualisation.

UMAP (Uniform Manifold Approximation and Projection)

Faster than t-SNE, better preserves global structure, and supports transform() on new data.

import umap
 
reducer = umap.UMAP(n_components=2, n_neighbors=15, min_dist=0.1, random_state=42)
X_2d = reducer.fit_transform(X)

Inductive Bias

  • PCA: assumes the data lies on a linear subspace; maximises variance.
  • t-SNE/UMAP: assumes the data lies on a non-linear low-dimensional manifold; preserves local topology.

Training Objective

MethodObjective
PCAMaximise variance of projections minimise reconstruction error
t-SNEMinimise between high-d Gaussian and low-d t-distribution similarities
UMAPMinimise cross-entropy between fuzzy topological representations

Strengths

  • Removes noise dimensions, improving downstream model robustness.
  • Enables visualisation of structure in dimensions.
  • Reduces overfitting risk for high-dimensional inputs.
  • PCA is fast and fully invertible; useful for compression.

Weaknesses

  • PCA only captures linear structure; fails on curved manifolds.
  • t-SNE is slow ( naively), non-deterministic, and cannot be applied to new data.
  • All methods can obscure structure if is chosen poorly.

Variants

  • Kernel PCA: applies PCA in a kernel-induced feature space for non-linear structure.
  • Sparse PCA: enforces sparsity in loadings for interpretability.
  • Incremental PCA: handles large datasets via minibatch updates.
  • Truncated SVD: equivalent to PCA but works with sparse matrices (no mean centering).
  • Random Projections (Johnson-Lindenstrauss): extremely fast, approximate dimensionality reduction with theoretical guarantees.

References

  • Jolliffe, I. (2002). Principal Component Analysis (2nd ed.). Springer.
  • van der Maaten, L. & Hinton, G. (2008). “Visualizing data using t-SNE.” JMLR.
  • McInnes, L. et al. (2018). “UMAP: Uniform Manifold Approximation and Projection.” arXiv:1802.03426.