Dimensionality Reduction
Core Idea
Dimensionality reduction maps data from a high-dimensional space to a lower-dimensional representation () while preserving as much task-relevant structure as possible. It is used to remove noise, reduce computational cost, enable visualisation, and mitigate the curse of dimensionality.
Two families: linear methods (PCA, SVD, random projections) that find global linear subspaces, and non-linear methods (t-SNE, UMAP) that preserve local neighbourhood structure.
Mathematical Formulation
Principal Component Analysis (PCA)
PCA finds the orthogonal directions of maximum variance in the data.
Covariance matrix: (centred data).
Eigendecomposition: , where columns of are principal components (eigenvectors) and with .
Projection to dimensions: where contains the top- eigenvectors.
Equivalently via SVD: ; the top- right singular vectors form .
Explained variance ratio: — choose such that this exceeds 0.95 (or use a scree plot elbow).
from sklearn.decomposition import PCA
pca = PCA(n_components=0.95, svd_solver='full') # retain 95% variance
X_reduced = pca.fit_transform(X_train)
print(f"Reduced to {pca.n_components_} dimensions")
print(f"Explained variance: {pca.explained_variance_ratio_.cumsum()[-1]:.3f}")t-SNE (t-Distributed Stochastic Neighbour Embedding)
Non-linear method designed for 2D/3D visualisation of high-dimensional data.
- Computes pairwise similarities in original space using a Gaussian kernel centred on each point.
- Computes pairwise similarities in low-dimensional space using a Student-t kernel.
- Minimises KL divergence between the two similarity distributions via gradient descent.
Key hyperparameter: perplexity (default 30) — roughly the number of neighbours considered; controls the balance between local and global structure.
from sklearn.manifold import TSNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_2d = tsne.fit_transform(X)Caution: t-SNE distances in the 2D plot do not reflect true distances. Cluster sizes and inter-cluster distances are not meaningful. Use only for visualisation.
UMAP (Uniform Manifold Approximation and Projection)
Faster than t-SNE, better preserves global structure, and supports transform() on new data.
import umap
reducer = umap.UMAP(n_components=2, n_neighbors=15, min_dist=0.1, random_state=42)
X_2d = reducer.fit_transform(X)Inductive Bias
- PCA: assumes the data lies on a linear subspace; maximises variance.
- t-SNE/UMAP: assumes the data lies on a non-linear low-dimensional manifold; preserves local topology.
Training Objective
| Method | Objective |
|---|---|
| PCA | Maximise variance of projections minimise reconstruction error |
| t-SNE | Minimise between high-d Gaussian and low-d t-distribution similarities |
| UMAP | Minimise cross-entropy between fuzzy topological representations |
Strengths
- Removes noise dimensions, improving downstream model robustness.
- Enables visualisation of structure in dimensions.
- Reduces overfitting risk for high-dimensional inputs.
- PCA is fast and fully invertible; useful for compression.
Weaknesses
- PCA only captures linear structure; fails on curved manifolds.
- t-SNE is slow ( naively), non-deterministic, and cannot be applied to new data.
- All methods can obscure structure if is chosen poorly.
Variants
- Kernel PCA: applies PCA in a kernel-induced feature space for non-linear structure.
- Sparse PCA: enforces sparsity in loadings for interpretability.
- Incremental PCA: handles large datasets via minibatch updates.
- Truncated SVD: equivalent to PCA but works with sparse matrices (no mean centering).
- Random Projections (Johnson-Lindenstrauss): extremely fast, approximate dimensionality reduction with theoretical guarantees.
References
- Jolliffe, I. (2002). Principal Component Analysis (2nd ed.). Springer.
- van der Maaten, L. & Hinton, G. (2008). “Visualizing data using t-SNE.” JMLR.
- McInnes, L. et al. (2018). “UMAP: Uniform Manifold Approximation and Projection.” arXiv:1802.03426.