Instance-Based Methods
Core Idea
Instance-based (or memory-based) methods make predictions by finding the training examples most similar to a query point, rather than fitting a global parametric model. The hypothesis is “local”: nearby training examples are most relevant for predicting any given test point.
Mathematical Formulation
K-Nearest Neighbours (KNN)
Given a query point and training set , identify the training points closest under distance :
Classification: majority vote over neighbours:
Regression: mean (or distance-weighted mean) over neighbours:
Distance Metrics
| Metric | Formula | Use case |
|---|---|---|
| Euclidean () | Continuous, homogeneous features | |
| Manhattan () | $\sum_j | x_j - x’_j |
| Minkowski | $(\sum_j | x_j - x’_j |
| Cosine | High-dimensional text/embeddings | |
| Hamming | Proportion of differing components | Binary or categorical features |
Standardise features before computing Euclidean distance — otherwise high-variance features dominate.
Inductive Bias
KNN assumes: nearby points (in feature space) have similar labels. This is the smoothness assumption, also called the manifold hypothesis in its stronger form. It holds well for locally smooth functions and fails when the decision boundary is globally structured or when the feature space is very high-dimensional.
Training Objective
KNN has no training phase — the training set is the model. Prediction requires searching over all training points, making prediction for brute-force search. Practical implementations use:
- KD-trees: for low-dimensional data ().
- Ball trees: better than KD-trees for .
- Approximate nearest neighbours (FAISS, HNSW): sub-linear search for large-scale retrieval.
Strengths
- No training time: simply stores the data.
- Non-parametric: adapts to arbitrary decision boundary shapes.
- Naturally handles multi-class: majority vote scales to any .
- Interpretable locally: the prediction is explained by the neighbours.
Weaknesses
- Slow prediction: per query (brute force); mitigated by ANN indices.
- High memory: must store entire training set.
- Curse of dimensionality: in high dimensions, distance concentrates and neighbourhoods become meaningless. Performance degrades rapidly for .
- No feature learning: cannot learn task-relevant representations — sensitive to irrelevant features.
- Sensitive to scale: requires feature standardisation.
Variants
- Radius-based neighbours: use all points within radius instead of fixed .
- Locally weighted regression (LOESS): fit a local polynomial to neighbours weighted by distance; used for smooth curve estimation.
- Condensed KNN: removes redundant training points to reduce storage.
- Weighted KNN: distance-weighted voting typically outperforms uniform KNN.
References
- Cover, T. & Hart, P. (1967). “Nearest neighbor pattern classification.” IEEE Transactions on Information Theory.
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. §13.3.