Supervised Learning
Definition
Learning a parameterized mapping from labeled examples by minimizing empirical risk.
Intuition
Find parameters that best explain the training labels; generalization is the key challenge. The model must learn structure that transfers beyond the training set — not just memorize labels.
Formal Description
Setup: dataset , hypothesis class .
Empirical risk minimization:
where is the task loss and is an optional regularizer.
Task types:
| Task | Output | Loss |
|---|---|---|
| Binary classification | sigmoid | BCE |
| Multi-class classification | softmax | CE |
| Regression | linear | MSE / MAE |
| Structured output | sequence, bounding box | task-specific |
Bias-variance decomposition:
High bias → underfitting (model too simple); high variance → overfitting (model too complex). Regularization, more data, and architectural choices all shift this tradeoff.
Applications
- Image classification (ImageNet, medical imaging)
- Speech recognition
- Machine translation
- Fraud detection
- Medical diagnosis
Trade-offs
- Requires labeled data — expensive to collect and annotate
- Assumes train and test distributions match; distribution shift degrades performance
- Capacity vs. generalization tradeoff: larger models can overfit with little data
- More data generally helps but with diminishing returns; data quality often matters more than quantity
Links
- logistic_regression
- cross_entropy_loss
- data_splits_and_distribution (in 01_foundations/statistical_learning_theory)