Supervised Learning

Definition

Learning a parameterized mapping $f_{θ} : X \to Y$ from labeled examples by minimizing empirical risk.

Intuition

Find parameters that best explain the training labels; generalization is the key challenge. The model must learn structure that transfers beyond the training set — not just memorize labels.

Formal Description

Setup: dataset ${(x^{(i)}, y^{(i)})}_{i = 1}^{m}$ , hypothesis class ${f_{θ}}$ .

Empirical risk minimization:

$\hat{θ} = ar g min_{θ} \frac{1}{m} \sum_{i = 1}^{m} ℓ (f_{θ} (x^{(i)}), y^{(i)}) + R (θ)$

where $ℓ$ is the task loss and $R (θ)$ is an optional regularizer.

Task types:

Task	Output	Loss
Binary classification	sigmoid	BCE
Multi-class classification	softmax	CE
Regression	linear	MSE / MAE
Structured output	sequence, bounding box	task-specific

Bias-variance decomposition:

$E [error] = bias^{2} + variance + irreducible noise$

High bias → underfitting (model too simple); high variance → overfitting (model too complex). Regularization, more data, and architectural choices all shift this tradeoff.

Applications

Image classification (ImageNet, medical imaging)
Speech recognition
Machine translation
Fraud detection
Medical diagnosis

Trade-offs

Requires labeled data — expensive to collect and annotate
Assumes train and test distributions match; distribution shift degrades performance
Capacity vs. generalization tradeoff: larger models can overfit with little data
More data generally helps but with diminishing returns; data quality often matters more than quantity

Notes

Explorer

supervised_learning

Supervised Learning

Definition

Intuition

Formal Description

Applications

Trade-offs

Links

Graph View

Table of Contents

Backlinks