Transfer Learning

Definition

Leveraging representations or parameters learned on a source task/domain to improve learning on a related target task/domain, typically by fine-tuning a pretrained model.

Intuition

Lower-level features (edges, textures, syntax) are shared across tasks; it’s wasteful to relearn them from scratch; the more similar the source and target, the more transferable the representations.

Formal Description

Full fine-tuning: initialize with pretrained weights $θ_{pre}$ , then train all layers on the target task with a small learning rate.

Head-only fine-tuning (feature extraction): freeze pretrained layers, train only a new output head; useful when target data is small.

When to use transfer:

Source and target have overlapping low-level features
Source dataset is much larger than target
Target data is scarce
Compute budget is limited

Pre-training → fine-tuning workflow:

Train on large source dataset
Replace/reinitialize output layer for target task
Optionally freeze early layers
Fine-tune with lower learning rate than initial training

Negative transfer: if source and target distributions are too different, pretrained features may hurt rather than help.

Applications

Vision: ImageNet → medical imaging, autonomous driving
NLP: BERT/GPT → downstream classification, QA, summarization
Speech: large ASR model → specific accents or domains

Trade-offs

Requires compatible architectures between source and target
Fine-tuning all layers needs non-trivial target data to avoid catastrophic forgetting
Hyperparameter sensitivity (learning rate, number of layers to freeze)
Pretrained models can encode biases from source data

Notes

Explorer

transfer_learning

Transfer Learning

Definition

Intuition

Formal Description

Applications

Trade-offs

Links

Graph View

Table of Contents

Backlinks