Multi-Task Learning
Definition
Training a single model to perform multiple related tasks simultaneously, sharing representations across tasks to improve generalization and data efficiency.
Intuition
Related tasks share useful low-level structure; learning them jointly forces the shared layers to represent features that are broadly useful, acting as implicit regularization.
Formal Description
Architecture: shared encoder/backbone + separate task-specific heads; loss is a weighted sum:
Hard parameter sharing: shared hidden layers with separate output heads (standard approach).
Soft parameter sharing: separate models with regularization encouraging similar parameters (less common).
Unlike transfer learning: all tasks are trained simultaneously; there is no separate pre-training phase.
Works well when:
- Tasks share low- or mid-level features
- Each task has insufficient data alone
- Tasks have similar levels of difficulty
Requirement for beneficial MTL: tasks must share useful features; if not, separate models perform better.
Applications
- Autonomous driving: simultaneously predict lane lines, pedestrians, traffic signs
- NLP: POS tagging + NER + parsing in a single model
- Medical imaging: multi-label classification over pathologies
Trade-offs
- Negative transfer if tasks are too dissimilar
- Loss weighting requires tuning; a dominant task can crowd out others
- Harder to debug than single-task models
- Task difficulty imbalance can cause uneven convergence