05 — ML Engineering
Engineering discipline for designing, building, deploying, and operating production-grade ML systems — from raw data ingestion through to monitored, retraining-ready production deployments.
Guiding question: “How do we productionize and operate ML systems?”
This layer does NOT cover: foundation-model-based systems (→ 06_ai_engineering), model family selection and statistical trade-offs (→ 03_modeling), or mathematical derivations (→ 01_foundations).
Sublayers
01 — Principles & Lifecycle
End-to-end ML system thinking: reliability contracts, scalability, and the iterative development philosophy.
02 — Data Engineering
Data pipelines, ETL/ELT, batch vs stream processing, feature stores.
03 — Training Data
Data labeling, class imbalance handling, augmentation, dataset versioning.
04 — Feature Engineering
Numeric/categorical/temporal feature transforms, embeddings, data leakage prevention.
05 — Model Development
Experiment tracking, distributed training, offline evaluation, reproducibility.
06 — Deployment & Serving
Batch/online/edge serving patterns, model compression, rollout strategies.
07 — Monitoring & Observability
Drift detection, operational metrics, alerting, incident response.
08 — Continual Learning
Retraining triggers, stateless vs stateful updates, test-in-production.
09 — Infrastructure & Platform
Orchestration, ML platform architecture, environment and dependency management.
Relationship to Other Layers
- 03_modeling — model families and training dynamics; this layer handles productionizing them
- 04_software_engineering — general software patterns applied to ML system concerns
- 06_ai_engineering — foundation-model-specific workflows that build on top of this layer’s infrastructure