02 — Data Science
Practical analytical reasoning from data: framing problems, understanding data, transforming it, and communicating findings. Answers: how do we understand data well enough to make sound modelling decisions?
Guiding question: What does the data tell us, and how do we prepare it for sound modelling?
This layer does NOT cover: mathematical derivations (→ 01_foundations), specific model families and training objectives (→ 03_modeling), production data pipelines (→ 05_ml_engineering), or business deployment decisions (→ 06_applications).
Sublayers
01 — Problem Framing
Translating a real-world task into a well-specified ML problem: output types, loss choice, baselines, and success criteria.
02 — Data Representation
How data is structured, encoded, and typed — tabular, text, image, and graph representations; schema design and data contracts.
03 — Exploratory Data Analysis
Univariate and bivariate analysis, missing value diagnosis, class imbalance, outlier detection, and distribution shape.
04 — Feature Engineering
Constructing new input features from raw data: numeric transforms, temporal encoding, categorical aggregates, target encoding, and leakage prevention.
05 — Experimentation and Validation
Data preprocessing pipelines, data validation (schema checks, drift tests, PSI), and experiment integrity.
06 — Interpretability and Communication
Model interpretability taxonomy, fairness metrics, and communicating analytical results to stakeholders.
07 — Decision Analysis and Business Metrics
Translating model outputs into business decisions: expected value, cost-benefit analysis, decision thresholds, and metric alignment.
Relationship to Other Layers
- ← 01 Foundations: probability, statistics, and linear algebra underpin every technique in this layer.
- → 03 Modeling: data prepared here feeds into model training and evaluation there.
- → 05 ML Engineering: production versions of data pipelines, feature stores, and validation gating live there.