05 — Experimentation and Validation
Data preprocessing pipelines and validation gates that ensure data quality, distributional integrity, and experiment reproducibility before model training.
Notes
- Data Preprocessing — missing value imputation, scaling, categorical encoding, sklearn Pipelines, distribution transforms, outlier handling
- Data Validation — schema checks, KS test, PSI, Great Expectations, referential integrity, serving-time validation
- B Testing and Experiment Design — hypothesis testing, power analysis, sequential testing, common pitfalls
- Causal Inference — potential outcomes, DAGs, identification strategies, HTE estimation
- B Testing and Experiment Design — hypothesis testing, power analysis, sample size, sequential testing, Type I/II errors, SUTVA, common pitfalls
Links
← 04 — Feature Engineering → 06 — Interpretability and Communication