05 — Experimentation and Validation

Data preprocessing pipelines and validation gates that ensure data quality, distributional integrity, and experiment reproducibility before model training.

Notes

  • Data Preprocessing — missing value imputation, scaling, categorical encoding, sklearn Pipelines, distribution transforms, outlier handling
  • Data Validation — schema checks, KS test, PSI, Great Expectations, referential integrity, serving-time validation
  • B Testing and Experiment Design — hypothesis testing, power analysis, sequential testing, common pitfalls
  • Causal Inference — potential outcomes, DAGs, identification strategies, HTE estimation
  • B Testing and Experiment Design — hypothesis testing, power analysis, sample size, sequential testing, Type I/II errors, SUTVA, common pitfalls

04 — Feature Engineering06 — Interpretability and Communication