Demand Forecasting

Problem

Predict future product demand at a specified granularity (SKU × location × week) to drive replenishment, production, and logistics decisions. Errors have asymmetric costs: overstocking ties up capital and generates waste; stockouts lose revenue and customer trust. The forecasting problem has two regimes — base demand (slow-moving trend + seasonality) and uplift demand (driven by promotions, pricing changes, external events).

Users / Stakeholders

RoleDecision driven by forecast
Supply chain plannerWhen and how much to order / produce
MerchandiserWhich products to promote; how deep to discount
Logistics managerWarehouse space allocation, distribution routing
FinanceWorking capital and inventory provision
Product managerNew product launch planning (cold start)

Forecasts are primarily consumed via a planning system (SAP, o9, Blue Yonder) — direct human consumption is at aggregate level (category, region). Individual SKU forecasts are machine-consumed.

Domain Context

  • Hierarchy: Demand is hierarchical (total → category → brand → SKU → store). Forecasts must be consistent across levels (hierarchical reconciliation: bottom-up, top-down, MinT).
  • Intermittent demand: Low-volume SKUs have many zeros. Standard ARIMA assumptions fail. Use Croston’s method or Tweedie regression.
  • Causal drivers: Promotions (type, depth, display), pricing elasticity, competitor actions, weather, holidays, product lifecycle stage.
  • External data: Macroeconomic indicators, weather APIs, Google Trends — high value but adds pipeline complexity.
  • Cold start: New SKUs have no history. Transfer learning from category-level model or feature-based regression.
  • Regulatory / privacy: Minimal — demand data is internal operational data. GDPR applies only if linked to identifiable customer purchases.
  • Seasonality patterns: May be multiple (weekly, annual, promotional calendar). Fourier terms or STL decomposition needed.

Inputs and Outputs

Input features:

Historical sales time series (unit sales, revenue)
Price (own + competitor)
Promotion flags (on_promotion, promo_type, discount_depth)
Calendar features (day_of_week, week_of_year, holiday_flag)
External: weather (temperature, precipitation), events
Lag features (lag_1w, lag_4w, lag_52w, rolling_mean_4w)
Product attributes (category, brand, weight, shelf_life)
Store attributes (size, location_type, demographic_index)

Output:

Point forecast:  ŷ_{t+h}  for horizons h = 1..12 weeks
Prediction intervals: P10, P50, P90 quantiles
Bias metric: MASE, wMAPE per SKU
Uncertainty flag: high-variance SKUs for human review

Data volumes: A mid-size retailer has 50K–500K SKU×store combinations. Full refresh weekly; rolling 2-year history. Batch inference on order of 10M rows per cycle.

Decision or Workflow Role

Forecast generation (weekly batch)
  ↓
Planners review exceptions (auto-approve if within tolerance)
  ↓
Replenishment orders generated in ERP/WMS
  ↓
Delivery + actual sales → feedback into training data
  ↓
Model performance monitoring → retraining trigger

The forecast is an input to a constrained optimisation (replenishment / allocation). The optimisation translates forecast + cost parameters into an order quantity. ML model quality directly impacts order quality and working capital.

Modeling / System Options

ApproachStrengthWeaknessWhen to use
Statistical (ARIMA, ETS)Interpretable, fast, well-understoodNo cross-series learning, limited causal featuresMature SKUs, stable demand, small portfolio
LightGBM + lag featuresHandles many causal features, cross-series learning, fastNo explicit temporal structureLarge portfolio, rich causal data
ProphetAutomatic seasonality + holidays, interpretableSlow at scale, weak on promotionsMedium portfolio, business stakeholder reports
DeepAR (LSTM)Captures complex patterns, probabilisticData-hungry, slow training, hard to debugVery large portfolio, sufficient history
N-BEATS / N-HiTSState-of-the-art point accuracy, no feature engineeringBlack box, complex to tuneAccuracy-critical portfolios
Hierarchical ensembleBest aggregate accuracyComplex reconciliation logicWhen consistency across levels is critical

Recommended starting point: LightGBM with lag features + calendar + promotion dummies. Hierarchically reconcile using MinT (shrinkage). Add DeepAR for high-velocity SKUs if compute budget allows.

See Time Series Implementation and Demand Forecasting Pipeline for code.

Deployment Constraints

  • Latency: Batch — weekly refresh. Not latency-sensitive. Full inference cycle budget: 2–4 hours for 10M rows.
  • Throughput: Parallel inference by product category. LightGBM handles millions of rows in minutes.
  • Interpretability: Planners need to understand why the forecast changed. SHAP waterfall charts per SKU at exception review. Feature importance must be stable across runs.
  • Stale forecast handling: If pipeline fails, fallback to last valid forecast. Staleness SLA: ≤14 days before mandatory escalation.
  • Retraining cadence: Weekly retrain on new actuals. Full retrain monthly with all history. Trigger alert if MASE > 1.5 on any category.

Risks and Failure Modes

RiskDescriptionMitigation
Data leakageUsing future promotion flags at train time as if knownStrict point-in-time feature construction; cutoff validation
Distribution shiftDemand patterns change at new stores, post-COVID, price changesSegment-specific models; rolling window training; drift monitor
Intermittent demandMAPE is undefined when actual = 0; mean over-weights outliersUse MASE; evaluate separately for intermittent SKUs
Promotion blindnessModel not exposed to future promotions at inferencePromotional calendar must be loaded into feature pipeline
Concept driftCategory trends change (e.g., energy drinks growing 30% YoY)Trend features; short training windows for fast-moving categories
Cold startNew SKUs have zero historyAttribute-based regression; seed from similar existing SKU
Outlier actualsOne-off events (store fire, supply disruption) inflate errorOutlier detection + annotation pipeline; conditional training

Success Metrics

MetricTargetNotes
wMAPE< 25% at SKU×weekWeighted by volume; less sensitive to low-volume SKUs
MASE< 1.0 (better than seasonal naïve)Model-agnostic, scale-invariant
BiasMPE
Fill rate> 97.5%Downstream business metric; driven by order quantity not just forecast
Inventory turnoverIncrease YoYTrue business outcome; harder to attribute to model alone
Planner override rate< 15%High override rate signals model distrust

References

  • Makridakis, S. et al. (2020). M5 Accuracy competition: Results, findings and conclusions.
  • Hyndman, R. & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.)
  • Salinas, D. et al. (2019). DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.

Modeling

ML Engineering

Reference Implementations

Adjacent Applications