Time Series Models
Definition
Models for sequential data where the order of observations matters and temporal dependencies carry predictive information. Covers classical statistical methods (ARIMA, state-space models) and modern neural approaches.
Intuition
A time series is a sequence of observations indexed by time. The key challenge is that observations are not i.i.d.: knowing yesterday’s value tells you something about today’s. The goal is to model this autocorrelation structure to make forecasts.
Formal Description
Stationarity
A time series is weakly stationary if:
- (constant mean)
- depends only on lag , not
Most classical methods assume stationarity. Augmented Dickey-Fuller (ADF) test checks for unit roots (non-stationarity).
Differencing to achieve stationarity: . The order of differencing is chosen such that is stationary.
ARIMA
AR() — Autoregressive:
MA() — Moving Average:
ARMA():
ARIMA(): Apply -th differencing to , then fit ARMA().
SARIMA()(): adds seasonal AR, I, MA terms at lag (e.g., for monthly data).
Order selection:
- ACF (autocorrelation function): significant at lags 1..q → MA(q)
- PACF (partial autocorrelation): significant at lags 1..p → AR(p)
- Use AIC/BIC for model selection;
auto_arimafrom pmdarima automates this.
Exponential Smoothing (ETS)
Weighted average of past observations, with exponentially decaying weights:
Simple ES:
Holt-Winters (Triple ES): models level, trend, and seasonality. ETS model:
- Error type (Additive/Multiplicative)
- Trend type (None/Additive/Additive-damped)
- Seasonal type (None/Additive/Multiplicative)
Multiplicative seasonality is appropriate when seasonal fluctuations are proportional to the level.
State-Space Models (Local Linear Trend)
The Kalman filter computes analytically for linear-Gaussian systems.
VAR (Vector Autoregression)
Multivariate extension of AR for time series:
Models cross-series dependencies. Used in macroeconomics; Granger causality tests check whether one series helps predict another.
Neural Time Series Models
LSTM/GRU: sequence-to-sequence models that learn long-range dependencies from data (see recurrent_networks).
Temporal Convolutional Networks (TCN): dilated causal convolutions; can outperform LSTMs with faster training.
Transformer-based: Informer, Autoformer, PatchTST for long-horizon forecasting; attention captures long-range dependencies without sequential processing.
N-BEATS, N-HiTS: pure neural, interpretable forecasting; state of the art on M4 benchmark.
Evaluation Metrics
| Metric | Formula | Properties | ||||
|---|---|---|---|---|---|---|
| MAE | $\frac{1}{T}\sum | y_t - \hat{y}_t | $ | Scale-dependent, robust to outliers | ||
| RMSE | Penalises large errors | |||||
| MAPE | $\frac{100}{T}\sum\frac{ | y_t - \hat{y}_t | }{ | y_t | }$ | Scale-free; undefined for |
| MASE | MAE normalised by naive in-sample MAE | Scale-free and meaningful |
Applications
- Insurance: claims reserving (chain-ladder, stochastic development methods)
- Finance: volatility forecasting (GARCH), algorithmic trading signals
- Demand forecasting: inventory management, supply chain
- Anomaly detection: detecting unusual spikes in time series metrics
Trade-offs
- ARIMA: interpretable, fast, principled; limited to linear dependencies and stationary processes.
- LSTM/Transformers: capture non-linear temporal patterns; require more data and tuning.
- Choose classical methods when or explainability is required; neural methods for large-scale, complex patterns.