Partial Dependence Plots, ICE, and ALE

Definition

Methods for visualizing the marginal effect of one or two features on a model’s predicted outcome, averaged (PDP) or shown per instance (ICE), or estimated via local contrasts to avoid extrapolation artifacts (ALE).

Intuition

A partial dependence plot asks: “If I fix all other features and vary only feature $X_{j}$ , how does the prediction change on average?” It marginalizes over the distribution of all other features. ICE shows the same relationship for each individual training point, revealing heterogeneity that PDP hides by averaging. ALE avoids the extrapolation problem of PDP by conditioning on small intervals rather than marginalizing globally.

Formal Description

Partial Dependence Plot (PDP)

For feature(s) $X_{S}$ and model $\hat{f}$ :

\hat{f}_{S} (x_{S}) = E_{X_{C}} [\hat{f} (x_{S}, X_{C})] \approx \frac{1}{n} i = 1 \sum n \hat{f} (x_{S}, x_{C}^{(i)})

where $X_{C}$ are the complement features. Implementation: grid over $x_{S}$ values; for each grid value, replace $X_{S}$ for all training instances, compute predictions, average. Two-way PDPs show interaction effects between two features.

Limitations:

Assumes feature independence; when features are correlated, marginalizing over $X_{C}$ creates unrealistic input combinations (e.g., forcing height=200cm with weight=40kg simultaneously)
Shows the average effect; heterogeneous subgroup effects are hidden
Computationally $O (n \cdot G)$ where $G$ is the grid size

Individual Conditional Expectation (ICE)

Plot one line per training instance: how does the prediction for instance $i$ change as $X_{j}$ varies? The PDP is the mean of all ICE curves.

\hat{f}_{j}^{(i)} (x_{j}) = \hat{f} (x_{j}, x_{- j}^{(i)})

Centered ICE (c-ICE): subtract each curve’s value at a reference point $x_{j}^{0}$ to highlight interaction effects: $\tilde{f}_{j}^{(i)} (x_{j}) = \hat{f}_{j}^{(i)} (x_{j}) - \hat{f}_{j}^{(i)} (x_{j}^{0})$ .

ICE reveals when the PDP average masks heterogeneous effects (some instances have positive, others negative marginal effects).

Accumulated Local Effects (ALE)

ALE avoids the unrealistic combinations problem by estimating feature effects using local differences within narrow data-conditional intervals:

\hat{ALE}_{j} (x) = k = 1 \sum k_{x} \frac{1}{∣ N _{j} ( k ) ∣} i \in N_{j} (k) \sum [\hat{f} (z_{k, j}, x_{- j}^{(i)}) - \hat{f} (z_{k - 1, j}, x_{- j}^{(i)})]

where $z_{k - 1, j}$ and $z_{k, j}$ are the interval boundaries and $N_{j} (k)$ is the set of training instances falling in interval $k$ . ALE accumulates these local differences (hence the name) and centers the result to have zero mean.

ALE advantages over PDP:

Unbiased with correlated features
Faster to compute (no full grid substitution)
Interpretable as the local effect of changing a feature within its observed range

Applications

Understanding non-linear feature effects in tree models and neural networks
Regulatory explanations (“how does income affect predicted probability of default across its range?“)
Model debugging (detecting unexpected non-monotonicities or discontinuities)
Two-way PDPs for identifying feature interactions

Trade-offs

Method	Handles correlation	Shows heterogeneity	Speed	Extrapolation
PDP	No	No (averaged)	Medium	Yes (problem)
ICE	No	Yes	Medium	Yes (problem)
ALE	Yes	No	Fast	No

PDPs are widely understood and easy to explain; ALE is preferred when features are correlated
ICE plots can be cluttered for large datasets; subsample for visualization
All methods show marginal effects, not causal effects — correlational feature structure affects interpretation

Notes

Explorer

partial_dependence_ice

Partial Dependence Plots, ICE, and ALE

Definition

Intuition

Formal Description

Partial Dependence Plot (PDP)

Individual Conditional Expectation (ICE)

Accumulated Local Effects (ALE)

Applications

Trade-offs

Links

Graph View

Table of Contents

Backlinks