Decision Analysis and Business Metrics

Problem Context

A model with high AUC-ROC can still add zero business value if it is evaluated, thresholded, or deployed without considering the costs and benefits of each decision outcome. Decision analysis bridges the gap between statistical performance and operational utility.

Analytical Goal

Choose a decision policy — typically a classification threshold — and success metrics that maximise (or satisfy) business value, accounting for the asymmetric costs of different error types.

Data Considerations

The inputs to decision analysis are:

A calibrated model that outputs probabilities $\overset{p}{^} = P (y = 1 ∣ x)$ .
Estimates of the cost matrix: cost of false positives ( $C_{FP}$ ) and false negatives ( $C_{FN}$ ), and the benefit of true positives ( $B_{TP}$ ).
The base rate $π = P (y = 1)$ in the deployment population.

If the model is not calibrated, the probabilities cannot be interpreted as true probabilities and the cost-benefit calculation will be wrong. Calibrate with Platt scaling or isotonic regression before decision analysis (see Evaluation and Validation).

Method / Approach

Expected Value Framework

At threshold $τ$ , predict $\overset{y}{^} = 1$ iff $\overset{p}{^} \geq τ$ . The expected value per prediction is:

E V (τ) = B_{TP} \cdot P (\overset{y}{^} = 1, y = 1) - C_{FP} \cdot P (\overset{y}{^} = 1, y = 0) - C_{FN} \cdot P (\overset{y}{^} = 0, y = 1)

Expanded using the confusion matrix probabilities:

E V (τ) = B_{TP} \cdot TPR (τ) \cdot π - C_{FP} \cdot FPR (τ) \cdot (1 - π) - C_{FN} \cdot (1 - TPR (τ)) \cdot π

The optimal threshold maximises $E V (τ)$ ; it generally differs from 0.5.

Special case — equal cost of FP and FN: optimal threshold is 0.5. In most real problems costs are asymmetric.

Example (fraud detection):

$B_{TP}$ : fraudulent transaction amount saved ≈ £250
$C_{FP}$ : customer friction from blocked legitimate transaction ≈ £5
$C_{FN}$ : undetected fraud ≈ £250 (full loss)
Optimal threshold much lower than 0.5 (false negatives are expensive).

Threshold Selection in Practice

Compute precision–recall curve and F1 over the validation set.
Plot $E V (τ)$ as a function of $τ$ using estimated cost matrix.
Select $τ^{*} = arg max_{τ} E V (τ)$ .
Validate stability of $τ^{*}$ with sensitivity analysis (vary cost estimates ±50%).

import numpy as np
from sklearn.metrics import precision_recall_curve
 
y_prob = model.predict_proba(X_val)[:, 1]
precisions, recalls, thresholds = precision_recall_curve(y_val, y_prob)
 
B_TP, C_FP, C_FN = 250, 5, 250
pi = y_val.mean()
 
ev = (B_TP * recalls[:-1] * pi
      - C_FP * (1 - precisions[:-1]) / precisions[:-1] * recalls[:-1] * pi
      - C_FN * (1 - recalls[:-1]) * pi)
 
optimal_threshold = thresholds[np.argmax(ev)]

Metric Alignment

Choose the primary evaluation metric to match the deployment decision:

Deployment context	Appropriate metric
Ranking / prioritisation (top-k alerts)	Precision @ K, Average Precision
Binary accept/reject at fixed threshold	F1, G-mean, expected value
Probabilistic risk score	Brier score, log-loss, calibration curve
Imbalanced class, recall critical	PR-AUC, F-beta (β > 1)
Regulatory requirement on FPR	TPR @ fixed FPR

Goodhart’s Law: once a metric becomes a target, it ceases to be a good measure. Define a primary metric for optimisation and secondary metrics as guardrails; monitor both.

Cost-Benefit Analysis for Model Comparison

When comparing Model A vs Model B, compute:

Δ E V = E V_{A} (τ_{A}^{*}) - E V_{B} (τ_{B}^{*})

Both models should be evaluated at their own optimal thresholds, not a shared one.

Validation / Risks

Cost matrix uncertainty: rarely known precisely; perform sensitivity analysis.
Base rate shift: deployment population may have different $π$ than training data; decision thresholds may need recalibration.
Proxy metrics: offline metrics (AUC, F1) are proxies for business value; validate with A/B test before declaring model improvement.
Feedback loops: decisions affect future data (e.g., denied applicants don’t appear in future training data) → account for counterfactual outcomes.

Communication Notes

Frame results in business terms: “the model prevents £1.2M in annual fraud losses at a 3% increase in blocked legitimate transactions.”
Present sensitivity analysis to show robustness to cost estimates.
For regulatory audiences: show the fairness implications of threshold choices (see Fairness Metrics).

References

Provost, F. & Fawcett, T. (2013). Data Science for Business. O’Reilly. Chapters 7–8.
Drummond, C. & Holte, R. (2006). “Cost curves: An improved method for visualizing classifier performance.” Machine Learning.

Notes

Explorer

decision_analysis