Fraud Detection
Problem
Identify fraudulent transactions — payments, account takeovers, application fraud, insurance claims — in real time to decline, flag, or challenge them before loss is incurred. Fraud detection operates under extreme class imbalance (0.1–2% fraud rate), adversarial conditions (fraudsters adapt to detection logic), and strict latency constraints (credit card authorisation: <100ms).
Three distinct fraud types with different modelling approaches:
- Transaction fraud: Stolen card use or account compromise — real-time pattern detection
- Application fraud: Synthetic identity or document forgery at account opening — multi-signal verification
- Account takeover (ATO): Login credential theft — behavioural biometrics + device intelligence
Users / Stakeholders
| Role | Decision |
|---|---|
| Fraud operations analyst | Review flagged transactions; rule tuning |
| Risk officer | Set risk appetite; fraud loss vs customer friction tradeoff |
| Cardholder | Challenged or declined transaction experience |
| Compliance / BSA officer | SAR filing; AML programme |
| Product team | False positive rate vs fraud rate tradeoff; feature design |
Domain Context
- Extreme class imbalance: 0.1–1% fraud rate. SMOTE/oversampling, class weighting, and calibrated scoring are essential.
- Ground truth delay: Chargebacks (confirmed fraud) may arrive 30–90 days after transaction. Training on recent data requires label propagation.
- Adversarial dynamics: Fraudsters observe decline reasons and adapt. Model performance degrades faster than typical ML — retraining cadence critical (weekly or daily).
- Cost asymmetry: False negative (missed fraud) costs 100% of transaction value. False positive (blocked legitimate transaction) costs ~$20 in customer service + cart abandonment rate of 30–40%.
- Regulatory: PCI-DSS requires data minimisation and audit trails. GDPR Article 22 restricts fully automated adverse decisions without human review pathway. US: ECOA, FCRA requirements for dispute handling.
- Network effects: Fraud rings share cards/accounts. Graph features (shared device, IP, address) are extremely high-value signals.
Inputs and Outputs
Features:
Transaction: amount, merchant_category, country, channel (online/POS), currency
Card: card_age, card_type, issuer_country, previous_decline_count
Velocity: txn_count_1h, txn_count_24h, amount_sum_1h, unique_merchants_7d
Behavioural: time_since_last_txn, typical_merchant_category_match, amount_deviation_from_mean
Device: device_fingerprint, ip_address, is_new_device, vpn_flag, geolocation
Graph: is_known_fraud_network, shared_ip_flag, shared_device_other_accounts
History: chargeback_history, dispute_count, address_change_recency
Output:
fraud_score: P(fraud) ∈ [0, 1]
decision: APPROVE / STEP_UP (3DS challenge) / DECLINE
challenge_type: OTP / BIOMETRIC / MANUAL_REVIEW
reason_codes: Top fraud signals for analyst review (SHAP-based)
Decision or Workflow Role
Transaction arrives → real-time feature computation (<10ms)
↓
Rules engine: hard blocks (known bad IP, stolen card list)
↓
ML scoring: fraud probability (<30ms)
↓
Decision matrix:
score < 0.1 → APPROVE
0.1–0.5 → STEP_UP (3DS challenge)
score > 0.5 → DECLINE or MANUAL_REVIEW
↓
Outcome logged → chargeback matching → label pipeline → retraining
The ML model operates alongside a rules engine, not instead of it. Rules provide hard constraints (block known bad actors); ML handles pattern generalisation.
Modeling / System Options
| Approach | Strength | Weakness | When to use |
|---|---|---|---|
| Logistic regression + engineered features | Interpretable; fast; solid baseline | Misses non-linear interactions | Baseline; interpretability-first |
| XGBoost / LightGBM | High accuracy; handles class imbalance well; SHAP interpretable | Batch features only; static model | Standard production choice |
| Graph Neural Network | Captures fraud ring structure; high-value network signals | Requires graph infrastructure; complex | Large-scale with graph data |
| Isolation Forest (unsupervised) | Catches novel fraud patterns without labels | High false positive rate; needs tuning | Cold start; novel fraud type detection |
| Real-time streaming model | Low-latency; captures velocity patterns | Complex infrastructure (Kafka + Flink) | Card-present transactions at bank scale |
Recommended: XGBoost/LightGBM with engineered velocity and graph features. Real-time feature computation via Redis or a feature store. Weekly retraining. Rules engine for hard blocks.
Deployment Constraints
- Latency: <100ms total decision latency for card authorisation. ML scoring budget ~30ms.
- Throughput: Large card networks: 10K–65K transactions/second peak.
- Interpretability: Regulatory audit trail for declined transactions. Reason codes required for customer dispute process.
- Model versioning: Shadow mode deployment for new models. Champion-challenger A/B with holdout fraud measurement.
- Feature freshness: Velocity features (txn_count_1h) must be computed in real time. Stale features = missed fraud.
Risks and Failure Modes
| Risk | Description | Mitigation |
|---|---|---|
| Model staleness | Fraud patterns shift; old model misses new schemes | Weekly retraining; concept drift monitoring |
| Label noise | Chargebacks don’t map 1:1 to fraud events | Tiered label confidence; semi-supervised approach |
| False positive harm | Declining legitimate transactions causes churn | Customer friction metrics; segment-specific thresholds |
| Adversarial adaptation | Fraudsters probe system to find low-score paths | Regular pattern audits; ensemble models; dark web intelligence |
| Cascade failure | ML service down → default to rule-only → higher fraud | Fallback strategy; circuit breaker; default-to-challenge policy |
| Demographic disparity | Model declines higher rates for certain groups | Fairness audit; equalised FPR across demographics |
Success Metrics
| Metric | Target | Notes |
|---|---|---|
| Fraud detection rate (recall) | > 90% at 1% FPR | Business-critical; missed fraud = direct loss |
| False positive rate | < 1% of legitimate transactions | Customer friction KPI |
| Fraud loss rate (bps) | Industry benchmark: 5–15bps of GMV | Ultimate financial metric |
| Model latency p99 | < 30ms | Operational SLA |
| Analyst review queue clearance | > 95% within 4h | Operational efficiency |
| Chargeback rate | < 1% of transactions | Card network threshold |
References
- Dal Pozzolo, A. et al. (2015). Calibrating Probability with Undersampling for Unbalanced Classification. CIDM.
- van Belle, V. et al. (2021). GOTCHA! Network-based Fraud Detection for Social Security Fraud. Management Science.
Links
Modeling
- Gradient Boosting — XGBoost/LightGBM for fraud classification
- Unsupervised Learning — anomaly detection baselines
ML Engineering
- Deployment and Serving — real-time low-latency inference
- Monitoring and Observability — fraud rate and model drift
Reference Implementations
Adjacent Applications