Fraud Detection

Problem

Identify fraudulent transactions — payments, account takeovers, application fraud, insurance claims — in real time to decline, flag, or challenge them before loss is incurred. Fraud detection operates under extreme class imbalance (0.1–2% fraud rate), adversarial conditions (fraudsters adapt to detection logic), and strict latency constraints (credit card authorisation: <100ms).

Three distinct fraud types with different modelling approaches:

  1. Transaction fraud: Stolen card use or account compromise — real-time pattern detection
  2. Application fraud: Synthetic identity or document forgery at account opening — multi-signal verification
  3. Account takeover (ATO): Login credential theft — behavioural biometrics + device intelligence

Users / Stakeholders

RoleDecision
Fraud operations analystReview flagged transactions; rule tuning
Risk officerSet risk appetite; fraud loss vs customer friction tradeoff
CardholderChallenged or declined transaction experience
Compliance / BSA officerSAR filing; AML programme
Product teamFalse positive rate vs fraud rate tradeoff; feature design

Domain Context

  • Extreme class imbalance: 0.1–1% fraud rate. SMOTE/oversampling, class weighting, and calibrated scoring are essential.
  • Ground truth delay: Chargebacks (confirmed fraud) may arrive 30–90 days after transaction. Training on recent data requires label propagation.
  • Adversarial dynamics: Fraudsters observe decline reasons and adapt. Model performance degrades faster than typical ML — retraining cadence critical (weekly or daily).
  • Cost asymmetry: False negative (missed fraud) costs 100% of transaction value. False positive (blocked legitimate transaction) costs ~$20 in customer service + cart abandonment rate of 30–40%.
  • Regulatory: PCI-DSS requires data minimisation and audit trails. GDPR Article 22 restricts fully automated adverse decisions without human review pathway. US: ECOA, FCRA requirements for dispute handling.
  • Network effects: Fraud rings share cards/accounts. Graph features (shared device, IP, address) are extremely high-value signals.

Inputs and Outputs

Features:

Transaction: amount, merchant_category, country, channel (online/POS), currency
Card: card_age, card_type, issuer_country, previous_decline_count
Velocity: txn_count_1h, txn_count_24h, amount_sum_1h, unique_merchants_7d
Behavioural: time_since_last_txn, typical_merchant_category_match, amount_deviation_from_mean
Device: device_fingerprint, ip_address, is_new_device, vpn_flag, geolocation
Graph: is_known_fraud_network, shared_ip_flag, shared_device_other_accounts
History: chargeback_history, dispute_count, address_change_recency

Output:

fraud_score:       P(fraud) ∈ [0, 1]
decision:          APPROVE / STEP_UP (3DS challenge) / DECLINE
challenge_type:    OTP / BIOMETRIC / MANUAL_REVIEW
reason_codes:      Top fraud signals for analyst review (SHAP-based)

Decision or Workflow Role

Transaction arrives → real-time feature computation (<10ms)
  ↓
Rules engine: hard blocks (known bad IP, stolen card list)
  ↓
ML scoring: fraud probability (<30ms)
  ↓
Decision matrix:
  score < 0.1   → APPROVE
  0.1–0.5       → STEP_UP (3DS challenge)  
  score > 0.5   → DECLINE or MANUAL_REVIEW
  ↓
Outcome logged → chargeback matching → label pipeline → retraining

The ML model operates alongside a rules engine, not instead of it. Rules provide hard constraints (block known bad actors); ML handles pattern generalisation.

Modeling / System Options

ApproachStrengthWeaknessWhen to use
Logistic regression + engineered featuresInterpretable; fast; solid baselineMisses non-linear interactionsBaseline; interpretability-first
XGBoost / LightGBMHigh accuracy; handles class imbalance well; SHAP interpretableBatch features only; static modelStandard production choice
Graph Neural NetworkCaptures fraud ring structure; high-value network signalsRequires graph infrastructure; complexLarge-scale with graph data
Isolation Forest (unsupervised)Catches novel fraud patterns without labelsHigh false positive rate; needs tuningCold start; novel fraud type detection
Real-time streaming modelLow-latency; captures velocity patternsComplex infrastructure (Kafka + Flink)Card-present transactions at bank scale

Recommended: XGBoost/LightGBM with engineered velocity and graph features. Real-time feature computation via Redis or a feature store. Weekly retraining. Rules engine for hard blocks.

Deployment Constraints

  • Latency: <100ms total decision latency for card authorisation. ML scoring budget ~30ms.
  • Throughput: Large card networks: 10K–65K transactions/second peak.
  • Interpretability: Regulatory audit trail for declined transactions. Reason codes required for customer dispute process.
  • Model versioning: Shadow mode deployment for new models. Champion-challenger A/B with holdout fraud measurement.
  • Feature freshness: Velocity features (txn_count_1h) must be computed in real time. Stale features = missed fraud.

Risks and Failure Modes

RiskDescriptionMitigation
Model stalenessFraud patterns shift; old model misses new schemesWeekly retraining; concept drift monitoring
Label noiseChargebacks don’t map 1:1 to fraud eventsTiered label confidence; semi-supervised approach
False positive harmDeclining legitimate transactions causes churnCustomer friction metrics; segment-specific thresholds
Adversarial adaptationFraudsters probe system to find low-score pathsRegular pattern audits; ensemble models; dark web intelligence
Cascade failureML service down → default to rule-only → higher fraudFallback strategy; circuit breaker; default-to-challenge policy
Demographic disparityModel declines higher rates for certain groupsFairness audit; equalised FPR across demographics

Success Metrics

MetricTargetNotes
Fraud detection rate (recall)> 90% at 1% FPRBusiness-critical; missed fraud = direct loss
False positive rate< 1% of legitimate transactionsCustomer friction KPI
Fraud loss rate (bps)Industry benchmark: 5–15bps of GMVUltimate financial metric
Model latency p99< 30msOperational SLA
Analyst review queue clearance> 95% within 4hOperational efficiency
Chargeback rate< 1% of transactionsCard network threshold

References

  • Dal Pozzolo, A. et al. (2015). Calibrating Probability with Undersampling for Unbalanced Classification. CIDM.
  • van Belle, V. et al. (2021). GOTCHA! Network-based Fraud Detection for Social Security Fraud. Management Science.

Modeling

ML Engineering

Reference Implementations

Adjacent Applications