Credit Scoring

Problem

Estimate the probability that a loan applicant or existing borrower will default on a credit obligation within a specified time horizon (typically 12 or 24 months). The score is used to make binary approval/rejection decisions, set interest rates, determine credit limits, and manage portfolio risk. Credit scoring is one of the oldest and most regulated uses of statistical modelling — the Equal Credit Opportunity Act (ECOA) and Fair Credit Reporting Act (FCRA) impose strict constraints on model design, data use, and explanation obligations.

Users / Stakeholders

RoleDecision
Underwriter / credit analystApprove, decline, or counter-offer loan applications
Risk officerSet scorecards, approval thresholds, risk appetite
Collections teamPrioritise collection efforts on delinquent accounts
Capital marketsPrice securitised loan portfolios
Compliance officerEnsure fair lending; adverse action notice obligations
BorrowerUnderstands why application was declined

Domain Context

  • Regulatory framework:
    • US: ECOA (no discrimination on protected classes), FCRA (adverse action notice requirements), model risk management guidance (SR 11-7)
    • EU: GDPR Article 22 (right to explanation for automated decisions), CRR/Basel III (regulatory capital implications)
    • UK: FCA Consumer Duty
  • Scorecard tradition: Classic credit scoring uses logistic regression with Weight-of-Evidence binning — not because it’s optimal, but because it’s interpretable, auditable, and understood by regulators.
  • Adverse action notices: If a loan is declined using an automated model, the applicant has a legal right to know the top reasons (typically top 4 reason codes). These must be human-readable.
  • Protected class proxies: Zip code can be a proxy for race. Geographic variables require careful fairness analysis (disparate impact testing).
  • Through-the-cycle vs point-in-time: Regulatory capital models use through-the-cycle PDs (stable). Risk-based pricing uses point-in-time PDs (current economic conditions). Different model calibration objectives.
  • Data access: In traditional credit, data comes from bureau (Experian, Equifax, TransUnion) — structured, reliable, regulated. Alternative credit scoring (thin-file borrowers) uses non-traditional signals (rent, utility payments, telco, banking behaviour) — less regulated, higher fairness risk.

Inputs and Outputs

Traditional bureau-based features:

Payment history: delinquency_count_12m, worst_delinquency_ever, times_90d_late
Utilisation: revolving_balance / revolving_limit (keep < 30%)
Length of credit history: oldest_account_age_months, avg_account_age
New credit: hard_inquiries_12m, new_accounts_6m
Credit mix: n_revolving_accounts, n_installment_accounts, n_mortgage_accounts
Negative marks: bankruptcies, tax_liens, civil_judgements
Income / DTI: income_reported, debt_to_income_ratio (if application data)

Output:

pd_score:         Probability of Default ∈ [0, 1]
scorecard_score:  Scaled score (e.g., 300–850 FICO-style) for interpretability
decision:         APPROVE / DECLINE / MANUAL_REVIEW / COUNTER_OFFER
adverse_action_codes: Top 4 reasons for decline (human-readable)
risk_tier:        PRIME / NEAR_PRIME / SUBPRIME

Decision or Workflow Role

Application submitted → bureau data pulled (hard inquiry)
  ↓
Identity verification → KYC / AML check
  ↓
Score model → PD estimate
  ↓
Decision rules:
  pd_score < 0.02  → AUTO-APPROVE at standard rate
  0.02–0.08        → APPROVE at risk-adjusted rate
  0.08–0.15        → MANUAL_REVIEW (underwriter judgement)
  pd_score > 0.15  → AUTO-DECLINE + adverse action notice
  ↓
Loan originated → performance monitoring → vintage analysis → model refresh

Modeling / System Options

ApproachStrengthWeaknessWhen to use
Logistic regression (WoE scorecard)Fully interpretable; adverse action codes; regulator-familiarMisses non-linearity; feature engineering labourRegulated consumer credit; SR 11-7 compliance
XGBoost / LightGBM + SHAPHigher accuracy; handles non-linearity; SHAP reason codesRequires validation model risk frameworkChallenger model; internal risk management
Survival analysis (Cox PH)Models time-to-default; handles right-censoring properlyLess familiar to regulators; harder to explainPortfolio-level default timing; stress testing
Neural networkHighest raw accuracy on large datasetsBlack box by default; heavy model risk burdenInnovation programmes; challenger models
Rule-based expert scorecardAuditable; no ML risk; regulator-approvedManual, slow to update; misses complex patternsThin-file borrowers; regulatory fallback

Recommended for production: WoE logistic regression as champion (regulatory compliance). LightGBM + SHAP as challenger (higher accuracy, SHAP satisfies adverse action code requirement).

Deployment Constraints

  • Latency: Instant online credit decisions require <500ms total pipeline (bureau pull + scoring + decisioning).
  • Interpretability: Non-negotiable. Adverse action reason codes are a legal requirement in the US.
  • Model risk governance: SR 11-7 (US) requires model documentation, independent validation, ongoing monitoring, and board-level oversight.
  • Fairness: Disparate impact analysis on race, gender, age, national origin required. 4/5ths rule as a statistical test for disparate impact.
  • Model stability: Scorecard PSI (Population Stability Index) > 0.25 triggers mandatory review.
  • Vintage analysis: Loans take time to season. Model validation requires sufficient performance window (minimum 12 months).

Risks and Failure Modes

RiskDescriptionMitigation
Disparate impactModel discriminates against protected classes indirectlyRegular disparate impact testing; proxy variable removal
Model decayEconomic cycle changes default behaviourVintage monitoring; through-the-cycle recalibration
Population shiftOrigination strategy changes who appliesPSI monitoring; covariate shift testing
Label contaminationLoans approved by model → biased training data (rejects unknown)Reject inference; credit bureau industry data pooling
Champion model over-rejectionOverly conservative threshold → revenue lossPortfolio-level test with challenger; threshold optimisation
GamingConsumers manipulate credit signals for approvalModel hardening; lag features; bureau-verified data only

Success Metrics

MetricTargetNotes
Gini / AUC> 0.70 (consumer credit)Industry benchmark; higher is better
KS statistic> 0.40Discriminatory power between goods/bads
PSI (stability)< 0.10Population Stability Index over time
Default rate at approvalWithin ±15% of forecastModel calibration
Approval rateBusiness KPI vs risk appetiteTradeoff metric
Disparate impact ratio> 0.80 (80% rule)Fairness metric; legal threshold

References

  • Anderson, R. (2007). The Credit Scoring Toolkit. Oxford University Press.
  • Siddiqi, N. (2017). Intelligent Credit Scoring. Wiley.
  • Board of Governors of the Federal Reserve (2011). SR 11-7: Guidance on Model Risk Management.

Modeling

Reference Implementations

Adjacent Applications