Linear and Generalised Linear Models
Definition
Linear models predict a target as a linear function of input features (possibly after a link function transformation). GLMs extend this to non-Gaussian response distributions via a link function connecting the linear predictor to the response mean.
Intuition
Linear models are the baseline of statistical modelling: interpretable, fast, well-understood theoretically, and often surprisingly competitive. GLMs generalise them to count data, binary outcomes, and positive continuous outcomes — making them the workhorse of insurance pricing, clinical research, and any domain requiring interpretable predictions.
Formal Description
Linear Regression
Model:
OLS solution: minimises ; closed form: (requires invertible).
Assumptions (Gauss-Markov):
- Linearity:
- No multicollinearity: has full rank
- Homoscedasticity: constant
- No autocorrelation:
Regularised variants:
| Model | Penalty | Effect |
|---|---|---|
| Ridge (L2) | Shrinks coefficients, handles multicollinearity | |
| Lasso (L1) | Sparse solutions (feature selection) | |
| Elastic Net | Combines sparsity and stability |
Logistic Regression
Binary classification via the logistic function:
Training: minimise negative log-likelihood (log-loss):
No closed form; solved via gradient descent or iteratively reweighted least squares (IRLS). The decision boundary is a hyperplane.
Coefficient interpretation: is the multiplicative change in odds for a one-unit increase in .
Generalised Linear Models (GLMs)
Three components:
- Random component: exponential family (Gaussian, Poisson, Gamma, Binomial, Tweedie, …)
- Linear predictor:
- Link function: , where
Common GLM families:
| Family | Link | Use case |
|---|---|---|
| Gaussian | Identity | Continuous symmetric targets |
| Binomial | Logit | Binary classification |
| Poisson | Log | Count data (e.g., claims frequency) |
| Gamma | Log/Inverse | Positive continuous (e.g., claim severity) |
| Tweedie | Log | Zero-inflated positive continuous (pure premium) |
Tweedie with variance power is the standard model for insurance pure premium (compound Poisson-Gamma).
Estimation: maximum likelihood via IRLS. Parameter covariance matrix enables Wald tests and confidence intervals for each coefficient.
Deviance: GLM analogue of RSS; measures goodness-of-fit relative to a saturated model. Null deviance − model deviance analogous to .
Multiplicative GLM (Insurance Standard)
With log link on Poisson or Gamma:
Each coefficient is a rating factor multiplying the base rate — directly interpretable and auditable for regulatory purposes.
Applications
- Ridge/Lasso: high-dimensional feature matrices where
- Logistic regression: credit scoring, fraud scoring baselines
- Poisson GLM: claim frequency modelling (claims per year of exposure)
- Gamma GLM: claim severity modelling (average cost per claim)
- Tweedie GLM: direct pure premium modelling
Trade-offs
- Linear models assume the true relationship is (approximately) linear or linearisable via the link function — violated by complex non-linear patterns.
- Interpretability advantage over trees disappears when many interaction terms are added.
- Lasso is brittle with highly correlated features; Elastic Net is more stable.