Linear and Generalised Linear Models

Definition

Linear models predict a target as a linear function of input features (possibly after a link function transformation). GLMs extend this to non-Gaussian response distributions via a link function connecting the linear predictor to the response mean.

Intuition

Linear models are the baseline of statistical modelling: interpretable, fast, well-understood theoretically, and often surprisingly competitive. GLMs generalise them to count data, binary outcomes, and positive continuous outcomes — making them the workhorse of insurance pricing, clinical research, and any domain requiring interpretable predictions.

Formal Description

Linear Regression

Model:

OLS solution: minimises ; closed form: (requires invertible).

Assumptions (Gauss-Markov):

  1. Linearity:
  2. No multicollinearity: has full rank
  3. Homoscedasticity: constant
  4. No autocorrelation:

Regularised variants:

ModelPenaltyEffect
Ridge (L2)Shrinks coefficients, handles multicollinearity
Lasso (L1)Sparse solutions (feature selection)
Elastic NetCombines sparsity and stability

Logistic Regression

Binary classification via the logistic function:

Training: minimise negative log-likelihood (log-loss):

No closed form; solved via gradient descent or iteratively reweighted least squares (IRLS). The decision boundary is a hyperplane.

Coefficient interpretation: is the multiplicative change in odds for a one-unit increase in .

Generalised Linear Models (GLMs)

Three components:

  1. Random component: exponential family (Gaussian, Poisson, Gamma, Binomial, Tweedie, …)
  2. Linear predictor:
  3. Link function: , where

Common GLM families:

FamilyLinkUse case
GaussianIdentityContinuous symmetric targets
BinomialLogitBinary classification
PoissonLogCount data (e.g., claims frequency)
GammaLog/InversePositive continuous (e.g., claim severity)
TweedieLogZero-inflated positive continuous (pure premium)

Tweedie with variance power is the standard model for insurance pure premium (compound Poisson-Gamma).

Estimation: maximum likelihood via IRLS. Parameter covariance matrix enables Wald tests and confidence intervals for each coefficient.

Deviance: GLM analogue of RSS; measures goodness-of-fit relative to a saturated model. Null deviance − model deviance analogous to .

Multiplicative GLM (Insurance Standard)

With log link on Poisson or Gamma:

Each coefficient is a rating factor multiplying the base rate — directly interpretable and auditable for regulatory purposes.

Applications

  • Ridge/Lasso: high-dimensional feature matrices where
  • Logistic regression: credit scoring, fraud scoring baselines
  • Poisson GLM: claim frequency modelling (claims per year of exposure)
  • Gamma GLM: claim severity modelling (average cost per claim)
  • Tweedie GLM: direct pure premium modelling

Trade-offs

  • Linear models assume the true relationship is (approximately) linear or linearisable via the link function — violated by complex non-linear patterns.
  • Interpretability advantage over trees disappears when many interaction terms are added.
  • Lasso is brittle with highly correlated features; Elastic Net is more stable.