Linear and Generalised Linear Models

Definition

Linear models predict a target as a linear function of input features (possibly after a link function transformation). GLMs extend this to non-Gaussian response distributions via a link function connecting the linear predictor to the response mean.

Intuition

Linear models are the baseline of statistical modelling: interpretable, fast, well-understood theoretically, and often surprisingly competitive. GLMs generalise them to count data, binary outcomes, and positive continuous outcomes — making them the workhorse of insurance pricing, clinical research, and any domain requiring interpretable predictions.

Formal Description

Linear Regression

Model: $\overset{y}{^} = w^{⊤} x + b$

OLS solution: minimises $RSS = ∥ y - X w ∥^{2}$ ; closed form: $\hat{w} = (X^{⊤} X)^{- 1} X^{⊤} y$ (requires $X^{⊤} X$ invertible).

Assumptions (Gauss-Markov):

Linearity: $E [y ∣ x] = w^{⊤} x$
No multicollinearity: $X^{⊤} X$ has full rank
Homoscedasticity: $Var [ε] = σ^{2}$ constant
No autocorrelation: $Cov [ε_{i}, ε_{j}] = 0$

Regularised variants:

Model	Penalty	Effect
Ridge (L2)	$λ ∥ w ∥_{2}^{2}$	Shrinks coefficients, handles multicollinearity
Lasso (L1)	$λ ∥ w ∥_{1}$	Sparse solutions (feature selection)
Elastic Net	$λ_{1} ∥ w ∥_{1} + λ_{2} ∥ w ∥_{2}^{2}$	Combines sparsity and stability

Logistic Regression

Binary classification via the logistic function:

P (y = 1∣ x) = σ (w^{⊤} x) = \frac{1}{1 + e ^{- w^{⊤} x}}

Training: minimise negative log-likelihood (log-loss):

L = - \frac{1}{n} i \sum [y_{i} lo g \overset{p}{^}_{i} + (1 - y_{i}) lo g (1 - \overset{p}{^}_{i})]

No closed form; solved via gradient descent or iteratively reweighted least squares (IRLS). The decision boundary ${x : w^{⊤} x = 0}$ is a hyperplane.

Coefficient interpretation: $e^{w_{j}}$ is the multiplicative change in odds for a one-unit increase in $x_{j}$ .

Generalised Linear Models (GLMs)

Three components:

Random component: $y \sim$ exponential family (Gaussian, Poisson, Gamma, Binomial, Tweedie, …)
Linear predictor: $η = w^{⊤} x$
Link function: $g (μ) = η$ , where $μ = E [y ∣ x]$

Common GLM families:

Family	Link	Use case
Gaussian	Identity	Continuous symmetric targets
Binomial	Logit	Binary classification
Poisson	Log	Count data (e.g., claims frequency)
Gamma	Log/Inverse	Positive continuous (e.g., claim severity)
Tweedie	Log	Zero-inflated positive continuous (pure premium)

Tweedie with variance power $p \in (1, 2)$ is the standard model for insurance pure premium (compound Poisson-Gamma).

Estimation: maximum likelihood via IRLS. Parameter covariance matrix enables Wald tests and confidence intervals for each coefficient.

Deviance: GLM analogue of RSS; measures goodness-of-fit relative to a saturated model. Null deviance − model deviance analogous to $R^{2}$ .

Multiplicative GLM (Insurance Standard)

With log link on Poisson or Gamma: $μ = e^{w^{⊤} x} = \prod_{j} e^{w_{j} x_{j}}$

Each coefficient $e^{w_{j}}$ is a rating factor multiplying the base rate — directly interpretable and auditable for regulatory purposes.

Applications

Ridge/Lasso: high-dimensional feature matrices where $d ≫ n$
Logistic regression: credit scoring, fraud scoring baselines
Poisson GLM: claim frequency modelling (claims per year of exposure)
Gamma GLM: claim severity modelling (average cost per claim)
Tweedie GLM: direct pure premium modelling

Trade-offs

Linear models assume the true relationship is (approximately) linear or linearisable via the link function — violated by complex non-linear patterns.
Interpretability advantage over trees disappears when many interaction terms are added.
Lasso is brittle with highly correlated features; Elastic Net is more stable.

Notes

Explorer

linear_and_glm