Latent Variable Models

Core Idea

Latent variable models introduce unobserved (hidden) variables $z$ to explain structure in the observed data $x$ . The joint model $p (x, z ∣ θ)$ is richer than any direct model of $x$ alone, and marginalising over $z$ gives a flexible marginal $p (x ∣ θ) = \int p (x ∣ z, θ) p (z ∣ θ) d z$ .

The core algorithmic challenge is inference: computing the posterior $p (z ∣ x, θ)$ , which is generally intractable and requires approximation via EM, variational inference, or MCMC.

Mathematical Formulation

The EM Algorithm (General Form)

Goal: maximise $lo g p (x ∣ θ) = lo g \int p (x, z ∣ θ) d z$ .

EM iterates:

E-step: compute expected complete-data log-likelihood: $Q (θ ∣ θ^{(t)}) = E_{z ∣ x, θ^{(t)}} [lo g p (x, z ∣ θ)]$
M-step: update parameters: $θ^{(t + 1)} = ar g θ max Q (θ ∣ θ^{(t)})$

EM is guaranteed to monotonically increase $lo g p (x ∣ θ)$ at each iteration and converges to a local maximum.

Hidden Markov Model (HMM)

An HMM models a sequence $x_{1 : T} = (x_{1}, \dots, x_{T})$ by positing a hidden Markov chain $z_{1 : T}$ :

p (x, z) = p (z_{1}) t = 2 \prod T p (z_{t} ∣ z_{t - 1}) t = 1 \prod T p (x_{t} ∣ z_{t})

Parameters:

Initial distribution $π_{k} = P (z_{1} = k)$
Transition matrix $A_{jk} = P (z_{t} = k ∣ z_{t - 1} = j)$
Emission distribution $p (x_{t} ∣ z_{t} = k)$ (Gaussian for continuous observations)

Inference algorithms:

Forward-backward (Baum-Welch): computes $P (z_{t} ∣ x_{1 : T})$ — marginal posterior at each step; used in EM.
Viterbi: finds the MAP sequence $ar g max_{z} P (z ∣ x)$ — the most probable hidden state path.

Applications: speech recognition (phoneme states), gene finding (exon/intron states), regime detection in time series.

Connection to VAE

The Variational Autoencoder (VAE) is a latent variable model with:

Prior $p (z) = N (0, I)$
Likelihood $p_{θ} (x ∣ z)$ = decoder neural network
Approximate posterior $q_{ϕ} (z ∣ x)$ = encoder neural network

EM is intractable here because the posterior is complex; instead, variational inference maximises the ELBO (see Representation Learning).

Inductive Bias

Latent variables encode structured uncertainty: the model commits to a prior over what is unobserved and infers its value from data.
HMMs assume Markov property: the future is conditionally independent of the past given the current hidden state.

Training Objective

Maximise marginal log-likelihood $\sum_{i} lo g p (x_{i} ∣ θ)$ via:

Exact EM for finite discrete $z$ (GMMs, HMMs)
Variational EM / ELBO maximisation for continuous or complex $z$ (VAE)

Strengths

Principled handling of uncertainty and missing observations.
HMMs model temporal dynamics with interpretable state structure.
Can separate generative factors into latent dimensions.

Weaknesses

EM converges to local optima; sensitive to initialisation.
HMM inference is $O (K^{2} T)$ — expensive for many states or long sequences.
Choosing the number of hidden states / latent dimensions requires model selection.

Variants

Factorial HMM: multiple independent Markov chains; exponential state space.
Input-Output HMM (IO-HMM): transitions conditioned on inputs.
Deep latent Gaussian models: multi-layer latent variable models.
Normalising flows: exact inference via invertible transformations.

References

Rabiner, L.R. (1989). “A tutorial on hidden Markov models and selected applications in speech recognition.” Proc. IEEE.
Dempster, A.P., Laird, N.M. & Rubin, D.B. (1977). “Maximum likelihood from incomplete data via the EM algorithm.” JRSS-B.
Kingma, D.P. & Welling, M. (2014). “Auto-Encoding Variational Bayes.” ICLR.

Notes

Explorer

latent_variable_models