Latent Variable Models
Core Idea
Latent variable models introduce unobserved (hidden) variables to explain structure in the observed data . The joint model is richer than any direct model of alone, and marginalising over gives a flexible marginal .
The core algorithmic challenge is inference: computing the posterior , which is generally intractable and requires approximation via EM, variational inference, or MCMC.
Mathematical Formulation
The EM Algorithm (General Form)
Goal: maximise .
EM iterates:
- E-step: compute expected complete-data log-likelihood:
- M-step: update parameters:
EM is guaranteed to monotonically increase at each iteration and converges to a local maximum.
Hidden Markov Model (HMM)
An HMM models a sequence by positing a hidden Markov chain :
Parameters:
- Initial distribution
- Transition matrix
- Emission distribution (Gaussian for continuous observations)
Inference algorithms:
- Forward-backward (Baum-Welch): computes — marginal posterior at each step; used in EM.
- Viterbi: finds the MAP sequence — the most probable hidden state path.
Applications: speech recognition (phoneme states), gene finding (exon/intron states), regime detection in time series.
Connection to VAE
The Variational Autoencoder (VAE) is a latent variable model with:
- Prior
- Likelihood = decoder neural network
- Approximate posterior = encoder neural network
EM is intractable here because the posterior is complex; instead, variational inference maximises the ELBO (see Representation Learning).
Inductive Bias
- Latent variables encode structured uncertainty: the model commits to a prior over what is unobserved and infers its value from data.
- HMMs assume Markov property: the future is conditionally independent of the past given the current hidden state.
Training Objective
Maximise marginal log-likelihood via:
- Exact EM for finite discrete (GMMs, HMMs)
- Variational EM / ELBO maximisation for continuous or complex (VAE)
Strengths
- Principled handling of uncertainty and missing observations.
- HMMs model temporal dynamics with interpretable state structure.
- Can separate generative factors into latent dimensions.
Weaknesses
- EM converges to local optima; sensitive to initialisation.
- HMM inference is — expensive for many states or long sequences.
- Choosing the number of hidden states / latent dimensions requires model selection.
Variants
- Factorial HMM: multiple independent Markov chains; exponential state space.
- Input-Output HMM (IO-HMM): transitions conditioned on inputs.
- Deep latent Gaussian models: multi-layer latent variable models.
- Normalising flows: exact inference via invertible transformations.
References
- Rabiner, L.R. (1989). “A tutorial on hidden Markov models and selected applications in speech recognition.” Proc. IEEE.
- Dempster, A.P., Laird, N.M. & Rubin, D.B. (1977). “Maximum likelihood from incomplete data via the EM algorithm.” JRSS-B.
- Kingma, D.P. & Welling, M. (2014). “Auto-Encoding Variational Bayes.” ICLR.