Latent Variable Models

Core Idea

Latent variable models introduce unobserved (hidden) variables to explain structure in the observed data . The joint model is richer than any direct model of alone, and marginalising over gives a flexible marginal .

The core algorithmic challenge is inference: computing the posterior , which is generally intractable and requires approximation via EM, variational inference, or MCMC.

Mathematical Formulation

The EM Algorithm (General Form)

Goal: maximise .

EM iterates:

  • E-step: compute expected complete-data log-likelihood:
  • M-step: update parameters:

EM is guaranteed to monotonically increase at each iteration and converges to a local maximum.

Hidden Markov Model (HMM)

An HMM models a sequence by positing a hidden Markov chain :

Parameters:

  • Initial distribution
  • Transition matrix
  • Emission distribution (Gaussian for continuous observations)

Inference algorithms:

  • Forward-backward (Baum-Welch): computes — marginal posterior at each step; used in EM.
  • Viterbi: finds the MAP sequence — the most probable hidden state path.

Applications: speech recognition (phoneme states), gene finding (exon/intron states), regime detection in time series.

Connection to VAE

The Variational Autoencoder (VAE) is a latent variable model with:

  • Prior
  • Likelihood = decoder neural network
  • Approximate posterior = encoder neural network

EM is intractable here because the posterior is complex; instead, variational inference maximises the ELBO (see Representation Learning).

Inductive Bias

  • Latent variables encode structured uncertainty: the model commits to a prior over what is unobserved and infers its value from data.
  • HMMs assume Markov property: the future is conditionally independent of the past given the current hidden state.

Training Objective

Maximise marginal log-likelihood via:

  • Exact EM for finite discrete (GMMs, HMMs)
  • Variational EM / ELBO maximisation for continuous or complex (VAE)

Strengths

  • Principled handling of uncertainty and missing observations.
  • HMMs model temporal dynamics with interpretable state structure.
  • Can separate generative factors into latent dimensions.

Weaknesses

  • EM converges to local optima; sensitive to initialisation.
  • HMM inference is — expensive for many states or long sequences.
  • Choosing the number of hidden states / latent dimensions requires model selection.

Variants

  • Factorial HMM: multiple independent Markov chains; exponential state space.
  • Input-Output HMM (IO-HMM): transitions conditioned on inputs.
  • Deep latent Gaussian models: multi-layer latent variable models.
  • Normalising flows: exact inference via invertible transformations.

References

  • Rabiner, L.R. (1989). “A tutorial on hidden Markov models and selected applications in speech recognition.” Proc. IEEE.
  • Dempster, A.P., Laird, N.M. & Rubin, D.B. (1977). “Maximum likelihood from incomplete data via the EM algorithm.” JRSS-B.
  • Kingma, D.P. & Welling, M. (2014). “Auto-Encoding Variational Bayes.” ICLR.