Graphical Models

Definition

Probabilistic graphical models (PGMs) represent joint distributions over many variables using a graph, where nodes are variables and edges encode conditional (in)dependencies. They provide a compact, interpretable representation of complex multivariate distributions.

Intuition

A joint distribution over binary variables requires parameters — intractable for large . Graphical models exploit conditional independence structure to represent the same distribution with far fewer parameters, and to perform efficient inference.

Formal Description

Directed Graphical Models (Bayesian Networks)

A Bayesian network (Belief Network) is a DAG where each node has a conditional probability table (CPT): .

Joint distribution factorises as:

D-separation: a criterion for reading off conditional independence from the graph structure. Key patterns:

StructureRelationship
Chain: (B blocks the path)
Fork: (common cause blocked by B)
Collider: (conditioning on B opens the path)

Parameter learning (MLE): for complete data, each CPT is estimated independently by counting.

Structure learning: score-based (maximise BIC/BDe over DAG structures) or constraint-based (PC algorithm: test conditional independence).

Undirected Graphical Models (Markov Random Fields)

Nodes connected by edges encoding potential functions:

where are cliques and is the partition function.

Markov property: for non-adjacent .

Applications: image segmentation (each pixel is a node), social networks.

Gibbs random fields / Ising model: pairwise potentials on a grid; extensively studied in physics and CV.

Inference

Exact inference:

  • Variable elimination: sum out variables in an ordering. Complexity depends on the treewidth.
  • Belief propagation (sum-product): message-passing on trees (exact) or graphs (loopy BP, approximate).
  • Junction tree algorithm: exact inference via converting the graph to a junction tree.

Approximate inference:

  • MCMC (Gibbs sampling): iteratively sample each variable from its conditional given current values.
  • Variational inference (mean field): minimise over a factorised family .

Hidden Markov Model (HMM)

A sequential Bayesian network with latent states and observations :

Key algorithms:

  • Forward-backward (Baum-Welch): E-step for EM; computes .
  • Viterbi: finds the most likely state sequence via dynamic programming.
  • EM (Baum-Welch): learns transition and emission parameters from unlabelled sequences.

Applications

  • Bayesian networks: clinical decision support, root cause analysis, causal reasoning
  • MRF: image segmentation, medical image analysis
  • HMM: speech recognition, CRF-based NER, gene finding, financial regime detection

Trade-offs

  • Exact inference is intractable for general graphs (NP-hard); tree-structured graphs are tractable.
  • Bayesian networks require careful structure specification or expensive structure learning.
  • Modern deep learning has largely replaced graphical models for perception tasks but PGMs remain valuable for causal modelling and interpretability.