Graphical Models
Definition
Probabilistic graphical models (PGMs) represent joint distributions over many variables using a graph, where nodes are variables and edges encode conditional (in)dependencies. They provide a compact, interpretable representation of complex multivariate distributions.
Intuition
A joint distribution over binary variables requires parameters — intractable for large . Graphical models exploit conditional independence structure to represent the same distribution with far fewer parameters, and to perform efficient inference.
Formal Description
Directed Graphical Models (Bayesian Networks)
A Bayesian network (Belief Network) is a DAG where each node has a conditional probability table (CPT): .
Joint distribution factorises as:
D-separation: a criterion for reading off conditional independence from the graph structure. Key patterns:
| Structure | Relationship |
|---|---|
| Chain: | (B blocks the path) |
| Fork: | (common cause blocked by B) |
| Collider: | (conditioning on B opens the path) |
Parameter learning (MLE): for complete data, each CPT is estimated independently by counting.
Structure learning: score-based (maximise BIC/BDe over DAG structures) or constraint-based (PC algorithm: test conditional independence).
Undirected Graphical Models (Markov Random Fields)
Nodes connected by edges encoding potential functions:
where are cliques and is the partition function.
Markov property: for non-adjacent .
Applications: image segmentation (each pixel is a node), social networks.
Gibbs random fields / Ising model: pairwise potentials on a grid; extensively studied in physics and CV.
Inference
Exact inference:
- Variable elimination: sum out variables in an ordering. Complexity depends on the treewidth.
- Belief propagation (sum-product): message-passing on trees (exact) or graphs (loopy BP, approximate).
- Junction tree algorithm: exact inference via converting the graph to a junction tree.
Approximate inference:
- MCMC (Gibbs sampling): iteratively sample each variable from its conditional given current values.
- Variational inference (mean field): minimise over a factorised family .
Hidden Markov Model (HMM)
A sequential Bayesian network with latent states and observations :
Key algorithms:
- Forward-backward (Baum-Welch): E-step for EM; computes .
- Viterbi: finds the most likely state sequence via dynamic programming.
- EM (Baum-Welch): learns transition and emission parameters from unlabelled sequences.
Applications
- Bayesian networks: clinical decision support, root cause analysis, causal reasoning
- MRF: image segmentation, medical image analysis
- HMM: speech recognition, CRF-based NER, gene finding, financial regime detection
Trade-offs
- Exact inference is intractable for general graphs (NP-hard); tree-structured graphs are tractable.
- Bayesian networks require careful structure specification or expensive structure learning.
- Modern deep learning has largely replaced graphical models for perception tasks but PGMs remain valuable for causal modelling and interpretability.