Graphical Models

Definition

Probabilistic graphical models (PGMs) represent joint distributions over many variables using a graph, where nodes are variables and edges encode conditional (in)dependencies. They provide a compact, interpretable representation of complex multivariate distributions.

Intuition

A joint distribution over $d$ binary variables requires $2^{d} - 1$ parameters — intractable for large $d$ . Graphical models exploit conditional independence structure to represent the same distribution with far fewer parameters, and to perform efficient inference.

Formal Description

Directed Graphical Models (Bayesian Networks)

A Bayesian network (Belief Network) is a DAG where each node $X_{i}$ has a conditional probability table (CPT): $P (X_{i} ∣ Pa (X_{i}))$ .

Joint distribution factorises as:

P (X_{1}, \dots, X_{d}) = i = 1 \prod d P (X_{i} ∣ Pa (X_{i}))

D-separation: a criterion for reading off conditional independence from the graph structure. Key patterns:

Structure	Relationship
Chain: $A \to B \to C$	$A ⊥ C ∣ B$ (B blocks the path)
Fork: $A \leftarrow B \to C$	$A ⊥ C ∣ B$ (common cause blocked by B)
Collider: $A \to B \leftarrow C$	$A \neq ⊥ C ∣ B$ (conditioning on B opens the path)

Parameter learning (MLE): for complete data, each CPT is estimated independently by counting.

Structure learning: score-based (maximise BIC/BDe over DAG structures) or constraint-based (PC algorithm: test conditional independence).

Undirected Graphical Models (Markov Random Fields)

Nodes connected by edges encoding potential functions:

P (x) = \frac{1}{Z} C \in C \prod ψ_{C} (x_{C})

where $C$ are cliques and $Z = \sum_{x} \prod_{C} ψ_{C}$ is the partition function.

Markov property: $X_{i} ⊥ X_{j} ∣ x_{∖ {i, j}}$ for non-adjacent $i, j$ .

Applications: image segmentation (each pixel is a node), social networks.

Gibbs random fields / Ising model: pairwise potentials on a grid; extensively studied in physics and CV.

Inference

Exact inference:

Variable elimination: sum out variables in an ordering. Complexity depends on the treewidth.
Belief propagation (sum-product): message-passing on trees (exact) or graphs (loopy BP, approximate).
Junction tree algorithm: exact inference via converting the graph to a junction tree.

Approximate inference:

MCMC (Gibbs sampling): iteratively sample each variable from its conditional given current values.
Variational inference (mean field): minimise $KL (q (x) ∥ p (x))$ over a factorised family $q$ .

Hidden Markov Model (HMM)

A sequential Bayesian network with latent states $z_{1}, \dots, z_{T}$ and observations $x_{1}, \dots, x_{T}$ :

P (x, z) = P (z_{1}) t = 2 \prod T P (z_{t} ∣ z_{t - 1}) t = 1 \prod T P (x_{t} ∣ z_{t})

Key algorithms:

Forward-backward (Baum-Welch): E-step for EM; computes $P (z_{t} ∣ x)$ .
Viterbi: finds the most likely state sequence $\hat{z} = ar g max_{z} P (z ∣ x)$ via dynamic programming.
EM (Baum-Welch): learns transition and emission parameters from unlabelled sequences.

Applications

Bayesian networks: clinical decision support, root cause analysis, causal reasoning
MRF: image segmentation, medical image analysis
HMM: speech recognition, CRF-based NER, gene finding, financial regime detection

Trade-offs

Exact inference is intractable for general graphs (NP-hard); tree-structured graphs are tractable.
Bayesian networks require careful structure specification or expensive structure learning.
Modern deep learning has largely replaced graphical models for perception tasks but PGMs remain valuable for causal modelling and interpretability.

Notes

Explorer

graphical_models