Probability Theory
Definition
A mathematical framework for quantifying uncertainty. Assigns a number in to events drawn from a sample space, obeying Kolmogorov’s axioms.
Intuition
Probability formalises the intuition that some outcomes are more likely than others. The axioms ensure probabilities are consistent: they don’t go negative, they add up to 1 across all possibilities, and disjoint events combine additively.
Formal Description
Sample space and events
- Sample space : the set of all possible outcomes.
- Event : a subset of outcomes.
- Probability measure on a -algebra satisfying:
- for all
- Countable additivity: for pairwise disjoint
Derived rules
Conditional probability
Chain rule (product rule):
For events:
Total probability
Given a partition of :
Bayes’ theorem
This is the engine of Bayesian inference: is the posterior, is the prior, is the likelihood.
Independence
and are independent iff , equivalently .
and are conditionally independent given iff .
Random variables
A random variable maps outcomes to real numbers.
- CDF:
- PMF (discrete):
- PDF (continuous): such that
Expected value and variance
Linearity of expectation: (no independence required).
Covariance: .
Correlation: .
Applications
- Bayesian inference: updating beliefs given new data
- Machine learning: every probabilistic model (naive Bayes, GMMs, neural nets with cross-entropy) is built on these foundations
- Statistical hypothesis testing (p-values rely on conditional probability)
- Markov chains, hidden Markov models, graphical models
Trade-offs
- The frequentist interpretation ( = long-run frequency) and the Bayesian interpretation ( = degree of belief) are philosophically distinct but mathematically identical at the level of Kolmogorov’s axioms.
- Conditional probability is undefined when ; care is needed at measure-zero events.