Probability Theory

Definition

A mathematical framework for quantifying uncertainty. Assigns a number in to events drawn from a sample space, obeying Kolmogorov’s axioms.

Intuition

Probability formalises the intuition that some outcomes are more likely than others. The axioms ensure probabilities are consistent: they don’t go negative, they add up to 1 across all possibilities, and disjoint events combine additively.

Formal Description

Sample space and events

  • Sample space : the set of all possible outcomes.
  • Event : a subset of outcomes.
  • Probability measure on a -algebra satisfying:
    1. for all
    2. Countable additivity: for pairwise disjoint

Derived rules

Conditional probability

Chain rule (product rule):

For events:

Total probability

Given a partition of :

Bayes’ theorem

This is the engine of Bayesian inference: is the posterior, is the prior, is the likelihood.

Independence

and are independent iff , equivalently .

and are conditionally independent given iff .

Random variables

A random variable maps outcomes to real numbers.

  • CDF:
  • PMF (discrete):
  • PDF (continuous): such that

Expected value and variance

Linearity of expectation: (no independence required).

Covariance: .

Correlation: .

Applications

  • Bayesian inference: updating beliefs given new data
  • Machine learning: every probabilistic model (naive Bayes, GMMs, neural nets with cross-entropy) is built on these foundations
  • Statistical hypothesis testing (p-values rely on conditional probability)
  • Markov chains, hidden Markov models, graphical models

Trade-offs

  • The frequentist interpretation ( = long-run frequency) and the Bayesian interpretation ( = degree of belief) are philosophically distinct but mathematically identical at the level of Kolmogorov’s axioms.
  • Conditional probability is undefined when ; care is needed at measure-zero events.