Probability Distributions
Definition
A probability distribution specifies how probability mass (discrete) or density (continuous) is assigned over the possible values of a random variable.
Intuition
Distributions are the vocabulary of probabilistic modelling. Different processes generate data with characteristic shapes: coin flips are Bernoulli, counts of rare events are Poisson, natural measurements often cluster near a mean (Gaussian), and proportions live on the unit interval (Beta).
Formal Description
Discrete Distributions
Bernoulli — single binary trial with success probability :
Binomial — number of successes in independent Bernoulli() trials:
Poisson — number of events in a fixed interval at rate :
Categorical — generalisation of Bernoulli to categories with probabilities , .
Continuous Distributions
Uniform :
Gaussian (Normal) — the most important distribution in statistics:
Standard normal: .
Multivariate Gaussian :
is the covariance matrix; marginals and conditionals of joint Gaussians are Gaussian.
Exponential — time until first event at rate , memoryless:
Gamma — sum of independent variables; conjugate prior for Poisson rate:
Beta — distribution over ; conjugate prior for Bernoulli/Binomial :
Symmetric when ; is Uniform; concentrates near extremes when .
Dirichlet — multivariate generalisation of Beta; conjugate prior for Categorical/Multinomial:
Central Limit Theorem
For i.i.d. with mean and variance :
Justifies Gaussian approximations in large-sample inference.
Exponential Family
Many distributions share the form:
where are natural parameters, are sufficient statistics, and is the log-partition function. Conjugate priors exist for exponential family likelihoods. Examples: Gaussian, Bernoulli, Poisson, Gamma, Beta, Dirichlet.
Applications
- Gaussian: linear regression errors, natural measurements, approximate posteriors
- Bernoulli/Binomial: classification, A/B testing
- Poisson: count data in insurance (claims frequency), NLP (word counts)
- Beta/Dirichlet: Bayesian priors for probabilities and topic models (LDA)
- Exponential/Gamma: survival analysis, insurance claim severity
Trade-offs
- The Gaussian assumption is often violated in practice (heavy tails, skewness); always check empirically before assuming normality.
- Conjugate priors give analytical posteriors but may not match domain knowledge; non-conjugate priors require MCMC or variational inference.