Probability Distributions

Definition

A probability distribution specifies how probability mass (discrete) or density (continuous) is assigned over the possible values of a random variable.

Intuition

Distributions are the vocabulary of probabilistic modelling. Different processes generate data with characteristic shapes: coin flips are Bernoulli, counts of rare events are Poisson, natural measurements often cluster near a mean (Gaussian), and proportions live on the unit interval (Beta).

Formal Description

Discrete Distributions

Bernoulli — single binary trial with success probability :

Binomial — number of successes in independent Bernoulli() trials:

Poisson — number of events in a fixed interval at rate :

Categorical — generalisation of Bernoulli to categories with probabilities , .

Continuous Distributions

Uniform :

Gaussian (Normal) — the most important distribution in statistics:

Standard normal: .

Multivariate Gaussian :

is the covariance matrix; marginals and conditionals of joint Gaussians are Gaussian.

Exponential — time until first event at rate , memoryless:

Gamma — sum of independent variables; conjugate prior for Poisson rate:

Beta — distribution over ; conjugate prior for Bernoulli/Binomial :

Symmetric when ; is Uniform; concentrates near extremes when .

Dirichlet — multivariate generalisation of Beta; conjugate prior for Categorical/Multinomial:

Central Limit Theorem

For i.i.d. with mean and variance :

Justifies Gaussian approximations in large-sample inference.

Exponential Family

Many distributions share the form:

where are natural parameters, are sufficient statistics, and is the log-partition function. Conjugate priors exist for exponential family likelihoods. Examples: Gaussian, Bernoulli, Poisson, Gamma, Beta, Dirichlet.

Applications

  • Gaussian: linear regression errors, natural measurements, approximate posteriors
  • Bernoulli/Binomial: classification, A/B testing
  • Poisson: count data in insurance (claims frequency), NLP (word counts)
  • Beta/Dirichlet: Bayesian priors for probabilities and topic models (LDA)
  • Exponential/Gamma: survival analysis, insurance claim severity

Trade-offs

  • The Gaussian assumption is often violated in practice (heavy tails, skewness); always check empirically before assuming normality.
  • Conjugate priors give analytical posteriors but may not match domain knowledge; non-conjugate priors require MCMC or variational inference.