Foundations
Timeless theoretical foundations. Notes here answer: why does this work mathematically?
Sublayers
01 — Linear Algebra
Vectors, matrices, linear systems, eigendecomposition
- Vector Spaces — vector space, span, basis, null/column/row space, norms, projections
- Matrices — operations, inner/outer product, inverse, special matrices
- Eigenvalues — determinants, eigenvalues/eigenvectors, diagonalization
- Linear Systems — Gaussian elimination, LU decomposition, inverses
02 — Calculus & Analysis
Differentiation, integration, series, ODEs, vector calculus
- Differentiation — limits, derivatives, chain rule, partial derivatives, Newton’s method
- Integration — definite/indefinite, FTC, techniques, numerical methods
- Series — sequences, convergence, power series, Taylor series
- Differential Equations — first/second order ODEs, Laplace transform, numerical methods
- Vector Calculus — vectors, multivariable calculus
03 — Probability & Statistics
Probability theory, distributions, Bayesian inference, hypothesis testing
- Probability Theory — axioms, conditional probability, Bayes’ theorem, expectation
- Probability Distributions — Bernoulli, Poisson, Gaussian, Beta, Dirichlet, exponential family
- Bayesian Inference — prior, posterior, MLE vs MAP, conjugate priors
- Hypothesis Testing — p-values, t-test, chi-squared, confidence intervals, multiple testing
04 — Optimization
Convex optimization, gradient methods, constrained optimization
- Convex Optimization — convex sets, functions, KKT conditions, duality
- Gradient Descent and Variants — SGD, momentum, Adam, learning rate schedules
- Lagrangian and Constrained Optimization — Lagrange multipliers, dual problem, SVM
05 — Statistical Learning Theory
Generalization, bias–variance, evaluation strategy, transfer learning
- Supervised Learning — task setup, loss functions, generalization
- PAC Learning — PAC framework, sample complexity, agnostic PAC
- VC Dimension — shattering, growth function, fundamental theorem
- Generalization Bounds — ERM bound, uniform convergence, Rademacher complexity
- Bias–Variance Analysis — decomposition, underfitting vs overfitting
- Data Splits and Distribution — train/dev/test strategy, distribution mismatch
- Error Analysis — ceiling analysis, error categorization
- Evaluation Metrics — precision, recall, F1, AUC-ROC
- Multi-Task Learning — shared representations, hard/soft parameter sharing
- Orthogonalization — separating tuning concerns
- Transfer Learning — pretrain/fine-tune, domain adaptation
06 — Deep Learning Theory
Neural network theory: backpropagation, optimization, regularization, architectures
- Neural Network Notation — layers, activations, forward pass notation
- Activation Functions — sigmoid, ReLU, GELU, Swish, universal approximation
- Weight Initialization — Xavier, Kaiming, symmetry breaking
- Backpropagation — computational graph, chain rule, Jacobian chain
- Backpropagation Through Time — unrolled RNNs, gradient truncation
- Gradient Descent — SGD, mini-batch, learning rate schedules
- Adaptive Optimizers — Momentum, RMSProp, Adam, AdamW
- Gradient Checking — numerical gradient verification
- Dropout — inverted dropout, MC dropout, ensemble interpretation
- Batch Normalization — normalise over batch, scale/shift, covariate shift
- Layer Normalization — normalise over features, Pre-LN, RMSNorm
- Residual Connections — skip connections, gradient highway, ensemble theory
- Cross-Entropy Loss — categorical and binary cross-entropy
- Triplet Loss — metric learning, anchor/positive/negative
- Vectorization and Broadcasting — batch operations, NumPy/PyTorch broadcasting
- Style Cost Function — neural style transfer loss
Links
Navigation: Start with Linear Algebra →