Deep Learning Theory Index
Navigation hub.
Notes
Foundations
- Neural Network Notation — layers, activations, forward pass notation
- Activation Functions — sigmoid, ReLU, GELU, Swish, universal approximation
- Weight Initialization — Xavier, Kaiming, symmetry breaking
Training
- Backpropagation — computational graph, chain rule, Jacobian chain
- Backpropagation Through Time — unrolled RNNs, gradient truncation
- Gradient Descent — SGD, mini-batch, learning rate schedules
- Adaptive Optimizers — Momentum, RMSProp, Adam, AdamW
- Gradient Checking — numerical gradient verification
Regularization and normalization
- Dropout — inverted dropout, MC dropout, ensemble interpretation
- Batch Normalization — normalise over batch, scale/shift, covariate shift
- Layer Normalization — normalise over features, Pre-LN, RMSNorm
- Residual Connections — skip connections, gradient highway, ensemble theory
Loss functions
- Cross-Entropy Loss — categorical and binary cross-entropy
- Triplet Loss — metric learning, anchor/positive/negative
Efficiency
- Vectorization and Broadcasting — batch operations, NumPy/PyTorch broadcasting
Links
- Foundations
- Supervised Learning (→ 05_statistical_learning_theory)
- Logistic Regression (→ 03_probability_and_statistics)
Navigation: ← Statistical Learning Theory | Foundations Index