Training Data Index

Constructing high-quality training datasets.

Notes

Data Labeling — annotation workflows, weak supervision, and quality control for constructing ground-truth datasets
Class Imbalance and Augmentation — oversampling, undersampling, SMOTE, and augmentation strategies for skewed label distributions
Dataset Versioning — DVC-based lineage, data contracts, and reproducible data snapshots