Tag Vocabulary

The complete controlled vocabulary for Vault_2026. See Conventions §12 for rules and application guidelines.

Key rules: kebab-case only · ≤4 tags per note · no structural redundancy · no tool names · index files always tags: []


Dimension 1 — Concept Type

Tags that classify what kind of knowledge a note contains.

TagDefinitionUse whenNot for
algorithmA named, step-by-step computational procedureA note defines or analyzes a specific algorithm (backpropagation, k-means, ARIMA)Generic topic notes; notes named after the algorithm already convey this
patternAn architectural or design patternA note describes a reusable solution structure (RAG pipeline, feature store, training pipeline pattern)One-off implementations; notes that just happen to follow a pattern
theoryMathematical or statistical theoryA note proves or formally defines a theoretical concept (PAC learning, VC dimension, convexity)Intuitive overviews; applied engineering notes
workflowAn operational process or practiceA note describes a repeatable process (experiment tracking, CI/CD pipeline, data labeling workflow)Single-tool reference notes; conceptual introductions

Dimension 2 — ML Lifecycle Stage

Tags that mark where in the ML/AI production lifecycle a note is relevant. Do not apply if the note’s folder already encodes the stage (e.g., don’t add training to notes already inside 05_ml_engineering/05_model_development/).

TagDefinitionUse whenNot for
dataData collection, ingestion, preprocessing, validation, augmentationA note outside the data engineering sublayer that is primarily about data concernsNotes inside 05_ml_engineering/02_data_engineering/
feature-engineeringFeature creation, encoding, selection, feature storesA note outside the feature engineering sublayer that covers feature transformationNotes inside 05_ml_engineering/04_feature_engineering/
trainingModel fitting, optimization, regularization, hyperparameter tuningA foundations or modeling note that is specifically about the training process (not just theory)Notes inside 05_ml_engineering/05_model_development/
evaluationMetrics, benchmarking, cross-validation, calibration, model validationAny note that is primarily about measuring model qualityGeneric overview notes
deploymentServing, inference, containerization, rollout strategies, model compressionNotes about putting models into productionNotes inside 05_ml_engineering/06_deployment_and_serving/
monitoringDrift detection, alerting, observability, retraining triggersNotes about tracking model health post-deploymentNotes inside 05_ml_engineering/07_monitoring_and_observability/

Dimension 3 — Domain / Modality

Tags that mark what kind of data or problem domain a note addresses. Do not apply if the folder already encodes the domain (e.g., don’t add nlp to notes inside a transformer or NLP sublayer).

TagDefinitionUse whenNot for
llmLarge language model systemsA note is about LLM-specific concerns (prompting, fine-tuning, inference, evaluation of LLMs) and is not already in 06_ai_engineering/ — or is in 06_ai_engineering/ but also highly relevant from an MLOps/engineering angleGeneric NLP notes; notes already inside 06_ai_engineering/ where llm is the default domain
nlpNatural language processing tasks not specifically LLM-centricClassical NLP, tokenization theory, text preprocessing, sentiment analysis, named entity recognitionLLM-specific notes (use llm)
visionComputer vision, image understandingNotes about CNNs, object detection, image segmentation, visual modelsNotes inside a CV sublayer already
time-seriesTemporal data, forecasting, sequential modeling of timeARIMA, Prophet, state-space models, walk-forward validationGeneral sequence models (use nlp or algorithm)
tabularStructured/relational data, classical MLTree ensembles, logistic regression, feature engineering for tabular dataDeep learning notes on tabular data (use algorithm + tabular)
multimodalCross-modal systems (vision+language, audio+text, etc.)CLIP, BLIP, vision-language models, audio captioningSingle-modality notes
recommendationCollaborative filtering, ranking, retrieval for recommender systemsMatrix factorization, two-tower models, CTR predictionGeneral retrieval notes (use retrieval)

Dimension 4 — Capability / Task Type

Tags that mark what the system or model does.

TagDefinitionUse whenNot for
classificationSupervised classificationNotes about classifiers, classification loss, ROC/AUC, multiclass strategiesRegression notes
regressionSupervised regressionNotes about regression models, MSE/MAE, prediction intervalsClassification notes
clusteringUnsupervised groupingk-means, DBSCAN, GMM, hierarchical clusteringDimensionality reduction (unless combined)
generationGenerative models and text/image/audio generationLLMs, diffusion models, VAEs, GANs, seq2seq generationDiscriminative models
retrievalSearch, nearest-neighbor, RAG, vector databasesEmbedding search, FAISS, Chroma, BM25, hybrid search, rerankingPure generation without retrieval
reasoningChain-of-thought, planning, agents, step-by-step inferenceReAct agents, DSPy, tool-calling, multi-step reasoningStraightforward inference
forecastingTime-series predictionARIMA, Prophet, temporal models predicting future valuesGeneral regression (use regression)
anomaly-detectionOutlier and novelty detectionIsolation Forest, one-class SVM, statistical anomaly testsClassification of rare events (use classification)
interpretabilityExplainability, feature attribution, fairness, model transparencySHAP, LIME, PDP, fairness metrics, model cardsEvaluation notes (use evaluation)

Dimension 5 — Infrastructure / System Concern

Tags that mark cross-cutting production system concerns.

TagDefinitionUse whenNot for
mlopsML production lifecycle operationsNotes about the operational concerns of running ML systems (pipelines, registries, orchestration) that appear outside 05_ml_engineering/Notes inside 05_ml_engineering/ (structural redundancy)
llmopsLLM-specific production operationsNotes about operating LLM systems in production (model gateways, prompt management, LLM observability) that appear outside 06_ai_engineering/Notes inside 06_ai_engineering/08_architecture_and_feedback/
distributedMulti-GPU, multi-node, parallelism, shardingNotes about distributed training or inferenceSingle-GPU notes
quantizationModel compression via reduced precision (INT8, INT4, GGUF, AWQ, GPTQ)Notes about quantization methods and trade-offsGeneral compression/pruning (use deployment)
fine-tuningAdapting pretrained models: LoRA, PEFT, RLHF, SFT, DPO, GRPONotes about fine-tuning methods and workflowsPretraining from scratch; general transfer learning
safetyAlignment, guardrails, content moderation, red-teaming, constitutional AINotes about making AI systems safe and alignedGeneral evaluation (use evaluation)

Synonym → Canonical Mapping

Use this when migrating old tags:

Old tag(s)Canonical tagReason
finetuning, sft, lora, qlora, peft, rlhf, dpo, grpo, adapter, adaptersfine-tuningAll refer to the same lifecycle concern
serving, model-serving, ml-serving, inference-api, production-mldeploymentAll refer to production serving
metrics, benchmarks, benchmarking, offline-evaluation, llm-evaluation, model-validation, llm-evalevaluationAll about measuring quality
concept-drift, data-drift, distribution-shift, drift, drift-detectionmonitoringAll about production health
vector-store, vector-database, vector-db, vector-search, similarity-searchretrievalAll vector search patterns
ranking, reranking, re-rankingretrievalRanking is part of the retrieval pipeline; use retrieval
deep_learning, deep-learning, neural-networks, neural_networksremove (structural)Redundant with 01_foundations/06_deep_learning_theory/
mlops, ml-ops, ml-pipelines, ml-systems, ml-lifecyclemlopsWhen outside 05_ml_engineering/; otherwise remove
computer_visionvisionCanonical form
sequence_models, rnn, lstm, gru, BPTTremove or nlpStructural (inside transformer/RNN sublayers)
transformers, transformer, attention, bert, gptremove (structural)Inside transformer sublayers already
python, go, typescript, javascriptremoveInside language-specific sublayers
end-to-endremoveFolder 08_implementations/02_end_to_end_examples/ encodes this
production, production-mlremove or deploymentUsually structural redundancy