06 — AI Engineering

Engineering discipline for building, evaluating, and operating applications built on foundation models — from model selection and evaluation through fine-tuning, inference optimization, and production feedback loops.

Guiding question: “How do we build reliable, useful systems with foundation models?”

This layer does NOT cover: general ML infrastructure and pipelines (→ 05_ml_engineering), classical model selection and statistical trade-offs (→ 03_modeling), or Transformer mathematical derivations (→ 01_foundations/06_deep_learning_theory).

Sublayers

01 — Foundation Models

Model families, scaling laws, alignment (RLHF/DPO/GRPO), tokenization.

02 — Evaluation

LLM evaluation taxonomy, benchmarks (MMLU/HumanEval), AI-as-judge, LM Eval Harness.

03 — Prompt Engineering

CoT, few-shot, structured outputs (Instructor/Guidance), prompt injection defense.

04 — RAG & Agents

RAG architectures, vector stores (Chroma/FAISS), agentic loop, function calling, multi-agent (CrewAI), DSPy.

05 — Fine-tuning

LoRA/QLoRA (Axolotl/LLaMA-Factory), RLHF/DPO/GRPO (TRL), fine-tuning strategy.

06 — Dataset Engineering

Instruction data design, synthetic data generation (Self-Instruct, Constitutional AI).

07 — Inference Optimization

Quantization (AWQ/GPTQ/GGUF/bitsandbytes), Flash Attention, KV cache, vLLM/llama.cpp.

08 — Architecture & Feedback

AI application architecture, LLM observability (LangSmith), safety (LlamaGuard), data flywheel.

Relationship to Other Layers

  • 05_ml_engineering — production ML infrastructure that AI engineering builds on for foundation-model-specific concerns
  • 04_software_engineering — general software patterns applied to LLM system design
  • 08_implementations — concrete implementations synthesizing this layer’s concepts into working code

8 items under this folder.