DSPy Prompt Optimization
Purpose
Implementation patterns for using DSPy (Stanford NLP) to build declarative, optimizable LLM pipelines. Instead of hand-crafting prompt strings, DSPy separates the structure of a task (a Signature) from the optimization of how to perform it. Optimizers like BootstrapFewShot and MIPRO auto-generate few-shot demonstrations and instruction improvements from labelled data — typically improving task accuracy 10–30% over manual prompting.
Examples
- Multi-hop question answering with automatic few-shot examples
- RAG pipeline with auto-tuned retrieval and generation prompts
- Agent with ReAct reasoning and data-driven optimization
Architecture
Installation and LM configuration:
pip install dspyimport dspy
# Configure LM — any OpenAI-compatible, Anthropic, or local model
lm = dspy.LM("openai/gpt-4o-mini", api_key="sk-...")
# Or: dspy.LM("anthropic/claude-3-5-haiku-latest")
# Or: dspy.LM("ollama_chat/llama3.1", base_url="http://localhost:11434")
dspy.configure(lm=lm)Signatures — the core abstraction:
# Inline form (quick)
qa = dspy.Predict("question -> answer")
# Class form (recommended — add docstring and field constraints)
class SummarizeArticle(dspy.Signature):
"""Summarize a news article into 3–5 bullet points."""
article: str = dspy.InputField()
summary: str = dspy.OutputField(desc="Bullet points starting with '•'")
# Chain-of-Thought adds a reasoning field automatically
cot = dspy.ChainOfThought(SummarizeArticle)
result = cot(article="Full article text here...")
print(result.reasoning) # intermediate reasoning
print(result.summary) # final outputComposing a multi-stage module:
class MultiHopQA(dspy.Module):
def __init__(self):
self.gen_query = dspy.ChainOfThought("question -> search_query")
self.retrieve = dspy.Retrieve(k=3) # needs a retriever configured
self.gen_answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question: str) -> dspy.Prediction:
query = self.gen_query(question=question).search_query
context = "\n".join(self.retrieve(query).passages)
answer = self.gen_answer(context=context, question=question).answer
return dspy.Prediction(answer=answer)
qa = MultiHopQA()
print(qa(question="Who invented the transformer architecture?").answer)Optimization with BootstrapFewShot:
from dspy.teleprompt import BootstrapFewShot
# Labelled training examples (50–200 sufficient for BootstrapFewShot)
trainset = [
dspy.Example(question="What is GQA?", answer="Grouped Query Attention...").with_inputs("question"),
# ... more examples
]
def exact_match(example, pred, trace=None):
return example.answer.lower() in pred.answer.lower()
optimizer = BootstrapFewShot(metric=exact_match, max_bootstrapped_demos=4)
optimized_qa = optimizer.compile(qa, trainset=trainset)
optimized_qa.save("models/multihop_qa_v1.json")Optimization with MIPRO (stronger, needs more data):
from dspy.teleprompt import MIPROv2
optimizer = MIPROv2(metric=exact_match, auto="medium") # auto selects trials
optimized = optimizer.compile(
qa,
trainset=trainset,
requires_permission_to_run=False
)Typed output with Pydantic:
from pydantic import BaseModel
class Entity(BaseModel):
name: str
type: str # "person" | "org" | "location"
relevance: float
class ExtractEntities(dspy.Signature):
"""Extract named entities from the text."""
text: str = dspy.InputField()
entities: list[Entity] = dspy.OutputField()
extractor = dspy.TypedPredictor(ExtractEntities)
result = extractor(text="DeepMind, a subsidiary of Alphabet, published Gemini...")
for e in result.entities:
print(e.name, e.type)ReAct agent with tools:
def search_web(query: str) -> str:
"""Search the web for current information."""
# integrate with Tavily, SerpAPI, etc.
return search_results
class AnswerWithSearch(dspy.Signature):
"""Answer the question, using search if needed."""
question: str = dspy.InputField()
answer: str = dspy.OutputField()
agent = dspy.ReAct(AnswerWithSearch, tools=[search_web])
print(agent(question="Latest vLLM release version?").answer)Evaluation:
from dspy.evaluate import Evaluate
evaluator = Evaluate(devset=testset, metric=exact_match, num_threads=4)
print("Before:", evaluator(qa))
print("After: ", evaluator(optimized_qa))Saving and loading:
optimized_qa.save("models/qa_v1.json")
qa.load("models/qa_v1.json")