Instructor: Structured LLM Outputs

Purpose

Implementation patterns for using Instructor to extract validated, type-safe structured data from LLM responses. Instructor wraps any LLM client (OpenAI, Anthropic, Ollama, etc.) and uses Pydantic models as the return type schema — automatically retrying with the validation error as feedback when the model produces malformed output. It is the standard tool for building reliable data extraction, classification, and structured generation pipelines.

Examples

Entity extraction from unstructured text
Document classification with confidence scores
Structured reasoning outputs (chain-of-thought → typed answer)
Multi-entity batch extraction pipelines

Architecture

Installation:

pip install instructor pydantic
# Provider extras:
pip install "instructor[anthropic]"  # Claude
pip install "instructor[openai]"     # OpenAI

Basic extraction pattern:

import instructor
from pydantic import BaseModel, Field
from anthropic import Anthropic
 
client = instructor.from_anthropic(Anthropic())
 
class UserProfile(BaseModel):
    name: str = Field(description="Full name")
    age: int  = Field(ge=0, le=120, description="Age in years")
    email: str = Field(description="Email address")
 
profile = client.messages.create(
    model="claude-3-5-haiku-latest",
    max_tokens=512,
    messages=[{
        "role": "user",
        "content": "Jane Doe is 28 years old, her email is jane@example.com"
    }],
    response_model=UserProfile
)
print(profile.name)   # "Jane Doe"
print(profile.age)    # 28

With OpenAI:

from openai import OpenAI
 
client = instructor.from_openai(OpenAI())
 
result = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=UserProfile,
    messages=[{"role": "user", "content": "Extract: Bob Smith, 35, bob@co.com"}]
)

Nested models and enums:

from enum import Enum
from typing import Optional
 
class Sentiment(str, Enum):
    POSITIVE  = "positive"
    NEGATIVE  = "negative"
    NEUTRAL   = "neutral"
 
class Address(BaseModel):
    city:    str
    country: str
 
class ReviewAnalysis(BaseModel):
    sentiment:       Sentiment
    confidence:      float = Field(ge=0.0, le=1.0)
    key_points:      list[str] = Field(description="Main points, max 5 items")
    address_mention: Optional[Address] = None
 
analysis = client.messages.create(
    model="claude-3-5-haiku-latest",
    max_tokens=512,
    messages=[{"role": "user", "content": "Analyze: 'Great product, shipped fast from London!'"}],
    response_model=ReviewAnalysis
)

Custom validators with automatic retry:

from pydantic import field_validator
import re
 
class StructuredCode(BaseModel):
    language: str
    code:     str
    
    @field_validator("code")
    def no_placeholder_comments(cls, v):
        if "# TODO" in v or "pass" == v.strip():
            raise ValueError("Code must be complete, not a stub")
        return v
 
# Instructor sends the ValueError message back to the LLM and retries
snippet = client.messages.create(
    model="claude-3-5-haiku-latest",
    max_tokens=512,
    max_retries=3,
    messages=[{"role": "user", "content": "Write a Python hello world function"}],
    response_model=StructuredCode
)

Streaming partial results:

from pydantic import BaseModel
 
class Report(BaseModel):
    title:    str
    sections: list[str]
    summary:  str
 
# Stream as fields populate — good for real-time UI updates
for partial in client.messages.create_partial(
    model="claude-3-5-haiku-latest",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a short report on transformers"}],
    response_model=Report
):
    print(f"\rTitle: {partial.title or '...'}", end="")

Streaming iterable of items:

class Task(BaseModel):
    title:    str
    priority: str  # "high" | "medium" | "low"
 
# Returns items one by one as LLM generates them
for task in client.messages.create_iterable(
    model="claude-3-5-haiku-latest",
    max_tokens=512,
    messages=[{"role": "user", "content": "List 5 ML project tasks"}],
    response_model=Task
):
    print(f"- [{task.priority}] {task.title}")

Batch extraction pipeline:

from concurrent.futures import ThreadPoolExecutor
 
def extract_one(text: str) -> UserProfile:
    return client.messages.create(
        model="claude-3-5-haiku-latest",
        max_tokens=256,
        messages=[{"role": "user", "content": text}],
        response_model=UserProfile
    )
 
texts = ["Alice, 30, alice@co.com", "Bob Smith 25 bob@co.com"]
with ThreadPoolExecutor(max_workers=4) as pool:
    profiles = list(pool.map(extract_one, texts))

Using local models via Ollama:

from openai import OpenAI
 
client = instructor.from_openai(
    OpenAI(base_url="http://localhost:11434/v1", api_key="ollama"),
    mode=instructor.Mode.JSON
)
 
result = client.chat.completions.create(
    model="llama3.1",
    response_model=UserProfile,
    messages=[{"role": "user", "content": "Extract: ..."}]
)

Key design guidelines:

Use Field(description=...) — the description is injected into the JSON schema the LLM sees
Use Field(ge=..., le=..., min_length=...) to constrain values and reduce retries
Keep models flat when possible; deeply nested models increase generation difficulty
For long batches, set max_retries=1 to fail fast and log failures separately

References

Instructor Documentation

Notes

Explorer

instructor_structured_outputs

Instructor: Structured LLM Outputs

Purpose

Examples

Architecture

References

Links

Graph View

Table of Contents

Backlinks