Chroma Vector Store

Purpose

Implementation patterns for using Chroma as an embedding database in RAG and semantic search applications. Chroma provides a simple 4-function API (add, query, get, delete), persistent storage, rich metadata filtering, and seamless LangChain/LlamaIndex integration. It is the standard choice for local development and open-source RAG projects before graduating to managed alternatives such as Pinecone or Qdrant.

Examples

  • RAG over internal documentation
  • Semantic search across product catalogue
  • Long-term agent memory store

Architecture

Persistence and client setup:

import chromadb
from chromadb.utils import embedding_functions
 
# Persistent local storage — survives process restarts
client = chromadb.PersistentClient(path="./chroma_db")
 
# OpenAI embeddings (recommended for quality)
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="sk-...",
    model_name="text-embedding-3-small"
)
 
# HuggingFace embeddings (free, local)
hf_ef = embedding_functions.HuggingFaceEmbeddingFunction(
    api_key="hf-...",
    model_name="sentence-transformers/all-mpnet-base-v2"
)
 
collection = client.get_or_create_collection(
    name="docs",
    embedding_function=openai_ef,
    metadata={"hnsw:space": "cosine"}  # cosine similarity
)

Ingest documents with metadata:

from langchain.text_splitter import RecursiveCharacterTextSplitter
 
# Chunk documents
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=50)
chunks = splitter.split_documents(raw_docs)
 
# Add to Chroma — embeddings generated automatically
collection.add(
    documents=[c.page_content for c in chunks],
    metadatas=[c.metadata for c in chunks],
    ids=[f"doc_{i}" for i in range(len(chunks))]
)

Similarity search with metadata filtering:

# Basic query — returns top-5 most relevant chunks
results = collection.query(
    query_texts=["What is the refund policy?"],
    n_results=5
)
 
# With metadata pre-filter — narrows search space before ANN
results = collection.query(
    query_texts=["pricing information"],
    n_results=3,
    where={
        "$and": [
            {"source": {"$eq": "docs"}},
            {"page": {"$gte": 1}}
        ]
    }
)
 
# Access results
for doc, meta, dist in zip(
    results["documents"][0],
    results["metadatas"][0],
    results["distances"][0]
):
    print(f"[score={1-dist:.3f}] {meta}{doc[:100]}")

LangChain integration (RAG retriever):

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
 
vectorstore = Chroma(
    client=client,
    collection_name="docs",
    embedding_function=OpenAIEmbeddings()
)
 
retriever = vectorstore.as_retriever(
    search_type="mmr",          # Maximum Marginal Relevance — diversity-aware
    search_kwargs={"k": 5, "fetch_k": 20}
)
docs = retriever.invoke("How do I cancel my subscription?")

Server mode (multi-process / production):

# Start Chroma HTTP server
chroma run --path ./chroma_db --port 8000
 
# Connect from application
import chromadb
client = chromadb.HttpClient(host="localhost", port=8000)

Updating and deleting documents:

# Update — re-embeds if documents changed
collection.update(ids=["doc_5"], documents=["Updated content..."])
 
# Delete stale entries by metadata filter
collection.delete(where={"source": {"$eq": "outdated_source"}})

Choosing embedding model:

ModelDimLatencyCostQuality
text-embedding-3-small1536~50ms$0.02/1M tokensHigh
all-MiniLM-L6-v2 (HF)384~10ms localFreeMedium
all-mpnet-base-v2 (HF)768~20ms localFreeMedium-high

When to graduate to managed alternatives:

  • Collection size > 1M vectors → consider Qdrant or Weaviate
  • Multi-tenant production SaaS → Pinecone (fully managed)
  • Existing PostgreSQL stack → pgvector extension

References

AI Engineering

  • Vector Stores — overview of vector store options (Chroma, FAISS, pgvector, Qdrant)
  • RAG Architecture — retrieval-augmented generation pipeline this pattern serves

System Patterns