DSPy (original) (raw)

DSPy 3.3.0b1 — New ReActV2 Module and improved LM/BaseLM· learn more →

Program, don’t prompt,

your LLMs.

DSPy is a Python framework for building AI systems. Express your tasks as structured signatures, not prompts, to produce maintainable, modular, and optimizable programs.

12345 678910

lm = dspy.LM("openai/gpt-5.4-nano")

class ExtractEvent(dspy.Signature):

"""Extract event details from an email."""

email: str = dspy.InputField()

event_name: str = dspy.OutputField()

date: str = dspy.OutputField()

extract = dspy.Predict(ExtractEvent)

extract(email=inbox_message)

Prediction(

event_name="Team Offsite",

date="Thursday, June 5"

)

Compose programs with reusable primitives.

Signatures

Declare your task.

Define your task as typed inputs and outputs instead of managing messy prompts. Portable, maintainable, and easy to iterate on.

class Triage(dspy.Signature):

"""Route a support ticket."""

ticket: str = dspy.InputField()

urgency: Literal["low", "high"] = dspy.OutputField()

team: str = dspy.OutputField()

Modules

Same interface, different strategy.

Modules control how your signature executes. Reason, run ensembles, use tools, add a REPL, and more without rewriting your task.

# Direct completion

classify = dspy.Predict(Triage)

# Add step-by-step reasoning

classify = dspy.ChainOfThought(Triage)

# Add tools and a reasoning loop

classify = dspy.ReAct(Triage, tools=[search])

Optimizers

Compile your program against a metric.

Give DSPy examples and a scoring function. It tunes your prompts automatically until quality converges.

tp = dspy.GEPA(

metric=semantic_f1,

auto="medium")

opt = tp.compile(rag, trainset)

# Before: 0.41 F1

# After: 0.63 F1

opt.save("rag.v2.json")

Extract Agent Pipeline Multimodal Optimize

def search(query: str) -> list[str]:

"""Search a knowledge base."""

return kb.query(query, k=3)

def calc(expr: str) -> float:

"""Evaluate a math expression."""

return dspy.PythonInterpreter({}).execute(expr)

agent = dspy.ReAct(

"question -> answer",

tools=[search, calc])

agent(question="GDP per capita of France?")

# thought 1: I need France's GDP and population.

# action 1: search("France GDP") → ...

# thought 2: Now divide GDP by population.

# action 2: calc("3.13e12 / 68e6") → 46029.4

Prediction(answer="$46,029")

class FactCheck(dspy.Module):

def __init__(self):

self.find = dspy.ChainOfThought(

"article -> claims: list[str]")

self.verify = dspy.ChainOfThought(

"claim, source -> verdict")

def forward(self, article):

found = self.find(article=article)

return [

self.verify(claim=c, source=article)

for c in found.claims]

# >>> FactCheck()(article=news_article)

[Prediction(verdict="supported"),

Prediction(verdict="unsupported"),

Prediction(verdict="supported")]

class AnalyzeChart(dspy.Signature):

"""Describe the trend and key data points in a chart."""

chart: dspy.Image = dspy.InputField()

title: str = dspy.OutputField()

trend: str = dspy.OutputField()

data_points: list[dict] = dspy.OutputField()

analyze = dspy.Predict(AnalyzeChart)

analyze(chart=dspy.Image("quarterly_revenue.png"))

Prediction(

title="Quarterly Revenue (2024)",

trend="Steady growth, Q3 dip, strong Q4 recovery",

data_points=[{"q": "Q1", "rev": "$4.2M"}, ...]

)

optimizer = dspy.GEPA(

metric=accuracy, auto="medium")

optimized = optimizer.compile(

extract, trainset=labeled_emails)

optimized.save("extract_v2.json")

# Baseline 62% (gpt-5.4-mini, zero-shot)

# Optimized 89% (gpt-5.4-mini + GEPA compile)

# Cost $2.18 · 200 examples

# Saved to → extract_v2.json

Built in the open, since Dec 2022.

DSPy started at Stanford NLP and grew into a research community. New optimizers and module types land here first — then show up in production systems at companies you’ve heard of.

Dec 2025

Recursive Language Models

Jul 2025

GEPA: Reflective Prompt Evolution

Jul 2024

BetterTogether: Fine-Tuning + Prompt Opt.

Jun 2024

MIPROv2: Optimizing Instructions & Demos

Feb 2024

STORM: Writing Wikipedia-like Articles

Oct 2023

DSPy: Compiling Declarative LM Calls

Dec 2022

Demonstrate-Search-Predict

DSPy in production

Metadata extraction across all shops; ~550× cost reduction

Optimized Dash relevance judge for ranking and evaluation

Prompt migration from larger to smaller models on Amazon Nova

Multiple chatbot use cases on Databricks

Code repair pipeline using code LLMs to synthesize diffs

LM judges, RAG, classification, and customer solutions

Evolutionary self-improvement for the Hermes agent

See all companies using DSPy in production