DSPy agent cost cap at runtime: circuit breaker for DSPy programs

DSPy (Declarative Self-improving Python) changed how teams build LLM pipelines by separating program logic from prompt engineering. You write typed Signature specs and compose them into Module classes; the teleprompter (optimizer) finds the prompts that maximize your metric on a training set. The same architecture that makes DSPy powerful for systematic prompt optimization also makes it expensive when things go wrong: a BootstrapFewShot optimizer running on a 200-example training set and a slow-converging metric can make thousands of LLM calls before you notice. And a ReAct agent module that gets stuck on a tool loop will keep calling your metric until it hits whatever ceiling DSPy itself imposes — which, by default, is no ceiling at all. This page shows how to add a per-run cost cap and loop detector to any DSPy program without touching your signatures, modules, or teleprompters.

Where DSPy programs accumulate cost

DSPy has two distinct cost profiles depending on how you’re using it:

DSPy’s built-in cost controls and their limits

DSPy exposes three knobs that partially address cost:

None of these are per-run dollar caps. None detect repeated tool-call fingerprints. They are reasonable defaults for a well-formed program; they are insufficient guardrails for a program that hits an unexpected edge case in production.

The right interception layer in DSPy

DSPy’s architecture separates the LM (the language model client, e.g., dspy.OpenAI or dspy.Claude) from the Module (the program logic). The LM is configured globally via dspy.configure(lm=...) and called inside modules via self.predict(...) or via the compiled predictor’s __call__.

The two viable interception points are:

  1. Subclass the LM. DSPy’s LM classes are subclassable. You can create a GuardedLM that wraps the underlying client and adds cost accounting + loop detection on each __call__. This catches every LLM call regardless of which module made it — ideal for optimizer cost caps.
  2. Wrap individual tool functions. For ReAct loop detection, wrap the Python functions registered as tools with guard(). The ReAct loop calls these functions; the guard sees the repetition and raises before the LM gets to reason about the repeated failure.

Subclassing the DSPy LM for budget enforcement

import dspy
from runguard import BudgetTracker, BudgetExceededError

class BudgetedLM(dspy.LM):
    """Wraps any DSPy LM with a per-run budget cap."""

    def __init__(self, base_lm: dspy.LM, max_usd: float):
        self.base_lm = base_lm
        self.tracker = BudgetTracker(max_usd=max_usd)
        # Delegate DSPy's model config attributes to the base LM
        self.model = base_lm.model
        self.kwargs = base_lm.kwargs

    def __call__(self, prompt=None, messages=None, **kwargs):
        # Preflight budget check (raises BudgetExceededError if over limit)
        self.tracker.check(
            model=self.model,
            prompt_tokens=self._estimate_tokens(prompt or messages),
        )
        response = self.base_lm(prompt=prompt, messages=messages, **kwargs)
        # Record actual usage from response
        self.tracker.record(
            model=self.model,
            prompt_tokens=response.usage.prompt_tokens,
            completion_tokens=response.usage.completion_tokens,
        )
        return response

    def _estimate_tokens(self, content) -> int:
        if isinstance(content, str):
            return len(content) // 4  # rough estimate
        return sum(len(m.get("content", "")) // 4 for m in (content or []))

# Usage: replace your existing LM with the budgeted wrapper
base = dspy.Claude(model="claude-sonnet-4-6", max_tokens=2000)
lm = BudgetedLM(base, max_usd=5.0)
dspy.configure(lm=lm)

# Your DSPy program runs unchanged; BudgetExceededError raises
# if the run would exceed $5.00
try:
    result = my_program(question=user_query)
except BudgetExceededError as e:
    print(f"Run halted at ${e.accumulated_usd:.2f} (limit: ${e.max_usd:.2f})")

Wrapping ReAct tools with loop detection

import dspy
from runguard import guard, LoopDetectedError

# Wrap each tool function before passing it to ReAct
def search(query: str) -> str:
    return retriever.search(query)

def lookup(term: str) -> str:
    return knowledge_base.lookup(term)

# Apply guard — each tool gets its own loop window
guarded_search = guard(search, loop_window=10, loop_threshold=2)
guarded_lookup = guard(lookup, loop_window=10, loop_threshold=2)

# ReAct sees the guarded versions as normal callables
react = dspy.ReAct(
    signature="question -> answer",
    tools=[guarded_search, guarded_lookup],
    max_iters=10  # RunGuard trips before max_iters if a tool loops
)

try:
    pred = react(question="What caused the 2026 AI regulatory shift?")
except LoopDetectedError as e:
    print(f"ReAct loop detected: tool {e.tool_name} repeated {e.count}x")
    print(f"Signature: {e.signature}")

With loop_threshold=2, the guard trips on the second repeat — tighter than the default 3 because ReAct’s loop is short (typically 5-10 iters max) and a repeated tool call at iteration 2 almost never recovers by iteration 3.

Cost control matrix: DSPy built-ins vs RunGuard

ScenarioDSPy built-inRunGuard
ReAct loop to max_itersCapped at max_iters (default: 5)Trips at 2nd repeat — before max_iters
Optimizer spending too muchNo dollar capBudgetExceededError at LM subclass level
Repeated tool call (same args, same error)max_iters onlyLoopDetectedError at 2nd repeat
LM cache hit (identical prompt)Returns cached result (free)N/A — identical prompt is fine, it's patterns RunGuard watches
Gradual cost accumulation across many callsNo visibility until run endsBudgetTracker fires on next call that exceeds ceiling
Context window approaching limitLM raises on next callContextOverflowError at configurable threshold (e.g., 85%)

When this matters most

The DSPy cost-cap pattern is most valuable in three scenarios:

Add a cost cap to your DSPy program today

RunGuard’s Python SDK installs with pip install runguard. Subclass your LM for optimizer cost caps; wrap your ReAct tools for loop detection. Both patterns are five lines of code and require no changes to your existing DSPy signatures or teleprompters.

Get started with RunGuard — or see the same pattern applied to AutoGen, smolagents, and general runaway cost prevention.