DSPy agent cost cap at runtime: circuit breaker for DSPy programs

DSPy (Declarative Self-improving Python) changed how teams build LLM pipelines by separating program logic from prompt engineering. You write typed Signature specs and compose them into Module classes; the teleprompter (optimizer) finds the prompts that maximize your metric on a training set. The same architecture that makes DSPy powerful for systematic prompt optimization also makes it expensive when things go wrong: a BootstrapFewShot optimizer running on a 200-example training set and a slow-converging metric can make thousands of LLM calls before you notice. And a ReAct agent module that gets stuck on a tool loop will keep calling your metric until it hits whatever ceiling DSPy itself imposes — which, by default, is no ceiling at all. This page shows how to add a per-run cost cap and loop detector to any DSPy program without touching your signatures, modules, or teleprompters.

Where DSPy programs accumulate cost

DSPy has two distinct cost profiles depending on how you’re using it:

Optimization runs (offline). DSPy’s optimizers — BootstrapFewShot, BootstrapFewShotWithRandomSearch, MIPRO, BayesianSignatureOptimizer — run your compiled program against a training set many times to find better demonstrations or instructions. MIPRO with num_candidates=10 on a 50-example training set with a 3-module pipeline can make 1,500+ LLM calls. At GPT-4.1 prices, that’s $5-30 depending on prompt size. A misconfigured optimizer, a typo in the metric function that makes every candidate look equally bad, or a training set with adversarial examples can triple that.
Inference with ReAct (online). DSPy’s ReAct module implements the Reason + Act loop. It calls the LM, parses tool invocations, executes tools, and feeds results back to the LM. The loop terminates when the LM emits a Finish action. If the LM never emits Finish — because the tools never return satisfying results, or because the LM’s stopping heuristic is miscalibrated — the ReAct loop runs until max_iters (default: 5). Five iterations on a complex query with GPT-4.1 costs $0.50-2.00. Reasonable for one call; alarming if your production agent handles 1,000 queries per hour and 5% of them loop to max_iters.

DSPy’s built-in cost controls and their limits

DSPy exposes three knobs that partially address cost:

ReAct(max_iters=N) — caps the number of Reason+Act iterations. Default is 5. This prevents infinite loops in the ReAct case. It does not prevent the optimizer from making expensive redundant calls during training, and it does not distinguish between a legitimate 5-step task and a task that got stuck on step 1 and repeated it 4 more times.
LM(cache=True) — DSPy’s built-in LM caching deduplicates identical prompt+completion pairs. This helps significantly during optimizer runs when the same demonstration set is evaluated multiple times. It helps zero for the “same query, same wrong tool call, new LLM invocation” loop pattern — each loop iteration has a different accumulated context, so the prompt is never identical.
BootstrapFewShot(max_bootstrapped_demos=N) — limits the number of demonstrations the optimizer collects in the bootstrap phase. This caps one component of optimization cost but not the evaluation phase, which can dominate.

None of these are per-run dollar caps. None detect repeated tool-call fingerprints. They are reasonable defaults for a well-formed program; they are insufficient guardrails for a program that hits an unexpected edge case in production.

The right interception layer in DSPy

DSPy’s architecture separates the LM (the language model client, e.g., dspy.OpenAI or dspy.Claude) from the Module (the program logic). The LM is configured globally via dspy.configure(lm=...) and called inside modules via self.predict(...) or via the compiled predictor’s __call__.

The two viable interception points are:

Subclass the LM. DSPy’s LM classes are subclassable. You can create a GuardedLM that wraps the underlying client and adds cost accounting + loop detection on each __call__. This catches every LLM call regardless of which module made it — ideal for optimizer cost caps.
Wrap individual tool functions. For ReAct loop detection, wrap the Python functions registered as tools with guard(). The ReAct loop calls these functions; the guard sees the repetition and raises before the LM gets to reason about the repeated failure.

Subclassing the DSPy LM for budget enforcement

import dspy
from runguard import BudgetTracker, BudgetExceededError

class BudgetedLM(dspy.LM):
    """Wraps any DSPy LM with a per-run budget cap."""

    def __init__(self, base_lm: dspy.LM, max_usd: float):
        self.base_lm = base_lm
        self.tracker = BudgetTracker(max_usd=max_usd)
        # Delegate DSPy's model config attributes to the base LM
        self.model = base_lm.model
        self.kwargs = base_lm.kwargs

    def __call__(self, prompt=None, messages=None, **kwargs):
        # Preflight budget check (raises BudgetExceededError if over limit)
        self.tracker.check(
            model=self.model,
            prompt_tokens=self._estimate_tokens(prompt or messages),
        )
        response = self.base_lm(prompt=prompt, messages=messages, **kwargs)
        # Record actual usage from response
        self.tracker.record(
            model=self.model,
            prompt_tokens=response.usage.prompt_tokens,
            completion_tokens=response.usage.completion_tokens,
        )
        return response

    def _estimate_tokens(self, content) -> int:
        if isinstance(content, str):
            return len(content) // 4  # rough estimate
        return sum(len(m.get("content", "")) // 4 for m in (content or []))

# Usage: replace your existing LM with the budgeted wrapper
base = dspy.Claude(model="claude-sonnet-4-6", max_tokens=2000)
lm = BudgetedLM(base, max_usd=5.0)
dspy.configure(lm=lm)

# Your DSPy program runs unchanged; BudgetExceededError raises
# if the run would exceed $5.00
try:
    result = my_program(question=user_query)
except BudgetExceededError as e:
    print(f"Run halted at ${e.accumulated_usd:.2f} (limit: ${e.max_usd:.2f})")

Wrapping ReAct tools with loop detection

import dspy
from runguard import guard, LoopDetectedError

# Wrap each tool function before passing it to ReAct
def search(query: str) -> str:
    return retriever.search(query)

def lookup(term: str) -> str:
    return knowledge_base.lookup(term)

# Apply guard — each tool gets its own loop window
guarded_search = guard(search, loop_window=10, loop_threshold=2)
guarded_lookup = guard(lookup, loop_window=10, loop_threshold=2)

# ReAct sees the guarded versions as normal callables
react = dspy.ReAct(
    signature="question -> answer",
    tools=[guarded_search, guarded_lookup],
    max_iters=10  # RunGuard trips before max_iters if a tool loops
)

try:
    pred = react(question="What caused the 2026 AI regulatory shift?")
except LoopDetectedError as e:
    print(f"ReAct loop detected: tool {e.tool_name} repeated {e.count}x")
    print(f"Signature: {e.signature}")

With loop_threshold=2, the guard trips on the second repeat — tighter than the default 3 because ReAct’s loop is short (typically 5-10 iters max) and a repeated tool call at iteration 2 almost never recovers by iteration 3.

Cost control matrix: DSPy built-ins vs RunGuard

Scenario	DSPy built-in	RunGuard
ReAct loop to max_iters	Capped at max_iters (default: 5)	Trips at 2nd repeat — before max_iters
Optimizer spending too much	No dollar cap	BudgetExceededError at LM subclass level
Repeated tool call (same args, same error)	max_iters only	LoopDetectedError at 2nd repeat
LM cache hit (identical prompt)	Returns cached result (free)	N/A — identical prompt is fine, it's patterns RunGuard watches
Gradual cost accumulation across many calls	No visibility until run ends	BudgetTracker fires on next call that exceeds ceiling
Context window approaching limit	LM raises on next call	ContextOverflowError at configurable threshold (e.g., 85%)

When this matters most

The DSPy cost-cap pattern is most valuable in three scenarios:

Running optimizers overnight or in CI. A scheduled optimizer run that encounters a pathological training example at 2 AM has no human to interrupt it. A budget cap transforms a potential $200 surprise into a $20 run that exits with a clear error and a logged cost breakdown.
Production ReAct agents with user-provided inputs. Users’ queries are adversarial by nature — not maliciously, but because they describe edge cases your training data didn’t cover. A query that routes to a tool your knowledge base doesn’t index will cycle through all max_iters without resolution. A loop detector exits at the first repeat.
Multi-agent DSPy programs. When you compose multiple ReAct or Predict modules into a pipeline, a loop in module 2 generates calls in module 3 as well. Budget accounting at the LM level captures the total cost across all modules in the run, regardless of which module is looping.

Add a cost cap to your DSPy program today

RunGuard’s Python SDK installs with pip install runguard. Subclass your LM for optimizer cost caps; wrap your ReAct tools for loop detection. Both patterns are five lines of code and require no changes to your existing DSPy signatures or teleprompters.

Get started with RunGuard — or see the same pattern applied to AutoGen, smolagents, and general runaway cost prevention.