DSPy agent cost cap at runtime: circuit breaker for DSPy programs
DSPy (Declarative Self-improving Python) changed how teams build LLM pipelines by separating program logic from prompt engineering. You write typed Signature specs and compose them into Module classes; the teleprompter (optimizer) finds the prompts that maximize your metric on a training set. The same architecture that makes DSPy powerful for systematic prompt optimization also makes it expensive when things go wrong: a BootstrapFewShot optimizer running on a 200-example training set and a slow-converging metric can make thousands of LLM calls before you notice. And a ReAct agent module that gets stuck on a tool loop will keep calling your metric until it hits whatever ceiling DSPy itself imposes — which, by default, is no ceiling at all. This page shows how to add a per-run cost cap and loop detector to any DSPy program without touching your signatures, modules, or teleprompters.
Where DSPy programs accumulate cost
DSPy has two distinct cost profiles depending on how you’re using it:
- Optimization runs (offline). DSPy’s optimizers —
BootstrapFewShot,BootstrapFewShotWithRandomSearch,MIPRO,BayesianSignatureOptimizer— run your compiled program against a training set many times to find better demonstrations or instructions.MIPROwithnum_candidates=10on a 50-example training set with a 3-module pipeline can make 1,500+ LLM calls. At GPT-4.1 prices, that’s $5-30 depending on prompt size. A misconfigured optimizer, a typo in the metric function that makes every candidate look equally bad, or a training set with adversarial examples can triple that. - Inference with ReAct (online). DSPy’s
ReActmodule implements the Reason + Act loop. It calls the LM, parses tool invocations, executes tools, and feeds results back to the LM. The loop terminates when the LM emits aFinishaction. If the LM never emitsFinish— because the tools never return satisfying results, or because the LM’s stopping heuristic is miscalibrated — theReActloop runs untilmax_iters(default: 5). Five iterations on a complex query with GPT-4.1 costs $0.50-2.00. Reasonable for one call; alarming if your production agent handles 1,000 queries per hour and 5% of them loop to max_iters.
DSPy’s built-in cost controls and their limits
DSPy exposes three knobs that partially address cost:
ReAct(max_iters=N)— caps the number of Reason+Act iterations. Default is 5. This prevents infinite loops in the ReAct case. It does not prevent the optimizer from making expensive redundant calls during training, and it does not distinguish between a legitimate 5-step task and a task that got stuck on step 1 and repeated it 4 more times.LM(cache=True)— DSPy’s built-in LM caching deduplicates identical prompt+completion pairs. This helps significantly during optimizer runs when the same demonstration set is evaluated multiple times. It helps zero for the “same query, same wrong tool call, new LLM invocation” loop pattern — each loop iteration has a different accumulated context, so the prompt is never identical.BootstrapFewShot(max_bootstrapped_demos=N)— limits the number of demonstrations the optimizer collects in the bootstrap phase. This caps one component of optimization cost but not the evaluation phase, which can dominate.
None of these are per-run dollar caps. None detect repeated tool-call fingerprints. They are reasonable defaults for a well-formed program; they are insufficient guardrails for a program that hits an unexpected edge case in production.
The right interception layer in DSPy
DSPy’s architecture separates the LM (the language model client, e.g., dspy.OpenAI or dspy.Claude) from the Module (the program logic). The LM is configured globally via dspy.configure(lm=...) and called inside modules via self.predict(...) or via the compiled predictor’s __call__.
The two viable interception points are:
- Subclass the LM. DSPy’s LM classes are subclassable. You can create a
GuardedLMthat wraps the underlying client and adds cost accounting + loop detection on each__call__. This catches every LLM call regardless of which module made it — ideal for optimizer cost caps. - Wrap individual tool functions. For
ReActloop detection, wrap the Python functions registered as tools withguard(). The ReAct loop calls these functions; the guard sees the repetition and raises before the LM gets to reason about the repeated failure.
Subclassing the DSPy LM for budget enforcement
import dspy
from runguard import BudgetTracker, BudgetExceededError
class BudgetedLM(dspy.LM):
"""Wraps any DSPy LM with a per-run budget cap."""
def __init__(self, base_lm: dspy.LM, max_usd: float):
self.base_lm = base_lm
self.tracker = BudgetTracker(max_usd=max_usd)
# Delegate DSPy's model config attributes to the base LM
self.model = base_lm.model
self.kwargs = base_lm.kwargs
def __call__(self, prompt=None, messages=None, **kwargs):
# Preflight budget check (raises BudgetExceededError if over limit)
self.tracker.check(
model=self.model,
prompt_tokens=self._estimate_tokens(prompt or messages),
)
response = self.base_lm(prompt=prompt, messages=messages, **kwargs)
# Record actual usage from response
self.tracker.record(
model=self.model,
prompt_tokens=response.usage.prompt_tokens,
completion_tokens=response.usage.completion_tokens,
)
return response
def _estimate_tokens(self, content) -> int:
if isinstance(content, str):
return len(content) // 4 # rough estimate
return sum(len(m.get("content", "")) // 4 for m in (content or []))
# Usage: replace your existing LM with the budgeted wrapper
base = dspy.Claude(model="claude-sonnet-4-6", max_tokens=2000)
lm = BudgetedLM(base, max_usd=5.0)
dspy.configure(lm=lm)
# Your DSPy program runs unchanged; BudgetExceededError raises
# if the run would exceed $5.00
try:
result = my_program(question=user_query)
except BudgetExceededError as e:
print(f"Run halted at ${e.accumulated_usd:.2f} (limit: ${e.max_usd:.2f})")
Wrapping ReAct tools with loop detection
import dspy
from runguard import guard, LoopDetectedError
# Wrap each tool function before passing it to ReAct
def search(query: str) -> str:
return retriever.search(query)
def lookup(term: str) -> str:
return knowledge_base.lookup(term)
# Apply guard — each tool gets its own loop window
guarded_search = guard(search, loop_window=10, loop_threshold=2)
guarded_lookup = guard(lookup, loop_window=10, loop_threshold=2)
# ReAct sees the guarded versions as normal callables
react = dspy.ReAct(
signature="question -> answer",
tools=[guarded_search, guarded_lookup],
max_iters=10 # RunGuard trips before max_iters if a tool loops
)
try:
pred = react(question="What caused the 2026 AI regulatory shift?")
except LoopDetectedError as e:
print(f"ReAct loop detected: tool {e.tool_name} repeated {e.count}x")
print(f"Signature: {e.signature}")
With loop_threshold=2, the guard trips on the second repeat — tighter than the default 3 because ReAct’s loop is short (typically 5-10 iters max) and a repeated tool call at iteration 2 almost never recovers by iteration 3.
Cost control matrix: DSPy built-ins vs RunGuard
| Scenario | DSPy built-in | RunGuard |
|---|---|---|
| ReAct loop to max_iters | Capped at max_iters (default: 5) | Trips at 2nd repeat — before max_iters |
| Optimizer spending too much | No dollar cap | BudgetExceededError at LM subclass level |
| Repeated tool call (same args, same error) | max_iters only | LoopDetectedError at 2nd repeat |
| LM cache hit (identical prompt) | Returns cached result (free) | N/A — identical prompt is fine, it's patterns RunGuard watches |
| Gradual cost accumulation across many calls | No visibility until run ends | BudgetTracker fires on next call that exceeds ceiling |
| Context window approaching limit | LM raises on next call | ContextOverflowError at configurable threshold (e.g., 85%) |
When this matters most
The DSPy cost-cap pattern is most valuable in three scenarios:
- Running optimizers overnight or in CI. A scheduled optimizer run that encounters a pathological training example at 2 AM has no human to interrupt it. A budget cap transforms a potential $200 surprise into a $20 run that exits with a clear error and a logged cost breakdown.
- Production ReAct agents with user-provided inputs. Users’ queries are adversarial by nature — not maliciously, but because they describe edge cases your training data didn’t cover. A query that routes to a tool your knowledge base doesn’t index will cycle through all
max_iterswithout resolution. A loop detector exits at the first repeat. - Multi-agent DSPy programs. When you compose multiple
ReActorPredictmodules into a pipeline, a loop in module 2 generates calls in module 3 as well. Budget accounting at the LM level captures the total cost across all modules in the run, regardless of which module is looping.
Add a cost cap to your DSPy program today
RunGuard’s Python SDK installs with pip install runguard. Subclass your LM for optimizer cost caps; wrap your ReAct tools for loop detection. Both patterns are five lines of code and require no changes to your existing DSPy signatures or teleprompters.
Get started with RunGuard — or see the same pattern applied to AutoGen, smolagents, and general runaway cost prevention.