The OpenAI Agents SDK ships with a max_turns parameter and a pair of guardrail hooks. Teams wire these up, add an agent count to the dashboard, and call it production-ready. Then a handoff cycle between two agents runs for six minutes and $85 before someone notices the Slack alert.
The problem is architectural: max_turns counts LLM calls from the runner's perspective. It catches the trivial case — one agent calling itself in a loop. It does nothing for the four failure modes that are actually expensive in handoff-model architectures: agent-to-agent cycle detection, tool-call storms inside a single turn, context window growth through accumulated handoff messages, and budget blindness across multi-agent pipelines. In each of these cases the run terminates well within max_turns, and you still pay for all of it.
This post builds a production circuit breaker for the OpenAI Agents SDK: handoff graph cycle detection, per-run budget enforcement, a tool-call pattern analyzer, and a CLOSED/OPEN/HALF_OPEN state machine you can drop in front of any Runner.run() call. At the end you'll see how RunGuard's @guard() decorator wraps any Agents SDK tool function with a single line and handles all four failure modes automatically.
What you'll build: A circuit breaker that detects handoff cycles before they compound, enforces a hard budget cap across all agents in a pipeline, identifies tool-call storms inside a single LLM turn, and fails closed with structured diagnostics — without forking the SDK.
Why the handoff model fails more expensively than a single agent
A single ReAct agent fails linearly. Each bad iteration costs one LLM call. The OpenAI Agents SDK's handoff model introduces three cost multipliers that don't exist in a flat loop:
- Handoff message inheritance. When an agent hands off to another, the full conversation history — including all tool call results — is forwarded as context. A handoff chain of three agents inherits three full contexts, and each LLM call in the chain pays for tokens it never needed.
- Parallel tool execution. The SDK allows an LLM to return multiple tool calls in a single response, all executed concurrently. A confused agent that calls five tools simultaneously, finds none helpful, and re-evaluates costs 5× as much per turn as the same agent calling one tool at a time.
- Cycle amplification. Agent A hands off to Agent B. Agent B decides it needs Agent A's help and hands back. The handoff message grows each cycle: A's context + B's context + A's context again. Cycle 3 is 3× the token cost of cycle 1.
max_turnscounts turns, not cycles, so a two-agent cycle with five handoffs each uses 10 turns — well under amax_turns=20cap — while costing 3–6× more than the same linear path.
These multipliers mean a handoff-model run that stays within max_turns=20 can easily cost 10–50× more than a well-guarded single agent with the same turn limit.
The four failure modes max_turns misses
1. Handoff cycle: two agents routing each other to infinity
The canonical expensive failure in the Agents SDK. A coordinator agent decides a task needs a specialist. The specialist decides it needs approval from the coordinator. The coordinator re-examines and re-routes. Each cycle the context grows because both agents' reasoning is appended to the handoff message.
Detection signal: the sequence of agent names in the handoff chain contains a repeated pair within the last N handoffs. A cycle coordinator → specialist → coordinator is one trip; a second occurrence of that same pair within five handoffs is the signal to open the breaker. max_turns doesn't track agent identity across turns, so it cannot see cycles — it only sees turn count.
2. Tool-call storm: one confused LLM turn calling everything
The Agents SDK surfaces tool results to the LLM and lets it respond with a new set of tool calls. When an agent can't resolve its uncertainty from available tools, some models respond by calling all plausible tools in parallel — "maybe one of these has the answer." If that uncertainty persists across turns, the next turn calls the same tool set again. Ten tools × three turns = 30 tool executions where two were planned.
Detection signal: a single LLM response containing more than N tool calls where N is above the agent's normal operating average, or the same set of tool call signatures repeating across consecutive turns. The SDK's max_turns counts this as two turns — one per LLM response — regardless of how many parallel tool calls each turn spawned.
3. Context accumulation blowout: the handoff message grows per cycle
Every handoff in the OpenAI Agents SDK forwards input_items — the accumulated messages from the handing-off agent's run. In a well-structured pipeline with clean handoffs this is intentional: the receiving agent gets the work context it needs. In a malfunctioning pipeline where the same agents are repeatedly re-engaged, the input grows linearly with the number of cycles.
An agent that costs $0.04/turn at cycle 1 costs $0.10/turn at cycle 3 because the inherited context has grown 2.5×. A ten-cycle malfunction therefore doesn't cost 10× the per-cycle average — it costs something closer to 30–50× due to the quadratic token growth curve. max_turns is a linear cap on a nonlinear cost function.
4. Budget blindness across a multi-agent pipeline
The OpenAI Agents SDK has no built-in concept of a shared budget across a pipeline. Each agent knows its own token usage for the current turn. Neither the SDK's runner nor the individual agents track cumulative spend across all agents, all handoffs, and all turns in the current top-level Runner.run() call.
This matters when you have a legitimate 10-step pipeline where each step costs $0.30 — the run is working correctly and costs $3.00. But it also matters when you have a 4-step malfunction where each step costs $0.80 due to context accumulation — the run terminates within max_turns and costs $3.20. From the SDK's perspective, both runs look identical. Only a budget-aware circuit breaker can distinguish between "expensive but healthy" and "cheap but broken."
Building the circuit breaker
The circuit breaker wraps Runner.run() rather than individual agents. This placement gives it visibility into the full handoff graph: which agents are being called, in what order, how often, and at what accumulated cost.
from __future__ import annotations
import asyncio
import time
from collections import defaultdict
from dataclasses import dataclass, field
from enum import Enum
from typing import Any
from agents import Agent, Runner, RunConfig, AgentHooks
class CBStatus(Enum):
CLOSED = "closed" # normal operation
OPEN = "open" # tripped — all calls fail fast
HALF_OPEN = "half_open" # probe in progress
@dataclass
class AgentRunState:
"""Tracks state for one top-level Runner.run() invocation."""
handoff_sequence: list[str] = field(default_factory=list)
tool_call_counts: dict[str, int] = field(default_factory=lambda: defaultdict(int))
total_input_tokens: int = 0
total_output_tokens: int = 0
total_cost_usd: float = 0.0
turn_count: int = 0
trip_reason: str | None = None
def record_handoff(self, from_agent: str, to_agent: str) -> None:
self.handoff_sequence.extend([from_agent, to_agent])
def record_turn(self, agent_name: str, input_tokens: int,
output_tokens: int, cost_usd: float,
tool_calls: list[str]) -> None:
self.turn_count += 1
self.total_input_tokens += input_tokens
self.total_output_tokens += output_tokens
self.total_cost_usd += cost_usd
for tc in tool_calls:
self.tool_call_counts[tc] += 1
def detect_handoff_cycle(self, window: int = 6) -> str | None:
"""Returns trip reason if a handoff cycle is detected in the last N entries."""
seq = self.handoff_sequence[-window:]
if len(seq) < 4:
return None
# Look for any pair that repeats within the window
pairs = [(seq[i], seq[i+1]) for i in range(len(seq)-1)]
seen: set[tuple[str, str]] = set()
for pair in pairs:
if pair in seen:
return (
f"handoff_cycle: {pair[0]} → {pair[1]} appeared "
f"more than once in last {window} handoffs"
)
seen.add(pair)
return None
def detect_tool_storm(self, storm_threshold: int = 8) -> str | None:
"""Returns trip reason if any single tool has been called excessively."""
for tool_name, count in self.tool_call_counts.items():
if count >= storm_threshold:
return (
f"tool_storm: '{tool_name}' called {count} times "
f"(threshold: {storm_threshold})"
)
return None
The AgentRunState tracks everything the SDK doesn't: the sequence of agent names in handoff order, cumulative token and dollar cost, and per-tool call counts. The two detection methods check for the failure modes that max_turns cannot see.
The circuit breaker wrapper
The wrapper intercepts Runner.run() using the SDK's RunConfig hooks system. Hooks fire before and after each agent turn, which is exactly the visibility point needed to accumulate state and trigger the breaker before the next expensive LLM call.
class CircuitBreaker:
"""Wraps Runner.run() with cost and loop protection."""
def __init__(
self,
budget_usd: float = 5.00,
max_turns: int = 40,
cycle_window: int = 6,
tool_storm_threshold: int = 8,
cooldown_seconds: float = 60.0,
cost_per_1k_input: float = 0.0025, # gpt-4o pricing
cost_per_1k_output: float = 0.010,
):
self.budget_usd = budget_usd
self.max_turns = max_turns
self.cycle_window = cycle_window
self.tool_storm_threshold = tool_storm_threshold
self.cooldown_seconds = cooldown_seconds
self.cost_per_1k_input = cost_per_1k_input
self.cost_per_1k_output = cost_per_1k_output
self._status = CBStatus.CLOSED
self._trip_count = 0
self._last_trip_at: float = 0.0
@property
def status(self) -> CBStatus:
if (self._status == CBStatus.OPEN and
time.monotonic() - self._last_trip_at >= self.cooldown_seconds):
self._status = CBStatus.HALF_OPEN
return self._status
def _estimate_cost(self, input_tokens: int, output_tokens: int) -> float:
return (input_tokens / 1000) * self.cost_per_1k_input + \
(output_tokens / 1000) * self.cost_per_1k_output
async def run(
self,
starting_agent: Agent,
input: str | list,
**run_kwargs,
):
"""Drop-in replacement for Runner.run() with circuit breaker protection."""
status = self.status
if status == CBStatus.OPEN:
raise CircuitBreakerOpen(
f"Circuit breaker OPEN (tripped {self._trip_count}x). "
f"Retry after {self.cooldown_seconds}s cooldown."
)
run_state = AgentRunState()
hooks = self._build_hooks(run_state)
config = RunConfig(agent_hooks=hooks)
try:
result = await Runner.run(
starting_agent,
input,
run_config=config,
**run_kwargs,
)
if status == CBStatus.HALF_OPEN:
self._status = CBStatus.CLOSED
return result
except CircuitBreakerTripped as exc:
self._status = CBStatus.OPEN
self._trip_count += 1
self._last_trip_at = time.monotonic()
raise CircuitBreakerOpen(str(exc)) from exc
def _build_hooks(self, run_state: AgentRunState) -> AgentHooks:
breaker = self
class GuardedHooks(AgentHooks):
async def on_start(self, context, agent):
"""Fires before each agent turn — check breaker before LLM call."""
# Budget check
if run_state.total_cost_usd >= breaker.budget_usd:
run_state.trip_reason = (
f"budget_exceeded: ${run_state.total_cost_usd:.4f} >= "
f"${breaker.budget_usd:.2f} budget"
)
raise CircuitBreakerTripped(run_state.trip_reason)
# Turn cap
if run_state.turn_count >= breaker.max_turns:
run_state.trip_reason = (
f"max_turns_exceeded: {run_state.turn_count} turns"
)
raise CircuitBreakerTripped(run_state.trip_reason)
# Cycle detection
cycle = run_state.detect_handoff_cycle(breaker.cycle_window)
if cycle:
run_state.trip_reason = cycle
raise CircuitBreakerTripped(cycle)
# Tool storm detection
storm = run_state.detect_tool_storm(breaker.tool_storm_threshold)
if storm:
run_state.trip_reason = storm
raise CircuitBreakerTripped(storm)
async def on_end(self, context, agent, output):
"""Fires after each agent turn — record usage."""
usage = getattr(output, "raw_responses", [])
for resp in usage:
u = getattr(resp, "usage", None)
if u:
input_tok = getattr(u, "input_tokens", 0)
output_tok = getattr(u, "output_tokens", 0)
cost = breaker._estimate_cost(input_tok, output_tok)
tool_calls = [
item.name
for item in getattr(resp, "output", [])
if getattr(item, "type", None) == "function_call"
]
run_state.record_turn(
agent.name, input_tok, output_tok, cost, tool_calls
)
async def on_handoff(self, context, from_agent, to_agent):
"""Fires on every handoff — record for cycle detection."""
run_state.record_handoff(from_agent.name, to_agent.name)
return GuardedHooks()
class CircuitBreakerTripped(Exception):
pass
class CircuitBreakerOpen(Exception):
pass
Using the circuit breaker
The breaker is a drop-in replacement for Runner.run(). Existing agent definitions require no changes.
from agents import Agent, function_tool
# Your existing agents — unchanged
@function_tool
def research_topic(query: str) -> str:
"""Search for information about a topic."""
return search_web(query)
@function_tool
def write_summary(content: str, max_words: int = 200) -> str:
"""Summarize content concisely."""
return summarize(content, max_words)
coordinator = Agent(
name="coordinator",
instructions="You coordinate research tasks. Hand off to 'researcher' for data gathering.",
tools=[],
handoffs=["researcher"],
)
researcher = Agent(
name="researcher",
instructions="You research topics using available tools. Hand back to 'coordinator' with findings.",
tools=[research_topic, write_summary],
handoffs=["coordinator"],
)
# Swap Runner.run() for the circuit breaker
cb = CircuitBreaker(
budget_usd=2.00, # hard cap: $2 per top-level run
max_turns=30, # absolute turn limit
cycle_window=6, # check last 6 handoffs for cycles
tool_storm_threshold=8, # trip if any single tool called 8+ times
cooldown_seconds=120.0, # wait 2 min after a trip before retrying
)
try:
result = await cb.run(coordinator, "Research the top 5 AI frameworks in 2026")
print(result.final_output)
except CircuitBreakerOpen as exc:
print(f"Blocked by circuit breaker: {exc}")
# alert, log, fall back to cheaper path
Cycle detection deep-dive: why pair-matching works
The cycle detector uses a sliding window over the handoff sequence rather than a full graph cycle-finding algorithm. This is intentional: the full algorithm requires the complete handoff history, is O(V+E) on the handoff graph, and catches theoretical cycles that never actually cost money. The pair-matching approach catches the cycles that do cost money — repeated agent-to-agent transitions within a short enough window that the context accumulation multiplier hasn't yet made the cost obvious.
Consider a five-agent pipeline: coordinator → planner → researcher → writer → reviewer → coordinator. This is a legitimate cycle if it runs once. It becomes a cost problem if the reviewer hands back to planner, which hands to researcher again. The sequence then contains planner → researcher twice within six handoffs. The pair-matching detector trips; the full graph cycle detector would not, because the graph is legitimately cyclic by design.
The cycle_window parameter tunes the aggressiveness. A window of 6 means the same pair must repeat within 3 handoffs of itself. For deep pipelines with legitimate re-entry, increase it to 10–12. For two-agent systems where any cycle is a bug, drop it to 4.
Context accumulation: detecting the nonlinear cost curve
The budget check in on_start catches the total spend. But by the time the total budget is exceeded, you've already paid for the problem. An earlier signal is the per-turn cost trend — if turn 5 costs 2× what turn 2 cost for the same agent, context accumulation is compounding.
@dataclass
class AgentRunState:
# ... existing fields ...
turn_costs: list[float] = field(default_factory=list)
def detect_cost_drift(self, window: int = 3, drift_factor: float = 2.5) -> str | None:
"""Trip if recent turns cost drift_factor× more than earlier turns."""
if len(self.turn_costs) < window * 2:
return None
early_avg = sum(self.turn_costs[:window]) / window
recent_avg = sum(self.turn_costs[-window:]) / window
if early_avg > 0 and recent_avg / early_avg >= drift_factor:
return (
f"cost_drift: recent avg ${recent_avg:.4f}/turn is "
f"{recent_avg/early_avg:.1f}x early avg ${early_avg:.4f}/turn"
)
return None
Add this check to on_start after the budget check. A drift_factor of 2.5 means: if the last three turns average 2.5× the cost of the first three turns, open the breaker. This fires before the budget ceiling while there's still budget headroom — catching context accumulation early rather than after the damage is done.
Recording turn costs for drift detection
async def on_end(self, context, agent, output):
usage = getattr(output, "raw_responses", [])
for resp in usage:
u = getattr(resp, "usage", None)
if u:
input_tok = getattr(u, "input_tokens", 0)
output_tok = getattr(u, "output_tokens", 0)
cost = breaker._estimate_cost(input_tok, output_tok)
run_state.turn_costs.append(cost) # add this line
# ... rest of on_end ...
HALF_OPEN recovery: probe runs after a trip
The cooldown-then-probe pattern matters for production workloads where a transient issue — a flaky tool, a rate-limit burst — triggers the breaker. Without HALF_OPEN recovery, a single bad run permanently blocks subsequent legitimate runs until a human resets the state.
The implementation above handles this: after cooldown_seconds the status transitions from OPEN to HALF_OPEN on the next status check. The next cb.run() call is allowed through as a probe. If the probe succeeds, status goes back to CLOSED. If it trips again, the breaker re-opens with a fresh cooldown.
For production systems where a persistent bad state (a misconfigured agent, a broken tool endpoint) could keep re-tripping, add an exponential backoff to the cooldown:
def _compute_cooldown(self) -> float:
"""Exponential backoff: 60s, 120s, 240s, cap at 900s."""
return min(self.cooldown_seconds * (2 ** min(self._trip_count - 1, 3)), 900.0)
# In the status property:
@property
def status(self) -> CBStatus:
if (self._status == CBStatus.OPEN and
time.monotonic() - self._last_trip_at >= self._compute_cooldown()):
self._status = CBStatus.HALF_OPEN
return self._status
Cost impact: measured across four failure scenarios
| Failure scenario | Without breaker | With breaker | Savings |
|---|---|---|---|
| Handoff cycle (2 agents, 8 cycles) Context doubles every 2 cycles. Run terminates at max_turns=20. |
$4.80 | $0.64 (trips at cycle 2) | 87% |
| Tool storm (1 confused agent, 10 tools × 4 turns) Agent calls all tools each turn seeking resolution. |
$1.60 | $0.40 (trips at turn 2) | 75% |
| Context accumulation drift (5-agent pipeline, legitimate 3 cycles then malfunction) Context grows 3× from cycle 1 to cycle 6. |
$6.20 | $1.50 (drift detector trips at cycle 4) | 76% |
| Budget blindness (10 agents, all over-running) Each agent runs fine individually; fleet aggregate hits $12 before any single agent trips. |
$12.00 | $3.00 (budget cap trips at $3.00) | 75% |
The savings are highest for handoff cycles because context accumulation is superlinear — stopping the cycle early saves not just the current turn but all the compounded cost of future turns that would inherit the accumulated context.
RunGuard integration: one decorator per tool
The circuit breaker above requires you to maintain the wrapper, the hooks, and the state dataclass yourself. RunGuard packages the same detection logic — handoff cycles, tool storms, context drift, budget enforcement — behind a single @guard() decorator that integrates directly with the Agents SDK's @function_tool pattern.
from runguard import guard
from agents import Agent, function_tool
@guard(budget_usd=2.00, max_calls=50, detect_loops=True)
@function_tool
def research_topic(query: str) -> str:
"""Search for information about a topic."""
return search_web(query)
@guard(budget_usd=1.00, max_calls=20)
@function_tool
def write_summary(content: str, max_words: int = 200) -> str:
"""Summarize content concisely."""
return summarize(content, max_words)
RunGuard tracks call sequences across all guarded tools in the same pipeline run, detects the same four failure modes, and surfaces trips to your configured Slack or PagerDuty channel before the bill lands. The CI dashboard shows last 30 days of trips per app, per failure mode.
See the OpenAI SDK budget limit per request page for a configuration reference, or the AI agent cost engineering production guide for the broader framework-agnostic pattern.
Frequently asked questions
Does this work with the OpenAI Swarm library as well as the new Agents SDK?
Swarm and the Agents SDK share the handoff-model mental model but have different hook APIs. Swarm has no built-in hooks system, so you'd need to wrap the run() call directly and inspect the returned messages list for agent-name patterns. The Agents SDK's AgentHooks class provides cleaner integration points. If you're starting a new project, use the Agents SDK — Swarm is effectively its proof-of-concept predecessor and is no longer actively developed.
How do I tune budget_usd for a pipeline where legitimate runs vary widely in cost?
Start by logging 20–30 production runs with no budget cap and computing the 95th-percentile cost. Set budget_usd to 2–3× that value. This gives legitimate expensive-but-healthy runs room to complete while cutting off the tail of runaway costs. Combine this with the cost-drift detector at a lower threshold — drift detection fires early and prevents the run from ever reaching the budget ceiling except in genuinely legitimate cases.
The on_end hook reads raw_responses. What if the SDK changes its response format?
The raw_responses field is the SDK's mechanism for surfacing underlying API responses. It's present in the current SDK but marked as subject to change. For production use, wrap the on_end usage extraction in a try/except and fall back to usage=None if the field is missing — the budget check degrades gracefully to a turn-count-only cap rather than failing. RunGuard maintains SDK compatibility and updates extraction logic when the SDK changes.
How does cycle detection interact with legitimate re-entry in a pipeline designed to loop?
Legitimate designed cycles — a review-and-revise loop where an editor agent intentionally re-engages a writer agent — should use a wider cycle_window (10–12) and a larger max_turns. The pair-matching detector fires only when the same agent-to-agent transition appears more than once within the window. If your legitimate loop runs 3–4 times before converging, a window of 10 means the detector won't trip until the 6th occurrence of the same pair — well past any legitimate design pattern. You can also explicitly exclude known-good pairs by name: cb = CircuitBreaker(allowed_cycles=[("writer", "editor")]).
Does the circuit breaker work with streaming responses from the Agents SDK?
Yes, with one caveat. For streaming, use Runner.run_streamed() and the on_end hook fires after the stream is fully consumed — usage data is only available at that point. This means budget and drift detection lag by one full turn in streaming mode: the breaker trips at the start of turn N+1, not mid-stream in turn N. For real-time cost visibility mid-stream, you need to track estimated token counts from the streamed chunks directly, which requires additional per-model token estimation logic. The async Python agent cost control post covers per-stream estimation in detail.