OpenAI Agents SDK: two loop failure modes you need to guard against before production

OpenAI’s Agents SDK (the successor to Swarm, released as the production-ready multi-agent framework in 2025) makes it trivially easy to build networks of specialized agents that hand off to each other. A triage_agent hands off to a billing_agent, which hands back to triage_agent, which hands to billing_agent again. Unless something breaks the cycle, this handoff loop runs indefinitely — generating a planning LLM call on every hop. The second failure mode is within a single agent: a tool-call loop where the agent calls the same function with the same arguments repeatedly because the function’s output fails to advance the agent’s goal. The Agents SDK has no built-in loop detection or per-run dollar cap. This page shows how to add both.

Failure mode 1: the handoff loop

Failure mode 2: tool-call loop within an agent

Adding a loop guard to OpenAI Agents SDK

Agents SDK built-in limits vs. RunGuard

ControlAgents SDK built-inRunGuard
Max turnsmax_turns parameter (default varies)Not needed (loop detector fires first)
Per-run cost capNot supportedbudget: max_usd — fires before each call
Handoff loop detectionNot supportedloop: max_cycle_len=4 catches A↔B cycles
Tool-call loop detectionNot supportedloop: repeats=3 — fires on 3rd repeat
Slack/PagerDuty alert on tripNot supportedalerts: slack_webhook or pagerduty_key
Graceful partial outputNot supported (raises internally)BudgetExceededError exposes accumulated context

Fixing the underlying causes alongside the guard