OpenAI Agents SDK: two loop failure modes you need to guard against before production

OpenAI’s Agents SDK (the successor to Swarm, released as the production-ready multi-agent framework in 2025) makes it trivially easy to build networks of specialized agents that hand off to each other. A triage_agent hands off to a billing_agent, which hands back to triage_agent, which hands to billing_agent again. Unless something breaks the cycle, this handoff loop runs indefinitely — generating a planning LLM call on every hop. The second failure mode is within a single agent: a tool-call loop where the agent calls the same function with the same arguments repeatedly because the function’s output fails to advance the agent’s goal. The Agents SDK has no built-in loop detection or per-run dollar cap. This page shows how to add both.

Failure mode 1: the handoff loop

How Agents SDK handoffs work. In the Agents SDK, an agent signals a handoff by calling a transfer_to_<agent_name> function, which is registered as a tool. The SDK’s runner intercepts the tool call result, switches the active agent to the target, and continues the conversation with the new agent. Handoffs are the core multi-agent coordination primitive: a routing agent classifies the user’s intent and hands off to the appropriate specialist.
The loop pattern. A handoff loop occurs when Agent A hands off to Agent B, and Agent B’s response causes Agent A to be selected again. The most common cause: Agent B receives a task it cannot handle (wrong scope, missing information, or ambiguous intent), so it signals a handoff back to Agent A (or to a triage agent that re-routes to Agent B). The loop continues until the SDK’s max_turns limit fires (default 10 in many configurations) or the process is killed. At GPT-4o pricing, a 10-turn handoff loop between two agents costs roughly $0.10–$0.50 depending on context size — small for a single loop, catastrophic if triggered by many users simultaneously.
Detection: track handoff signatures. Each handoff is a tool call whose function name starts with transfer_to_. The handoff sequence is detectable as a repeating pattern in the run’s tool-call history: transfer_to_billing → transfer_to_triage → transfer_to_billing → … A period-2 cycle in the handoff sequence. RunGuard’s max_cycle_len=4 setting catches period-1 and period-2 cycles and fires on the third repetition.

Failure mode 2: tool-call loop within an agent

The pattern. The agent calls a tool, gets a result, determines the result is insufficient, calls the same tool with the same (or very similar) arguments, gets the same insufficient result, and calls the same tool again. The Agents SDK’s max_turns limits the number of turns (user+assistant message pairs), not the number of tool calls within a turn. A single turn can contain many tool calls if the model calls them in parallel or sequentially. A tool-call loop can exhaust significant cost within a single turn.
The error-string masking trigger. The most common trigger for tool-call loops in Agents SDK applications is a tool that returns an error string rather than raising an exception when it fails. The model sees the error string as a result (not a failure) and calls the tool again to try to get a better result. The fix is to raise a typed exception from the tool on failure — the SDK will surface it as a tool error in the conversation, which the model interprets correctly as a reason to stop retrying rather than a result to act on.

Adding a loop guard to OpenAI Agents SDK

The interception point: the model provider. The Agents SDK routes all LLM calls through a model provider (by default OpenAIProvider, but custom providers are supported via the model_provider parameter on Runner.run()). The correct place to add RunGuard is as a custom provider that wraps the underlying OpenAI provider and adds guard logic before each call goes out.

Python: custom model provider with RunGuard.

from openai import AsyncOpenAI
from agents import Agent, Runner, ModelProvider, ModelSettings
from agents.models.openai_responses import OpenAIResponsesModel
from runguard import guard, LoopDetectedError, BudgetExceededError
from typing import Any

class GuardedModelProvider(ModelProvider):
    """Agents SDK ModelProvider with RunGuard budget + loop detection."""

    def __init__(self, max_usd: float = 2.0, client: AsyncOpenAI = None):
        self._max_usd = max_usd
        self._client = client or AsyncOpenAI()
        self._guard = None

    def _build_guard(self):
        underlying = OpenAIResponsesModel(
            model="gpt-4o",
            openai_client=self._client,
        )

        async def _inner(input_data, model_settings):
            response = await underlying.get_response(input_data, model_settings)
            # Extract USD cost
            usage = getattr(response, "usage", None) or {}
            input_tokens = usage.get("input_tokens", 0)
            output_tokens = usage.get("output_tokens", 0)
            usd = (input_tokens * 2.50 + output_tokens * 10.0) / 1_000_000

            # Extract signature: prefer tool/handoff name over "end_turn"
            sig = "end_turn"
            for item in (getattr(response, "output", None) or []):
                call_type = getattr(item, "type", None)
                if call_type == "function_call":
                    sig = item.name
                    break
            return {"response": response, "usd": usd, "sig": sig}

        return guard(
            _inner,
            budget={"max_usd": self._max_usd},
            loop={"repeats": 3, "max_cycle_len": 4},  # catches handoff A↔B and tool repeats
        )

    def get_model(self, model_name: str):
        class _GuardedModel:
            def __init__(inner_self):
                inner_self._guard_provider = self

            async def get_response(inner_self, input_data, model_settings):
                if inner_self._guard_provider._guard is None:
                    inner_self._guard_provider._guard = inner_self._guard_provider._build_guard()
                result = await inner_self._guard_provider._guard(input_data, model_settings)
                return result["response"]

        return _GuardedModel()


# --- Usage ---

triage_agent = Agent(
    name="Triage",
    instructions="Classify the user request and hand off to the appropriate specialist.",
)
billing_agent = Agent(
    name="Billing",
    instructions="Handle billing questions. Hand back to Triage if the question is off-topic.",
)

# Wire up handoff tools
triage_agent.handoffs = [billing_agent]
billing_agent.handoffs = [triage_agent]

guarded_provider = GuardedModelProvider(max_usd=1.50)

async def handle_request(user_message: str):
    try:
        result = await Runner.run(
            triage_agent,
            input=user_message,
            model_provider=guarded_provider,
            max_turns=20,   # outer backstop; guard fires first
        )
        return result.final_output
    except LoopDetectedError as e:
        return f"Routing loop detected (pattern: {e.pattern!r}). Please try again with a clearer request."
    except BudgetExceededError as e:
        return f"Response budget exceeded (${e.spent:.3f}). Task was too complex for this session."

Scoping the guard per user request. The example above uses a single GuardedModelProvider instance with a shared _guard. For production multi-user systems, create a new provider instance (and therefore a new guard) per user request to prevent one user’s call history from affecting another’s budget and loop window. Pass the provider as a parameter to Runner.run() — it is request-scoped, not singleton.

Agents SDK built-in limits vs. RunGuard

Control	Agents SDK built-in	RunGuard
Max turns	max_turns parameter (default varies)	Not needed (loop detector fires first)
Per-run cost cap	Not supported	budget: max_usd — fires before each call
Handoff loop detection	Not supported	loop: max_cycle_len=4 catches A↔B cycles
Tool-call loop detection	Not supported	loop: repeats=3 — fires on 3rd repeat
Slack/PagerDuty alert on trip	Not supported	alerts: slack_webhook or pagerduty_key
Graceful partial output	Not supported (raises internally)	BudgetExceededError exposes accumulated context

Fixing the underlying causes alongside the guard

Fix 1: clear handoff conditions. Every agent should have an explicit condition for when NOT to hand off. An agent that can say “I cannot help with this request — please contact support directly” is less likely to create a handoff loop than one whose only exit is to hand off to another agent. Add a fallback tool or a terminal instruction to every agent in your network.
Fix 2: raise exceptions from tools, never return error strings. Every tool in your Agents SDK application should raise a typed exception on failure. The SDK wraps tool exceptions as error messages in the conversation, which the model interprets as a hard stop rather than a soft “try again” signal. Returning “Error: no results found” as a string triggers a retry loop; raising ToolExecutionError("no results found for query: ...")’ does not.
Fix 3: add a max-turns guard at the agent level, not just the runner. Set max_turns on individual agents for critical paths. A billing agent that genuinely needs more than 5 tool calls to complete a task is doing something unexpected. Layer the guard (fires at 3 repeats) with a turn limit (fires at N turns) for defense in depth.