Agent tool call timeout handling: the three failure modes and how to prevent each one
When an AI agent’s tool call hangs, stalls, or returns slowly, you face three distinct failure modes: the agent blocks indefinitely waiting for a result, the agent retries the same tool call repeatedly (a retry storm), or the agent receives an error and enters a loop trying to recover. All three are expensive. The first costs wallclock time and memory. The second costs API tokens (every retry is an LLM call that processes the retry result). The third costs both. This page explains how to configure per-tool timeouts, how retry storms form, and how RunGuard’s loop detector catches them before they drain your budget.
Failure mode 1: the hanging tool call
- How it happens. An agent calls a tool that makes a network request — a web scraper, a database query, an external API. The downstream service is slow or unresponsive. The tool’s implementation has no timeout (or a very long one). The agent’s event loop blocks on the tool call. If you’re running agents for multiple users, a single slow tool call ties up the agent’s thread or async task until the underlying connection times out — which in some HTTP clients is 5 minutes or longer.
- The agent-framework amplification. Most agent frameworks (LangChain, CrewAI, Agents SDK) run tool calls synchronously within the agent’s ReAct loop. If a tool blocks, the entire agent blocks. There is no built-in per-tool deadline in any of these frameworks.
asyncio.wait_forworks for async tools, but many tools — particularly third-party integrations — are synchronous, and wrapping them with timeout logic requires boilerplate per tool. - Python: a reusable tool timeout decorator.
import signal import functools from typing import Any, Callable class ToolTimeoutError(Exception): def __init__(self, tool_name: str, timeout_s: float): super().__init__(f"Tool '{tool_name}' timed out after {timeout_s}s") self.tool_name = tool_name self.timeout_s = timeout_s def with_timeout(seconds: float): """Decorator that adds a wall-clock timeout to a synchronous tool function.""" def decorator(fn: Callable) -> Callable: @functools.wraps(fn) def wrapper(*args, **kwargs) -> Any: def _handler(signum, frame): raise ToolTimeoutError(fn.__name__, seconds) old = signal.signal(signal.SIGALRM, _handler) signal.setitimer(signal.ITIMER_REAL, seconds) try: return fn(*args, **kwargs) finally: signal.setitimer(signal.ITIMER_REAL, 0) signal.signal(signal.SIGALRM, old) return wrapper return decorator # Usage with LangChain tools from langchain.tools import tool @tool @with_timeout(seconds=10) def fetch_web_page(url: str) -> str: """Fetch and return the text content of a web page.""" import requests resp = requests.get(url, timeout=8) return resp.text[:4000] - TypeScript: per-tool AbortController. In TypeScript agents (LangChain.js, Vercel AI SDK), tools are typically async functions. Wrap with an AbortController timeout:
import { DynamicTool } from "@langchain/core/tools"; function withTimeout<T>(fn: () => Promise<T>, timeoutMs: number): Promise<T> { return new Promise((resolve, reject) => { const timer = setTimeout(() => reject(new Error(`Tool timed out after ${timeoutMs}ms`)), timeoutMs); fn().then( (v) => { clearTimeout(timer); resolve(v); }, (e) => { clearTimeout(timer); reject(e); } ); }); } const fetchWebPageTool = new DynamicTool({ name: "fetch_web_page", description: "Fetch text content from a URL", func: async (url: string) => { return withTimeout(async () => { const resp = await fetch(url); const text = await resp.text(); return text.slice(0, 4000); }, 10_000); // 10 second timeout }, });
Failure mode 2: the retry storm
- How retry storms form. A tool call times out or returns an error. The agent sees the error as a reason to try again (the model is trained to be persistent). It calls the same tool with the same — or slightly varied — arguments. The tool times out again. The model calls it a third time. This is a retry storm: the agent loops on a failing tool, generating an LLM call on every loop iteration to process the new error response and decide what to do. Each iteration costs API tokens. A typical retry storm with GPT-4o costs $0.05–$0.20 per loop iteration, and agents frequently run 5–15 iterations before hitting a
max_iterbackstop. - Why agents don’t self-stop. Modern LLMs are trained to be helpful and persistent. The model sees “Error: request timed out” as a signal to try a different approach or retry. It does not have a concept of “I have called this tool 5 times with the same arguments and I should stop.” Without explicit loop detection, the model will retry until it hits a hard iteration limit or until the tool eventually succeeds.
- Detecting retry storms with RunGuard. RunGuard’s loop detector tracks the signature of each LLM response — specifically which tool it calls and with what arguments. If the same tool-call signature appears 3 times within a sliding window of 5 responses, RunGuard raises
LoopDetectedErrorand stops the agent before the retry storm exhausts the session budget.
Failure mode 3: the error-recovery loop
- The pattern. A tool returns an error string (not a raised exception). The model interprets the error string as a piece of information to act on — it tries to recover by calling a different tool, gathering more context, and eventually calling the original tool again in a slightly different way. This produces a longer loop than a direct retry: the model takes 3–4 steps between each re-attempt at the failing tool, making it harder to detect as a simple repeat.
- Fix: raise exceptions from tools, never return error strings. When your tool fails, raise a typed exception. Agent frameworks surface tool exceptions as error messages in the conversation context with clear “tool call failed” framing, which the model interprets as a signal to stop trying that tool. Returning an error string looks like a result — the model sees it as partial information and tries to get more. The single most impactful change you can make to prevent error-recovery loops is to change all tool error returns from
return "Error: ..."toraise ToolExecutionError("..."). - Using RunGuard’s cycle detector for multi-step recovery loops.
from runguard import guard, LoopDetectedError def make_agent_guard(max_usd: float = 2.0): async def wrapped_llm_call(messages, tools): # Your existing LLM call here response = await openai_client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, ) tool_calls = response.choices[0].message.tool_calls or [] # Signature: name + first 30 chars of first arg (catches same-arg retries) sig = "text" if tool_calls: first = tool_calls[0] import json args = json.loads(first.function.arguments) first_arg = str(list(args.values())[0])[:30] if args else "" sig = f"{first.function.name}:{first_arg}" cost_usd = ( response.usage.prompt_tokens * 2.50 + response.usage.completion_tokens * 10.0 ) / 1_000_000 return {"response": response, "usd": cost_usd, "sig": sig} return guard( wrapped_llm_call, budget={"max_usd": max_usd}, loop={"repeats": 3, "window": 8, "max_cycle_len": 3}, # catches retry storms and 3-step recovery loops )
Tool timeout patterns comparison
| Pattern | Problem | Solution |
|---|---|---|
| No timeout on tool | Agent hangs indefinitely on slow network | Add per-tool timeout (10–30s depending on tool) |
| Tool returns error string on timeout | Triggers retry storm — model sees error as data | Raise exception instead of returning error string |
| Tool raises exception on timeout | Model may still retry, but interprets as hard failure | RunGuard loop detector catches 3+ repeats |
| Global max_iter limit | Counts iterations not cost — still expensive | RunGuard budget cap fires on dollar spend, not iteration count |
| Exponential backoff in tool | Adds real delays — still costs tokens on each retry | Combine backoff with RunGuard retry-count detection |
Per-tool timeout recommendations by tool type
- Web scraping / HTTP fetch tools: 10–15 seconds. Most legitimate web pages respond in under 5 seconds. A 10–15 second timeout catches genuinely slow pages while preventing indefinite hangs on unresponsive hosts. Set an explicit timeout on the HTTP client as well as the outer tool wrapper — HTTP client timeouts handle connection and read phases independently.
- Database query tools: 5–10 seconds. Database queries that take longer than 10 seconds are almost always doing a full table scan or hitting a missing index. A tight timeout forces the issue to surface as a query performance problem rather than an agent hang. Add a
LIMITclause to all agent-generated SQL queries as an additional guardrail. - LLM-as-tool (nested agents): 30–60 seconds. Tools that call another LLM (e.g., a summarizer sub-agent) have longer inherent latency. Set a 30–60 second timeout, but also apply RunGuard’s budget cap to the inner LLM call so the nested agent can’t consume the outer agent’s entire budget.
- External API tools: 5–20 seconds depending on SLA. For APIs with documented SLAs (e.g., a weather API that responds in under 2 seconds), set a tight 5-second timeout. For batch-processing APIs that can legitimately take longer, use a polling pattern rather than a synchronous timeout — the tool submits a job and returns a job ID, and a separate tool polls for the result.