Agent tool call timeout handling: the three failure modes and how to prevent each one

When an AI agent’s tool call hangs, stalls, or returns slowly, you face three distinct failure modes: the agent blocks indefinitely waiting for a result, the agent retries the same tool call repeatedly (a retry storm), or the agent receives an error and enters a loop trying to recover. All three are expensive. The first costs wallclock time and memory. The second costs API tokens (every retry is an LLM call that processes the retry result). The third costs both. This page explains how to configure per-tool timeouts, how retry storms form, and how RunGuard’s loop detector catches them before they drain your budget.

Failure mode 1: the hanging tool call

How it happens. An agent calls a tool that makes a network request — a web scraper, a database query, an external API. The downstream service is slow or unresponsive. The tool’s implementation has no timeout (or a very long one). The agent’s event loop blocks on the tool call. If you’re running agents for multiple users, a single slow tool call ties up the agent’s thread or async task until the underlying connection times out — which in some HTTP clients is 5 minutes or longer.
The agent-framework amplification. Most agent frameworks (LangChain, CrewAI, Agents SDK) run tool calls synchronously within the agent’s ReAct loop. If a tool blocks, the entire agent blocks. There is no built-in per-tool deadline in any of these frameworks. asyncio.wait_for works for async tools, but many tools — particularly third-party integrations — are synchronous, and wrapping them with timeout logic requires boilerplate per tool.

Python: a reusable tool timeout decorator.

import signal
import functools
from typing import Any, Callable

class ToolTimeoutError(Exception):
    def __init__(self, tool_name: str, timeout_s: float):
        super().__init__(f"Tool '{tool_name}' timed out after {timeout_s}s")
        self.tool_name = tool_name
        self.timeout_s = timeout_s

def with_timeout(seconds: float):
    """Decorator that adds a wall-clock timeout to a synchronous tool function."""
    def decorator(fn: Callable) -> Callable:
        @functools.wraps(fn)
        def wrapper(*args, **kwargs) -> Any:
            def _handler(signum, frame):
                raise ToolTimeoutError(fn.__name__, seconds)
            old = signal.signal(signal.SIGALRM, _handler)
            signal.setitimer(signal.ITIMER_REAL, seconds)
            try:
                return fn(*args, **kwargs)
            finally:
                signal.setitimer(signal.ITIMER_REAL, 0)
                signal.signal(signal.SIGALRM, old)
        return wrapper
    return decorator

# Usage with LangChain tools
from langchain.tools import tool

@tool
@with_timeout(seconds=10)
def fetch_web_page(url: str) -> str:
    """Fetch and return the text content of a web page."""
    import requests
    resp = requests.get(url, timeout=8)
    return resp.text[:4000]

TypeScript: per-tool AbortController. In TypeScript agents (LangChain.js, Vercel AI SDK), tools are typically async functions. Wrap with an AbortController timeout:

import { DynamicTool } from "@langchain/core/tools";

function withTimeout<T>(fn: () => Promise<T>, timeoutMs: number): Promise<T> {
  return new Promise((resolve, reject) => {
    const timer = setTimeout(() => reject(new Error(`Tool timed out after ${timeoutMs}ms`)), timeoutMs);
    fn().then(
      (v) => { clearTimeout(timer); resolve(v); },
      (e) => { clearTimeout(timer); reject(e); }
    );
  });
}

const fetchWebPageTool = new DynamicTool({
  name: "fetch_web_page",
  description: "Fetch text content from a URL",
  func: async (url: string) => {
    return withTimeout(async () => {
      const resp = await fetch(url);
      const text = await resp.text();
      return text.slice(0, 4000);
    }, 10_000); // 10 second timeout
  },
});

Failure mode 2: the retry storm

How retry storms form. A tool call times out or returns an error. The agent sees the error as a reason to try again (the model is trained to be persistent). It calls the same tool with the same — or slightly varied — arguments. The tool times out again. The model calls it a third time. This is a retry storm: the agent loops on a failing tool, generating an LLM call on every loop iteration to process the new error response and decide what to do. Each iteration costs API tokens. A typical retry storm with GPT-4o costs $0.05–$0.20 per loop iteration, and agents frequently run 5–15 iterations before hitting a max_iter backstop.
Why agents don’t self-stop. Modern LLMs are trained to be helpful and persistent. The model sees “Error: request timed out” as a signal to try a different approach or retry. It does not have a concept of “I have called this tool 5 times with the same arguments and I should stop.” Without explicit loop detection, the model will retry until it hits a hard iteration limit or until the tool eventually succeeds.
Detecting retry storms with RunGuard. RunGuard’s loop detector tracks the signature of each LLM response — specifically which tool it calls and with what arguments. If the same tool-call signature appears 3 times within a sliding window of 5 responses, RunGuard raises LoopDetectedError and stops the agent before the retry storm exhausts the session budget.

Failure mode 3: the error-recovery loop

The pattern. A tool returns an error string (not a raised exception). The model interprets the error string as a piece of information to act on — it tries to recover by calling a different tool, gathering more context, and eventually calling the original tool again in a slightly different way. This produces a longer loop than a direct retry: the model takes 3–4 steps between each re-attempt at the failing tool, making it harder to detect as a simple repeat.
Fix: raise exceptions from tools, never return error strings. When your tool fails, raise a typed exception. Agent frameworks surface tool exceptions as error messages in the conversation context with clear “tool call failed” framing, which the model interprets as a signal to stop trying that tool. Returning an error string looks like a result — the model sees it as partial information and tries to get more. The single most impactful change you can make to prevent error-recovery loops is to change all tool error returns from return "Error: ..." to raise ToolExecutionError("...").

Using RunGuard’s cycle detector for multi-step recovery loops.

from runguard import guard, LoopDetectedError

def make_agent_guard(max_usd: float = 2.0):
    async def wrapped_llm_call(messages, tools):
        # Your existing LLM call here
        response = await openai_client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
        )
        tool_calls = response.choices[0].message.tool_calls or []
        # Signature: name + first 30 chars of first arg (catches same-arg retries)
        sig = "text"
        if tool_calls:
            first = tool_calls[0]
            import json
            args = json.loads(first.function.arguments)
            first_arg = str(list(args.values())[0])[:30] if args else ""
            sig = f"{first.function.name}:{first_arg}"
        cost_usd = (
            response.usage.prompt_tokens * 2.50 +
            response.usage.completion_tokens * 10.0
        ) / 1_000_000
        return {"response": response, "usd": cost_usd, "sig": sig}

    return guard(
        wrapped_llm_call,
        budget={"max_usd": max_usd},
        loop={"repeats": 3, "window": 8, "max_cycle_len": 3},  # catches retry storms and 3-step recovery loops
    )

Tool timeout patterns comparison

Pattern	Problem	Solution
No timeout on tool	Agent hangs indefinitely on slow network	Add per-tool timeout (10–30s depending on tool)
Tool returns error string on timeout	Triggers retry storm — model sees error as data	Raise exception instead of returning error string
Tool raises exception on timeout	Model may still retry, but interprets as hard failure	RunGuard loop detector catches 3+ repeats
Global max_iter limit	Counts iterations not cost — still expensive	RunGuard budget cap fires on dollar spend, not iteration count
Exponential backoff in tool	Adds real delays — still costs tokens on each retry	Combine backoff with RunGuard retry-count detection

Per-tool timeout recommendations by tool type

Web scraping / HTTP fetch tools: 10–15 seconds. Most legitimate web pages respond in under 5 seconds. A 10–15 second timeout catches genuinely slow pages while preventing indefinite hangs on unresponsive hosts. Set an explicit timeout on the HTTP client as well as the outer tool wrapper — HTTP client timeouts handle connection and read phases independently.
Database query tools: 5–10 seconds. Database queries that take longer than 10 seconds are almost always doing a full table scan or hitting a missing index. A tight timeout forces the issue to surface as a query performance problem rather than an agent hang. Add a LIMIT clause to all agent-generated SQL queries as an additional guardrail.
LLM-as-tool (nested agents): 30–60 seconds. Tools that call another LLM (e.g., a summarizer sub-agent) have longer inherent latency. Set a 30–60 second timeout, but also apply RunGuard’s budget cap to the inner LLM call so the nested agent can’t consume the outer agent’s entire budget.
External API tools: 5–20 seconds depending on SLA. For APIs with documented SLAs (e.g., a weather API that responds in under 2 seconds), set a tight 5-second timeout. For batch-processing APIs that can legitimately take longer, use a polling pattern rather than a synchronous timeout — the tool submits a job and returns a job ID, and a separate tool polls for the result.