Your TypeScript AI agent calls a tool. The tool returns a result. The model decides to call the same tool again with the same arguments. Then again. Your bill climbs. The run never terminates. This is not a hypothetical — it is the most common production incident in AI agent engineering in 2026, and it is almost always preventable.
This guide covers four root causes of TypeScript agent loops, three native approaches that look like fixes but aren't, and one pattern-based solution that actually stops the loop before it costs you money.
The key distinction: A step counter tells you how many times the loop has run. A circuit breaker stops the loop before the next step fires. You want the latter.
Why TypeScript agents loop
Not all infinite loops look the same. There are four distinct failure modes, and each requires a different detection strategy.
1. Tool-call signature lock
The most common pattern: the model calls search("Q"), gets a result, and decides the answer requires calling search("Q") again. The model's context now contains the result of the first call, but the model treats the result as insufficient and retries the identical query. The loop signature is search("Q") → search("Q") → search("Q") — same function name, same arguments hash, repeating.
This happens most often with retrieval tools (web search, vector DB lookup, file read) where the model can't distinguish "I already tried this" from "I should try this." The result fills more context, the model becomes less confident, and it retries in hope of a better answer.
2. State mutation miss
The agent calls updateRecord(id, {status: "processing"}), the tool succeeds, and the agent calls getRecord(id) to confirm. The record shows "processing". The agent calls updateRecord(id, {status: "processing"}) again — because the desired end-state is "done", not "processing", and the model re-evaluates each step without maintaining awareness of its own prior mutations.
This pattern is especially common in agents orchestrating database workflows, CRM updates, or any state machine where the model is responsible for driving transitions.
3. Retry storm
A tool call fails with a transient error. The model retries. The retry fails. The model decides the best next step is — retry. Some frameworks have built-in retry logic that compounds with model-level retry decisions: LangChain's withRetry(), OpenAI SDK's automatic retry, and the model's own learned behavior of "try again" can stack, turning a single transient failure into 20–40 back-to-back calls before any timeout fires.
4. Context window drift
As the agent's context fills, older tool results get truncated. The model loses visibility into what it has already done. With the prior steps truncated away, the model re-evaluates the task as if starting fresh — and re-executes the same tools it ran in the now-truncated section. This is the most expensive loop: it accelerates as the context fills, because each additional tool call adds more tokens that push out more prior results.
See how to detect LLM tool call loops in production for observability strategies that surface which of these four patterns is hitting your system.
Three native approaches — and why they fall short
Before reaching for a library, most teams try to solve this in framework configuration. Each approach works partially and fails specifically.
recursionLimit (LangGraph)
LangGraph's recursionLimit is a step counter. It counts graph node invocations and raises GraphRecursionError when the count exceeds the limit. This is better than nothing, but it has two problems.
First, it fires on the Nth step regardless of whether those steps are repetitive. A legitimate 50-step research workflow hits the same limit as a 50-iteration loop. You end up either setting the limit high enough to allow legitimate runs (which allows long loops) or low enough to catch loops (which breaks legitimate runs). The limit is blunt.
Second, it fires after the damage is done. If your limit is 30 and your agent loops 30 times at $0.20/loop, you've spent $6 before the error fires. A circuit breaker that trips on the third repeated signature spends $0.60 on the same failure.
See LangGraph infinite loop guard for a detailed breakdown of how recursionLimit interacts with RunGuard's pattern detection in conditional-edge cycles.
while loop with a counter
The handwritten alternative to recursionLimit:
let steps = 0;
const MAX_STEPS = 20;
while (!done) {
if (steps++ > MAX_STEPS) throw new Error("Too many steps");
result = await agent.step(input);
done = checkTermination(result);
}
Same problem as recursionLimit: it's a step counter, not a pattern counter. It cannot distinguish a loop from a long legitimate run. Setting MAX_STEPS = 20 is a guess, not a detection strategy.
Timeouts
An AbortController or setTimeout kill stops the run after a wall-clock duration, regardless of what the agent is doing. Timeouts are useful as a backstop, but they don't tell you why the run timed out or which tool call caused the loop. They also don't help when a loop runs fast: 50 identical search calls in 8 seconds will complete and return before a 30-second timeout fires.
Pattern-based detection: how it actually works
The fundamental insight is that a loop is a pattern, not a count. A circuit breaker should trip when it sees the same pattern repeating N times — not when it sees N steps, some of which might be different.
A "loop signature" is a hash of the tool name and the arguments. Two calls produce the same signature when they call the same function with the same inputs:
function signature(toolName: string, args: unknown): string {
// Stable hash of (tool, args) — order-independent JSON
const canonical = JSON.stringify(args, Object.keys(args as object).sort());
return `${toolName}:${hash(canonical)}`;
}
function hash(s: string): string {
// FNV-1a 32-bit — fast, non-cryptographic, good distribution
let h = 2166136261;
for (let i = 0; i < s.length; i++) {
h ^= s.charCodeAt(i);
h = (h * 16777619) >>> 0;
}
return h.toString(16);
}
With signatures in hand, a minimal circuit breaker is a sliding window counter:
class LoopBreaker {
private counts = new Map<string, number>();
private readonly threshold: number;
constructor(threshold = 3) {
this.threshold = threshold;
}
check(toolName: string, args: unknown): void {
const sig = signature(toolName, args);
const count = (this.counts.get(sig) ?? 0) + 1;
this.counts.set(sig, count);
if (count >= this.threshold) {
throw new Error(
`Circuit breaker tripped: ${toolName} called with identical args ${count} times`
);
}
}
}
You'd call breaker.check(toolName, args) before every tool invocation. If the same signature appears three times, the breaker trips before the third call goes out.
This is the shape of pattern-based detection. The production version has a few more edges to handle: argument normalization (so {a: 1, b: 2} and {b: 2, a: 1} produce the same signature), windowed counting (so old signatures decay and don't trigger on a new request that happens to call the same tool much later), budget-aware counting (weight each check by token cost), and context-size monitoring.
Using RunGuard in TypeScript
RunGuard packages the above logic — including windowed counting, argument normalization, budget tracking, and Slack alerting — as a single wrapper function. Install it:
npm install @runguard/sdk
The guard() function wraps any async tool function and adds a pre-call circuit breaker:
import { guard } from "@runguard/sdk";
const safeSearch = guard(search, {
loopThreshold: 3, // trip on 3rd identical signature
budgetLimitUsd: 2.00, // trip if run exceeds $2
onTrip: (reason) => {
console.error("Circuit breaker tripped:", reason);
// reason.type: "loop" | "budget" | "context"
// reason.signature: the repeated tool+args hash
// reason.count: how many times it fired
}
});
// Use exactly like the unwrapped function
const result = await safeSearch("AI agent circuit breaker TypeScript");
When the breaker trips, it throws a typed GuardTripError that your error boundary can catch and route to your on-call channel, a dead-letter queue, or a fallback path:
import { GuardTripError } from "@runguard/sdk";
try {
const result = await agent.run(input);
} catch (err) {
if (err instanceof GuardTripError) {
await slack.send(`#incidents`, `RunGuard tripped: ${err.reason.type} on ${err.reason.tool}`);
// Return a safe fallback rather than letting the agent retry
return { status: "circuit_open", reason: err.reason.type };
}
throw err;
}
LangChain composition
For agents built on LangChain, wrap the tool function before passing it to DynamicTool:
import { DynamicTool } from "@langchain/core/tools";
import { guard } from "@runguard/sdk";
const safeFetch = guard(fetchWebPage, { loopThreshold: 3 });
const fetchTool = new DynamicTool({
name: "fetch_web_page",
description: "Fetch the content of a URL",
func: async (url) => safeFetch(url),
});
// LangSmith traces what RunGuard permits — they compose cleanly
const agent = await createReactAgent({ llm, tools: [fetchTool] });
The composition is guard(tool_fn) — RunGuard wraps at the function call level, below LangChain's CallbackManager. This means RunGuard fires before on_tool_start would be emitted, so the tripped call never appears in your LangSmith trace as a successful invocation. This is the key architectural difference between a circuit breaker and an observability tool — see LangSmith alternative: RunGuard for the full technical comparison.
LangGraph composition
In a LangGraph workflow, guard each node's tool calls individually rather than wrapping the graph:
import { StateGraph, END } from "@langchain/langgraph";
import { guard } from "@runguard/sdk";
// Create a per-run breaker instance — scoped to this execution
const breaker = guard.createBreaker({ loopThreshold: 3, budgetLimitUsd: 5 });
const searchNode = breaker.wrap(async (state) => {
const results = await search(state.query);
return { ...state, searchResults: results };
});
const graph = new StateGraph(AgentState)
.addNode("search", searchNode)
.addNode("analyze", analyzeNode)
.addEdge("search", "analyze")
.addConditionalEdges("analyze", shouldSearch, {
"yes": "search",
"no": END,
});
// If shouldSearch returns "yes" three times with the same query,
// the breaker trips on the fourth traversal of the search edge.
The breaker.wrap(nodeFunction) form is equivalent to calling guard(fn) on each tool inside the node. The breaker instance is shared across all nodes in the graph, so a loop that cycles through two different nodes with the same net effect can still be detected if either node's tool signature repeats.
For a deeper look at why LangGraph's recursionLimit is insufficient for the most common LangGraph loop patterns, see LangGraph infinite loop guard.
The five loop patterns RunGuard catches that counters miss
| Pattern | What happens | Why counters miss it |
|---|---|---|
Signature locksearch(q) → search(q) → search(q) |
Same tool, same args, repeating indefinitely | Counter fires at step N regardless of whether the steps are repetitive |
Two-tool cyclegetRecord(id) → updateRecord(id, x) → getRecord(id) → ... |
Agent alternates between two tools in a cycle; neither alone repeats N times | Counter counts total steps; cycle can run forever within limit if alternating |
Multi-agent handoffagentA.run() → agentB.run() → agentA.run() → ... |
Two sub-agents hand off to each other; parent sees only two tool calls but they recur | Parent-level counter only sees two distinct tool calls |
| Context-drift re-execution Earlier steps truncated → model reruns them |
Model loses visibility of past steps; reruns the same sequence from a filled context | Counter starts from zero per run; re-execution looks like a fresh start |
Retry stormfetchURL(u) → 429 → fetchURL(u) → 429 → ... |
Transient error triggers model retry AND framework retry AND SDK retry | Counter doesn't know which calls are retries vs. legitimate new calls |
Pattern 1 (signature lock) is caught by signature-based counting. Patterns 2 and 3 require RunGuard's pair-cycle detector — which looks at the last N signatures as a sequence rather than individual counts. Patterns 4 and 5 require context-size monitoring and error-type tracking respectively, both of which RunGuard monitors in parallel with signature counting.
For detection strategies specific to LangChain agents — including how on_tool_start can observe patterns but cannot halt them — see LangChain circuit breaker: detection vs. prevention.
Configuration reference
The full set of options for guard(fn, options):
interface GuardOptions {
// Loop detection
loopThreshold?: number; // default: 3 — trips on Nth identical signature
windowSize?: number; // default: 20 — signatures older than this decay
pairCycleDepth?: number; // default: 6 — depth of cycle-pair detection
// Budget guard
budgetLimitUsd?: number; // default: undefined (no budget cap)
budgetModelPricing?: Record<string, { inputPer1k: number; outputPer1k: number }>;
// Context guard
contextThresholdPct?: number; // default: 0.85 — trip at 85% of model's context window
// Alerting
onTrip?: (reason: TripReason) => void | Promise<void>;
slackWebhookUrl?: string; // posts to Slack when breaker trips
// Recovery
allowAfterCooldown?: boolean; // default: false — trip is permanent for this run
cooldownMs?: number; // if allowAfterCooldown, how long to wait
}
The defaults are conservative by design: loopThreshold: 3 catches loops quickly, windowSize: 20 prevents false positives on legitimate tools that get called many times across a long run, and allowAfterCooldown: false means a tripped breaker stays open — the run should be inspected, not automatically retried.
Frequently asked questions
Does RunGuard work with Python agents?
Yes — pip install runguard gives you the same guard() decorator pattern for Python. The TypeScript SDK shipped first because the AI agent ecosystem in 2026 is TypeScript-first (Vercel AI SDK, LangChain.js, LangGraph.js, Mastra, Composio all have TypeScript as the primary SDK target), but the Python SDK reaches feature parity. See LangSmith alternative loop detection for Python-specific examples using LangChain's traceable decorator.
Will the circuit breaker break my streaming responses?
guard() wraps at the tool call level, not the LLM inference level. It fires before the tool function is invoked. Streaming from the model itself is not affected — only the outbound tool calls are gated. If your architecture calls the model in a streaming loop and invokes tools based on stream events, wrap the tool-invocation function specifically rather than wrapping the streaming call.
How is a loopThreshold of 3 different from recursionLimit: 3 in LangGraph?
loopThreshold: 3 counts identical signatures — it fires when the same pattern appears three times. recursionLimit: 3 counts graph-node invocations total — it fires when any three nodes have run. In a 20-step agent that calls ten different tools twice each, recursionLimit: 3 would trip on step 3. loopThreshold: 3 would only trip if the same tool was called with the same arguments three times — leaving the legitimate multi-step run alone.
What happens if the breaker trips inside a try/catch that swallows errors?
GuardTripError extends Error with a readonly isTripError = true discriminant. We recommend checking err instanceof GuardTripError (or err.isTripError) before swallowing in a catch block, or using the onTrip callback — which fires synchronously before the throw — to log and alert regardless of downstream error handling.
Can RunGuard detect loops that span multiple invocations of the same agent function?
Not by default — the breaker instance is scoped to a single run. If you want cross-run detection (e.g., an orchestration loop that calls agent.run() multiple times with the same input), create a persistent breaker outside the run function and pass it in via guard.createBreaker(). Cross-run detection is also available in the Team plan via the RunGuard dashboard, which aggregates signature hashes across runs and surfaces patterns that individual breakers would miss.