A runtime infinite-loop guard for LangGraph
LangGraph ships recursion_limit on the compiled graph — default 25 — and raises GraphRecursionError when a run crosses it. That is a step counter, not a budget alert. By the time the limit trips on a gpt-4o agent with a 30K-token state, you’ve already paid for 25 planner LLM calls; if you bumped the limit to 100 because a legitimate research run needed it, a stuck conditional edge can chew through twenty dollars before it stops. recursion_limit bounds the loop length; it does not bound the bill, and it does not page you when the breaker trips. This page is the runtime infinite-loop breaker we ship and how it slots into a LangGraph node in eight lines of Python.
Where the dollars actually accumulate inside a LangGraph run
- The planner LLM call on every agent-node visit. The standard agent pattern is two nodes —
agentcalls the model,tools(aToolNode) executes proposed tool calls, andtools_conditionroutes back toagentuntil the model emits a final answer. Each visit toagentis a fresh chat-completions call against the configured model with the entiremessagesreducer state in the prompt. After ten visits, the state has accumulated the original prompt plus ten assistant turns plus ten tool outputs — on a search-tool agent that returns 8K of HTML each call, theagentnode is staring at 80K input tokens by visit eleven, dominated by accumulated tool outputs. - State reducers that grow without bound. The
add_messagesreducer concatenates — nothing in the default reducer truncates or summarises. A graph that runs for fifty visits has fifty turns of message state on every subsequent planner call. Custom reducers can shrink the state, but the default is unbounded growth. - Conditional-edge ping-pong. An
add_conditional_edgesfromagentthat routes totoolswhen the message has tool calls, paired with atoolsnode whose output (a tool error, a degenerate tool result, an empty list) keeps making theagentemit the same tool call again, is the canonical LangGraph infinite loop.recursion_limitcatches it — eventually — but not before twenty-five visits’ worth of input-token bills land. - Subagents inherit a fresh default. A subagent invoked from inside a parent graph runs with its own
recursion_limit— if the parent is configured for 100 but the subagent is wrapped via middleware that doesn’t propagate config (a known LangGraph SubAgentMiddleware bug surfaced in March 2026), the child silently uses the default 25. That’s the small case. The bigger case is a graph with three subagents that each, in their own subgraph, hit a 25-visit ceiling before bubbling back — a single parent visit can be 75 hidden planner LLM calls. - Checkpointing prolongs the bleeding. The
MemorySaver/SqliteSaver/PostgresSavercheckpointers persist the graph state. A graph that legitimately runs for hours, saved every step, can resume mid-loop the next morning — the loop counter resets, the bill does not.
What LangGraph’s existing knobs give you and what they don’t
LangGraph’s primitives are correct in shape and wrong in unit. recursion_limit on graph.invoke(input, config={"recursion_limit": 25}) is a step count, not a dollar cap. A step on a 30K-token state costs ten times what a step on a 3K-token state costs, and the cap doesn’t know the difference. GraphRecursionError is raised mid-step, after the offending step’s LLM call has already been billed; the error tells you the limit hit, not the cumulative spend, and there’s no built-in on_recursion_limit hook to page Slack. The tools_condition helper is a routing rule, not a budget rule; it can branch on whether the assistant emitted tool calls, but it can’t branch on whether the run has spent too much. Custom checkpointers persist state across resumes — useful for durability, useless for cost prevention; a stuck loop saved at step 24 resumes at step 1 the next morning. The new Command(goto=…) primitive lets a node deflect, but only if you wire the deflection by hand. None of these look at cumulative dollars spent so far in this run and none of them stop the next node before it fires. A run that legitimately needs forty visits to refine a research summary and a run that’s been firing the same broken tool call against the same arguments for forty visits both look identical to the executor — they just produce different invoices.
What a runtime infinite-loop guard actually has to do
- Detect the cycle on a fingerprint, not a node-visit count. The same node firing forty times with forty different inputs is research; the same node firing three times with the same input is a loop. The detector takes a per-step signature — the proposed tool name, the canonicalised tool arguments, optionally a hash of the relevant slice of state — and looks for any cycle of length 1–8 repeating 3+ times in the most-recent 32 steps. Cycle of length 1 catches the stuck-tool loop. Cycle of length 2 catches the agent/tools ping-pong (tool errors, agent retries with the same arguments). Higher lengths cover the multi-node retry shapes a planner falls into when a conditional edge keeps sending it back through the same three or four nodes.
- Track real dollars, not step counts. A step that hit a 50K-token state costs ten times what a step on a 5K-token state costs. The tracker takes a USD number from the host after each node call — pulled from the
response_metadataon the AIMessage if you’re usingChatOpenAI.with_structured_outputor similar, or computed from input/output tokens times the published per-token price for the model. - Trip before the next node fires, not after. The check is in-process, on a numeric accumulator and a small ring buffer. It runs in microseconds. When the cap is crossed or the cycle threshold is hit, the next call into the wrapped node raises a typed error and the graph halts — the next planner call never goes out, the next conditional edge never evaluates, the next subagent never spawns.
- Be a primitive, not a framework opinion. The same wrap should compose with LangGraph nodes, with the bare LangChain LCEL chains the nodes wrap, with the raw OpenAI SDK underneath, with whatever framework lands next quarter. A breaker that ships as a LangGraph
NodeMiddlewaresubclass or a customStateGraphmixin is brittle; a breaker that wraps any callable is portable.
Wrapping a LangGraph node with runguard
# langgraph + runguard. The graph stays a graph; we wrap the agent node so
# the loop detector and budget tracker see every paid call before the next.
from langgraph.graph import StateGraph, MessagesState, END
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_openai import ChatOpenAI
from runguard import guard, BudgetExceededError, LoopDetectedError
llm = ChatOpenAI(model="gpt-4o").bind_tools([search_web, summarise])
def _agent_step(state: MessagesState):
msg = llm.invoke(state["messages"])
usage = msg.response_metadata.get("token_usage", {})
usd = (usage.get("prompt_tokens", 0) * 2.5e-6
+ usage.get("completion_tokens", 0) * 10e-6)
proposed = (msg.tool_calls or [{}])[0]
sig = f"agent:{proposed.get('name','final')}:{str(proposed.get('args',''))[:64]}"
return {"messages": [msg], "usd": usd, "sig": sig}
guarded_agent = guard(
_agent_step,
signature=lambda _state, out: out["sig"],
budget={"max_usd": 5, "window_ms": 60_000},
loop={"repeats": 3, "max_cycle_len": 8},
cost=lambda _state, out: out["usd"],
on_trip=lambda e: print("[runguard]", e["reason"], e.get("spent"), "of", e.get("cap")),
)
g = StateGraph(MessagesState)
g.add_node("agent", guarded_agent)
g.add_node("tools", ToolNode([search_web, summarise]))
g.set_entry_point("agent")
g.add_conditional_edges("agent", tools_condition)
g.add_edge("tools", "agent")
graph = g.compile()
try:
out = graph.invoke({"messages": [("user", "Brief me on Q3 SEC filings for $TICK")]})
except (BudgetExceededError, LoopDetectedError) as e:
print("halted:", e)
The loop primitive is the LoopDetector shipped at product/sdk/src/loop-detector.ts: defaults windowSize: 32, minCycleLen: 1, maxCycleLen: 8, repeats: 3 — a push(signature) the wrap calls per step, a scan() that returns a typed match, a reset() for fresh runs, and constructor-time validation that rejects repeats < 2 and windowSize < maxCycleLen * repeats. The budget primitive is the BudgetTracker at product/sdk/src/budget.ts: maxUsd for the cap, optional windowMs for rolling-window throttles, an add(usd) the host calls post-call (which silently no-ops on zero, if (usd === 0) return), and an exceeded() the wrap reads pre-call. The BudgetTracker file is 84 lines; the LoopDetector is 111 lines — both are pure in-process primitives, no daemon, no telemetry. The fingerprint-and-window approach is documented at how to detect LLM tool-call loops in production; the LangChain AgentExecutor wrap is here; the multi-agent CrewAI wrap is here; the browser-use wrap is here; the OpenAI AgentKit wrap is here.
How the breaker behaves inside graph.invoke()
- Costs accumulate after each node call. The wrap reads the
usdfield on the node’s output dict and pushes it into theBudgetTracker. Successful node visits under the cap pass through transparently — the merged state goes back to the graph executor and the next conditional edge evaluates. Zero-cost visits (a deterministic post-processing node, a router node that doesn’t call an LLM) never trip the budget; the tracker explicitly skips zero entries viaif (usd === 0) return. - The first visit over the cap throws before its planner call goes out.
BudgetExceededErroris constructed with the cumulative spend, the cap, and a reason field. It propagates out of the wrapped node up through the graph executor — no in-flight planner LLM call, no in-flight tool execution, no further conditional-edge evaluations. The graph state at the moment of trip is preserved; when the host catches the error, it can persist that state via the active checkpointer for a human review or a clean resume. - The loop detector trips on the third repeat of any signature cycle. The wrap pushes the signature into a 32-entry sliding window after each step and scans for a length-1 to length-8 cycle that’s repeated three or more times in a row at the tail of the window.
LoopDetectedErrorcarries the cycle length, the pattern itself, and the repeat count — the calling code dispatches on the type. A length-1 trip is the canonical LangGraph stuck-tool-loop (same proposed tool with same arguments, three times in a row); a length-2 trip is the agent/tools ping-pong; a length-3 trip is the three-node-retry shape a planner falls into under a conditional-edge regression. - Your
on_triphook fires before the throw. Page Slack with the spend curve and the offending cycle pattern, write a row to a trip log keyed on the graph name plus the run id, snapshot the graph state through the active checkpointer — whatever you wire. Sync hooks run inline; async hooks are awaited. Anon_tripexception propagates instead of the trip error, by design (the host explicitly opted in to side-effecting on trip). - Reset is explicit. When a fresh graph invocation starts, call
guarded_agent.reset()to clear both the spend ledger and the loop window. The tracker is per-guarded-fn, not per-process — you can wrap each LLM-calling node independently for per-node isolation, or share oneguard()across multiple nodes for graph-wide cumulative caps. Wrap a subagent’s entry node the same way and the parent and child get independent breakers; share theguard()reference between them and a parent budget covers the whole subgraph.
Tuning for LangGraph cost shapes
LangGraph’s default recursion_limit is 25 visits. On gpt-4o at a typical agent-pattern prompt size, a mid-run agent-node visit lands around $0.05–$0.20 of input tokens before assistant output, climbing as the messages reducer accumulates. The default max_usd: 5 on the budget tracker corresponds to roughly 25–100 visits on the small end and 12–50 on the heavy end — an honest research run with one or two subagents finishes well inside the cap; a stuck retry loop trips the breaker before the bill triples. For long-running orchestrations behind a SqliteSaver checkpointer (a daily digest agent that resumes from the saved state every morning, a continuous monitor agent), set window_ms: 60_000 with the same max_usd: 5: the cap rolls; old spend evicts; the cumulative invoice over an hour is unbounded but the per-minute spike is bounded. For high-stakes work where an over-spend is worse than an under-spend (production fan-out, paid lead enrichment, paid market data lookups), drop to max_usd: 1 — a tighter cap costs you one re-run on legitimate workflows; a looser cap costs you one Friday-night incident. Stack the budget guard with the loop detector on the same wrap: a stuck conditional-edge usually trips the loop guard first (the proposed tool name plus arguments hash to the same signature on each visit), but a slow-burn drift on slightly-different-each-time tool inputs trips the budget instead — both stop the run, both leave a typed error, both are cheap to retry. Keep recursion_limit set to its default or higher: it’s a backstop, not the primary defence.
The agent/tools ping-pong, on the same wrap
- Signature is the run fingerprint. The default
agent:<tool_name>:<first_64_chars_of_args>covers the canonical LangGraph loop — theagentnode proposes the same tool with the same arguments after thetoolsnode returns a degenerate result, three times in a row, and the run halts before the fourth visit. For per-graph loop detection across subagents, signature on the subagent name plus the proposed tool plus the canonicalised arguments. The detector pushes the signature into a 32-entry sliding window and looks for any cycle of length 1–8 repeating 3+ times. Length 1 catches the stuck-tool loop. Length 2 catches the agent/tools ping-pong:agentproposes,toolsreturns the same error,agentre-proposes,toolsre-errors. Higher lengths cover the multi-node retry shapes a graph falls into when a conditional edge keeps sending the run through the same three or four nodes. - Trip event tells you which fired.
reason: "loop"for a cycle hit;reason: "budget"for a cost cap;reason: "context"if you also pass a context-window guard for messages-state bloat. The typed error is one ofLoopDetectedError,BudgetExceededError,ContextLimitError— the calling code dispatches on the type, not on string parsing. - Per-node or shared. One
guard()per LLM-calling node gives you per-node isolation — the planner agent has one budget, a downstream summariser node has another. One sharedguard()across the whole graph (wrap each LLM node with the same guarded callable) gives you cross-node loop detection — useful when anadd_conditional_edgesregression keeps routing the same graph state through the same three nodes and you want the third repeat to fire even though no single node is repeating. - Plays nicely with checkpointers. The wrap is in-process and stateless across runs — it doesn’t persist its loop window or spend ledger to the checkpointer. That’s deliberate: a graph that resumes from a saved state is a fresh run from the breaker’s point of view, which is what you want for the legitimate-resume case. For the breaker-state-must-survive-restart case (the cross-process mirror of the dogfood story below), persist the trip event yourself in your
on_triphook and refuse to start the next run if the trip event is still open. - Zero outbound calls. The whole check is pure data flow inside your Python process. No telemetry, no daemon, no SaaS, nothing leaves your VPC. The wrap is the only thing in your process that knows the agent is loop-stuck or over-budget.
The first loop our SDK caught was ours
It wasn’t a LangGraph run — it was our own launch script firing a six-tweet thread against a paid X API. The first attempt came back with HTTP 402 CreditsDepleted. Six consecutive sessions later, six identical signatures — post_tweet:402:CreditsDepleted — were sitting in a flat JSON file on disk. The seventh session loaded the six-row history into the detector at startup and exited at signature three with a RunGuardTripped preflight before a single HTTP request went out. It has held the breaker open every session since. Read the dogfood story on the 30-day log; the same pattern slots into a LangGraph run when the agent node proposes the same stuck tool against the same arguments three visits in a row.
What this is not
- Not a LangGraph node middleware. RunGuard does not subclass
NodeMiddleware, ship aStateGraphmixin, or hook intoadd_conditional_edges. It wraps the underlying callable. That is the design — the same wrap composes with the LangChainAgentExecutor, with the raw OpenAI SDK, with browser-use, with whatever framework lands next quarter. The SDK atproduct/sdk/src/budget.tsis 84 lines; the loop detector atproduct/sdk/src/loop-detector.tsis 111 lines; both are in-process primitives. - Not a replacement for
recursion_limit. Keeprecursion_limitat its default or higher — it’s a graph-runtime backstop that catches the case where the wrap was never installed, where a node was added later without the guard, where a conditional edge regression isn’t signature-detectable. The two are complementary:recursion_limitbounds the worst case at the graph executor level;guard()bounds the realistic case at the node-call level, and trips earlier with a typed error and a Slack page. - Not LangSmith tracing. LangSmith answers “what did the graph do yesterday and how much did it cost?”. A runtime infinite-loop guard answers “should the next paid node fire?”. The two are complementary — one for finance, one for prevention. Run both. The trace is your morning-after audit; the breaker is your tonight-before-bed insurance.
- Not a server. No outbound network, no telemetry, no cookies, no daemon, no SaaS. The loop check is pure data flow inside your Python process. The same in-process discipline shows up in the embed-preview widget; the policy is one repo away in llms.txt.
The minimum LangGraph integration
One pip install runguard, one guard() wrap around a thin _agent_step that calls the LLM and returns {messages, usd, sig}, and one on_trip that pages the channel you actually read. Eight lines of wrap, no NodeMiddleware subclass to register, no StateGraph override, no agent decorator. The breaker trips on the dollar cap or the third repeat of any agent-step signature, halts the graph, and leaves a structured event and a typed error behind for the post-mortem — long before recursion_limit would have fired and long before the bill arrives. RunGuard ships it as runguard on PyPI and @runguard/sdk on npm — same primitive, both runtimes, in-process, zero deps.