A runtime infinite-loop guard for LangGraph

LangGraph ships recursion_limit on the compiled graph — default 25 — and raises GraphRecursionError when a run crosses it. That is a step counter, not a budget alert. By the time the limit trips on a gpt-4o agent with a 30K-token state, you’ve already paid for 25 planner LLM calls; if you bumped the limit to 100 because a legitimate research run needed it, a stuck conditional edge can chew through twenty dollars before it stops. recursion_limit bounds the loop length; it does not bound the bill, and it does not page you when the breaker trips. This page is the runtime infinite-loop breaker we ship and how it slots into a LangGraph node in eight lines of Python.

Where the dollars actually accumulate inside a LangGraph run

What LangGraph’s existing knobs give you and what they don’t

LangGraph’s primitives are correct in shape and wrong in unit. recursion_limit on graph.invoke(input, config={"recursion_limit": 25}) is a step count, not a dollar cap. A step on a 30K-token state costs ten times what a step on a 3K-token state costs, and the cap doesn’t know the difference. GraphRecursionError is raised mid-step, after the offending step’s LLM call has already been billed; the error tells you the limit hit, not the cumulative spend, and there’s no built-in on_recursion_limit hook to page Slack. The tools_condition helper is a routing rule, not a budget rule; it can branch on whether the assistant emitted tool calls, but it can’t branch on whether the run has spent too much. Custom checkpointers persist state across resumes — useful for durability, useless for cost prevention; a stuck loop saved at step 24 resumes at step 1 the next morning. The new Command(goto=…) primitive lets a node deflect, but only if you wire the deflection by hand. None of these look at cumulative dollars spent so far in this run and none of them stop the next node before it fires. A run that legitimately needs forty visits to refine a research summary and a run that’s been firing the same broken tool call against the same arguments for forty visits both look identical to the executor — they just produce different invoices.

What a runtime infinite-loop guard actually has to do

Wrapping a LangGraph node with runguard

# langgraph + runguard. The graph stays a graph; we wrap the agent node so
# the loop detector and budget tracker see every paid call before the next.
from langgraph.graph import StateGraph, MessagesState, END
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_openai import ChatOpenAI
from runguard import guard, BudgetExceededError, LoopDetectedError

llm = ChatOpenAI(model="gpt-4o").bind_tools([search_web, summarise])

def _agent_step(state: MessagesState):
    msg = llm.invoke(state["messages"])
    usage = msg.response_metadata.get("token_usage", {})
    usd = (usage.get("prompt_tokens", 0) * 2.5e-6
         + usage.get("completion_tokens", 0) * 10e-6)
    proposed = (msg.tool_calls or [{}])[0]
    sig = f"agent:{proposed.get('name','final')}:{str(proposed.get('args',''))[:64]}"
    return {"messages": [msg], "usd": usd, "sig": sig}

guarded_agent = guard(
    _agent_step,
    signature=lambda _state, out: out["sig"],
    budget={"max_usd": 5, "window_ms": 60_000},
    loop={"repeats": 3, "max_cycle_len": 8},
    cost=lambda _state, out: out["usd"],
    on_trip=lambda e: print("[runguard]", e["reason"], e.get("spent"), "of", e.get("cap")),
)

g = StateGraph(MessagesState)
g.add_node("agent", guarded_agent)
g.add_node("tools", ToolNode([search_web, summarise]))
g.set_entry_point("agent")
g.add_conditional_edges("agent", tools_condition)
g.add_edge("tools", "agent")
graph = g.compile()

try:
    out = graph.invoke({"messages": [("user", "Brief me on Q3 SEC filings for $TICK")]})
except (BudgetExceededError, LoopDetectedError) as e:
    print("halted:", e)

The loop primitive is the LoopDetector shipped at product/sdk/src/loop-detector.ts: defaults windowSize: 32, minCycleLen: 1, maxCycleLen: 8, repeats: 3 — a push(signature) the wrap calls per step, a scan() that returns a typed match, a reset() for fresh runs, and constructor-time validation that rejects repeats < 2 and windowSize < maxCycleLen * repeats. The budget primitive is the BudgetTracker at product/sdk/src/budget.ts: maxUsd for the cap, optional windowMs for rolling-window throttles, an add(usd) the host calls post-call (which silently no-ops on zero, if (usd === 0) return), and an exceeded() the wrap reads pre-call. The BudgetTracker file is 84 lines; the LoopDetector is 111 lines — both are pure in-process primitives, no daemon, no telemetry. The fingerprint-and-window approach is documented at how to detect LLM tool-call loops in production; the LangChain AgentExecutor wrap is here; the multi-agent CrewAI wrap is here; the browser-use wrap is here; the OpenAI AgentKit wrap is here.

How the breaker behaves inside graph.invoke()

Tuning for LangGraph cost shapes

LangGraph’s default recursion_limit is 25 visits. On gpt-4o at a typical agent-pattern prompt size, a mid-run agent-node visit lands around $0.05–$0.20 of input tokens before assistant output, climbing as the messages reducer accumulates. The default max_usd: 5 on the budget tracker corresponds to roughly 25–100 visits on the small end and 12–50 on the heavy end — an honest research run with one or two subagents finishes well inside the cap; a stuck retry loop trips the breaker before the bill triples. For long-running orchestrations behind a SqliteSaver checkpointer (a daily digest agent that resumes from the saved state every morning, a continuous monitor agent), set window_ms: 60_000 with the same max_usd: 5: the cap rolls; old spend evicts; the cumulative invoice over an hour is unbounded but the per-minute spike is bounded. For high-stakes work where an over-spend is worse than an under-spend (production fan-out, paid lead enrichment, paid market data lookups), drop to max_usd: 1 — a tighter cap costs you one re-run on legitimate workflows; a looser cap costs you one Friday-night incident. Stack the budget guard with the loop detector on the same wrap: a stuck conditional-edge usually trips the loop guard first (the proposed tool name plus arguments hash to the same signature on each visit), but a slow-burn drift on slightly-different-each-time tool inputs trips the budget instead — both stop the run, both leave a typed error, both are cheap to retry. Keep recursion_limit set to its default or higher: it’s a backstop, not the primary defence.

The agent/tools ping-pong, on the same wrap

The first loop our SDK caught was ours

It wasn’t a LangGraph run — it was our own launch script firing a six-tweet thread against a paid X API. The first attempt came back with HTTP 402 CreditsDepleted. Six consecutive sessions later, six identical signatures — post_tweet:402:CreditsDepleted — were sitting in a flat JSON file on disk. The seventh session loaded the six-row history into the detector at startup and exited at signature three with a RunGuardTripped preflight before a single HTTP request went out. It has held the breaker open every session since. Read the dogfood story on the 30-day log; the same pattern slots into a LangGraph run when the agent node proposes the same stuck tool against the same arguments three visits in a row.

What this is not

The minimum LangGraph integration

One pip install runguard, one guard() wrap around a thin _agent_step that calls the LLM and returns {messages, usd, sig}, and one on_trip that pages the channel you actually read. Eight lines of wrap, no NodeMiddleware subclass to register, no StateGraph override, no agent decorator. The breaker trips on the dollar cap or the third repeat of any agent-step signature, halts the graph, and leaves a structured event and a typed error behind for the post-mortem — long before recursion_limit would have fired and long before the bill arrives. RunGuard ships it as runguard on PyPI and @runguard/sdk on npm — same primitive, both runtimes, in-process, zero deps.