A circuit breaker for LangChain agents

LangChain’s AgentExecutor ships with max_iterations and max_execution_time. They are the right primitives for a 2023 agent that ran in seconds and called free models. In 2026 they catch the loop after the bill, not before it. This page is the runtime breaker we ship and how it slots into a LangChain Tool in eight lines.

Where loops actually happen inside a LangChain agent

Tool returns a 4XX, the model retries. The most common shape: a Tool hits a paid upstream that is rate-limited, out of credits, or refusing a malformed payload. The error string lands in the agent’s scratchpad. The next iteration sees the failure, “reasons” about it, and emits the same call. Same name, same arguments, same upstream, same 4XX. Three or four rounds in, you have a fixed point.
An OutputParserException the agent thinks it can fix. The model emits an action whose JSON the parser rejects. LangChain re-prompts. The model emits the same broken JSON. Each re-prompt is a full token-cost call.
A retrieval tool that returns the same passages. The agent re-asks the same vector store for the same query, gets the same top-k, and tries to pull a different conclusion out of identical evidence. The signatures match; the bill climbs.
An AgentExecutor stuck in a two-tool ping-pong. search → calculator → search → calculator with the same intermediate values. Per-step succeeds, the run loops as a whole. max_iterations is the only thing that stops it — eventually, expensively.

Why `max_iterations` and `max_execution_time` miss it

The two knobs LangChain gives you are correct in shape and wrong in granularity. max_iterations is a count of how many (action, observation) rounds the executor is allowed to take; the default is 15, the agent runs to that ceiling, and only then raises a STOP. By the time round 15 fires, you have made fifteen LLM calls plus fifteen tool calls. If each is $0.10 of model spend and a $1 paid-API call, that’s $16.50 per loop event — before you knew there was a loop. max_execution_time is wall-clock; on agents that legitimately take minutes, it never trips early enough. Neither knob looks at the content of the calls. A run that legitimately needs 12 distinct steps and a run that fires the same broken call 12 times look identical to the executor.

What a circuit breaker actually has to do

Fingerprint each call before the model spends another token on it. Tool name plus canonicalized arguments plus — for failures — the response status and error title. http_get:tinyurl.com/x:429:RateLimited is one signature; vector_search:"refund policy":200:ok is another.
Watch for the same fingerprint repeating in a sliding window. The runs you care about loop fast. A window of 32 entries spans more iterations than most well-formed agents need; cycles of length 1 (same call), 2 (ping-pong) and up to 8 cover the realistic bad shapes.
Trip before the next call goes out, not after. The check is in-process, on a Map-backed buffer. It runs in well under a millisecond per call. When the third repeat lands, the next tool.invoke() raises a typed error and the executor halts.
Be a primitive, not a framework opinion. Loops happen on retrieval tools, on HTTP tools, on shell tools, on tool-use you wrote yesterday. A breaker that wraps any async function is the right shape; a breaker that ships as a LangChain-only callback handler is not.

Wrapping a LangChain Tool with `@runguard/sdk`

// langchain.js + @runguard/sdk. The Tool stays a Tool; only its func gets
// wrapped. AgentExecutor sees the same interface, the breaker sees every call.
import { DynamicTool } from 'langchain/tools';
import { guard, LoopDetectedError, BudgetExceededError } from '@runguard/sdk';

const guardedFetch = guard(
  async ({ url }: { url: string }) => {
    const r = await fetch(url);
    return { status: r.status, body: await r.text() };
  },
  {
    signature: ({ url }) => `http_get:${url}`,
    loop: { repeats: 3, maxCycleLen: 8 },
    budget: { maxUsd: 5 },
    cost: (_in, out) => out.status >= 400 ? 0 : 0.001,
    onTrip: async (e) => { console.error('[runguard]', e.reason, e.signature); },
  },
);

const tool = new DynamicTool({
  name: 'http_get',
  description: 'Fetch a URL and return status + body. Trips on third identical call.',
  func: async (input: string) => {
    try {
      return JSON.stringify(await guardedFetch({ url: input }));
    } catch (e) {
      if (e instanceof LoopDetectedError) throw e; // stop the executor
      if (e instanceof BudgetExceededError) throw e;
      return `error: ${(e as Error).message}`;
    }
  },
});

Defaults match every other surface in the SDK: windowSize: 32, minCycleLen: 1, maxCycleLen: 8, repeats: 3. The wrapped function is plain, non-LangChain async — so the same wrap composes with raw fetch, with OpenAI.chat.completions.create, with whatever you reach for next sprint. The fingerprint-and-window approach is documented at how to detect LLM tool-call loops in production.

How the breaker behaves inside an `AgentExecutor`

First two identical calls return normally. The detector pushes the signature into the window and returns { detected: false }. The agent sees its observation and continues. This is critical — legitimate retries against a transient 429 with backoff are common, and a breaker that trips on attempt two is more annoying than the loop it was supposed to catch.
The third identical call throws before the wrapped fetch runs. LoopDetectedError is constructed with the cycle length, the repeats count, and the matching pattern. It is not caught and re-thrown by the DynamicTool shim above — it propagates up through tool.invoke(), into the executor, out to the caller.
Your onTrip hook fires before the throw. Page Slack, write a row to a trip log, kill the AWS task — whatever you wire. It is awaited; an onTrip error propagates instead of the trip error, by design (the host explicitly opted in to side-effecting on trip).
Reset is explicit. When the executor is restarted for a fresh run, call guardedFetch.reset() to clear the window. The loop counter is per-guarded-fn, not per-process — you can keep one breaker per tool and reset them independently.

Tuning for LangChain’s loop shapes

LangChain’s AgentExecutor defaults to max_iterations: 15. A breaker tuned to repeats: 3, maxCycleLen: 8 can catch a length-1 loop on iteration 3 and a length-2 ping-pong on iteration 6 — both well inside the executor’s ceiling. If your tools genuinely retry idempotent reads (eventually-consistent stores, slow upstreams that recover), pass retryable: true on the call site so the detector skips that signature, or split your tool into a per-attempt one that the detector watches and an outer-retry one that it does not. For high-cost runs — a research agent paying $0.50 per LLM step — consider repeats: 2 on tools whose loop signatures are unique enough that a false-positive trip is cheap. The cost of a missed loop is the bill; the cost of a false-positive trip is one re-run.

Budget and context guards, on the same wrap

Budget cap. Pass budget: { maxUsd: 5 } with a cost function. The tracker accumulates after each successful call. The next call after the cap throws BudgetExceededError. This is the answer to “the model shouldn’t ever spend more than $5 on this run” — an LLM doesn’t volunteer to stop, the wrapper does.
Rolling-window throttle. Add windowMs: 60_000 for “$5 per minute” instead of cumulative-only. Old spend rolls off the front.
Context-window guard. Pass context: { maxContextTokens: 200_000, headroom: 4_000 } with a tokens function that projects total tokens (system + history + new input + reserved-for-output) using your provider’s tokenizer. The breaker trips 4 000 tokens before Anthropic’s 200 k ceiling, so the host can summarize, checkpoint, or fork a fresh thread — before the next call would 400.
One guard(), three reasons. The same wrap watches loop, budget, and context simultaneously. The TripEvent.reason field on onTrip tells you which fired; the typed error tells the calling code which to handle.

The first loop our SDK caught was ours

It wasn’t a LangChain agent — it was our own launch script firing a six-tweet thread against a shared paid API. The first attempt came back with HTTP 402 CreditsDepleted. Six consecutive sessions later, six identical signatures — post_tweet:402:CreditsDepleted — were sitting in a flat JSON file on disk. The seventh session loaded the six-row history, pushed it into the detector at startup, and exited at signature three with RunGuardTripped before a single HTTP request went out. It has held the breaker open every session since. Read the dogfood story on the 30-day log; the same pattern slots into LangChain’s AgentExecutor when the loop is across iterations instead of across sessions.

What this is not

Not a LangChain plugin. RunGuard does not subclass BaseTool, register a BaseCallbackHandler, or hook into the AgentExecutor internals. It wraps the underlying async function. That is the design — the same wrap composes with raw fetch, with the OpenAI SDK, with browser-use, with whatever framework lands next quarter.
Not a replacement for LangSmith. A trace viewer answers “what happened on the run that already finished?”. A runtime breaker answers “should the next call go out?”. The two are complementary — one for forensics, one for prevention. Run both.
Not a server. No outbound network, no telemetry, no cookies. The loop check is pure data flow inside your process. The same in-process discipline shows up in the embed-preview widget; the policy is one repo away in llms.txt.

The minimum LangChain integration

One npm i @runguard/sdk, one guard() call per tool whose loop you want to catch, one onTrip that pages the channel you actually read. Eight lines of wrap per tool, no callback handler to register, no executor fork. The breaker trips on the third repeat of any signature, halts the executor, and leaves a structured error and a trip event behind for the post-mortem you would have written on Sunday anyway. RunGuard ships it as @runguard/sdk on npm and runguard on PyPI — same primitive, both runtimes, in-process, zero deps.