How to detect LLM tool-call loops in production

Most production LLM agents loop the same way: a single tool call comes back with a transient error, the model retries it, gets the same error, retries again, and keeps going until something — a budget, a wall-clock, a tired engineer on Slack — stops the run. This page is the working approach we ship with, plus the TypeScript that runs the check.

What a loop actually looks like on the wire

Why retry counts and per-step timeouts miss it

The naive guards — a per-call retry counter, a per-step timeout, an outer wall-clock — are the right primitives for the wrong granularity. A single tool call retrying twice is fine; a model invoking that same tool eight times across eight separate “steps” (each at attempt 1) is the same loop, and the per-call counter never ticks. Per-step timeouts only fire if the call hangs; a fast 429 from the same upstream finishes in 80 ms and resets the timer cleanly each time. Outer wall-clocks fire eventually, but on agent runs that take minutes by design, the bill is already in four-figure territory by the time they trip.

The fingerprint-and-window approach

The detector, in TypeScript

// product/sdk/src/loop-detector.ts — the core primitive in @runguard/sdk.
// Maintain a sliding window; after each new signature, scan the tail for a
// cycle of length L (minCycleLen..maxCycleLen) repeated `repeats` times.
import { LoopDetector } from '@runguard/sdk';

const detector = new LoopDetector({ repeats: 3, maxCycleLen: 8 });

async function runStep(call: ToolCall) {
  const result = await tool.invoke(call);
  const sig = `${call.name}:${result.status}:${result.errorTitle ?? 'ok'}`;
  const match = detector.push(sig);
  if (match.detected) {
    throw new RunGuardTripped({
      reason: 'loop',
      pattern: match.pattern,
      cycleLength: match.cycleLength,
      repeats: match.repeats,
    });
  }
  return result;
}

Defaults: windowSize: 32, minCycleLen: 1, maxCycleLen: 8, repeats: 3. The check is in-process and resolves in well under a millisecond per guarded call — no network hop, no remote service. The only operation that takes ~40 ms is the trip itself.

Tuning the thresholds for your agent

The first loop our SDK caught was ours

Our launch flow fires a six-tweet thread via deploy/post-launch-thread.js. The first attempt came back with HTTP 402 CreditsDepleted against a shared upstream account. Six consecutive sessions, six identical signatures — post_tweet:402:CreditsDepleted — logged to a flat JSON file on disk. We wired the detector in on the seventh session. It rehydrated the six-row history, matched repeats: 3 × cycleLength 1 at entry three, exited with RunGuardTripped before a single HTTP request went out, and has held the breaker open every session since. Detail in the 30-day log; the dogfood dataset is the canonical anchor for the day-0 post.

What this is not

The minimum you need in production

If you’re writing this from scratch, the smallest defensible version is: a function that takes a tool name and arguments, returns a signature string, pushes it into a list, and trips when the last 3 × L entries are L-periodic for any L from 1 to 8. That’s ~40 lines. The reason it isn’t shipped by every agent team already is that writing it on a Sunday after the bill arrives is exhausting, and almost nobody does it twice. RunGuard ships it as @runguard/sdk on npm and runguard on PyPI — one line of install, the same primitive the dogfood story uses.