Loop detection for CrewAI multi-agent crews

CrewAI gives every Agent a max_iter and a max_rpm, and every Crew a max_execution_time. They are the right primitives for a single-agent demo. In a three-agent hierarchical crew running paid tools, they catch the loop after the bill, not before. This page is the runtime breaker we ship and how it slots into a CrewAI Tool in eight lines of Python.

Where loops actually happen inside a CrewAI crew

Why max_iter, max_rpm, and max_execution_time miss it

The three knobs CrewAI gives you are correct in shape and wrong in granularity. max_iter is the per-agent ceiling on reasoning iterations; the default is 25, the agent runs to that ceiling, and only then raises a stop. By the time round 25 fires inside one worker, you have made twenty-five manager LLM calls plus twenty-five worker LLM calls plus twenty-five tool calls — in a hierarchical process, the manager round-trips on every step. max_rpm throttles request rate per agent; a loop running at the cap is just a slower loop, with the same final bill. max_execution_time is wall-clock at the crew level; on multi-agent runs that legitimately take minutes, it never trips early enough. None of the three look at the content of the calls. A run that legitimately needs 12 distinct steps and a run that fires the same broken call 12 times look identical to the executor.

What a circuit breaker actually has to do

Wrapping a CrewAI Tool with runguard

# crewai + runguard. The Tool stays a Tool; only its underlying func gets
# wrapped. Crew sees the same interface, the breaker sees every call.
import json
import requests
from crewai.tools import BaseTool
from runguard import guard, LoopDetectedError, BudgetExceededError

def _http_get(payload):
    r = requests.get(payload["url"], timeout=10)
    return {"status": r.status_code, "body": r.text}

guarded_http = guard(
    _http_get,
    signature=lambda i: f"http_get:{i['url']}",
    loop={"repeats": 3, "max_cycle_len": 8},
    budget={"max_usd": 5},
    cost=lambda _i, o: 0 if o["status"] >= 400 else 0.001,
    on_trip=lambda e: print("[runguard]", e["reason"], e.get("signature")),
)

class HttpGet(BaseTool):
    name: str = "http_get"
    description: str = "Fetch a URL. Trips on third identical signature."

    def _run(self, url: str) -> str:
        try:
            return json.dumps(guarded_http({"url": url}))
        except (LoopDetectedError, BudgetExceededError):
            raise  # propagate up, halt the crew
        except Exception as e:
            return f"error: {e}"

Defaults match every other surface in the SDK: window_size: 32, min_cycle_len: 1, max_cycle_len: 8, repeats: 3 — snake_case in Python, camelCase in TypeScript, same numbers either way. The wrapped function is plain, non-CrewAI Python — so the same wrap composes with raw requests, with openai.chat.completions.create, with whatever framework you reach for next sprint. The fingerprint-and-window approach is documented at how to detect LLM tool-call loops in production; the TypeScript equivalent for LangChain is here.

How the breaker behaves inside Crew.kickoff()

Tuning for CrewAI’s loop shapes

CrewAI’s Agent defaults to max_iter: 25; in a hierarchical crew, the manager and worker each get their own ceiling, so the effective worst-case bill is roughly (manager_iter × worker_iter) LLM calls before anything halts. A breaker tuned to repeats: 3, max_cycle_len: 8 can catch a length-1 worker loop on iteration 3 and a length-2 manager-worker ping-pong on iteration 6 — both well inside any per-agent ceiling. For multi-agent crews where loops live at the crew level, share one guard instance across the agents that participate in the cycle so the detector sees every call in one window; per-agent guards will each see their own slice and miss a cross-agent ping-pong. If your tools genuinely retry idempotent reads, mark them as such by raising retryable errors that the detector skips, or split the tool into a per-attempt callable that the detector watches and an outer-retry wrapper that it does not. For high-cost runs — a research crew paying $0.50 per LLM step on the manager and worker combined — consider repeats: 2 on tools whose loop signatures are unique enough that a false-positive trip is cheap. The cost of a missed loop is the bill; the cost of a false-positive trip is one re-run.

Budget and context guards, on the same wrap

The first loop our SDK caught was ours

It wasn’t a CrewAI crew — it was our own launch script firing a six-tweet thread against a shared paid API. The first attempt came back with HTTP 402 CreditsDepleted. Six consecutive sessions later, six identical signatures — post_tweet:402:CreditsDepleted — were sitting in a flat JSON file on disk. The seventh session loaded the six-row history, pushed it into the detector at startup, and exited at signature three with a RunGuardTripped preflight before a single HTTP request went out. It has held the breaker open every session since. Read the dogfood story on the 30-day log; the same pattern slots into a CrewAI hierarchical process when the manager keeps re-delegating to the same worker on the same broken upstream.

What this is not

The minimum CrewAI integration

One pip install runguard, one guard() call per tool whose loop you want to catch, one on_trip that pages the channel you actually read. Eight lines of wrap per tool, no callback to register, no agent or crew subclass. The breaker trips on the third repeat of any signature, halts the crew, and leaves a structured event and a typed error behind for the post-mortem you would have written on Sunday anyway. RunGuard ships it as runguard on PyPI and @runguard/sdk on npm — same primitive, both runtimes, in-process, zero deps.