AI agent infinite loop in Python: how to detect and stop it

Every Python developer who has shipped an AI agent to production has eventually watched their monitoring dashboard as LLM API calls spike, the cost counter climbs, and the agent returns no result. The agent hit an infinite loop — or close enough to one that it exhausted its iteration limit having accomplished nothing. The specific failure mode varies by framework (LangChain, LangGraph, PydanticAI, AutoGen, Phidata), but the root causes are consistent across all of them: a tool returned an unhelpful result, the LLM generated a semantically valid follow-up call, and there was no runtime component to recognize the pattern as a loop and interrupt it. This page covers the four main infinite-loop patterns in Python AI agents, how to detect each one, and how RunGuard’s circuit breaker library adds detection + interruption with minimal code changes regardless of which Python agent framework you use.

Why Python AI agents loop: the four root causes

1. Empty or unhelpful tool results

The most common cause. Your tool returns [], "", None, or a dict like {"error": "not found"}. The LLM receives this as a valid tool response, interprets it as “I need to try again differently,” generates a new tool call with rephrased arguments, and gets the same unhelpful result. The loop continues until the framework’s iteration limit is hit. With default limits of 10–25 iterations and LLM calls at $0.01–$0.10 each, a single looping agent invocation costs $0.10–$2.50 before it terminates.

2. Silent exception masking

Many agent frameworks catch exceptions thrown by tools and convert them to error strings that get fed back to the LLM as a “tool result.” LangChain’s AgentExecutor does this; so does AutoGen’s function calling layer. A ConnectionError, TimeoutError, or even a KeyError in your tool code becomes a string like "Error: connection refused" in the agent’s message history. The LLM reads this as a retryable failure and generates another call. If the underlying service is down, every retry produces the same error string, and the loop runs to the iteration limit.

3. LLM prompt drift in long context windows

As the conversation history grows (each tool call appending input and output to the messages list), later LLM calls have larger contexts. At high context sizes, the model’s attention to the original instruction degrades and it may start generating tool calls that are less relevant to the original task — essentially exploring rather than solving. This produces a “semantic loop” where the tool calls are all different (so fingerprinting by exact args doesn’t catch them) but they’re all equally unhelpful. Detecting this requires context-size monitoring, not just call fingerprinting.

4. Multi-agent routing loops

In multi-agent architectures (AutoGen group chats, LangGraph multi-node graphs, CrewAI crews), a coordinator agent routes tasks to specialist agents. If a specialist fails and returns a fallback, the coordinator may route the same task to a different specialist or back to the same one. This creates a cross-agent loop that is invisible to single-agent loop detection: each individual agent executes its steps correctly, but the system as a whole cycles on the same unresolvable task.

Basic loop detection: fingerprinting tool calls

The most effective general approach is fingerprinting each tool call by its (function name, arguments, return value) tuple and raising an exception on the N-th repetition. RunGuard’s guard() wraps any Python callable and implements this check.

from runguard import guard, LoopDetectedError

# Works with any callable — LangChain tools, PydanticAI tools, raw functions
def my_search_tool(query: str) -> str:
    results = search_engine.search(query)
    return " | ".join(r.text for r in results) if results else ""

guarded_search = guard(
    my_search_tool,
    loop_window=10,     # evaluate the last 10 tool invocations
    loop_threshold=2,   # raise on 2nd identical fingerprint within window
)

# Raises LoopDetectedError if the same (query, result) pair appears twice
# in the last 10 calls to guarded_search
try:
    result = guarded_search("documentation for deprecated API")
except LoopDetectedError as e:
    print(f"Loop: {e.tool_name} got '{e.result}' for args {e.args} twice")

Budget cap: stopping cost accumulation

Loop detection prevents repetition. A budget cap prevents cost accumulation even when the calls are all different (the semantic loop case). BudgetTracker monitors cumulative LLM spend across a run and raises before the next call would exceed the limit.

from runguard import BudgetTracker, BudgetExceededError

# Track spend across all LLM calls in a run
budget = BudgetTracker(max_usd=0.50)

# Integrate with any LLM client via a callback / middleware
# Example: LangChain callback handler
from langchain.callbacks.base import BaseCallbackHandler

class BudgetCallbackHandler(BaseCallbackHandler):
    def __init__(self, tracker: BudgetTracker):
        self.tracker = tracker

    def on_llm_end(self, response, **kwargs):
        usage = response.llm_output.get("token_usage", {})
        self.tracker.record(
            model=response.llm_output.get("model_name", "unknown"),
            prompt_tokens=usage.get("prompt_tokens", 0),
            completion_tokens=usage.get("completion_tokens", 0),
        )
        self.tracker.check()  # raises BudgetExceededError if over limit

handler = BudgetCallbackHandler(budget)
# Pass to any LangChain LLM: ChatOpenAI(callbacks=[handler])

Context-size monitoring: stopping semantic loops

For the context-window-induced semantic loop, instrument the messages list at the framework entry point and raise before it exceeds your threshold:

from runguard import ContextGuard, ContextOverflowError

ctx_guard = ContextGuard(
    model="claude-sonnet-4-6",
    max_fraction=0.80,   # trip at 80% of model's context window
    count_method="tiktoken"
)

def safe_llm_call(messages: list[dict], model_client) -> str:
    try:
        ctx_guard.check(messages)  # raises ContextOverflowError if >80% full
    except ContextOverflowError as e:
        raise RuntimeError(
            f"Context window {e.fraction:.0%} full after {e.message_count} tool calls. "
            f"Agent likely in semantic loop — aborting."
        )
    return model_client.complete(messages)

Framework-specific loop controls vs RunGuard

FrameworkBuilt-in loop controlWhat RunGuard adds
LangChain AgentExecutormax_iterations (default 15)Fingerprint detection + dollar cap + context guard
LangGraphrecursion_limit on graphFingerprint detection across node transitions
AutoGenmax_consecutive_auto_replyCross-agent sequence detection + budget cap
PydanticAINo loop detectionFingerprint detection + budget cap + session guard
Phidata / Agnomax_steps (count only)Fingerprint detection + team routing guard
CrewAImax_iter per task (default 25)Fingerprint detection + inter-agent loop detection

Writing a minimal loop detector without a library

If you want to understand the mechanism before adding a dependency, here is the core fingerprinting algorithm that RunGuard implements under the hood:

from collections import deque
import hashlib, json

class SimpleLoopDetector:
    def __init__(self, window: int = 10, threshold: int = 2):
        self.window = deque(maxlen=window)
        self.threshold = threshold

    def _fingerprint(self, tool_name, args, result) -> str:
        payload = json.dumps(
            {"tool": tool_name, "args": args, "result": result},
            sort_keys=True, default=str
        )
        return hashlib.sha256(payload.encode()).hexdigest()[:16]

    def record(self, tool_name: str, args: dict, result) -> None:
        fp = self._fingerprint(tool_name, args, result)
        self.window.append(fp)
        count = list(self.window).count(fp)
        if count >= self.threshold:
            raise RuntimeError(
                f"Loop detected: {tool_name}({args}) returned same result {count}x"
            )

# Use it in any agent run loop
detector = SimpleLoopDetector(window=10, threshold=2)
# call detector.record(tool_name, args, result) after each tool call

This is the core mechanism. RunGuard adds: async compatibility, Slack/PagerDuty alerting, budget tracking, context-size monitoring, cross-tool sequence detection, structured error payloads for your monitoring system, and pre-built integrations for LangChain, LangGraph, AutoGen, PydanticAI, Phidata, and Haystack.

Stop your Python AI agent infinite loops today

RunGuard’s Python SDK installs with pip install runguard. It works with any Python agent framework: wrap your tool functions with guard(), add a BudgetTracker to your LLM client, and catch LoopDetectedError and BudgetExceededError in your run loop. Framework-specific integrations are in the SDK docs.

Get started with RunGuard — or see framework-specific guides for LangChain, PydanticAI, AutoGen, and CrewAI.