Google ADK agent loop prevention: adding a circuit breaker before your Gemini bill arrives

Google’s Agent Development Kit (ADK), released in 2025, is a Python framework for building multi-agent systems on top of Gemini. It handles agent orchestration, tool registration, and multi-agent routing with a clean declarative API. What it does not include is a runtime circuit breaker: if an ADK agent enters a tool-call loop, a handoff cycle between sub-agents, or accumulates tokens past Gemini’s context limit, ADK will keep calling the model until the task succeeds, the process crashes, or your billing quota runs out. This page shows how to intercept ADK’s Gemini calls, detect repeating patterns, enforce per-run dollar caps, and recover cleanly when the guard trips.

How ADK agents loop: the three failure modes

Tool-error loop. An ADK agent calls a tool, receives an error response, and decides to call the tool again with the same (or slightly modified) arguments to try to get a better result. The tool keeps failing for the same underlying reason — a missing permission, a network partition, a malformed query. The agent keeps retrying. ADK has no retry limit on tool calls within a single turn; the max_iterations parameter limits the number of agent turns (user–assistant round trips), not the number of tool calls within a turn. A tight loop of failed tool calls can exhaust a significant fraction of the model’s context window with repetitive error text.
Sub-agent routing cycle. ADK’s SequentialAgent and ParallelAgent composites route tasks to sub-agents. If the routing logic is underspecified — for example, a triage agent routes to a specialist that hands back to the triage agent when it encounters a task outside its scope — the system enters a routing cycle. Each hop is a full Gemini API call with the accumulated conversation state. At Gemini 1.5 Pro pricing, 50 hops between two agents in a cycle costs roughly $1–$5 depending on context size — cheap per incident, catastrophic at production traffic.
Token accumulation past the context limit. ADK’s default session handling appends every tool result to the agent’s conversation state. A research or data-processing agent that calls many tools over a long session can accumulate hundreds of thousands of tokens. Gemini 1.5 Pro supports up to 1M token context — this takes longer to hit than with smaller-context models but is still reachable by a sufficiently long-running agent. Gemini 2.0 Flash has a 1M context limit as well. When the limit is hit, ADK raises an exception from the model client rather than compacting the context automatically.

Adding a circuit breaker to ADK: the interception point

ADK’s model call surface. ADK routes all LLM calls through a BaseLlm subclass (by default Gemini from google.adk.models.lite_llm). You can substitute a custom BaseLlm implementation by passing it as the model parameter to any agent constructor. The custom implementation wraps the actual Gemini client and adds pre-call guard logic.

Python: custom ADK model wrapper with RunGuard.

from google.adk.agents import Agent
from google.adk.models.lite_llm import LiteLlm
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from runguard import guard_async, LoopDetectedError, BudgetExceededError, ContextOverflowError
from typing import Any
import litellm


class GuardedGeminiModel:
    """ADK-compatible model wrapper with RunGuard loop detection and budget cap."""

    def __init__(
        self,
        model_name: str = "gemini/gemini-2.0-flash",
        max_usd: float = 2.0,
        max_context_tokens: int = 900_000,
    ):
        self.model_name = model_name
        self._max_usd = max_usd
        self._max_context_tokens = max_context_tokens
        self._guarded = None

    def _build_guard(self):
        async def _call_gemini(messages, **kwargs):
            response = await litellm.acompletion(
                model=self.model_name,
                messages=messages,
                **kwargs,
            )
            usage = response.usage
            input_tokens = usage.prompt_tokens if usage else 0
            output_tokens = usage.completion_tokens if usage else 0

            # Gemini 2.0 Flash pricing (~$0.075/$0.30 per 1M tokens)
            usd = (input_tokens * 0.075 + output_tokens * 0.30) / 1_000_000

            # Extract function call name as signature, or "end_turn"
            choice = response.choices[0]
            sig = "end_turn"
            tool_calls = getattr(choice.message, "tool_calls", None) or []
            if tool_calls:
                sig = tool_calls[0].function.name

            return {"response": response, "usd": usd, "sig": sig}

        return guard_async(
            _call_gemini,
            budget={"max_usd": self._max_usd},
            loop={"repeats": 3, "max_cycle_len": 4},
            context={
                "maxContextTokens": self._max_context_tokens,
                "headroom": 100_000,  # reserve 100k for completion
            },
            tokens=lambda msgs, **_: sum(
                len(str(m.get("content", ""))) // 4 for m in msgs
            ),  # character heuristic; replace with Gemini countTokens for accuracy
        )

    async def generate_content_async(self, messages, **kwargs):
        if self._guarded is None:
            self._guarded = self._build_guard()
        result = await self._guarded(messages, **kwargs)
        return result["response"]


# --- Usage ---

async def run_adk_agent(user_input: str) -> str:
    guarded_model = GuardedGeminiModel(max_usd=1.50)

    # ADK standard agent wiring — model is injected at construction
    agent = Agent(
        name="research_agent",
        model=guarded_model,
        description="Research agent with RunGuard protection",
        instruction="You are a research assistant. Use the provided tools to answer questions.",
        tools=[search_tool, summarize_tool],
    )

    session_service = InMemorySessionService()
    session = await session_service.create_session(app_name="demo", user_id="u1")
    runner = Runner(agent=agent, app_name="demo", session_service=session_service)

    try:
        async for event in runner.run_async(
            user_id="u1",
            session_id=session.id,
            new_message=user_input,
        ):
            if event.is_final_response():
                return event.content.parts[0].text
        return ""
    except LoopDetectedError as e:
        return f"Loop detected (pattern: {e.pattern!r}). Agent aborted."
    except BudgetExceededError as e:
        return f"Budget cap reached (${e.spent:.3f}). Agent aborted."
    except ContextOverflowError as e:
        return f"Context limit approaching ({e.projected_tokens} tokens). Agent aborted."

Per-session guard scoping. The example above shares a single GuardedGeminiModel instance across all sessions. For production multi-user systems, instantiate a new model wrapper per user session so that one session’s loop history doesn’t affect another session’s budget or pattern window. The guard’s internal state (accumulated spend, call signatures) is instance-scoped.

ADK multi-agent routing: detecting handoff cycles

How ADK routes between agents. ADK’s SequentialAgent runs sub-agents in order; LlmAgent with sub_agents can route dynamically by choosing which sub-agent to invoke. The routing decision is itself an LLM call. A routing cycle occurs when the triage agent routes to Agent B, Agent B routes back to the triage agent (because the task is outside its scope), and the triage agent routes to Agent B again with no new information.
Detecting routing cycles via function-call signatures. Each sub-agent invocation is represented in the conversation history as a function call with the sub-agent’s name. The signature stream for a routing cycle looks like: research_agent → triage_agent → research_agent → triage_agent. RunGuard’s loop detector with max_cycle_len=4 and repeats=3 fires after the third repetition of any period-1 or period-2 cycle in this stream.
Fixing the root cause: add escape conditions to every sub-agent. Every sub-agent in an ADK network should have an explicit condition for terminating rather than routing. A specialist agent that cannot handle a task should output a structured response indicating failure with a reason code — not route back to the orchestrator with an empty answer. Add a termination tool (task_failed, out_of_scope) that the agent calls when it cannot proceed, and handle that structured failure in the orchestrator rather than re-routing.

ADK built-in limits vs. RunGuard

Control	ADK built-in	RunGuard
Max agent turns	max_iterations parameter	Loop detector fires before turn limit on repeating patterns
Per-run cost cap	Not supported	budget: max_usd — fires before each Gemini call
Tool-call loop detection	Not supported	loop: repeats=3 fires on 3rd repeat of same tool call
Sub-agent routing cycle detection	Not supported	loop: max_cycle_len=4 catches A↔B routing cycles
Context-window guard	Exception on overflow only	Pre-call ContextOverflowError before request is sent
Slack/PagerDuty alert on trip	Not supported	alerts: slack_webhook or pagerduty_key

Gemini token pricing and budget calibration

Gemini 2.0 Flash (default ADK model): $0.075/$0.30 per 1M input/output tokens (standard, ≤200k context). For an agent making 100 calls at 5k input tokens each, the input cost alone is 500k × $0.075 = $0.038 per run. A loop that runs 1,000 calls before the process dies costs $37.50 in input tokens alone. Set max_usd: 0.50 for development agents, max_usd: 5.0 for production agents where a single run has bounded scope.
Gemini 1.5 Pro (large-context tasks): $1.25/$5.00 per 1M input/output tokens (≤128k context). An agent using 50k tokens per call costs $0.0625 per call. At 100 calls in a runaway loop, that’s $6.25. Set max_usd: 2.0 as a conservative cap for most research tasks.
Calibrating max_usd from P95 run costs. Run your agent on a representative sample of production inputs, log the dollar cost per run, take the 95th percentile, and set max_usd to 2.5× that value. This gives a cap that never fires on legitimate long runs while catching runaway loops that cost 10× the P95. See autonomous agent cost control best practices for the full calibration methodology.