Google ADK agent loop prevention: adding a circuit breaker before your Gemini bill arrives
Google’s Agent Development Kit (ADK), released in 2025, is a Python framework for building multi-agent systems on top of Gemini. It handles agent orchestration, tool registration, and multi-agent routing with a clean declarative API. What it does not include is a runtime circuit breaker: if an ADK agent enters a tool-call loop, a handoff cycle between sub-agents, or accumulates tokens past Gemini’s context limit, ADK will keep calling the model until the task succeeds, the process crashes, or your billing quota runs out. This page shows how to intercept ADK’s Gemini calls, detect repeating patterns, enforce per-run dollar caps, and recover cleanly when the guard trips.
How ADK agents loop: the three failure modes
- Tool-error loop. An ADK agent calls a tool, receives an error response, and decides to call the tool again with the same (or slightly modified) arguments to try to get a better result. The tool keeps failing for the same underlying reason — a missing permission, a network partition, a malformed query. The agent keeps retrying. ADK has no retry limit on tool calls within a single turn; the
max_iterationsparameter limits the number of agent turns (user–assistant round trips), not the number of tool calls within a turn. A tight loop of failed tool calls can exhaust a significant fraction of the model’s context window with repetitive error text. - Sub-agent routing cycle. ADK’s
SequentialAgentandParallelAgentcomposites route tasks to sub-agents. If the routing logic is underspecified — for example, a triage agent routes to a specialist that hands back to the triage agent when it encounters a task outside its scope — the system enters a routing cycle. Each hop is a full Gemini API call with the accumulated conversation state. At Gemini 1.5 Pro pricing, 50 hops between two agents in a cycle costs roughly $1–$5 depending on context size — cheap per incident, catastrophic at production traffic. - Token accumulation past the context limit. ADK’s default session handling appends every tool result to the agent’s conversation state. A research or data-processing agent that calls many tools over a long session can accumulate hundreds of thousands of tokens. Gemini 1.5 Pro supports up to 1M token context — this takes longer to hit than with smaller-context models but is still reachable by a sufficiently long-running agent. Gemini 2.0 Flash has a 1M context limit as well. When the limit is hit, ADK raises an exception from the model client rather than compacting the context automatically.
Adding a circuit breaker to ADK: the interception point
- ADK’s model call surface. ADK routes all LLM calls through a
BaseLlmsubclass (by defaultGeminifromgoogle.adk.models.lite_llm). You can substitute a customBaseLlmimplementation by passing it as themodelparameter to any agent constructor. The custom implementation wraps the actual Gemini client and adds pre-call guard logic. - Python: custom ADK model wrapper with RunGuard.
from google.adk.agents import Agent from google.adk.models.lite_llm import LiteLlm from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from runguard import guard_async, LoopDetectedError, BudgetExceededError, ContextOverflowError from typing import Any import litellm class GuardedGeminiModel: """ADK-compatible model wrapper with RunGuard loop detection and budget cap.""" def __init__( self, model_name: str = "gemini/gemini-2.0-flash", max_usd: float = 2.0, max_context_tokens: int = 900_000, ): self.model_name = model_name self._max_usd = max_usd self._max_context_tokens = max_context_tokens self._guarded = None def _build_guard(self): async def _call_gemini(messages, **kwargs): response = await litellm.acompletion( model=self.model_name, messages=messages, **kwargs, ) usage = response.usage input_tokens = usage.prompt_tokens if usage else 0 output_tokens = usage.completion_tokens if usage else 0 # Gemini 2.0 Flash pricing (~$0.075/$0.30 per 1M tokens) usd = (input_tokens * 0.075 + output_tokens * 0.30) / 1_000_000 # Extract function call name as signature, or "end_turn" choice = response.choices[0] sig = "end_turn" tool_calls = getattr(choice.message, "tool_calls", None) or [] if tool_calls: sig = tool_calls[0].function.name return {"response": response, "usd": usd, "sig": sig} return guard_async( _call_gemini, budget={"max_usd": self._max_usd}, loop={"repeats": 3, "max_cycle_len": 4}, context={ "maxContextTokens": self._max_context_tokens, "headroom": 100_000, # reserve 100k for completion }, tokens=lambda msgs, **_: sum( len(str(m.get("content", ""))) // 4 for m in msgs ), # character heuristic; replace with Gemini countTokens for accuracy ) async def generate_content_async(self, messages, **kwargs): if self._guarded is None: self._guarded = self._build_guard() result = await self._guarded(messages, **kwargs) return result["response"] # --- Usage --- async def run_adk_agent(user_input: str) -> str: guarded_model = GuardedGeminiModel(max_usd=1.50) # ADK standard agent wiring — model is injected at construction agent = Agent( name="research_agent", model=guarded_model, description="Research agent with RunGuard protection", instruction="You are a research assistant. Use the provided tools to answer questions.", tools=[search_tool, summarize_tool], ) session_service = InMemorySessionService() session = await session_service.create_session(app_name="demo", user_id="u1") runner = Runner(agent=agent, app_name="demo", session_service=session_service) try: async for event in runner.run_async( user_id="u1", session_id=session.id, new_message=user_input, ): if event.is_final_response(): return event.content.parts[0].text return "" except LoopDetectedError as e: return f"Loop detected (pattern: {e.pattern!r}). Agent aborted." except BudgetExceededError as e: return f"Budget cap reached (${e.spent:.3f}). Agent aborted." except ContextOverflowError as e: return f"Context limit approaching ({e.projected_tokens} tokens). Agent aborted." - Per-session guard scoping. The example above shares a single
GuardedGeminiModelinstance across all sessions. For production multi-user systems, instantiate a new model wrapper per user session so that one session’s loop history doesn’t affect another session’s budget or pattern window. The guard’s internal state (accumulated spend, call signatures) is instance-scoped.
ADK multi-agent routing: detecting handoff cycles
- How ADK routes between agents. ADK’s
SequentialAgentruns sub-agents in order;LlmAgentwithsub_agentscan route dynamically by choosing which sub-agent to invoke. The routing decision is itself an LLM call. A routing cycle occurs when the triage agent routes to Agent B, Agent B routes back to the triage agent (because the task is outside its scope), and the triage agent routes to Agent B again with no new information. - Detecting routing cycles via function-call signatures. Each sub-agent invocation is represented in the conversation history as a function call with the sub-agent’s name. The signature stream for a routing cycle looks like:
research_agent→triage_agent→research_agent→triage_agent. RunGuard’s loop detector withmax_cycle_len=4andrepeats=3fires after the third repetition of any period-1 or period-2 cycle in this stream. - Fixing the root cause: add escape conditions to every sub-agent. Every sub-agent in an ADK network should have an explicit condition for terminating rather than routing. A specialist agent that cannot handle a task should output a structured response indicating failure with a reason code — not route back to the orchestrator with an empty answer. Add a termination tool (
task_failed,out_of_scope) that the agent calls when it cannot proceed, and handle that structured failure in the orchestrator rather than re-routing.
ADK built-in limits vs. RunGuard
| Control | ADK built-in | RunGuard |
|---|---|---|
| Max agent turns | max_iterations parameter | Loop detector fires before turn limit on repeating patterns |
| Per-run cost cap | Not supported | budget: max_usd — fires before each Gemini call |
| Tool-call loop detection | Not supported | loop: repeats=3 fires on 3rd repeat of same tool call |
| Sub-agent routing cycle detection | Not supported | loop: max_cycle_len=4 catches A↔B routing cycles |
| Context-window guard | Exception on overflow only | Pre-call ContextOverflowError before request is sent |
| Slack/PagerDuty alert on trip | Not supported | alerts: slack_webhook or pagerduty_key |
Gemini token pricing and budget calibration
- Gemini 2.0 Flash (default ADK model): $0.075/$0.30 per 1M input/output tokens (standard, ≤200k context). For an agent making 100 calls at 5k input tokens each, the input cost alone is 500k × $0.075 = $0.038 per run. A loop that runs 1,000 calls before the process dies costs $37.50 in input tokens alone. Set
max_usd: 0.50for development agents,max_usd: 5.0for production agents where a single run has bounded scope. - Gemini 1.5 Pro (large-context tasks): $1.25/$5.00 per 1M input/output tokens (≤128k context). An agent using 50k tokens per call costs $0.0625 per call. At 100 calls in a runaway loop, that’s $6.25. Set
max_usd: 2.0as a conservative cap for most research tasks. - Calibrating max_usd from P95 run costs. Run your agent on a representative sample of production inputs, log the dollar cost per run, take the 95th percentile, and set
max_usdto 2.5× that value. This gives a cap that never fires on legitimate long runs while catching runaway loops that cost 10× the P95. See autonomous agent cost control best practices for the full calibration methodology.