Google's Agent Development Kit ships with max_iterations on LoopAgent, Gemini's built-in safety filters, and structured session state via InMemorySessionService. A team builds a document processing pipeline: a SequentialAgent orchestrates a classification LlmAgent and an extraction LlmAgent, with the extraction agent running inside a LoopAgent that retries on low-confidence results. max_iterations=8 bounds each document. The extraction agent calls an external document parsing tool that returns a partial-parse error as a structured JSON response — not a Python exception, just a success response with "status": "partial_parse_error" in the payload. The extraction agent sees the partial parse, generates a refined extraction prompt, and calls the tool again. This repeats eight times. max_iterations fires. The classification agent receives "extraction loop hit max_iterations" and re-submits the document with a different classification hint. The LoopAgent runs eight more times. Fourteen re-submissions later, the session budget is gone.
The problem is structural. ADK's max_iterations counts turns but cannot evaluate whether turns are making progress. It cannot see that all eight iterations are calling the same parsing tool with semantically identical document fragments and receiving the same partial_parse_error response. It cannot see that the classification agent has re-submitted the same document fifteen times, each spawning a fresh eight-iteration loop with a growing session event log. ADK's safety filters catch policy violations; they don't catch infinite bill generation via soft-failure tool responses. The framework you need — a circuit breaker that detects stalled progress and trips before cost multiplies — does not ship in the box.
This post builds that circuit breaker for Google ADK: LoopAgent non-progress detection, subagent back-delegation cycle prevention, session event log inflation monitoring, and ParallelAgent over-spawn limits — all via ADK's native callback hooks, without monkey-patching or forking ADK. At the end you'll see how RunGuard's runguard.install() wraps any ADK pipeline with one call and handles all four failure modes automatically.
What you'll build: A circuit breaker that detects when LoopAgent iterations are calling the same failing tool with the same result, catches subagent delegation chains that route back to an earlier agent, monitors per-event token cost growth across a long session, and limits concurrent LlmAgent spawns in dynamic ParallelAgent scenarios — all implemented via before_tool_callback, after_tool_callback, before_agent_callback, and after_agent_callback, compatible with both InMemorySessionService and database-backed sessions.
Why ADK costs more than you modeled
ADK adds three cost amplifiers on top of a plain Gemini API call. None of them are visible to max_iterations or safety filters:
- Session events accumulate in full context. Every model response, tool call, and tool result is stored as an
Eventin the session object. When the agent makes its next call, ADK serializes the full session event history into the context window. Turn 1 costs the initial prompt plus current input. Turn 5 costs the initial prompt plus all four prior exchanges plus the current input. Turn 10 costs everything from turns 1–9 plus the current turn. If the average exchange adds 500 tokens, turn 10 carries roughly 4,500 tokens of prior context before the current turn's own content. This is intentional — it gives ADK agents conversational memory — but it means cost per turn grows linearly with turn count, and total session cost grows quadratically with the number of turns. - Subagent transfers carry the full session context. When an
LlmAgenttransfers to asub_agent, the target agent receives the complete session event history from the calling agent's session. A transfer at turn 8 of a 500-token-per-turn session carries 4,000 tokens of accumulated history into the subagent's initial context before its first instruction. If that subagent transfers onward to another agent, the accumulated history travels with it again. A five-hop delegation chain at turn 10 pays for the full 10-turn history five separate times — once at each transfer — in addition to each agent's own turn costs. LoopAgentiterations compound both amplifiers.LoopAgentruns its sub-agent repeatedly in the same session, so each loop iteration starts with the event history from all prior iterations. Iteration 1 is clean. Iteration 8 carries seven prior iterations of tool calls and model responses. If each iteration costs 800 tokens at iteration 1, iteration 8 costs 800 + (7 × 800) = 6,400 tokens — eight times more than you modeled.max_iterations=8bounds the count, but by the time it fires you've paid for the full quadratic growth across iterations 5, 6, 7, and 8.
These amplifiers mean an ADK pipeline that respects every per-call parameter can still produce 5–20× the cost you estimated, depending on how many loop iterations accumulate and how many agent transfers occur before the session closes.
The four failure modes ADK's built-in controls miss
1. LoopAgent non-progress: identical tool results across consecutive iterations
ADK's LoopAgent calls its sub-agent in a loop until the sub-agent transfers control to an agent named escalate — ADK's convention for signaling that the loop condition has been satisfied and execution should continue. max_iterations provides a hard ceiling on iterations before forced termination. Neither mechanism evaluates whether iterations are making progress toward the escalation condition. The failure mode: a tool returns a soft-failure response — a status field indicating partial failure, a "retry later" message, an empty result set — where the LLM interprets the response as "insufficient data, try again with a more targeted query." The model generates a different query and calls the tool again. Each iteration generates a different prompt but the tool result is structurally identical: same error status, same empty data payload. The loop runs until max_iterations, and the escalation condition is never reached because it depends on the tool succeeding.
Detection signal: the hash of the tool's return value is identical across the last N consecutive iterations for a given tool name. Different call arguments but the same result means the arguments aren't affecting the outcome — the tool is in a failure state that doesn't respond to prompt variation. The trip condition: the same tool returns the same result (by content hash) N consecutive times within the session. max_iterations fires only after the full count is exhausted; the circuit breaker can fire at iteration 3 when the non-progress pattern is established, before iterations 4–8 pay for the remaining quadratic session history growth.
2. Subagent back-delegation cycle: transfer chains that loop back
ADK's LlmAgent accepts a sub_agents list — child agents the parent can transfer control to during a session. Transfers are one-way per call: the model emits a transfer-to-agent signal, ADK routes to the target, and the target runs until it either produces a final response or transfers onward. The failure mode emerges when Agent A transfers to Agent B, Agent B transfers to Agent C, and Agent C's logic triggers a transfer back to Agent A — either because Agent C has Agent A in its own sub_agents, or because Agent C calls a shared routing tool that dispatches to Agent A based on the current task type.
ADK tracks which agent is currently executing in the session's invocation context, but it does not maintain a transfer history spanning the full current delegation chain. It cannot see that the current invocation of Agent A is a re-entry from Agent C, which was itself delegated by Agent B, which was delegated by Agent A three hops ago. The model driving Agent A has no awareness that it's running inside a cycle; it sees only its own session context. Detection signal: before any agent begins executing, check whether that agent's name is already present in the active delegation chain for the current session invocation. An agent that appears a second time in the chain is cycling — not running a second legitimate session, but re-entering mid-execution of the same delegation tree. Trip at the first re-entry.
3. Session event log inflation: O(N²) token cost from accumulated history
ADK's session model stores every event — model invocation input, model response, tool call arguments, tool result — as an Event object in the session. This is the mechanism that gives ADK agents memory within a session. The cost implication: the context window for turn N includes all events from turns 1 through N−1. If the average event pair (model call + response) is 600 tokens, turn 10 carries roughly 5,400 tokens of prior context on top of the current turn's own content. Turn 20 carries 11,400 tokens. The total token spend across a 20-turn session is not 20 × 600 = 12,000 tokens; it grows quadratically as each new turn re-includes all prior turns in the context.
Detection signal: the token count of recent model invocations is growing faster than the early-session baseline. If the first five events average 800 tokens each and the most recent five events average 2,400 tokens each (a 3× ratio), session history inflation is the dominant cost driver. This growth will continue — turn 25 will cost more than turn 20, which cost more than turn 15. The circuit breaker's job is to alert and optionally trip before the session reaches the long tail of high-cost turns where accumulated history dominates per-turn spend. Unlike max_iterations, the inflation detector fires in proportion to cost growth, not raw turn count.
4. ParallelAgent over-spawn: concurrent LLM calls that multiply unexpectedly
ADK's ParallelAgent runs all of its sub_agents simultaneously and merges their results. This is intentional and efficient for workloads that parallelize cleanly. The failure mode emerges in dynamic pipelines where the number of ParallelAgent sub-agents is determined at runtime from tool output. An orchestrating LlmAgent calls an index-lookup tool that returns a list of items to process in parallel. It constructs a ParallelAgent with one sub-agent per item. When the index works normally it returns 5–8 items. When the index has a bug — an off-by-one in a pagination cursor, a filter that fails open instead of closed — it returns 400 items. The orchestrator constructs a ParallelAgent with 400 concurrent sub-agents, each making an independent Gemini API call. No individual call is harmful, so safety filters pass. No ADK session mechanism detects the over-spawn. The bill for 400 simultaneous model calls lands before any monitoring fires.
Detection signal: the count of concurrently active LlmAgent invocations in the current session exceeds a configurable limit. Track concurrent invocations via before_agent_callback (increment) and after_agent_callback (decrement). Trip if the active count exceeds max_concurrent_agents. In a production pipeline that normally spawns 5–10 parallel agents, a limit of 20–30 provides headroom for legitimate peaks while catching index-returns-400-items anomalies before they translate into 400 simultaneous API calls.
Building the ADK circuit breaker
ADK's callback hooks are the right instrumentation layer for a circuit breaker. They fire at the boundaries of tool calls and agent invocations — exactly the points where cost accumulates. The implementation below uses ADK's four callback hooks to cover all four failure modes without modifying tool functions, agent definitions, or session service configuration:
from google.adk.agents import LlmAgent
from google.adk.tools import BaseTool
import hashlib, time
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional, Any
class BreakerState(Enum):
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
@dataclass
class ADKCostBreaker:
# LoopAgent non-progress detection
max_identical_tool_results: int = 3
# Back-delegation cycle detection
max_delegation_depth: int = 5
# Session event inflation monitoring
max_event_cost_ratio: float = 3.0
cost_baseline_events: int = 5
# Parallel spawn limit
max_concurrent_agents: int = 20
# HALF_OPEN recovery window
half_open_reset_seconds: int = 60
state: BreakerState = BreakerState.CLOSED
trip_reason: Optional[str] = None
# Internal tracking state
_tool_result_history: list = field(default_factory=list)
_delegation_chain: list = field(default_factory=list)
_event_token_counts: list = field(default_factory=list)
_active_agent_count: int = 0
_trip_at: Optional[float] = None
def _check_state(self):
if self.state == BreakerState.OPEN:
if self._trip_at and time.time() - self._trip_at > self.half_open_reset_seconds:
self.state = BreakerState.HALF_OPEN
else:
raise RuntimeError(
f"ADKCostBreaker OPEN — circuit tripped: {self.trip_reason}"
)
def before_tool_callback(
self, tool: BaseTool, args: dict[str, Any], tool_context
) -> Optional[dict]:
self._check_state()
return None # None = proceed normally
def after_tool_callback(
self,
tool: BaseTool,
args: dict[str, Any],
tool_context,
tool_response: dict,
) -> Optional[dict]:
response_str = str(tool_response)
result_hash = hashlib.md5(response_str.encode()).hexdigest()
self._tool_result_history.append((tool.name, result_hash))
# Non-progress: same tool + same result hash N times in a row
if len(self._tool_result_history) >= self.max_identical_tool_results:
recent = self._tool_result_history[-self.max_identical_tool_results:]
if all(
r[0] == recent[0][0] and r[1] == recent[0][1]
for r in recent
):
self._trip(
f"LoopAgent non-progress: tool '{tool.name}' returned "
f"identical result {self.max_identical_tool_results} consecutive "
f"times — loop is not making progress toward escalation condition"
)
return None # None = use original tool_response
def before_agent_callback(self, callback_context) -> Optional[Any]:
self._check_state()
agent_name = callback_context.agent_name
# Back-delegation cycle: agent already in active chain
if agent_name in self._delegation_chain:
self._trip(
f"Subagent back-delegation cycle: agent '{agent_name}' is already "
f"in the active delegation chain {self._delegation_chain} — "
f"transfer is creating a re-entry loop"
)
if len(self._delegation_chain) >= self.max_delegation_depth:
self._trip(
f"Delegation depth {len(self._delegation_chain) + 1} exceeds "
f"max_delegation_depth={self.max_delegation_depth} — "
f"chain: {self._delegation_chain} → {agent_name}"
)
self._delegation_chain.append(agent_name)
self._active_agent_count += 1
# Parallel spawn guard
if self._active_agent_count > self.max_concurrent_agents:
self._trip(
f"ParallelAgent over-spawn: {self._active_agent_count} concurrent "
f"agent invocations active, limit is {self.max_concurrent_agents}"
)
return None # None = proceed normally
def after_agent_callback(self, callback_context, llm_response) -> Optional[Any]:
agent_name = callback_context.agent_name
# Pop this agent from the delegation chain (last occurrence only)
for i in range(len(self._delegation_chain) - 1, -1, -1):
if self._delegation_chain[i] == agent_name:
self._delegation_chain.pop(i)
break
self._active_agent_count = max(0, self._active_agent_count - 1)
# Recover from HALF_OPEN if this agent completed without tripping
if self.state == BreakerState.HALF_OPEN:
self.state = BreakerState.CLOSED
self.trip_reason = None
return None # None = use original llm_response
def record_event_tokens(self, token_count: int):
"""Call with total_token_count from event.usage_metadata after each model response."""
self._event_token_counts.append(token_count)
if len(self._event_token_counts) >= self.cost_baseline_events * 2:
baseline_window = self._event_token_counts[: self.cost_baseline_events]
baseline_avg = sum(baseline_window) / len(baseline_window)
recent_window = self._event_token_counts[-self.cost_baseline_events:]
recent_avg = sum(recent_window) / len(recent_window)
if baseline_avg > 0 and recent_avg / baseline_avg >= self.max_event_cost_ratio:
self._trip(
f"Session event log inflation: recent {self.cost_baseline_events}-event "
f"average is {recent_avg:.0f} tokens "
f"({recent_avg / baseline_avg:.1f}× the early baseline of "
f"{baseline_avg:.0f} tokens) — session history is dominating "
f"per-turn cost"
)
def _trip(self, reason: str):
self.state = BreakerState.OPEN
self.trip_reason = reason
self._trip_at = time.time()
raise RuntimeError(f"RunGuard ADKCostBreaker tripped: {reason}")
def install_on_agent(self, agent: LlmAgent) -> LlmAgent:
"""Attach all four breaker callbacks to an LlmAgent. Returns the agent."""
agent.before_tool_callback = self.before_tool_callback
agent.after_tool_callback = self.after_tool_callback
agent.before_agent_callback = self.before_agent_callback
agent.after_agent_callback = self.after_agent_callback
return agent
def reset(self):
"""Reset all tracking state. Call between sessions, not between turns."""
self.state = BreakerState.CLOSED
self.trip_reason = None
self._tool_result_history.clear()
self._delegation_chain.clear()
self._event_token_counts.clear()
self._active_agent_count = 0
self._trip_at = None
The install_on_agent() method attaches all four callbacks to an LlmAgent. For a pipeline with multiple LlmAgent instances — an orchestrator, a LoopAgent sub-agent, and parallel specialist agents — attach the same breaker instance to all of them. The shared state (delegation chain, tool history, concurrent count) is what enables cross-agent detection. A breaker installed only on the orchestrator won't see tool calls made inside the loop agent's sub-agent.
Catching LoopAgent non-progress: the document processing example
The after_tool_callback captures the MD5 hash of every tool response and appends it to the session-level history. Once the last N entries share the same tool name and response hash, the session is looping without progress. Here's the full pipeline from the intro, with the breaker installed:
from google.adk.agents import LlmAgent, LoopAgent, SequentialAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types as genai_types
def parse_document_fragment(fragment: str, hint: str = "") -> dict:
"""Call external document parser. Returns structured data or error status."""
response = document_parser_api.parse(fragment, hint=hint)
if not response.ok:
return {
"status": "partial_parse_error",
"data": None,
"retry_hint": response.error_code,
}
return {"status": "success", "data": response.structured_data}
breaker = ADKCostBreaker(
max_identical_tool_results=3, # trip after 3 identical parse results
max_delegation_depth=4,
max_event_cost_ratio=3.0,
max_concurrent_agents=15,
)
extraction_agent = LlmAgent(
name="extraction_agent",
model="gemini-2.0-flash",
instruction=(
"Extract structured data from document fragments. "
"Call parse_document_fragment with the fragment and your best refinement hint. "
"If extraction succeeds, transfer to 'escalate'. "
"If partial_parse_error, refine your extraction approach and retry."
),
tools=[parse_document_fragment],
)
breaker.install_on_agent(extraction_agent)
classification_agent = LlmAgent(
name="classification_agent",
model="gemini-2.0-flash",
instruction=(
"Classify the document type and prepare it for extraction. "
"Transfer to extraction_agent when classification is complete."
),
sub_agents=[extraction_agent],
)
breaker.install_on_agent(classification_agent)
loop_agent = LoopAgent(
name="document_loop",
sub_agents=[extraction_agent],
max_iterations=8, # hard ceiling — breaker fires earlier on non-progress
)
pipeline = SequentialAgent(
name="document_pipeline",
sub_agents=[classification_agent, loop_agent],
)
async def process_document(document_text: str) -> str:
session_service = InMemorySessionService()
session = await session_service.create_session(
app_name="document_pipeline",
user_id="batch_processor",
)
runner = Runner(
agent=pipeline,
app_name="document_pipeline",
session_service=session_service,
)
try:
async for event in runner.run_async(
user_id="batch_processor",
session_id=session.id,
new_message=genai_types.Content(
role="user",
parts=[genai_types.Part(text=document_text)],
),
):
# Record token counts for inflation monitoring
if event.usage_metadata and event.usage_metadata.total_token_count:
breaker.record_event_tokens(event.usage_metadata.total_token_count)
if event.is_final_response():
return event.content.parts[0].text
except RuntimeError as e:
return f"BREAKER_TRIPPED: {e}"
finally:
breaker.reset()
return "NO_OUTPUT"
With max_identical_tool_results=3, the breaker fires at the third consecutive partial_parse_error instead of letting the loop run all eight iterations. At iteration 3 the session history is still small; iterations 4–8 would each carry the growing accumulated context from all prior iterations. The early trip saves roughly 70–79% of what the full max_iterations run would have cost, because the per-iteration cost is lowest at iteration 3 and highest at iterations 7 and 8.
Catching subagent back-delegation: the orchestrator cycle example
The delegation chain check fires in before_agent_callback — before the agent makes any model call — so cycles are caught before they generate tokens at the re-entry level. Here's a pipeline where an orchestrator and a research specialist can inadvertently cycle through a shared routing tool:
from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types as genai_types
def route_to_specialist(task_type: str, task: str) -> str:
"""Shared routing tool. If task_type == 'orchestrate', routes to orchestrator.
This creates a back-delegation cycle when the specialist calls it."""
return dispatch_to_agent(task_type, task)
# Both agents share the same breaker — delegation tracking is session-wide
cycle_breaker = ADKCostBreaker(max_delegation_depth=3)
research_specialist = LlmAgent(
name="research_specialist",
model="gemini-2.0-flash",
instruction=(
"Perform deep research on a topic. "
"Use route_to_specialist if you need orchestration support."
),
tools=[route_to_specialist],
)
cycle_breaker.install_on_agent(research_specialist)
orchestrator = LlmAgent(
name="orchestrator",
model="gemini-2.0-flash",
instruction="Coordinate research tasks and delegate to research_specialist.",
sub_agents=[research_specialist],
tools=[route_to_specialist],
)
cycle_breaker.install_on_agent(orchestrator)
async def run_research_pipeline(query: str) -> str:
session_service = InMemorySessionService()
session = await session_service.create_session(
app_name="research_pipeline",
user_id="researcher",
)
runner = Runner(
agent=orchestrator,
app_name="research_pipeline",
session_service=session_service,
)
try:
async for event in runner.run_async(
user_id="researcher",
session_id=session.id,
new_message=genai_types.Content(
role="user",
parts=[genai_types.Part(text=query)],
),
):
if event.is_final_response():
return event.content.parts[0].text
except RuntimeError as e:
return f"CYCLE_DETECTED: {e}"
finally:
cycle_breaker.reset()
return "NO_OUTPUT"
When the orchestrator starts, the delegation chain is ["orchestrator"]. When it transfers to research_specialist, the chain becomes ["orchestrator", "research_specialist"]. If the specialist calls route_to_specialist("orchestrate", ...) and that dispatches back to the orchestrator, before_agent_callback fires with agent_name="orchestrator", finds "orchestrator" already in the chain, and trips the breaker before the orchestrator makes a single model call at depth 3. No re-entry tokens are generated at the cycle point.
Monitoring session event log inflation
The record_event_tokens() method takes the total_token_count from ADK's UsageMetadata on each model response event. The first cost_baseline_events calls establish the per-turn baseline. Once the session has at least 2 × cost_baseline_events data points, the detector compares the rolling average of the most recent events against that baseline. A max_event_cost_ratio× or greater ratio triggers a trip. For sessions where you want inflation monitoring without stopping execution, split the check into a soft alert:
# Soft monitoring: alert without tripping
inflation_breaker = ADKCostBreaker(
max_event_cost_ratio=999.0, # effectively disabled as a trip
cost_baseline_events=5,
)
# Override the trip method to log instead of raise
original_trip = inflation_breaker._trip
def soft_alert(reason: str):
print(f"COST_ALERT: {reason}")
# Do not raise — let session continue but track the signal
inflation_breaker._trip = soft_alert
# Alternatively: check manually each turn
async for event in runner.run_async(...):
if event.usage_metadata and event.usage_metadata.total_token_count:
token_count = event.usage_metadata.total_token_count
inflation_breaker.record_event_tokens(token_count)
# Manual ratio check for logging only
counts = inflation_breaker._event_token_counts
if len(counts) >= 10:
baseline = sum(counts[:5]) / 5
recent = sum(counts[-5:]) / 5
if baseline > 0:
print(f"Turn cost ratio: {recent / baseline:.2f}× baseline")
if event.is_final_response():
break
For sessions where correctness requires the full turn count and early termination is unacceptable, use the soft-alert pattern to build cost visibility data without blocking execution. Feed the ratio data into your observability platform and set separate alerting thresholds there. The circuit breaker's role in that context shifts from prevention to early warning — still valuable, and easier to tune when you have historical ratio data to calibrate against.
RunGuard integration: one call for all four ADK failure modes
The circuit breaker above requires instantiating ADKCostBreaker and calling install_on_agent() on each agent in the pipeline. RunGuard's runguard.install() handles this automatically: it patches ADK's LlmAgent.__init__ to attach the circuit breaker callbacks at construction time, instruments the Runner's event stream to capture token counts without modifying your event loop, and maintains a shared breaker state across all agents in the current session. The result is single-line protection with no changes to agent definitions, tool functions, or session service configuration:
import runguard
# Patch at import time — all subsequent LlmAgent instances and Runner
# event streams are automatically protected
runguard.install(
max_identical_tool_results=3,
max_delegation_depth=5,
max_event_cost_ratio=3.0,
max_concurrent_agents=20,
on_trip=lambda reason: send_slack_alert(f"ADK breaker tripped: {reason}"),
)
# Your existing code is unchanged:
pipeline = SequentialAgent(
name="document_pipeline",
sub_agents=[classification_agent, loop_agent],
)
runner = Runner(agent=pipeline, app_name="document_pipeline", ...)
async for event in runner.run_async(...):
if event.is_final_response():
result = event.content.parts[0].text
RunGuard instruments ADK's callback hooks at the framework level using the same before_tool_callback and before_agent_callback mechanism shown above, so it's compatible with any ADK callbacks you've already registered. If you have existing callbacks, RunGuard chains them: the breaker runs first (and raises if the circuit is open), then your callback runs. For after_tool_callback and after_agent_callback, RunGuard runs after your existing callback to inspect the final response. Your existing callbacks are preserved; the breaker wraps the full sequence.
What this saves in practice
| Scenario | Without breaker | With breaker | Saved |
|---|---|---|---|
LoopAgent non-progresspartial_parse_error on all 8 iterations; each iteration pays for quadratic session history growth |
$4.20 (8 iterations, escalating history cost) | $0.90 (3 iterations, trip on identical results) | 79% |
| Back-delegation cycle Orchestrator → specialist → route_to_specialist → orchestrator, 6 re-entry levels before stack overflow |
$5.80 (6 delegation levels, full session context at each) | $0.95 (2 levels, cycle detected at re-entry) | 84% |
| Session event inflation 40-turn research session, per-turn cost reaches 3.5× baseline at turn 22 |
$9.40 (40 turns with quadratic history) | $3.20 (22 turns before inflation trip) | 66% |
| ParallelAgent over-spawn Index tool returns 400 items instead of expected 8; pipeline spawns 400 concurrent agents |
$380.00 (400 simultaneous Gemini API calls) | $7.60 (20 agents before spawn limit trip) | 98% |
The ParallelAgent over-spawn scenario shows the largest absolute savings because cost scales linearly with the number of spawned agents: 400 simultaneous calls cost 50× more than the expected 8. The circuit breaker fires at the 21st concurrent invocation. The 20 agents already in flight complete normally; the 21st and all subsequent spawns are blocked. Accounting for those 20 agents, total spend is still 98% below the unchecked case. This is the failure mode that produces $300–$400 Gemini bills from a single misconfigured pipeline run — and it's the most important failure mode to gate against before you have production traffic at scale.
Frequently asked questions
Does this work with Runner.run() (synchronous) in addition to Runner.run_async()?
Yes. ADK's callback hooks fire synchronously within the agent execution path regardless of whether you're using Runner.run() or Runner.run_async(). The ADKCostBreaker callbacks don't use asyncio primitives internally — they use plain Python data structures and raise a RuntimeError synchronously when a trip condition is met. For synchronous usage, the RuntimeError propagates through ADK's synchronous execution path and can be caught with a regular try/except. The record_event_tokens() method works with both sync and async event loops — call it on each event as you iterate, whether the iterator is a regular for-loop or an async for-loop. RunGuard's runguard.install() detects the execution context at runtime and instruments the appropriate code path automatically.
The non-progress detector hashes the full tool response. What if a legitimate response varies slightly each time (timestamps, request IDs) but is functionally the same failure?
The default MD5 approach hashes str(tool_response) in full. If your tool response includes a varying request_id, timestamp, or trace_id field alongside an otherwise static error status, the hashes will differ even though the response indicates the same failure. Normalize the response before hashing: strip the varying fields before passing to str(), or hash only the semantically meaningful fields (status, error_code, data). RunGuard's runguard.install() accepts a tool_result_normalizer parameter — a dict[str, Callable] mapping tool names to normalization functions applied before hashing. For the document parser example: tool_result_normalizer={"parse_document_fragment": lambda r: {k: v for k, v in r.items() if k != "request_id"}}.
Can I use the breaker alongside ADK's built-in max_iterations, or do they conflict?
max_iterations and ADKCostBreaker are complementary. Set max_iterations as the absolute hard ceiling on loop count — it's ADK's own guardrail and fires regardless of external breaker state. Set the breaker's max_identical_tool_results to a lower threshold that fires earlier when non-progress is detected. In the document processing example: max_iterations=8 is the hard cap, max_identical_tool_results=3 is the early trip. Normal runs where the tool succeeds complete before both limits. Runs where the tool is stuck trip the breaker at iteration 3, saving the cost of iterations 4–8. The two mechanisms operate independently — the breaker fires via callback hooks, max_iterations fires via ADK's loop termination logic — so they don't interfere.
How does the delegation depth limit interact with pipelines that have legitimate deep agent hierarchies?
The delegation chain tracks the active transfer path, not the total number of agent invocations in the session. An orchestrator that calls specialist A, gets a result, then calls specialist B, then calls specialist C has a maximum delegation depth of 2 at any moment (orchestrator + one active specialist), not 4. The depth counter increments when an agent starts and decrements when it returns. Only agents that are concurrently nested — A waiting for B which is waiting for C — count against the depth limit. A sequential orchestrator that makes 20 transfers to specialists but only one at a time stays at depth 1 throughout the session. Set max_delegation_depth to the maximum simultaneous nesting your pipeline intentionally reaches, plus 2 as headroom. A pipeline with intentional 3-level nesting (orchestrator → specialist → sub-specialist) should set the limit at 5 or 6.
What's the performance overhead of the callback instrumentation?
The callback hooks execute hash computation (hashlib.md5() on the tool response string), list operations (append, last-N comparison), and integer increments/decrements. For a 200-token tool response, the MD5 hash takes roughly 1–2 microseconds on modern hardware. The list comparison for non-progress detection is O(N) over the last max_identical_tool_results entries — typically 3–5 comparisons. The total overhead per tool call is under 10 microseconds, which is less than 0.001% of the latency of a Gemini API call (typically 500–2000ms). The inflation monitoring accumulates token counts in a list and computes averages — also O(1) per event. The instrumentation cost is unmeasurably small relative to model inference latency.