Google ADK Cost Control: Loop Detection and Budget Enforcement in Production

Google's Agent Development Kit ships with max_iterations on LoopAgent, Gemini's built-in safety filters, and structured session state via InMemorySessionService. A team builds a document processing pipeline: a SequentialAgent orchestrates a classification LlmAgent and an extraction LlmAgent, with the extraction agent running inside a LoopAgent that retries on low-confidence results. max_iterations=8 bounds each document. The extraction agent calls an external document parsing tool that returns a partial-parse error as a structured JSON response — not a Python exception, just a success response with "status": "partial_parse_error" in the payload. The extraction agent sees the partial parse, generates a refined extraction prompt, and calls the tool again. This repeats eight times. max_iterations fires. The classification agent receives "extraction loop hit max_iterations" and re-submits the document with a different classification hint. The LoopAgent runs eight more times. Fourteen re-submissions later, the session budget is gone.

The problem is structural. ADK's max_iterations counts turns but cannot evaluate whether turns are making progress. It cannot see that all eight iterations are calling the same parsing tool with semantically identical document fragments and receiving the same partial_parse_error response. It cannot see that the classification agent has re-submitted the same document fifteen times, each spawning a fresh eight-iteration loop with a growing session event log. ADK's safety filters catch policy violations; they don't catch infinite bill generation via soft-failure tool responses. The framework you need — a circuit breaker that detects stalled progress and trips before cost multiplies — does not ship in the box.

This post builds that circuit breaker for Google ADK: LoopAgent non-progress detection, subagent back-delegation cycle prevention, session event log inflation monitoring, and ParallelAgent over-spawn limits — all via ADK's native callback hooks, without monkey-patching or forking ADK. At the end you'll see how RunGuard's runguard.install() wraps any ADK pipeline with one call and handles all four failure modes automatically.

What you'll build: A circuit breaker that detects when LoopAgent iterations are calling the same failing tool with the same result, catches subagent delegation chains that route back to an earlier agent, monitors per-event token cost growth across a long session, and limits concurrent LlmAgent spawns in dynamic ParallelAgent scenarios — all implemented via before_tool_callback, after_tool_callback, before_agent_callback, and after_agent_callback, compatible with both InMemorySessionService and database-backed sessions.

Why ADK costs more than you modeled

ADK adds three cost amplifiers on top of a plain Gemini API call. None of them are visible to max_iterations or safety filters:

Session events accumulate in full context. Every model response, tool call, and tool result is stored as an Event in the session object. When the agent makes its next call, ADK serializes the full session event history into the context window. Turn 1 costs the initial prompt plus current input. Turn 5 costs the initial prompt plus all four prior exchanges plus the current input. Turn 10 costs everything from turns 1–9 plus the current turn. If the average exchange adds 500 tokens, turn 10 carries roughly 4,500 tokens of prior context before the current turn's own content. This is intentional — it gives ADK agents conversational memory — but it means cost per turn grows linearly with turn count, and total session cost grows quadratically with the number of turns.
Subagent transfers carry the full session context. When an LlmAgent transfers to a sub_agent, the target agent receives the complete session event history from the calling agent's session. A transfer at turn 8 of a 500-token-per-turn session carries 4,000 tokens of accumulated history into the subagent's initial context before its first instruction. If that subagent transfers onward to another agent, the accumulated history travels with it again. A five-hop delegation chain at turn 10 pays for the full 10-turn history five separate times — once at each transfer — in addition to each agent's own turn costs.
LoopAgent iterations compound both amplifiers. LoopAgent runs its sub-agent repeatedly in the same session, so each loop iteration starts with the event history from all prior iterations. Iteration 1 is clean. Iteration 8 carries seven prior iterations of tool calls and model responses. If each iteration costs 800 tokens at iteration 1, iteration 8 costs 800 + (7 × 800) = 6,400 tokens — eight times more than you modeled. max_iterations=8 bounds the count, but by the time it fires you've paid for the full quadratic growth across iterations 5, 6, 7, and 8.

These amplifiers mean an ADK pipeline that respects every per-call parameter can still produce 5–20× the cost you estimated, depending on how many loop iterations accumulate and how many agent transfers occur before the session closes.

The four failure modes ADK's built-in controls miss

1. LoopAgent non-progress: identical tool results across consecutive iterations

ADK's LoopAgent calls its sub-agent in a loop until the sub-agent transfers control to an agent named escalate — ADK's convention for signaling that the loop condition has been satisfied and execution should continue. max_iterations provides a hard ceiling on iterations before forced termination. Neither mechanism evaluates whether iterations are making progress toward the escalation condition. The failure mode: a tool returns a soft-failure response — a status field indicating partial failure, a "retry later" message, an empty result set — where the LLM interprets the response as "insufficient data, try again with a more targeted query." The model generates a different query and calls the tool again. Each iteration generates a different prompt but the tool result is structurally identical: same error status, same empty data payload. The loop runs until max_iterations, and the escalation condition is never reached because it depends on the tool succeeding.

Detection signal: the hash of the tool's return value is identical across the last N consecutive iterations for a given tool name. Different call arguments but the same result means the arguments aren't affecting the outcome — the tool is in a failure state that doesn't respond to prompt variation. The trip condition: the same tool returns the same result (by content hash) N consecutive times within the session. max_iterations fires only after the full count is exhausted; the circuit breaker can fire at iteration 3 when the non-progress pattern is established, before iterations 4–8 pay for the remaining quadratic session history growth.

2. Subagent back-delegation cycle: transfer chains that loop back

ADK's LlmAgent accepts a sub_agents list — child agents the parent can transfer control to during a session. Transfers are one-way per call: the model emits a transfer-to-agent signal, ADK routes to the target, and the target runs until it either produces a final response or transfers onward. The failure mode emerges when Agent A transfers to Agent B, Agent B transfers to Agent C, and Agent C's logic triggers a transfer back to Agent A — either because Agent C has Agent A in its own sub_agents, or because Agent C calls a shared routing tool that dispatches to Agent A based on the current task type.

ADK tracks which agent is currently executing in the session's invocation context, but it does not maintain a transfer history spanning the full current delegation chain. It cannot see that the current invocation of Agent A is a re-entry from Agent C, which was itself delegated by Agent B, which was delegated by Agent A three hops ago. The model driving Agent A has no awareness that it's running inside a cycle; it sees only its own session context. Detection signal: before any agent begins executing, check whether that agent's name is already present in the active delegation chain for the current session invocation. An agent that appears a second time in the chain is cycling — not running a second legitimate session, but re-entering mid-execution of the same delegation tree. Trip at the first re-entry.

3. Session event log inflation: O(N²) token cost from accumulated history

ADK's session model stores every event — model invocation input, model response, tool call arguments, tool result — as an Event object in the session. This is the mechanism that gives ADK agents memory within a session. The cost implication: the context window for turn N includes all events from turns 1 through N−1. If the average event pair (model call + response) is 600 tokens, turn 10 carries roughly 5,400 tokens of prior context on top of the current turn's own content. Turn 20 carries 11,400 tokens. The total token spend across a 20-turn session is not 20 × 600 = 12,000 tokens; it grows quadratically as each new turn re-includes all prior turns in the context.

Detection signal: the token count of recent model invocations is growing faster than the early-session baseline. If the first five events average 800 tokens each and the most recent five events average 2,400 tokens each (a 3× ratio), session history inflation is the dominant cost driver. This growth will continue — turn 25 will cost more than turn 20, which cost more than turn 15. The circuit breaker's job is to alert and optionally trip before the session reaches the long tail of high-cost turns where accumulated history dominates per-turn spend. Unlike max_iterations, the inflation detector fires in proportion to cost growth, not raw turn count.

4. ParallelAgent over-spawn: concurrent LLM calls that multiply unexpectedly

ADK's ParallelAgent runs all of its sub_agents simultaneously and merges their results. This is intentional and efficient for workloads that parallelize cleanly. The failure mode emerges in dynamic pipelines where the number of ParallelAgent sub-agents is determined at runtime from tool output. An orchestrating LlmAgent calls an index-lookup tool that returns a list of items to process in parallel. It constructs a ParallelAgent with one sub-agent per item. When the index works normally it returns 5–8 items. When the index has a bug — an off-by-one in a pagination cursor, a filter that fails open instead of closed — it returns 400 items. The orchestrator constructs a ParallelAgent with 400 concurrent sub-agents, each making an independent Gemini API call. No individual call is harmful, so safety filters pass. No ADK session mechanism detects the over-spawn. The bill for 400 simultaneous model calls lands before any monitoring fires.

Detection signal: the count of concurrently active LlmAgent invocations in the current session exceeds a configurable limit. Track concurrent invocations via before_agent_callback (increment) and after_agent_callback (decrement). Trip if the active count exceeds max_concurrent_agents. In a production pipeline that normally spawns 5–10 parallel agents, a limit of 20–30 provides headroom for legitimate peaks while catching index-returns-400-items anomalies before they translate into 400 simultaneous API calls.

Building the ADK circuit breaker

ADK's callback hooks are the right instrumentation layer for a circuit breaker. They fire at the boundaries of tool calls and agent invocations — exactly the points where cost accumulates. The implementation below uses ADK's four callback hooks to cover all four failure modes without modifying tool functions, agent definitions, or session service configuration:

from google.adk.agents import LlmAgent
from google.adk.tools import BaseTool
import hashlib, time
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional, Any


class BreakerState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"


@dataclass
class ADKCostBreaker:
    # LoopAgent non-progress detection
    max_identical_tool_results: int = 3
    # Back-delegation cycle detection
    max_delegation_depth: int = 5
    # Session event inflation monitoring
    max_event_cost_ratio: float = 3.0
    cost_baseline_events: int = 5
    # Parallel spawn limit
    max_concurrent_agents: int = 20
    # HALF_OPEN recovery window
    half_open_reset_seconds: int = 60

    state: BreakerState = BreakerState.CLOSED
    trip_reason: Optional[str] = None

    # Internal tracking state
    _tool_result_history: list = field(default_factory=list)
    _delegation_chain: list = field(default_factory=list)
    _event_token_counts: list = field(default_factory=list)
    _active_agent_count: int = 0
    _trip_at: Optional[float] = None

    def _check_state(self):
        if self.state == BreakerState.OPEN:
            if self._trip_at and time.time() - self._trip_at > self.half_open_reset_seconds:
                self.state = BreakerState.HALF_OPEN
            else:
                raise RuntimeError(
                    f"ADKCostBreaker OPEN — circuit tripped: {self.trip_reason}"
                )

    def before_tool_callback(
        self, tool: BaseTool, args: dict[str, Any], tool_context
    ) -> Optional[dict]:
        self._check_state()
        return None  # None = proceed normally

    def after_tool_callback(
        self,
        tool: BaseTool,
        args: dict[str, Any],
        tool_context,
        tool_response: dict,
    ) -> Optional[dict]:
        response_str = str(tool_response)
        result_hash = hashlib.md5(response_str.encode()).hexdigest()
        self._tool_result_history.append((tool.name, result_hash))

        # Non-progress: same tool + same result hash N times in a row
        if len(self._tool_result_history) >= self.max_identical_tool_results:
            recent = self._tool_result_history[-self.max_identical_tool_results:]
            if all(
                r[0] == recent[0][0] and r[1] == recent[0][1]
                for r in recent
            ):
                self._trip(
                    f"LoopAgent non-progress: tool '{tool.name}' returned "
                    f"identical result {self.max_identical_tool_results} consecutive "
                    f"times — loop is not making progress toward escalation condition"
                )
        return None  # None = use original tool_response

    def before_agent_callback(self, callback_context) -> Optional[Any]:
        self._check_state()
        agent_name = callback_context.agent_name

        # Back-delegation cycle: agent already in active chain
        if agent_name in self._delegation_chain:
            self._trip(
                f"Subagent back-delegation cycle: agent '{agent_name}' is already "
                f"in the active delegation chain {self._delegation_chain} — "
                f"transfer is creating a re-entry loop"
            )

        if len(self._delegation_chain) >= self.max_delegation_depth:
            self._trip(
                f"Delegation depth {len(self._delegation_chain) + 1} exceeds "
                f"max_delegation_depth={self.max_delegation_depth} — "
                f"chain: {self._delegation_chain} → {agent_name}"
            )

        self._delegation_chain.append(agent_name)
        self._active_agent_count += 1

        # Parallel spawn guard
        if self._active_agent_count > self.max_concurrent_agents:
            self._trip(
                f"ParallelAgent over-spawn: {self._active_agent_count} concurrent "
                f"agent invocations active, limit is {self.max_concurrent_agents}"
            )
        return None  # None = proceed normally

    def after_agent_callback(self, callback_context, llm_response) -> Optional[Any]:
        agent_name = callback_context.agent_name
        # Pop this agent from the delegation chain (last occurrence only)
        for i in range(len(self._delegation_chain) - 1, -1, -1):
            if self._delegation_chain[i] == agent_name:
                self._delegation_chain.pop(i)
                break
        self._active_agent_count = max(0, self._active_agent_count - 1)

        # Recover from HALF_OPEN if this agent completed without tripping
        if self.state == BreakerState.HALF_OPEN:
            self.state = BreakerState.CLOSED
            self.trip_reason = None

        return None  # None = use original llm_response

    def record_event_tokens(self, token_count: int):
        """Call with total_token_count from event.usage_metadata after each model response."""
        self._event_token_counts.append(token_count)

        if len(self._event_token_counts) >= self.cost_baseline_events * 2:
            baseline_window = self._event_token_counts[: self.cost_baseline_events]
            baseline_avg = sum(baseline_window) / len(baseline_window)
            recent_window = self._event_token_counts[-self.cost_baseline_events:]
            recent_avg = sum(recent_window) / len(recent_window)

            if baseline_avg > 0 and recent_avg / baseline_avg >= self.max_event_cost_ratio:
                self._trip(
                    f"Session event log inflation: recent {self.cost_baseline_events}-event "
                    f"average is {recent_avg:.0f} tokens "
                    f"({recent_avg / baseline_avg:.1f}× the early baseline of "
                    f"{baseline_avg:.0f} tokens) — session history is dominating "
                    f"per-turn cost"
                )

    def _trip(self, reason: str):
        self.state = BreakerState.OPEN
        self.trip_reason = reason
        self._trip_at = time.time()
        raise RuntimeError(f"RunGuard ADKCostBreaker tripped: {reason}")

    def install_on_agent(self, agent: LlmAgent) -> LlmAgent:
        """Attach all four breaker callbacks to an LlmAgent. Returns the agent."""
        agent.before_tool_callback = self.before_tool_callback
        agent.after_tool_callback = self.after_tool_callback
        agent.before_agent_callback = self.before_agent_callback
        agent.after_agent_callback = self.after_agent_callback
        return agent

    def reset(self):
        """Reset all tracking state. Call between sessions, not between turns."""
        self.state = BreakerState.CLOSED
        self.trip_reason = None
        self._tool_result_history.clear()
        self._delegation_chain.clear()
        self._event_token_counts.clear()
        self._active_agent_count = 0
        self._trip_at = None

The install_on_agent() method attaches all four callbacks to an LlmAgent. For a pipeline with multiple LlmAgent instances — an orchestrator, a LoopAgent sub-agent, and parallel specialist agents — attach the same breaker instance to all of them. The shared state (delegation chain, tool history, concurrent count) is what enables cross-agent detection. A breaker installed only on the orchestrator won't see tool calls made inside the loop agent's sub-agent.

Catching LoopAgent non-progress: the document processing example

The after_tool_callback captures the MD5 hash of every tool response and appends it to the session-level history. Once the last N entries share the same tool name and response hash, the session is looping without progress. Here's the full pipeline from the intro, with the breaker installed:

from google.adk.agents import LlmAgent, LoopAgent, SequentialAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types as genai_types


def parse_document_fragment(fragment: str, hint: str = "") -> dict:
    """Call external document parser. Returns structured data or error status."""
    response = document_parser_api.parse(fragment, hint=hint)
    if not response.ok:
        return {
            "status": "partial_parse_error",
            "data": None,
            "retry_hint": response.error_code,
        }
    return {"status": "success", "data": response.structured_data}


breaker = ADKCostBreaker(
    max_identical_tool_results=3,   # trip after 3 identical parse results
    max_delegation_depth=4,
    max_event_cost_ratio=3.0,
    max_concurrent_agents=15,
)

extraction_agent = LlmAgent(
    name="extraction_agent",
    model="gemini-2.0-flash",
    instruction=(
        "Extract structured data from document fragments. "
        "Call parse_document_fragment with the fragment and your best refinement hint. "
        "If extraction succeeds, transfer to 'escalate'. "
        "If partial_parse_error, refine your extraction approach and retry."
    ),
    tools=[parse_document_fragment],
)
breaker.install_on_agent(extraction_agent)

classification_agent = LlmAgent(
    name="classification_agent",
    model="gemini-2.0-flash",
    instruction=(
        "Classify the document type and prepare it for extraction. "
        "Transfer to extraction_agent when classification is complete."
    ),
    sub_agents=[extraction_agent],
)
breaker.install_on_agent(classification_agent)

loop_agent = LoopAgent(
    name="document_loop",
    sub_agents=[extraction_agent],
    max_iterations=8,  # hard ceiling — breaker fires earlier on non-progress
)

pipeline = SequentialAgent(
    name="document_pipeline",
    sub_agents=[classification_agent, loop_agent],
)


async def process_document(document_text: str) -> str:
    session_service = InMemorySessionService()
    session = await session_service.create_session(
        app_name="document_pipeline",
        user_id="batch_processor",
    )
    runner = Runner(
        agent=pipeline,
        app_name="document_pipeline",
        session_service=session_service,
    )

    try:
        async for event in runner.run_async(
            user_id="batch_processor",
            session_id=session.id,
            new_message=genai_types.Content(
                role="user",
                parts=[genai_types.Part(text=document_text)],
            ),
        ):
            # Record token counts for inflation monitoring
            if event.usage_metadata and event.usage_metadata.total_token_count:
                breaker.record_event_tokens(event.usage_metadata.total_token_count)
            if event.is_final_response():
                return event.content.parts[0].text
    except RuntimeError as e:
        return f"BREAKER_TRIPPED: {e}"
    finally:
        breaker.reset()

    return "NO_OUTPUT"

With max_identical_tool_results=3, the breaker fires at the third consecutive partial_parse_error instead of letting the loop run all eight iterations. At iteration 3 the session history is still small; iterations 4–8 would each carry the growing accumulated context from all prior iterations. The early trip saves roughly 70–79% of what the full max_iterations run would have cost, because the per-iteration cost is lowest at iteration 3 and highest at iterations 7 and 8.

Catching subagent back-delegation: the orchestrator cycle example

The delegation chain check fires in before_agent_callback — before the agent makes any model call — so cycles are caught before they generate tokens at the re-entry level. Here's a pipeline where an orchestrator and a research specialist can inadvertently cycle through a shared routing tool:

from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types as genai_types


def route_to_specialist(task_type: str, task: str) -> str:
    """Shared routing tool. If task_type == 'orchestrate', routes to orchestrator.
    This creates a back-delegation cycle when the specialist calls it."""
    return dispatch_to_agent(task_type, task)


# Both agents share the same breaker — delegation tracking is session-wide
cycle_breaker = ADKCostBreaker(max_delegation_depth=3)

research_specialist = LlmAgent(
    name="research_specialist",
    model="gemini-2.0-flash",
    instruction=(
        "Perform deep research on a topic. "
        "Use route_to_specialist if you need orchestration support."
    ),
    tools=[route_to_specialist],
)
cycle_breaker.install_on_agent(research_specialist)

orchestrator = LlmAgent(
    name="orchestrator",
    model="gemini-2.0-flash",
    instruction="Coordinate research tasks and delegate to research_specialist.",
    sub_agents=[research_specialist],
    tools=[route_to_specialist],
)
cycle_breaker.install_on_agent(orchestrator)


async def run_research_pipeline(query: str) -> str:
    session_service = InMemorySessionService()
    session = await session_service.create_session(
        app_name="research_pipeline",
        user_id="researcher",
    )
    runner = Runner(
        agent=orchestrator,
        app_name="research_pipeline",
        session_service=session_service,
    )

    try:
        async for event in runner.run_async(
            user_id="researcher",
            session_id=session.id,
            new_message=genai_types.Content(
                role="user",
                parts=[genai_types.Part(text=query)],
            ),
        ):
            if event.is_final_response():
                return event.content.parts[0].text
    except RuntimeError as e:
        return f"CYCLE_DETECTED: {e}"
    finally:
        cycle_breaker.reset()

    return "NO_OUTPUT"

When the orchestrator starts, the delegation chain is ["orchestrator"]. When it transfers to research_specialist, the chain becomes ["orchestrator", "research_specialist"]. If the specialist calls route_to_specialist("orchestrate", ...) and that dispatches back to the orchestrator, before_agent_callback fires with agent_name="orchestrator", finds "orchestrator" already in the chain, and trips the breaker before the orchestrator makes a single model call at depth 3. No re-entry tokens are generated at the cycle point.

Monitoring session event log inflation

The record_event_tokens() method takes the total_token_count from ADK's UsageMetadata on each model response event. The first cost_baseline_events calls establish the per-turn baseline. Once the session has at least 2 × cost_baseline_events data points, the detector compares the rolling average of the most recent events against that baseline. A max_event_cost_ratio× or greater ratio triggers a trip. For sessions where you want inflation monitoring without stopping execution, split the check into a soft alert:

# Soft monitoring: alert without tripping
inflation_breaker = ADKCostBreaker(
    max_event_cost_ratio=999.0,  # effectively disabled as a trip
    cost_baseline_events=5,
)

# Override the trip method to log instead of raise
original_trip = inflation_breaker._trip
def soft_alert(reason: str):
    print(f"COST_ALERT: {reason}")
    # Do not raise — let session continue but track the signal
inflation_breaker._trip = soft_alert

# Alternatively: check manually each turn
async for event in runner.run_async(...):
    if event.usage_metadata and event.usage_metadata.total_token_count:
        token_count = event.usage_metadata.total_token_count
        inflation_breaker.record_event_tokens(token_count)

        # Manual ratio check for logging only
        counts = inflation_breaker._event_token_counts
        if len(counts) >= 10:
            baseline = sum(counts[:5]) / 5
            recent = sum(counts[-5:]) / 5
            if baseline > 0:
                print(f"Turn cost ratio: {recent / baseline:.2f}× baseline")

    if event.is_final_response():
        break

For sessions where correctness requires the full turn count and early termination is unacceptable, use the soft-alert pattern to build cost visibility data without blocking execution. Feed the ratio data into your observability platform and set separate alerting thresholds there. The circuit breaker's role in that context shifts from prevention to early warning — still valuable, and easier to tune when you have historical ratio data to calibrate against.

RunGuard integration: one call for all four ADK failure modes

The circuit breaker above requires instantiating ADKCostBreaker and calling install_on_agent() on each agent in the pipeline. RunGuard's runguard.install() handles this automatically: it patches ADK's LlmAgent.__init__ to attach the circuit breaker callbacks at construction time, instruments the Runner's event stream to capture token counts without modifying your event loop, and maintains a shared breaker state across all agents in the current session. The result is single-line protection with no changes to agent definitions, tool functions, or session service configuration:

import runguard

# Patch at import time — all subsequent LlmAgent instances and Runner
# event streams are automatically protected
runguard.install(
    max_identical_tool_results=3,
    max_delegation_depth=5,
    max_event_cost_ratio=3.0,
    max_concurrent_agents=20,
    on_trip=lambda reason: send_slack_alert(f"ADK breaker tripped: {reason}"),
)

# Your existing code is unchanged:
pipeline = SequentialAgent(
    name="document_pipeline",
    sub_agents=[classification_agent, loop_agent],
)

runner = Runner(agent=pipeline, app_name="document_pipeline", ...)
async for event in runner.run_async(...):
    if event.is_final_response():
        result = event.content.parts[0].text

RunGuard instruments ADK's callback hooks at the framework level using the same before_tool_callback and before_agent_callback mechanism shown above, so it's compatible with any ADK callbacks you've already registered. If you have existing callbacks, RunGuard chains them: the breaker runs first (and raises if the circuit is open), then your callback runs. For after_tool_callback and after_agent_callback, RunGuard runs after your existing callback to inspect the final response. Your existing callbacks are preserved; the breaker wraps the full sequence.

What this saves in practice

Scenario	Without breaker	With breaker	Saved
LoopAgent non-progress `partial_parse_error` on all 8 iterations; each iteration pays for quadratic session history growth	$4.20 (8 iterations, escalating history cost)	$0.90 (3 iterations, trip on identical results)	79%
Back-delegation cycle Orchestrator → specialist → `route_to_specialist` → orchestrator, 6 re-entry levels before stack overflow	$5.80 (6 delegation levels, full session context at each)	$0.95 (2 levels, cycle detected at re-entry)	84%
Session event inflation 40-turn research session, per-turn cost reaches 3.5× baseline at turn 22	$9.40 (40 turns with quadratic history)	$3.20 (22 turns before inflation trip)	66%
ParallelAgent over-spawn Index tool returns 400 items instead of expected 8; pipeline spawns 400 concurrent agents	$380.00 (400 simultaneous Gemini API calls)	$7.60 (20 agents before spawn limit trip)	98%

The ParallelAgent over-spawn scenario shows the largest absolute savings because cost scales linearly with the number of spawned agents: 400 simultaneous calls cost 50× more than the expected 8. The circuit breaker fires at the 21st concurrent invocation. The 20 agents already in flight complete normally; the 21st and all subsequent spawns are blocked. Accounting for those 20 agents, total spend is still 98% below the unchecked case. This is the failure mode that produces $300–$400 Gemini bills from a single misconfigured pipeline run — and it's the most important failure mode to gate against before you have production traffic at scale.

Frequently asked questions

Does this work with Runner.run() (synchronous) in addition to Runner.run_async()?

Yes. ADK's callback hooks fire synchronously within the agent execution path regardless of whether you're using Runner.run() or Runner.run_async(). The ADKCostBreaker callbacks don't use asyncio primitives internally — they use plain Python data structures and raise a RuntimeError synchronously when a trip condition is met. For synchronous usage, the RuntimeError propagates through ADK's synchronous execution path and can be caught with a regular try/except. The record_event_tokens() method works with both sync and async event loops — call it on each event as you iterate, whether the iterator is a regular for-loop or an async for-loop. RunGuard's runguard.install() detects the execution context at runtime and instruments the appropriate code path automatically.

The non-progress detector hashes the full tool response. What if a legitimate response varies slightly each time (timestamps, request IDs) but is functionally the same failure?

The default MD5 approach hashes str(tool_response) in full. If your tool response includes a varying request_id, timestamp, or trace_id field alongside an otherwise static error status, the hashes will differ even though the response indicates the same failure. Normalize the response before hashing: strip the varying fields before passing to str(), or hash only the semantically meaningful fields (status, error_code, data). RunGuard's runguard.install() accepts a tool_result_normalizer parameter — a dict[str, Callable] mapping tool names to normalization functions applied before hashing. For the document parser example: tool_result_normalizer={"parse_document_fragment": lambda r: {k: v for k, v in r.items() if k != "request_id"}}.

Can I use the breaker alongside ADK's built-in max_iterations, or do they conflict?

max_iterations and ADKCostBreaker are complementary. Set max_iterations as the absolute hard ceiling on loop count — it's ADK's own guardrail and fires regardless of external breaker state. Set the breaker's max_identical_tool_results to a lower threshold that fires earlier when non-progress is detected. In the document processing example: max_iterations=8 is the hard cap, max_identical_tool_results=3 is the early trip. Normal runs where the tool succeeds complete before both limits. Runs where the tool is stuck trip the breaker at iteration 3, saving the cost of iterations 4–8. The two mechanisms operate independently — the breaker fires via callback hooks, max_iterations fires via ADK's loop termination logic — so they don't interfere.

How does the delegation depth limit interact with pipelines that have legitimate deep agent hierarchies?

The delegation chain tracks the active transfer path, not the total number of agent invocations in the session. An orchestrator that calls specialist A, gets a result, then calls specialist B, then calls specialist C has a maximum delegation depth of 2 at any moment (orchestrator + one active specialist), not 4. The depth counter increments when an agent starts and decrements when it returns. Only agents that are concurrently nested — A waiting for B which is waiting for C — count against the depth limit. A sequential orchestrator that makes 20 transfers to specialists but only one at a time stays at depth 1 throughout the session. Set max_delegation_depth to the maximum simultaneous nesting your pipeline intentionally reaches, plus 2 as headroom. A pipeline with intentional 3-level nesting (orchestrator → specialist → sub-specialist) should set the limit at 5 or 6.

What's the performance overhead of the callback instrumentation?

The callback hooks execute hash computation (hashlib.md5() on the tool response string), list operations (append, last-N comparison), and integer increments/decrements. For a 200-token tool response, the MD5 hash takes roughly 1–2 microseconds on modern hardware. The list comparison for non-progress detection is O(N) over the last max_identical_tool_results entries — typically 3–5 comparisons. The total overhead per tool call is under 10 microseconds, which is less than 0.001% of the latency of a Gemini API call (typically 500–2000ms). The inflation monitoring accumulates token counts in a list and computes averages — also O(1) per event. The instrumentation cost is unmeasurably small relative to model inference latency.

Stop runaway Google ADK agents before the bill lands

RunGuard wraps ADK's callback hooks with one runguard.install() call. LoopAgent non-progress detection, back-delegation cycle prevention, session event log monitoring, and ParallelAgent spawn limits — none of it requires changing your agent definitions, tool functions, or session service configuration.

See pricing →