Vertex AI agents and the loop problem: why Gemini’s 1M context window makes runaway loops more expensive, not less

Vertex AI Agent Builder and Google’s Agent Development Kit (ADK) let you deploy agents backed by Gemini 1.5 Pro — a model with a 1-million-token context window. This is both an advantage and a risk. The large context means agents can process enormous amounts of tool result data without hitting context overflow errors. But it also means a runaway agent loop can run for far more iterations before any hard limit fires, accumulating enormous cost before you notice. Gemini 1.5 Pro charges $3.50/M input tokens for prompts over 128k tokens — a 500k-token agent context in a tight loop can cost $1.75 per iteration. At 20 loop iterations, that’s $35 in a single failed agent run. This page shows how to add loop detection and budget guardrails to Vertex AI agents before that happens.

Vertex AI agent deployment models and their loop risks

Agent Builder (no-code / low-code). Vertex AI Agent Builder provides a managed runtime for building agents with tools, data stores, and playbooks. Agents are configured via the console or Terraform and deployed as managed endpoints. Agent Builder has a per-session message limit (configurable, default varies) but no built-in cost cap or loop detection. A playbook that routes to itself (circular routing condition) creates an infinite delegation loop at the managed runtime level.
ADK (Agent Development Kit) agents. Google’s ADK is a Python SDK for building custom Gemini-backed agents with function calling and multi-agent orchestration. ADK agents are structured similarly to LangChain agents: they run a ReAct loop, call functions, and process results. ADK has no built-in loop detection. The closest primitive is a max-iterations parameter on the ADK runner, which limits total iterations but not cost.
Custom Gemini API agents. Many Vertex AI deployments use the Gemini API directly (via google-generativeai or the Vertex AI SDK) to build custom agent loops. These have no framework-level iteration limits, no cost caps, and no loop detection — you get exactly what you build. This is the highest-flexibility and highest-risk deployment model.
The long-context loop amplifier. Most loop-detection thinking is calibrated for 128k-context models, where a 20-iteration loop might consume $5–$10. Gemini 1.5 Pro’s 1M context means each loop iteration potentially sends much more context, and the per-iteration cost scales with context size. An agent that accumulates tool results over 20 iterations before hitting a limit might reach 400k–600k tokens in context by iteration 10 — paying the over-128k price tier for every subsequent call.

Loop failure modes in Vertex AI agents

Function call loop. The agent calls the same function with the same (or trivially varied) arguments repeatedly because the function’s result never satisfies the agent’s termination condition. Common triggers: a search function that returns “no results”, a data-fetch function that returns empty due to a filter condition the agent keeps trying to correct, or a code-execution function whose output doesn’t pass a test the agent wrote incorrectly.
Playbook routing loop (Agent Builder). Agent Builder supports playbooks with routing conditions. If Playbook A routes to Playbook B under condition X, and Playbook B routes back to Playbook A under condition Y, and both X and Y can be simultaneously true, the session enters an infinite routing loop. Agent Builder’s session message limit fires eventually, but not before multiple round-trips have accumulated significant token usage.
Sub-agent delegation loop (ADK multi-agent). ADK supports multi-agent orchestration via AgentTool — one agent can call another as a tool. A delegation loop forms when Agent A calls Agent B (via AgentTool), Agent B determines the task requires Agent A’s capabilities and returns an instruction that triggers A to call B again. ADK’s runner processes each sub-agent call as a nested inference, so the loop generates multiple model calls per “iteration” — compounding cost faster than single-agent loops.

Adding RunGuard to Vertex AI / ADK agents

For direct Gemini API calls. Wrap the Gemini generate_content call with RunGuard’s guard function:

import google.generativeai as genai
from runguard import guard, LoopDetectedError, BudgetExceededError

genai.configure(api_key="YOUR_VERTEX_AI_API_KEY")
model = genai.GenerativeModel("gemini-1.5-pro")

async def gemini_agent_step(contents, tools_config):
    response = model.generate_content(
        contents=contents,
        tools=tools_config,
        generation_config={"temperature": 0.1},
    )
    # Extract cost from token metadata
    usage = response.usage_metadata
    # Gemini 1.5 Pro pricing (over 128k): $3.50/M input, $10.50/M output
    input_tokens = usage.prompt_token_count
    output_tokens = usage.candidates_token_count
    tier_multiplier = 1.4 if input_tokens > 128_000 else 1.0
    usd = (input_tokens * 3.50 * tier_multiplier + output_tokens * 10.50) / 1_000_000

    # Extract function call signature
    sig = "end"
    for candidate in response.candidates:
        for part in candidate.content.parts:
            if hasattr(part, "function_call") and part.function_call.name:
                sig = part.function_call.name
                break

    return {"response": response, "usd": usd, "sig": sig}

guarded_step = guard(
    gemini_agent_step,
    budget={"max_usd": 2.0},
    loop={"repeats": 3, "window": 6, "max_cycle_len": 4},
)

async def run_agent(task: str, tools_config):
    contents = [{"role": "user", "parts": [task]}]
    while True:
        try:
            result = await guarded_step(contents, tools_config)
        except LoopDetectedError as e:
            return f"Agent loop detected (pattern: {e.pattern!r}). Task halted."
        except BudgetExceededError as e:
            return f"Budget exceeded (${e.spent:.3f}). Task halted at step {e.steps}."

        response = result["response"]
        # Process the response — check for function calls, append results, etc.
        # ... your existing agent loop logic ...
        if is_final_answer(response):
            break
        contents = update_contents_with_result(contents, response)
    return extract_final_answer(response)

For ADK agents. ADK’s agent runner uses an internal BaseLlmFlow to process model calls. RunGuard integrates at the LLM client level — configure your ADK agent with a LiteLlm or custom model backend that wraps the Gemini call with RunGuard logic, or use ADK’s before_model_callback hook to inject budget and loop tracking:

from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm
from runguard.litellm import GuardedLiteLlm

guarded_model = GuardedLiteLlm(
    model="vertex_ai/gemini-1.5-pro",
    budget={"max_usd": 1.50},
    loop={"repeats": 3, "window": 5},
)

root_agent = LlmAgent(
    name="research_agent",
    model=guarded_model,
    description="Researches topics and provides comprehensive summaries",
    instruction="You are a research assistant. Use your tools to find accurate information.",
    tools=[search_tool, fetch_tool, summarize_tool],
)

The Gemini long-context cost tiers explained

Two pricing tiers for Gemini 1.5 Pro. Gemini 1.5 Pro has a sharp pricing cliff at 128k tokens: under 128k, input costs $1.25/M tokens; over 128k, input costs $3.75/M tokens (3x higher). For agents that accumulate tool results and conversation history, this cliff is easy to cross by step 10–15 of a complex task. An agent loop that starts below the cliff may cross it partway through, with subsequent iterations paying the higher rate.
Setting the budget cap relative to the cliff. A practical budget cap for production Vertex AI agents: estimate the expected number of steps (N), estimate average context size per step, and set the cap at 2–3x the expected cost at the higher pricing tier. This ensures the guard fires well before runaway loops reach dangerous spend levels, while leaving enough headroom for legitimately complex tasks.
Token-aware cap vs. dollar cap. For Vertex AI agents, a token-aware cap (fire at 100k input tokens) is often more predictable than a pure dollar cap, because the cost-per-token changes non-linearly at the 128k boundary. RunGuard supports both max_usd and max_input_tokens — using both in combination gives you cost protection and context-window safety in a single configuration.

Vertex AI agent built-in controls vs. RunGuard

Control	Vertex AI / ADK built-in	RunGuard
Per-run cost cap	Not supported	budget: max_usd — fires before each call
Function call loop detection	Not supported	loop: repeats=3 — catches same-function loops
Sub-agent delegation loop	Not supported	sig_fn can extract delegation target from ADK responses
Max iterations	ADK runner max_iterations (default varies)	Not needed (RunGuard loop detector fires first)
Long-context pricing cliff detection	Not supported	max_input_tokens cap fires before 128k cliff
Alert on budget exceeded	Cloud Monitoring budget alerts (account-level, not per-run)	alerts: slack_webhook or pagerduty_key per run