Vertex AI agents and the loop problem: why Gemini’s 1M context window makes runaway loops more expensive, not less
Vertex AI Agent Builder and Google’s Agent Development Kit (ADK) let you deploy agents backed by Gemini 1.5 Pro — a model with a 1-million-token context window. This is both an advantage and a risk. The large context means agents can process enormous amounts of tool result data without hitting context overflow errors. But it also means a runaway agent loop can run for far more iterations before any hard limit fires, accumulating enormous cost before you notice. Gemini 1.5 Pro charges $3.50/M input tokens for prompts over 128k tokens — a 500k-token agent context in a tight loop can cost $1.75 per iteration. At 20 loop iterations, that’s $35 in a single failed agent run. This page shows how to add loop detection and budget guardrails to Vertex AI agents before that happens.
Vertex AI agent deployment models and their loop risks
- Agent Builder (no-code / low-code). Vertex AI Agent Builder provides a managed runtime for building agents with tools, data stores, and playbooks. Agents are configured via the console or Terraform and deployed as managed endpoints. Agent Builder has a per-session message limit (configurable, default varies) but no built-in cost cap or loop detection. A playbook that routes to itself (circular routing condition) creates an infinite delegation loop at the managed runtime level.
- ADK (Agent Development Kit) agents. Google’s ADK is a Python SDK for building custom Gemini-backed agents with function calling and multi-agent orchestration. ADK agents are structured similarly to LangChain agents: they run a ReAct loop, call functions, and process results. ADK has no built-in loop detection. The closest primitive is a max-iterations parameter on the ADK runner, which limits total iterations but not cost.
- Custom Gemini API agents. Many Vertex AI deployments use the Gemini API directly (via
google-generativeaior the Vertex AI SDK) to build custom agent loops. These have no framework-level iteration limits, no cost caps, and no loop detection — you get exactly what you build. This is the highest-flexibility and highest-risk deployment model. - The long-context loop amplifier. Most loop-detection thinking is calibrated for 128k-context models, where a 20-iteration loop might consume $5–$10. Gemini 1.5 Pro’s 1M context means each loop iteration potentially sends much more context, and the per-iteration cost scales with context size. An agent that accumulates tool results over 20 iterations before hitting a limit might reach 400k–600k tokens in context by iteration 10 — paying the over-128k price tier for every subsequent call.
Loop failure modes in Vertex AI agents
- Function call loop. The agent calls the same function with the same (or trivially varied) arguments repeatedly because the function’s result never satisfies the agent’s termination condition. Common triggers: a search function that returns “no results”, a data-fetch function that returns empty due to a filter condition the agent keeps trying to correct, or a code-execution function whose output doesn’t pass a test the agent wrote incorrectly.
- Playbook routing loop (Agent Builder). Agent Builder supports playbooks with routing conditions. If Playbook A routes to Playbook B under condition X, and Playbook B routes back to Playbook A under condition Y, and both X and Y can be simultaneously true, the session enters an infinite routing loop. Agent Builder’s session message limit fires eventually, but not before multiple round-trips have accumulated significant token usage.
- Sub-agent delegation loop (ADK multi-agent). ADK supports multi-agent orchestration via
AgentTool— one agent can call another as a tool. A delegation loop forms when Agent A calls Agent B (via AgentTool), Agent B determines the task requires Agent A’s capabilities and returns an instruction that triggers A to call B again. ADK’s runner processes each sub-agent call as a nested inference, so the loop generates multiple model calls per “iteration” — compounding cost faster than single-agent loops.
Adding RunGuard to Vertex AI / ADK agents
- For direct Gemini API calls. Wrap the Gemini
generate_contentcall with RunGuard’s guard function:import google.generativeai as genai from runguard import guard, LoopDetectedError, BudgetExceededError genai.configure(api_key="YOUR_VERTEX_AI_API_KEY") model = genai.GenerativeModel("gemini-1.5-pro") async def gemini_agent_step(contents, tools_config): response = model.generate_content( contents=contents, tools=tools_config, generation_config={"temperature": 0.1}, ) # Extract cost from token metadata usage = response.usage_metadata # Gemini 1.5 Pro pricing (over 128k): $3.50/M input, $10.50/M output input_tokens = usage.prompt_token_count output_tokens = usage.candidates_token_count tier_multiplier = 1.4 if input_tokens > 128_000 else 1.0 usd = (input_tokens * 3.50 * tier_multiplier + output_tokens * 10.50) / 1_000_000 # Extract function call signature sig = "end" for candidate in response.candidates: for part in candidate.content.parts: if hasattr(part, "function_call") and part.function_call.name: sig = part.function_call.name break return {"response": response, "usd": usd, "sig": sig} guarded_step = guard( gemini_agent_step, budget={"max_usd": 2.0}, loop={"repeats": 3, "window": 6, "max_cycle_len": 4}, ) async def run_agent(task: str, tools_config): contents = [{"role": "user", "parts": [task]}] while True: try: result = await guarded_step(contents, tools_config) except LoopDetectedError as e: return f"Agent loop detected (pattern: {e.pattern!r}). Task halted." except BudgetExceededError as e: return f"Budget exceeded (${e.spent:.3f}). Task halted at step {e.steps}." response = result["response"] # Process the response — check for function calls, append results, etc. # ... your existing agent loop logic ... if is_final_answer(response): break contents = update_contents_with_result(contents, response) return extract_final_answer(response) - For ADK agents. ADK’s agent runner uses an internal
BaseLlmFlowto process model calls. RunGuard integrates at the LLM client level — configure your ADK agent with aLiteLlmor custom model backend that wraps the Gemini call with RunGuard logic, or use ADK’sbefore_model_callbackhook to inject budget and loop tracking:from google.adk.agents import LlmAgent from google.adk.models.lite_llm import LiteLlm from runguard.litellm import GuardedLiteLlm guarded_model = GuardedLiteLlm( model="vertex_ai/gemini-1.5-pro", budget={"max_usd": 1.50}, loop={"repeats": 3, "window": 5}, ) root_agent = LlmAgent( name="research_agent", model=guarded_model, description="Researches topics and provides comprehensive summaries", instruction="You are a research assistant. Use your tools to find accurate information.", tools=[search_tool, fetch_tool, summarize_tool], )
The Gemini long-context cost tiers explained
- Two pricing tiers for Gemini 1.5 Pro. Gemini 1.5 Pro has a sharp pricing cliff at 128k tokens: under 128k, input costs $1.25/M tokens; over 128k, input costs $3.75/M tokens (3x higher). For agents that accumulate tool results and conversation history, this cliff is easy to cross by step 10–15 of a complex task. An agent loop that starts below the cliff may cross it partway through, with subsequent iterations paying the higher rate.
- Setting the budget cap relative to the cliff. A practical budget cap for production Vertex AI agents: estimate the expected number of steps (N), estimate average context size per step, and set the cap at 2–3x the expected cost at the higher pricing tier. This ensures the guard fires well before runaway loops reach dangerous spend levels, while leaving enough headroom for legitimately complex tasks.
- Token-aware cap vs. dollar cap. For Vertex AI agents, a token-aware cap (fire at 100k input tokens) is often more predictable than a pure dollar cap, because the cost-per-token changes non-linearly at the 128k boundary. RunGuard supports both
max_usdandmax_input_tokens— using both in combination gives you cost protection and context-window safety in a single configuration.
Vertex AI agent built-in controls vs. RunGuard
| Control | Vertex AI / ADK built-in | RunGuard |
|---|---|---|
| Per-run cost cap | Not supported | budget: max_usd — fires before each call |
| Function call loop detection | Not supported | loop: repeats=3 — catches same-function loops |
| Sub-agent delegation loop | Not supported | sig_fn can extract delegation target from ADK responses |
| Max iterations | ADK runner max_iterations (default varies) | Not needed (RunGuard loop detector fires first) |
| Long-context pricing cliff detection | Not supported | max_input_tokens cap fires before 128k cliff |
| Alert on budget exceeded | Cloud Monitoring budget alerts (account-level, not per-run) | alerts: slack_webhook or pagerduty_key per run |