Vertex AI Agent Builder Cost Control: Loop Detection and Budget Enforcement in Production

Vertex AI Agent Builder is Google Cloud's managed platform for deploying conversational agents powered by Gemini models. Released into general availability in 2024, it unifies two previously separate products: Dialogflow CX (structured flows and pages) and a newer Playbook-based paradigm where you describe agent behavior in natural language instructions rather than explicit state machine transitions. In a Playbook agent, you define goals and examples in plain language, attach tools (OpenAPI-defined external APIs, code snippets, or Vertex AI Search data stores), and the Gemini model decides which tools to invoke, in what order, and when to respond.

The managed orchestration means you submit a user message via detect_intent and receive a response — the model's tool selection, invocations, result injection, and final answer generation all happen server-side in a single synchronous round-trip. This is the same architectural trade-off as Azure AI Agent Service and AWS Bedrock Agents: the managed loop is efficient and requires no orchestration code on your side, but cost-related failure modes that manifest as patterns across tool calls within a turn, or across consecutive turns within a session, are invisible to the platform's built-in safeguards.

This post covers four failure modes specific to Vertex AI Agent Builder's Playbook architecture and shows how to build a VertexAgentBreaker circuit breaker that wraps the google-cloud-dialogflow-cx session client and intercepts all four before the bill lands.

Why session limits are not a circuit breaker

Vertex AI Agent Builder exposes session-level configuration for controlling agent behavior: a max_turns parameter (when set via API) caps the number of conversation turns in a session, and session TTLs expire idle sessions after a configurable timeout. When you need finer per-query control, you can set token count guidance in your Playbook instructions or via system prompt framing — but these are soft guidance to the model, not enforced limits.

None of these mechanisms detect behavioral patterns. A session turn counter increments once per detect_intent call regardless of how many tool invocations occurred within that turn, whether those invocations made any forward progress, or whether the same tool was called eight times with near-identical parameters. A session TTL expires the session after a period of inactivity, not after the agent has spent N tokens unsuccessfully.

The distinction between a counter and a circuit breaker matters acutely for Playbook agents because:

Playbook agents make tool calls entirely server-side. Unlike code-first frameworks where you write the tool dispatch loop and can insert detection logic inline, here the only observation surface is the DetectIntentResponse and the diagnostic information it carries.
Gemini models in Agent Builder are trained to be helpful and persistent. A Playbook whose goal is "answer the user's question by searching the data store" will continue querying the data store across turns if it hasn't found an answer — this is intentional behavior that becomes a failure mode when the right answer isn't in the data store at all.
Multi-turn session context accumulates automatically. You do not opt in to session history — by default, Vertex AI Agent Builder sends the full conversation history to the model on every detect_intent call. A 40-turn support session adds substantial token overhead to every turn that follows.

A pattern-aware circuit breaker fills these gaps by observing response structure across turns, tracking tool call sequences, grounding query repetition, cross-playbook routing decisions, and session-level token accumulation — and tripping before the pattern compounds into a significant bill.

Vertex AI Agent Builder architecture overview

A brief orientation to the components involved in the four failure modes:

Sessions are the durable conversation containers. Each session has an ID you generate, lives at projects/{project}/locations/{location}/agents/{agent}/sessions/{session_id}, and accumulates ConversationTurn records across calls. The session context — all prior user and agent messages — is injected as input on every detect_intent call.
Playbooks are natural-language instruction sets attached to an agent. A Playbook defines a goal (e.g., "Help the user troubleshoot their order"), a set of steps, and example conversations. The Gemini model reads the Playbook on every turn and decides which action to take next.
Tools are callable functions attached to a Playbook — either OpenAPI-spec'd external APIs, inline code snippets (Code Interpreter), or Vertex AI Search data stores for grounding. Tool calls are resolved server-side; you receive the tool call inputs and outputs in the diagnostic_info map of the response.
Multi-playbook agents use a root Playbook that can hand off to sub-Playbooks via a TRANSFER_TO_PLAYBOOK action. The sub-Playbook can escalate back to the root via a resume action. This routing is fully managed and opaque unless you parse the diagnostic information in each response.

Failure mode 1: Playbook tool invocation spiral

A Playbook whose goal depends on a tool call that doesn't converge — for example, a search tool where the results never satisfy the Playbook's success criteria — will call that tool repeatedly within a single session turn or across consecutive turns, each time with near-identical input parameters.

Vertex AI Agent Builder processes all tool calls for a given turn on the server side. From the client, each detect_intent response carries a query_result.diagnostic_info proto struct that, when populate_diagnostic_info is set in the query parameters, includes the sequence of actions taken during that turn — including tool call inputs and outputs. In the v3beta1 API, this diagnostic map has an Action Results key whose value contains the ordered list of tool invocations.

The spiral detection logic: after each detect_intent call, parse the tool call inputs from the diagnostic info, compute a token-set fingerprint of each call, and maintain a sliding window of recent calls per tool name across the session. When the same tool appears in the window three or more times with Jaccard similarity ≥ 0.80 between consecutive call fingerprints, the agent is spiraling.

from collections import defaultdict
from dataclasses import dataclass, field
from enum import Enum
import re
import time
from typing import Any

from google.cloud import dialogflowcx_v3beta1 as dialogflow
from google.protobuf import struct_pb2


class BreakerState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"


class AgentBreakerTripError(Exception):
    def __init__(self, reason: str, detail: str):
        self.reason = reason
        self.detail = detail
        super().__init__(f"[{reason}] {detail}")


def _token_set(text: str) -> frozenset:
    return frozenset(re.findall(r"\w+", text.lower()))


def _jaccard(a: frozenset, b: frozenset) -> float:
    if not a and not b:
        return 1.0
    union = a | b
    return len(a & b) / len(union)


def _call_fingerprint(tool_name: str, inputs: dict) -> frozenset:
    raw = tool_name + " " + " ".join(f"{k}={v}" for k, v in sorted(inputs.items()))
    return _token_set(raw)


@dataclass
class BreakerConfig:
    spiral_window: int = 5
    spiral_similarity: float = 0.80
    spiral_min_repeats: int = 3
    grounding_window: int = 3
    grounding_similarity: float = 0.75
    escalation_max_cycles: int = 3
    drift_max_turns: int = 40
    drift_max_input_chars: int = 80_000
    reset_timeout_s: float = 60.0


@dataclass
class SessionState:
    tool_history: list = field(default_factory=list)
    grounding_query_history: list = field(default_factory=list)
    playbook_route_history: list = field(default_factory=list)
    turn_count: int = 0
    total_input_chars: int = 0


class VertexAgentBreaker:
    """Circuit breaker wrapping Vertex AI Agent Builder detect_intent."""

    def __init__(
        self,
        project_id: str,
        location: str,
        agent_id: str,
        config: BreakerConfig | None = None,
    ):
        self._project_id = project_id
        self._location = location
        self._agent_id = agent_id
        self._cfg = config or BreakerConfig()
        self._state = BreakerState.CLOSED
        self._tripped_at: float | None = None
        self._client = dialogflow.SessionsClient()
        self._sessions: dict[str, SessionState] = {}

    # ── public API ──────────────────────────────────────────────────────────

    def detect_intent(
        self,
        session_id: str,
        user_message: str,
        language_code: str = "en",
        query_params: dialogflow.QueryParameters | None = None,
    ) -> dialogflow.DetectIntentResponse:
        if self._state == BreakerState.OPEN:
            elapsed = time.monotonic() - (self._tripped_at or 0)
            if elapsed < self._cfg.reset_timeout_s:
                raise AgentBreakerTripError(
                    "CIRCUIT_OPEN",
                    f"Breaker is OPEN. Retry after {self._cfg.reset_timeout_s - elapsed:.1f}s.",
                )
            self._state = BreakerState.HALF_OPEN

        session_state = self._sessions.setdefault(session_id, SessionState())

        # Pre-call drift check on session size
        message_chars = len(user_message)
        session_state.total_input_chars += message_chars
        session_state.turn_count += 1
        self._check_session_drift(session_state)

        params = query_params or dialogflow.QueryParameters(
            populate_diagnostic_info=True
        )

        session_path = self._client.session_path(
            self._project_id, self._location, self._agent_id, session_id
        )
        request = dialogflow.DetectIntentRequest(
            session=session_path,
            query_input=dialogflow.QueryInput(
                text=dialogflow.TextInput(text=user_message),
                language_code=language_code,
            ),
            query_params=params,
        )

        response = self._client.detect_intent(request=request)

        # Post-call pattern analysis
        self._analyze_response(response, session_state)

        if self._state == BreakerState.HALF_OPEN:
            self._state = BreakerState.CLOSED
            self._tripped_at = None

        return response

    def reset_session(self, session_id: str) -> None:
        self._sessions.pop(session_id, None)

    # ── analysis ────────────────────────────────────────────────────────────

    def _analyze_response(
        self, response: dialogflow.DetectIntentResponse, state: SessionState
    ) -> None:
        result = response.query_result
        diagnostic = result.diagnostic_info

        tool_calls = self._extract_tool_calls(diagnostic)
        for tool_name, call_inputs in tool_calls:
            fp = _call_fingerprint(tool_name, call_inputs)
            state.tool_history.append((tool_name, fp))
            self._check_tool_spiral(state, tool_name)

            if "data_store" in tool_name.lower() or "search" in tool_name.lower():
                query_text = call_inputs.get("query", call_inputs.get("input", ""))
                if query_text:
                    state.grounding_query_history.append(_token_set(query_text))
                    self._check_grounding_fixation(state)

        playbook_name = self._extract_current_playbook(diagnostic)
        if playbook_name:
            state.playbook_route_history.append(playbook_name)
            self._check_escalation_loop(state)

    def _check_tool_spiral(self, state: SessionState, tool_name: str) -> None:
        recent = [
            (t, fp) for t, fp in state.tool_history[-self._cfg.spiral_window:]
            if t == tool_name
        ]
        if len(recent) < self._cfg.spiral_min_repeats:
            return
        fps = [fp for _, fp in recent[-self._cfg.spiral_min_repeats:]]
        pairs = [_jaccard(fps[i], fps[i + 1]) for i in range(len(fps) - 1)]
        avg_sim = sum(pairs) / len(pairs) if pairs else 0.0
        if avg_sim >= self._cfg.spiral_similarity:
            self._trip(
                "TOOL_SPIRAL",
                f"Tool '{tool_name}' called {len(recent)}× with avg similarity "
                f"{avg_sim:.2f} >= {self._cfg.spiral_similarity} in last "
                f"{self._cfg.spiral_window} calls",
            )

    def _check_grounding_fixation(self, state: SessionState) -> None:
        history = state.grounding_query_history
        if len(history) < self._cfg.grounding_window:
            return
        recent = history[-self._cfg.grounding_window:]
        pairs = [_jaccard(recent[i], recent[i + 1]) for i in range(len(recent) - 1)]
        avg_sim = sum(pairs) / len(pairs) if pairs else 0.0
        if avg_sim >= self._cfg.grounding_similarity:
            self._trip(
                "GROUNDING_FIXATION",
                f"Data store search query repeated {self._cfg.grounding_window} times "
                f"with avg similarity {avg_sim:.2f} >= {self._cfg.grounding_similarity}",
            )

    def _check_escalation_loop(self, state: SessionState) -> None:
        route_history = state.playbook_route_history
        if len(route_history) < self._cfg.escalation_max_cycles * 2:
            return
        recent = route_history[-self._cfg.escalation_max_cycles * 2:]
        # detect A→B→A→B→A→B pattern (alternating or short cycle)
        if len(set(recent)) <= 2:
            # count full cycles: A→B→A is one cycle
            cycles = sum(
                1 for i in range(1, len(recent) - 1)
                if recent[i] != recent[i - 1] and recent[i + 1] == recent[i - 1]
            )
            if cycles >= self._cfg.escalation_max_cycles:
                self._trip(
                    "ESCALATION_LOOP",
                    f"Multi-playbook escalation cycle detected {cycles}× between "
                    f"{set(recent)!r} in last {len(recent)} routing decisions",
                )

    def _check_session_drift(self, state: SessionState) -> None:
        if state.total_input_chars > self._cfg.drift_max_input_chars:
            self._trip(
                "SESSION_DRIFT",
                f"Cumulative session input chars {state.total_input_chars} exceeds "
                f"budget {self._cfg.drift_max_input_chars} after {state.turn_count} turns",
            )

    # ── helpers ─────────────────────────────────────────────────────────────

    def _extract_tool_calls(
        self, diagnostic: struct_pb2.Struct
    ) -> list[tuple[str, dict]]:
        calls: list[tuple[str, dict]] = []
        if not diagnostic:
            return calls
        raw = type(diagnostic).to_dict(diagnostic) if hasattr(type(diagnostic), "to_dict") else {}
        action_results = raw.get("Action Results", [])
        if isinstance(action_results, list):
            for action in action_results:
                if isinstance(action, dict):
                    tool_name = action.get("tool", action.get("toolName", ""))
                    inputs = action.get("input", action.get("parameters", {}))
                    if tool_name:
                        calls.append((str(tool_name), dict(inputs) if inputs else {}))
        return calls

    def _extract_current_playbook(self, diagnostic: struct_pb2.Struct) -> str | None:
        if not diagnostic:
            return None
        raw = type(diagnostic).to_dict(diagnostic) if hasattr(type(diagnostic), "to_dict") else {}
        return raw.get("triggered_condition", raw.get("current_playbook", None))

    def _trip(self, reason: str, detail: str) -> None:
        self._state = BreakerState.OPEN
        self._tripped_at = time.monotonic()
        raise AgentBreakerTripError(reason, detail)

Wiring it into your application is a three-line change from the raw SDK:

import os
from google.cloud import dialogflowcx_v3beta1 as dialogflow

# Before:
client = dialogflow.SessionsClient()
session_path = client.session_path(PROJECT, LOCATION, AGENT_ID, session_id)
response = client.detect_intent(request=dialogflow.DetectIntentRequest(...))

# After:
from vertex_agent_breaker import VertexAgentBreaker, AgentBreakerTripError, BreakerConfig

breaker = VertexAgentBreaker(
    project_id=os.environ["VERTEX_PROJECT_ID"],
    location="us-central1",
    agent_id=os.environ["VERTEX_AGENT_ID"],
    config=BreakerConfig(spiral_min_repeats=3, grounding_window=3),
)

try:
    response = breaker.detect_intent(session_id=session_id, user_message=user_input)
except AgentBreakerTripError as exc:
    print(f"Breaker tripped: {exc.reason} — {exc.detail}")
    # return a fallback response, log the event, or escalate to human

Failure mode 2: Data store grounding query fixation

Vertex AI Agent Builder supports attaching a Vertex AI Search data store to a Playbook as a grounding tool — the agent can query the data store to retrieve relevant documents and ground its response. When the data store doesn't contain an answer to the user's question, a Playbook instructed to "only answer from the data store" will query it repeatedly across turns, each time hoping for different results.

Each data store lookup generates a Vertex AI Search API call (billed separately from the Gemini inference) and adds the retrieved documents to the model's context window. For large data stores with long documents, three failed lookup attempts can add several thousand tokens to the context before the Playbook admits it doesn't have an answer.

The grounding fixation check in VertexAgentBreaker identifies data store tool calls by name (any tool whose name contains data_store or search, customizable per deployment), extracts the query parameter from the invocation inputs, and computes Jaccard similarity between consecutive query token sets. Three queries in a row with similarity ≥ 0.75 trips the GROUNDING_FIXATION reason.

The threshold asymmetry between tool spiral (0.80) and grounding fixation (0.75) is intentional. A search query that adds one synonym each iteration — "order status" → "order status tracking" → "order status delivery tracking" — is still semantically fixated, and a slightly lower threshold catches it. The Jaccard token-set approach handles near-synonym expansion correctly because each new synonym word shifts the overlap ratio downward by a predictable amount proportional to the change.

Failure mode 3: Multi-playbook escalation loop

Vertex AI Agent Builder supports multi-playbook architectures where a root Playbook routes to specialized sub-Playbooks based on user intent. A routing failure — where the sub-Playbook cannot satisfy the goal and escalates back to root, and root re-routes to the same sub-Playbook because the intent classification hasn't changed — creates an escalation cycle that generates a full Gemini inference call on every hop.

The routing decision is exposed via the diagnostic_info field in each DetectIntentResponse. By parsing the current Playbook name from the diagnostic map and maintaining a history of routing decisions across turns in the session, the breaker detects alternating or cyclic patterns. A cycle where root → specialized → root → specialized → root → specialized appears three times within a session window trips the ESCALATION_LOOP reason.

This failure mode differs from the tool spiral in an important way: each hop in an escalation cycle is a separate detect_intent call, not a sub-call within a single turn. The session turn counter increments with each hop, so a six-turn escalation loop looks like six legitimate turns to the built-in session limit. Only a circuit breaker that tracks the sequence of Playbooks visited across those turns can identify the pattern.

Configuring escalation_max_cycles requires understanding your intended routing depth. An agent with a legitimate three-level deep escalation (root → general support → technical support → L2 engineering) should set escalation_max_cycles to 4 or higher. For a two-Playbook architecture (root + one specialist), the default of 3 is correct: two legitimate hops out and back constitutes one intentional round-trip; a third round-trip indicates the sub-Playbook is not resolving the issue.

Failure mode 4: Session context token drift

Every detect_intent call in an active session includes the full prior conversation history — all user messages and agent responses — as part of the model's input context. Vertex AI Agent Builder injects this automatically; there is no client-side option to control it without explicit session management.

In a support use case with an agent that handles complex multi-step issues, a 40-turn session accumulates substantial context. If the user message in turn 40 is "please summarize what we've discussed," the input to the Gemini model includes the 39 prior turns plus the current message. This session-length overhead dominates the token count and means the cost of turn 40 is far higher than the cost of turn 1 — even though the user message is shorter.

Token drift becomes a failure mode when a session grows without the user making meaningful progress. A confused user cycling through troubleshooting steps that don't resolve their issue can generate a 60-turn session where each turn costs more than the last. The VertexAgentBreaker tracks the cumulative input character count across turns as a proxy for input token count (Vertex AI charges by character at the Dialogflow CX billing tier for Agent Builder, and roughly linearly by token for the underlying Gemini model).

The pre-call check fires before the detect_intent request when the cumulative session input has exceeded the configured budget. This prevents spending on the turn that would push cost above the limit rather than waiting for the response to arrive. For the threshold, use approximately 80,000 characters as a starting point — this corresponds to roughly 20,000 tokens at a 4:1 character-to-token ratio, a reasonable session limit for most support use cases.

BreakerConfig tuning

Parameter	Default	Raise when	Lower when
`spiral_window` `int, default 5`	5	Agents with legitimate multi-step tool use that calls the same tool across several turns with evolving inputs	Agents where any repeated tool call within a short window indicates a problem
`spiral_similarity` `float, default 0.80`	0.80	Tool calls where input parameters legitimately overlap heavily across calls (e.g. pagination: `{"page": 1}` → `{"page": 2}` has moderate overlap)	Agents where any near-duplicate call is suspicious; stricter enforcement environments
`grounding_window` `int, default 3`	3	Conversational agents where a user may naturally ask about a topic multiple turns in a row before moving on	Task-focused agents where repeating the same data store query twice is already suspicious
`grounding_similarity` `float, default 0.75`	0.75	Data stores where refined queries (adding synonyms, narrowing scope) are an intentional agent strategy	Narrow-domain agents where slight query variation still indicates fixation
`escalation_max_cycles` `int, default 3`	3	Deep multi-playbook routing trees where three round-trips between root and specialist is expected before resolution	Flat two-playbook architectures where a second round-trip already signals routing misconfiguration
`drift_max_input_chars` `int, default 80_000`	80,000	Long-session workflows (legal research, document drafting) where 60+ turn sessions are legitimately expected	Short-answer support agents where sessions should not exceed 15–20 turns

For initial deployment, run the breaker in logging mode for a week before enabling enforcement: replace self._trip() with a structured log write and increment a metric counter. This surfaces the trip rate and which reasons fire, letting you tune thresholds to eliminate false positives before the breaker starts raising.

Connecting to Google Cloud Monitoring

Vertex AI Agent Builder emits session-level telemetry to Cloud Monitoring automatically. The VertexAgentBreaker's trip events complement the native telemetry:

Forward all AgentBreakerTripError events to a custom Cloud Monitoring metric (namespace custom.googleapis.com/runguard/agent_breaker) with labels for agent_id, session_id, and trip_reason. The Cloud Monitoring custom metrics API accepts data points via the Monitoring client library.
Create a Cloud Monitoring alert policy that fires when any trip_reason label exceeds 5 events per hour — this surfaces systematic Playbook instruction problems that cause recurring spirals rather than one-off incidents.
Log the full detail string from each AgentBreakerTripError to Cloud Logging as a structured JSON log entry with severity=WARNING and a labels.trip_reason field. Cloud Logging automatically indexes label values, making it easy to filter for GROUNDING_FIXATION events and correlate them with data store query latency from the same time window.
Wire a SESSION_DRIFT trip to a PagerDuty or Google Chat webhook via a Cloud Monitoring notification channel — session drift that reaches the budget cap indicates a user in a stuck troubleshooting loop, which may warrant a human handoff rather than a breaker response message.

This mirrors the layered observability pattern described in our AI Agent Cost Engineering Production Guide — the VertexAgentBreaker is the trip sensor; Cloud Monitoring is the alarm layer; Cloud Logging is the incident log.

Session management to prevent drift at the source

Session context drift is the failure mode most worth preventing at design time rather than detecting at runtime. Four patterns reduce drift structurally in Vertex AI Agent Builder:

Session-per-task, not session-per-user. Create a new session for each distinct user goal rather than one session per user that accumulates all conversations indefinitely. The cost of a new session is a UUID generation; the cost of a 60-turn session on the next query is substantial. For authentication-dependent use cases where session continuity is required, use a session summarization handoff: when a session exceeds N turns, generate a summary assistant message and start a new session with only the summary as prior context.
Explicit context window management via Playbook instructions. Include a step in your Playbook such as "If the conversation is getting long, summarize what has been established so far and ask the user to confirm before proceeding." This is a soft signal to the model to compress context, not a hard limit — but it reduces the likelihood of the model generating long verbose responses that inflate subsequent turn input sizes.
Set session TTLs aggressively for asynchronous channels. The default session TTL for Vertex AI Agent Builder is 24 hours. For channels where a user might return hours later and start a new topic, a shorter TTL (e.g., 30 minutes for web chat) ensures old context doesn't inflate cost on what is functionally a new conversation.
Pre-fill sessions for API-triggered agents. For agents that handle structured API requests rather than open-ended conversation — data enrichment, classification pipelines, automated report generation — do not reuse sessions across requests. Each invocation should use a fresh session ID, so the context window contains only the current request and no historical noise.

FAQ

Does VertexAgentBreaker work with Vertex AI Agent Engine (formerly Reasoning Engine), or only Playbook-based agents?

The VertexAgentBreaker as shown wraps the dialogflowcx_v3beta1 Sessions client, which is the API for Playbook-based Vertex AI Conversational Agents. Vertex AI Agent Engine (the serverless Python agent runtime) uses a different client — vertexai.preview.reasoning_engines.ReasoningEngine — with a query() method and event stream. The same detection logic (tool spiral, grounding fixation, escalation loop, session drift) applies conceptually to Agent Engine deployments, but the integration point is different: you would wrap the agent.query() call and parse the streamed events rather than the DetectIntentResponse diagnostic info.

The diagnostic_info field is only available in v3beta1 — should I use a beta API in production?

Vertex AI Agent Builder's dialogflowcx_v3beta1 API has been stable in practice for over a year and Google promotes it in Agent Builder documentation. The populate_diagnostic_info flag is the only way to access server-side tool call detail without maintaining separate logging infrastructure. If your organization has a strict GA-only policy, you can deploy the breaker in degraded mode: omit the populate_diagnostic_info flag, skip tool spiral and grounding fixation detection, and rely only on session drift (which requires no diagnostic info) and escalation loop detection (which uses the current page/playbook name from the GA query_result.current_page field). This gives you partial protection with zero beta API exposure.

How do I handle a tripped breaker gracefully from the user's perspective?

The recommended pattern is to catch AgentBreakerTripError, log the event, and return a fixed fallback response rather than forwarding the error to the user. For a support agent, this fallback might be: "I'm having trouble answering this question through our automated system. Let me connect you with a support specialist." Classify the trip reason before generating the fallback: GROUNDING_FIXATION indicates the data store doesn't contain the answer (surface an escalation path), SESSION_DRIFT suggests the user is stuck and needs a human (route to live support), and TOOL_SPIRAL or ESCALATION_LOOP indicates a Playbook configuration problem (log with high severity for engineering review).

Can I run VertexAgentBreaker in a multi-process deployment without state sharing?

The per-session state in VertexAgentBreaker._sessions is in-process memory, which means separate process replicas maintain separate state. For a single-replica deployment or a deployment where session affinity (sticky sessions) routes all turns of a session to the same replica, this is fine. For multi-replica deployments without session affinity, either implement session affinity at the load balancer level (Vertex AI Agent Builder session IDs are suitable as affinity keys), or externalize the SessionState to a shared store (Redis, Firestore) keyed by session_id. The SessionState dataclass is fully serializable to JSON and small enough for sub-millisecond Firestore round-trips.

Does this work with RunGuard's hosted dashboard?

The VertexAgentBreaker pattern shown here is a standalone implementation of RunGuard's circuit-breaker logic. RunGuard's SDK provides the same detection logic as a hosted service: trip events are forwarded to the RunGuard dashboard, which shows trip rate per agent, trip reason distribution, and session cost trend across your entire Vertex AI Agent Builder fleet — without requiring you to instrument each agent individually or manage per-session state. The Solo plan at $19/mo covers one agent with up to 1M guarded detect_intent calls per month.

Stop paying for loops

VertexAgentBreaker is one implementation of RunGuard's circuit-breaker pattern for managed agent platforms. RunGuard monitors playbook tool spirals, data store grounding fixation, multi-playbook escalation cycles, and session token drift across your Vertex AI Agent Builder fleet — alerting before a single looping session compounds into an unexpected Cloud bill.

See pricing — Solo $19/mo

Also in this series