AWS Bedrock Agents Cost Control: Loop Detection and Budget Enforcement in Production

AWS Bedrock Agents is a fully managed service for deploying autonomous AI agents on AWS infrastructure. You define action groups — Lambda functions or OpenAPI schemas that become the agent's tools — and optionally connect knowledge bases for retrieval-augmented generation. The Bedrock orchestrator handles the reasoning loop: it calls the underlying model (Claude, Titan, Llama, or others), interprets tool use responses, invokes the appropriate action groups, and produces a final response.

The managed model means you don't write the ReAct loop yourself. The trade-off is that the loop is a black box: you can't intercept individual orchestration steps without a custom implementation. Bedrock exposes maxLength — a cap on total orchestration steps per invocation — as the primary built-in safety valve. Teams deploying agents with long-running tasks routinely raise this limit to 20, 40, or higher to avoid early termination. The higher you set it, the more expensive a runaway agent becomes.

This post covers four failure modes specific to Bedrock Agents' architecture and shows how to build a BedrockBreaker circuit breaker that wraps boto3's invoke_agent call and trips before the bill lands.

Why maxLength is not a circuit breaker

Bedrock's maxLength parameter (passed in sessionState or the agent configuration) sets the maximum number of orchestration steps the agent will take before returning a final response or an incomplete-action indicator. A step is one full cycle: model inference → tool selection → action group invocation → response processing.

A circuit breaker detects a pattern — repeated behavior that signals the agent is spending without converging — and halts specifically because progress has stalled. maxLength is a hard ceiling: it stops the agent after N steps regardless of whether those steps were productive. The distinction matters in both directions:

An agent legitimately working through a 30-step research task should not be blocked at step 15 because 15 happens to be the limit. Raising maxLength to 40 to accommodate legitimate tasks also raises the blast radius when the agent loops.
An agent calling the same Lambda action group with identical parameters ten times in a row has looped. A pattern-aware breaker trips after step 3 of the spiral; maxLength lets it run to the ceiling.
In multi-agent configurations, each sub-agent has its own maxLength. A supervisor that re-invokes a sub-agent five times because it doesn't like the response generates five independent step budgets — the per-invocation limit offers no protection against supervisor-level over-retry.

The four failure modes below each operate within what looks like valid Bedrock agent behavior at the individual step level. Only a pattern-level view across steps — or across supervisor/sub-agent invocation boundaries — reveals the loop.

Bedrock Agents architecture overview

Understanding the four failure modes requires a brief orientation to how Bedrock routes calls:

Action groups are Lambda functions (or OpenAPI-described HTTP endpoints) that Bedrock invokes as tools. Each action group can have multiple API paths. The model selects which action group and path to call; Bedrock proxies the call and feeds the response back into the orchestration loop as a tool result.
Knowledge bases are Bedrock-managed vector stores (backed by OpenSearch Serverless, Aurora, or other datastores). The orchestrator queries them automatically when the model determines retrieval is needed. Knowledge base queries are billed separately from model inference — per-query retrieval pricing stacks on top of your token costs.
Sessions carry conversation history across multiple invoke_agent calls via a sessionId. Within a session, each invocation receives the prior conversation context, growing the input token count on every call.
Multi-agent collaboration (Bedrock's supervisor/collaborator model) lets one agent act as a supervisor that delegates to specialised sub-agents. The supervisor sends a task; the sub-agent runs its own full orchestration loop and returns a result; the supervisor resumes. Each actor has independent maxLength and billing.

Each of these components introduces a distinct looping failure mode.

Failure mode 1: Action group invocation spiral

Action group spirals occur when the model selects the same action group and API path repeatedly with near-identical parameters, each time receiving a response that doesn't satisfy the current reasoning step. The pattern: the model calls get_customer_order(order_id="ORD-7821"), receives a result, determines the result is incomplete, and calls the same action again — perhaps with a slightly different parameter (order_id="7821", without the prefix). The Lambda returns the same result. The model tries once more.

This failure mode appears most often when:

The action group's Lambda returns partial results for a query the model expected to be complete (e.g., paginated responses where the model doesn't handle the pagination token correctly).
The model's instructions tell it to "confirm the data before proceeding" but the confirmation tool is the same tool that fetched the data in the first place.
The action group response schema has an optional field the model is looking for; the field is absent; the model retries hoping it appears.

At maxLength=20, a spiral that repeats the same action group call 18 times before the ceiling hits costs 18 Lambda invocations (billed by AWS) plus 18 model inference steps (billed per token). The Lambda cost is usually negligible; the model cost is not. At a typical Bedrock input/output rate, 18 orchestration cycles on a moderately complex task can cost $0.50–$2.00 for a task that should have cost $0.05.

Detection requires tracking the sequence of action group invocations across the orchestration loop and computing parameter similarity between consecutive calls to the same action group. Token-set Jaccard similarity on the serialized parameter map works reliably: if the same action group is called three or more times in a sliding window of five steps with parameter similarity above 0.80, the agent is spiraling rather than making progress.

Failure mode 2: Knowledge base RAG query fixation

Knowledge base query fixation occurs when the orchestrator issues semantically identical retrieval queries across consecutive orchestration steps. Bedrock's knowledge base integration is opaque — the orchestrator decides when to query it without exposing a callable hook in the standard invoke_agent API. However, the retrieval intent is visible in the model's reasoning: successive steps that contain "let me search for…" or tool use indicating KB retrieval with near-identical queries are a reliable signal.

The more direct detection surface is the trace output. When you call invoke_agent with enableTrace=True, Bedrock streams trace events that include knowledgeBaseLookupInput records — the exact query string sent to the vector store. These events are available as chunks in the streaming response. A circuit breaker that parses the trace stream can compare consecutive KB query strings and trip when similarity exceeds a threshold.

The cost structure is important: knowledge base queries incur per-query pricing on top of the model inference cost. OpenSearch Serverless, the most common Bedrock KB backend, charges for search requests and indexed data separately. An agent that fires 15 semantically identical KB queries in one session is paying for 15 retrieval operations, 15 × top-K document chunks injected back into context (inflating input tokens), and 15 model inference steps to process the retrieved content. The multiplier effect makes KB fixation one of the more expensive Bedrock failure modes per unit of compute time.

Query fixation typically appears when:

The KB doesn't contain information that satisfies the model's question (the document was never indexed, or the terminology doesn't match the query vocabulary).
The retrieved passages partially answer the question but leave a follow-up that the model resolves by re-querying with a refinement — which returns the same passages.
The agent is following an instruction to "look up X in the knowledge base before answering" without a mechanism to stop if X isn't found after N attempts.

Failure mode 3: Multi-agent supervisor cascade

Bedrock's multi-agent collaboration feature introduces a class of failure modes that can't be caught by any per-invocation limit. A supervisor agent delegates tasks to sub-agents via invoke_agent calls against collaborator agent IDs. Each sub-agent runs its own complete orchestration loop with its own maxLength budget. The supervisor receives the sub-agent's response, evaluates it, and — if dissatisfied — either re-invokes the same sub-agent with additional context or invokes a second sub-agent to verify.

The failure cascade has a multiplicative cost structure:

total_cost = supervisor_steps × (sub_agent_steps × token_cost_per_step)
           + sub_agent_invocations × per_invocation_overhead

A supervisor with maxLength=10 that invokes the same sub-agent five times because each response is "not quite right" generates five independent sub-agent runs, each potentially running to their own maxLength=15. Worst case: 10 supervisor steps × (15 sub-agent steps × token_cost) = 150 billable inference cycles plus overhead for each cross-agent invocation. On tasks where the sub-agent legitimately uses multiple action groups per step, this compounds further.

The supervisor cascade failure mode is particularly common when:

The supervisor's instructions use quality criteria that the sub-agent's responses never fully satisfy (e.g., "delegate to the research agent until you have a complete answer").
The sub-agent returns a response that is technically correct but formatted differently than the supervisor expected, causing the supervisor to request reformatting — which requires another full sub-agent run.
Two sub-agents are asked to validate each other's output, creating a ping-pong where each flags the other's result as needing revision.

Detection at the supervisor level requires tracking the sub-agent invocation log — which sub-agent was called, with what input, and what it returned — and detecting when the supervisor is re-calling the same collaborator with semantically similar task inputs. A supervisor that invokes the same sub-agent more than three times in a single session without a diverging task specification is in a cascade.

Failure mode 4: Session token drift

Session token drift is not a loop in the traditional sense — it doesn't repeat a specific action. It's a cost accumulation failure mode that compounds across invoke_agent calls within a long-running session. Each invocation on a session with sessionId carries the prior conversation history as input context. As the session grows, the input token count per invocation grows with it. An agent handling a 20-message session sees dramatically more input tokens per step than it did at the start of the session.

The drift becomes a budget problem when:

The agent is handling a long customer support session where each new user message triggers a full invocation with the entire prior transcript as context.
The agent revisits resolved topics — asking the same clarifying question it already asked three turns earlier because the prior answer is buried in a long context — incurring cost for redundant reasoning.
The model starts summarizing or recapping prior steps within its response, inflating output tokens on steps where the actual task output is small.

Detection uses the token usage fields in Bedrock's invocation response. Every invoke_agent response includes usage metadata with inputTokens and outputTokens. A circuit breaker that tracks these per-invocation and computes a rolling growth rate can trip when the projected cost for the next N invocations, extrapolated from the current growth trend, exceeds a session budget ceiling.

Building the BedrockBreaker

The BedrockBreaker wraps the bedrock-agent-runtime boto3 client's invoke_agent call. It uses enableTrace=True to stream trace events and parses them in real time, building the invocation log that feeds the four detection methods. When any method trips, the breaker halts the streaming response and raises a CircuitBreakerTripError with a diagnostic payload.

import boto3
import json
import hashlib
from collections import defaultdict
from dataclasses import dataclass, field
from enum import Enum
from typing import Iterator

class BreakerState(Enum):
    CLOSED = "closed"      # normal operation
    OPEN = "open"          # tripped, rejecting calls
    HALF_OPEN = "half_open"  # probe allowed, watching for recovery

class TripReason(Enum):
    ACTION_GROUP_SPIRAL = "action_group_spiral"
    KB_QUERY_FIXATION = "kb_query_fixation"
    SUPERVISOR_CASCADE = "supervisor_cascade"
    SESSION_TOKEN_DRIFT = "session_token_drift"

class CircuitBreakerTripError(Exception):
    def __init__(self, reason: TripReason, detail: str, partial_response: str = ""):
        self.reason = reason
        self.detail = detail
        self.partial_response = partial_response
        super().__init__(f"BedrockBreaker tripped: {reason.value} — {detail}")

@dataclass
class BreakerConfig:
    # Action group spiral detection
    action_spiral_window: int = 5          # steps to look back
    action_spiral_min_repeats: int = 3     # same group in window to trip
    action_spiral_similarity: float = 0.80 # min Jaccard similarity to count as "same"

    # KB query fixation
    kb_fixation_window: int = 3            # consecutive near-identical queries to trip
    kb_fixation_similarity: float = 0.75   # min Jaccard similarity

    # Supervisor cascade
    supervisor_max_subagent_calls: int = 3  # same sub-agent invocations before trip
    supervisor_similarity: float = 0.70     # task input similarity threshold

    # Session token drift
    session_budget_tokens: int = 50_000    # total session input+output token cap
    session_drift_window: int = 3           # consecutive steps to measure growth rate
    session_drift_max_ratio: float = 1.25  # max allowed growth ratio per step

@dataclass
class InvocationRecord:
    action_group: str
    api_path: str
    params_hash: str
    params_tokens: set  # for Jaccard

@dataclass
class BreakerRunState:
    action_log: list = field(default_factory=list)  # InvocationRecord per step
    kb_query_log: list = field(default_factory=list)  # str per KB query
    token_log: list = field(default_factory=list)    # (input_tokens, output_tokens) per call
    session_tokens_total: int = 0
    partial_response_chunks: list = field(default_factory=list)

def _jaccard(a: set, b: set) -> float:
    if not a and not b:
        return 1.0
    union = a | b
    if not union:
        return 1.0
    return len(a & b) / len(union)

def _param_tokens(params: dict) -> set:
    """Tokenize flattened param map for Jaccard comparison."""
    tokens = set()
    for k, v in params.items():
        tokens.add(k.lower())
        for word in str(v).lower().split():
            tokens.add(word[:12])  # cap token length
    return tokens

def _params_hash(params: dict) -> str:
    return hashlib.md5(json.dumps(params, sort_keys=True).encode()).hexdigest()[:8]

class BedrockBreaker:
    def __init__(
        self,
        agent_id: str,
        agent_alias_id: str,
        config: BreakerConfig = None,
        region: str = "us-east-1",
    ):
        self.agent_id = agent_id
        self.agent_alias_id = agent_alias_id
        self.config = config or BreakerConfig()
        self.state = BreakerState.CLOSED
        self.consecutive_trips = 0
        self.client = boto3.client("bedrock-agent-runtime", region_name=region)

    def invoke(
        self,
        session_id: str,
        input_text: str,
        session_state: dict = None,
    ) -> str:
        """
        Wraps invoke_agent with circuit breaker protection.
        Returns the final text response.
        Raises CircuitBreakerTripError if a failure mode is detected.
        """
        if self.state == BreakerState.OPEN:
            raise CircuitBreakerTripError(
                TripReason.ACTION_GROUP_SPIRAL,
                "Breaker is OPEN — refusing invocation until reset or probe succeeds."
            )

        run = BreakerRunState()

        kwargs = dict(
            agentId=self.agent_id,
            agentAliasId=self.agent_alias_id,
            sessionId=session_id,
            inputText=input_text,
            enableTrace=True,
        )
        if session_state:
            kwargs["sessionState"] = session_state

        response = self.client.invoke_agent(**kwargs)
        final_text = self._consume_stream(response["completion"], run, session_id)

        # Successful run — reset HALF_OPEN to CLOSED
        if self.state == BreakerState.HALF_OPEN:
            self.state = BreakerState.CLOSED
            self.consecutive_trips = 0

        return final_text

    def _consume_stream(
        self, stream: Iterator, run: BreakerRunState, session_id: str
    ) -> str:
        for event in stream:
            # Accumulate final response text
            if "chunk" in event:
                chunk_bytes = event["chunk"].get("bytes", b"")
                run.partial_response_chunks.append(chunk_bytes.decode("utf-8", errors="replace"))

            # Parse trace events for pattern detection
            if "trace" in event:
                self._process_trace(event["trace"], run)

            # Token usage is in the final chunk's metadata
            if "chunk" in event and "attribution" in event["chunk"]:
                usage = event["chunk"].get("usage", {})
                if usage:
                    input_t = usage.get("inputTokens", 0)
                    output_t = usage.get("outputTokens", 0)
                    run.token_log.append((input_t, output_t))
                    run.session_tokens_total += input_t + output_t
                    self._check_session_drift(run)

        return "".join(run.partial_response_chunks)

    def _process_trace(self, trace: dict, run: BreakerRunState):
        trace_inner = trace.get("trace", {})

        # Action group invocations appear in orchestrationTrace → observation
        orch = trace_inner.get("orchestrationTrace", {})

        invoc = orch.get("invocationInput", {})
        if invoc.get("invocationType") == "ACTION_GROUP":
            ag_input = invoc.get("actionGroupInvocationInput", {})
            group = ag_input.get("actionGroupName", "unknown")
            path = ag_input.get("apiPath", "")
            params_raw = ag_input.get("parameters", [])
            params_dict = {p["name"]: p["value"] for p in params_raw if "name" in p}
            record = InvocationRecord(
                action_group=group,
                api_path=path,
                params_hash=_params_hash(params_dict),
                params_tokens=_param_tokens(params_dict),
            )
            run.action_log.append(record)
            self._check_action_spiral(run)

        # Knowledge base queries appear in orchestrationTrace → observation → knowledgeBaseLookupInput
        obs = orch.get("observation", {})
        kb_lookup = obs.get("knowledgeBaseLookupOutput", {})
        # Also check invocationInput for KB
        if invoc.get("invocationType") == "KNOWLEDGE_BASE":
            kb_input = invoc.get("knowledgeBaseLookupInput", {})
            query = kb_input.get("text", "")
            if query:
                run.kb_query_log.append(query)
                self._check_kb_fixation(run)

    def _check_action_spiral(self, run: BreakerRunState):
        cfg = self.config
        log = run.action_log
        if len(log) < cfg.action_spiral_min_repeats:
            return

        window = log[-cfg.action_spiral_window:]
        # Group by action group name within window
        by_group: dict[str, list] = defaultdict(list)
        for rec in window:
            by_group[rec.action_group].append(rec)

        for group, records in by_group.items():
            if len(records) < cfg.action_spiral_min_repeats:
                continue
            # Check pairwise Jaccard similarity between consecutive records
            spiral_count = 0
            for i in range(1, len(records)):
                sim = _jaccard(records[i-1].params_tokens, records[i].params_tokens)
                if sim >= cfg.action_spiral_similarity:
                    spiral_count += 1
            if spiral_count >= cfg.action_spiral_min_repeats - 1:
                partial = "".join(run.partial_response_chunks)
                self._trip(TripReason.ACTION_GROUP_SPIRAL,
                           f"Action group '{group}' called {len(records)} times "
                           f"with ≥{cfg.action_spiral_similarity:.0%} parameter similarity "
                           f"in last {cfg.action_spiral_window} steps.",
                           partial)

    def _check_kb_fixation(self, run: BreakerRunState):
        cfg = self.config
        queries = run.kb_query_log
        if len(queries) < cfg.kb_fixation_window:
            return

        recent = queries[-cfg.kb_fixation_window:]
        token_sets = [set(q.lower().split()) for q in recent]
        # Check all consecutive pairs in the window
        all_similar = all(
            _jaccard(token_sets[i], token_sets[i+1]) >= cfg.kb_fixation_similarity
            for i in range(len(token_sets) - 1)
        )
        if all_similar:
            partial = "".join(run.partial_response_chunks)
            self._trip(TripReason.KB_QUERY_FIXATION,
                       f"Knowledge base queried {cfg.kb_fixation_window} consecutive times "
                       f"with ≥{cfg.kb_fixation_similarity:.0%} query similarity. "
                       f"Last query: '{recent[-1][:80]}'",
                       partial)

    def _check_session_drift(self, run: BreakerRunState):
        cfg = self.config

        # Hard session token cap
        if run.session_tokens_total >= cfg.session_budget_tokens:
            partial = "".join(run.partial_response_chunks)
            self._trip(TripReason.SESSION_TOKEN_DRIFT,
                       f"Session token total {run.session_tokens_total} exceeded "
                       f"budget of {cfg.session_budget_tokens}.",
                       partial)

        # Growth rate check
        tlog = run.token_log
        if len(tlog) < cfg.session_drift_window:
            return

        recent_inputs = [t[0] for t in tlog[-cfg.session_drift_window:]]
        # Check if each step's input tokens grew by more than the allowed ratio
        drift_steps = sum(
            1 for i in range(1, len(recent_inputs))
            if recent_inputs[i-1] > 0
            and recent_inputs[i] / recent_inputs[i-1] > cfg.session_drift_max_ratio
        )
        if drift_steps >= cfg.session_drift_window - 1:
            projected_next = recent_inputs[-1] * cfg.session_drift_max_ratio
            partial = "".join(run.partial_response_chunks)
            self._trip(TripReason.SESSION_TOKEN_DRIFT,
                       f"Input tokens growing at >{cfg.session_drift_max_ratio:.0%} per step "
                       f"for {drift_steps} consecutive steps. "
                       f"Projected next-step input: ~{int(projected_next)} tokens.",
                       partial)

    def _trip(self, reason: TripReason, detail: str, partial: str):
        self.state = BreakerState.OPEN
        self.consecutive_trips += 1
        raise CircuitBreakerTripError(reason, detail, partial)

    def reset(self):
        """Manually reset to HALF_OPEN to allow one probe invocation."""
        self.state = BreakerState.HALF_OPEN

Handling a supervisor cascade

The supervisor cascade failure mode requires wrapping the supervisor's own BedrockBreaker with a sub-agent invocation tracker. Because the supervisor calls sub-agents via the same invoke_agent API, you can intercept these calls at the supervisor's orchestration layer by parsing the trace events for invocationType == "AGENT_COLLABORATOR".

@dataclass
class SubAgentRecord:
    collaborator_id: str
    task_input: str
    task_tokens: set  # for Jaccard on task text

@dataclass
class SupervisorRunState(BreakerRunState):
    subagent_log: list = field(default_factory=list)  # SubAgentRecord per delegation

class SupervisorBedrockBreaker(BedrockBreaker):
    """Extended breaker for supervisor agents with multi-agent cascade detection."""

    def _process_trace(self, trace: dict, run: BreakerRunState):
        super()._process_trace(trace, run)

        trace_inner = trace.get("trace", {})
        orch = trace_inner.get("orchestrationTrace", {})
        invoc = orch.get("invocationInput", {})

        if invoc.get("invocationType") == "AGENT_COLLABORATOR":
            collab_input = invoc.get("agentCollaboratorInvocationInput", {})
            collab_id = collab_input.get("agentCollaboratorAliasArn", "unknown").split("/")[-1]
            task_text = collab_input.get("input", {}).get("text", "")
            task_tokens = set(task_text.lower().split())

            record = SubAgentRecord(
                collaborator_id=collab_id,
                task_input=task_text,
                task_tokens=task_tokens,
            )
            if hasattr(run, "subagent_log"):
                run.subagent_log.append(record)
                self._check_supervisor_cascade(run)

    def _check_supervisor_cascade(self, run):
        cfg = self.config
        log = run.subagent_log

        by_collaborator: dict[str, list] = defaultdict(list)
        for rec in log:
            by_collaborator[rec.collaborator_id].append(rec)

        for collab_id, records in by_collaborator.items():
            if len(records) <= cfg.supervisor_max_subagent_calls:
                continue
            # Check if task inputs are semantically similar across re-invocations
            similar_pairs = sum(
                1 for i in range(1, len(records))
                if _jaccard(records[i-1].task_tokens, records[i].task_tokens)
                   >= cfg.supervisor_similarity
            )
            if similar_pairs >= cfg.supervisor_max_subagent_calls:
                partial = "".join(run.partial_response_chunks)
                self._trip(TripReason.SUPERVISOR_CASCADE,
                           f"Supervisor invoked sub-agent '{collab_id}' "
                           f"{len(records)} times with ≥{cfg.supervisor_similarity:.0%} "
                           f"task similarity — cascade detected.",
                           partial)

    def invoke(self, session_id: str, input_text: str, session_state: dict = None) -> str:
        if self.state == BreakerState.OPEN:
            raise CircuitBreakerTripError(
                TripReason.SUPERVISOR_CASCADE,
                "Breaker is OPEN — refusing supervisor invocation until reset."
            )
        run = SupervisorRunState()

        kwargs = dict(
            agentId=self.agent_id,
            agentAliasId=self.agent_alias_id,
            sessionId=session_id,
            inputText=input_text,
            enableTrace=True,
        )
        if session_state:
            kwargs["sessionState"] = session_state

        response = self.client.invoke_agent(**kwargs)
        final_text = self._consume_stream(response["completion"], run, session_id)

        if self.state == BreakerState.HALF_OPEN:
            self.state = BreakerState.CLOSED
            self.consecutive_trips = 0

        return final_text

Wiring the breaker into a production Bedrock agent

The BedrockBreaker is a drop-in wrapper around your existing invoke_agent call. If you're currently invoking Bedrock agents like this:

import boto3

client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

def handle_user_message(session_id: str, message: str) -> str:
    response = client.invoke_agent(
        agentId="ABCDE12345",
        agentAliasId="TSTALIASID",
        sessionId=session_id,
        inputText=message,
    )
    chunks = []
    for event in response["completion"]:
        if "chunk" in event:
            chunks.append(event["chunk"]["bytes"].decode("utf-8"))
    return "".join(chunks)

The migration to BedrockBreaker is a three-line change:

from bedrock_breaker import BedrockBreaker, BreakerConfig, CircuitBreakerTripError

breaker = BedrockBreaker(
    agent_id="ABCDE12345",
    agent_alias_id="TSTALIASID",
    config=BreakerConfig(
        session_budget_tokens=40_000,     # trip at 40K session tokens
        action_spiral_min_repeats=3,      # trip after 3 similar action group calls
        kb_fixation_window=3,             # trip after 3 similar KB queries
        supervisor_max_subagent_calls=3,  # trip after 3 similar sub-agent calls
    ),
    region="us-east-1",
)

def handle_user_message(session_id: str, message: str) -> str:
    try:
        return breaker.invoke(session_id, message)
    except CircuitBreakerTripError as e:
        # Log to CloudWatch, emit an alarm, return graceful degradation response
        print(f"[BedrockBreaker] {e.reason.value}: {e.detail}")
        if e.partial_response:
            return e.partial_response + "\n\n[Response truncated: circuit breaker tripped]"
        return "I was unable to complete this request. Please try again or contact support."

Resetting after a trip

After a trip, the breaker moves to OPEN and rejects all subsequent calls. To probe for recovery — for example, after you've updated the agent's instructions to prevent the looping behavior — call breaker.reset() to move to HALF_OPEN. The next invocation is allowed through; if it succeeds without tripping, the breaker returns to CLOSED and resets the consecutive trip counter. If it trips again, the breaker stays OPEN.

# After fixing the underlying agent configuration
breaker.reset()

# Next invocation is a probe
try:
    result = breaker.invoke(session_id, probe_message)
    # Breaker is now CLOSED — normal operation resumes
except CircuitBreakerTripError as e:
    # Still looping — investigate further before re-probing
    print(f"Probe failed: {e.detail}")

Tuning the configuration for your workload

The defaults in BreakerConfig are conservative starting points. The right values depend on your agent's legitimate behavior patterns:

Parameter	Default	When to raise	When to lower
`action_spiral_min_repeats` `int, default 3`	3	Agent legitimately retries the same action group on failure (e.g., idempotent write-with-confirmation pattern)	Simple question-answering agents that should never call the same action twice
`kb_fixation_window` `int, default 3`	3	Agent uses progressive KB refinement (broader query → narrower query is legitimate)	KB-heavy agents where any repeated retrieval is a signal of failure
`session_budget_tokens` `int, default 50_000`	50,000	Long-running sessions with large knowledge context are expected	Short-session agents (e.g., single-turn customer support) where 10K tokens should suffice
`supervisor_max_subagent_calls` `int, default 3`	3	Supervisor legitimately uses iterative refinement (e.g., research → edit → review cycle)	Simple delegation pipelines where one sub-agent call per task is expected

The most reliable calibration approach: run a week's worth of production traces through the breaker logic in dry-run mode (logging trips without actually raising the exception) and adjust thresholds to minimize false positives before switching to enforcement mode.

Connecting to Bedrock's native observability

AWS CloudWatch natively receives Bedrock agent traces when you enable enableTrace. The BedrockBreaker's trip events complement — but don't replace — the full trace logging. A recommended setup:

Forward all CircuitBreakerTripError events to a CloudWatch custom metric namespace (e.g., RunGuard/BedrockBreaker) with dimensions for AgentId, TripReason, and SessionId.
Create a CloudWatch alarm that fires when any trip reason exceeds 5 events per hour — this surfaces systematic agent instruction problems that produce recurring spirals rather than one-off incidents.
Log the partial_response from tripped invocations to an S3 prefix for post-incident analysis — this is often the most actionable artifact for diagnosing what the agent was attempting when it looped.

This mirrors the approach described in our AI Agent Cost Engineering Production Guide — the BedrockBreaker is the trip sensor; CloudWatch is the alarm layer; S3 is the incident log.

FAQ

Does BedrockBreaker work with all Bedrock model providers, or only Claude?

The breaker operates at the invoke_agent API level, which is model-agnostic. Bedrock Agents orchestrates the loop regardless of the underlying model — Claude, Llama 3, Titan, Mistral, or others. The trace event schema is consistent across providers, so all four detection methods work identically. The only model-specific behavior is the token counting in session_budget_tokens, which is denominated in tokens and counted from the Bedrock usage response (also model-agnostic).

Does enableTrace add latency or cost?

Enabling trace adds a small amount of data to the streaming response — the trace events are lightweight JSON records. In practice, the overhead is negligible compared to the model inference and action group invocation latency. There is no additional per-event billing for trace output; you pay for the model tokens and action group invocations regardless. The BedrockBreaker's trace parsing happens in the streaming loop, adding no round-trip latency.

How do I handle legitimate retry patterns without false-positive trips?

Two options: raise the action_spiral_min_repeats threshold and lower the action_spiral_similarity threshold for agents with known retry patterns, or add a per-action-group allowlist of action groups that are exempt from spiral detection (e.g., a health-check or status-poll action that is legitimately called multiple times). The BreakerConfig can be subclassed with an exempt_action_groups: set[str] field and the check can skip records in that set.

Can I run multiple BedrockBreaker instances in parallel for concurrent sessions?

Yes. Each BedrockBreaker instance tracks its own state independently. For a high-concurrency service, instantiate one breaker per agent configuration (one per agentId/agentAliasId pair) and pass the per-session state in the BreakerRunState object created fresh on each invoke call. The breaker's open/closed state is shared across sessions on the same instance — if one session trips the breaker, all subsequent sessions on that instance are blocked until reset() is called. This is intentional: a trip often indicates an instruction-level problem that affects all sessions, not just one.

Does this work with Bedrock Inline Agents?

Bedrock Inline Agents (launched in late 2024) use invoke_inline_agent rather than invoke_agent. The trace schema is equivalent, and the same parsing logic applies. To support inline agents, create a parallel InlineBedrockBreaker subclass that uses self.client.invoke_inline_agent() in place of invoke_agent and passes the inline agent configuration in the call kwargs. All four detection methods transfer directly.

Stop paying for loops

BedrockBreaker is one implementation of RunGuard's circuit-breaker pattern for managed agent platforms. RunGuard monitors action groups, KB queries, and session token growth across your Bedrock fleet — alerting before a single spiraling agent run hits your AWS bill.

See pricing — Solo $19/mo

Also in this series