AWS Bedrock agents circuit breaker: stop knowledge base loops and action group retry storms

AWS Bedrock Agents orchestrate Foundation Models, Knowledge Bases, and Action Groups into autonomous pipelines that run against live AWS infrastructure. What they do not include is a circuit breaker: there is no per-session cost cap, no loop detection, and no way to stop an agent that has entered a repetitive failure mode short of killing the invocation from outside. When a Bedrock agent loops — querying the same Knowledge Base fragment again and again, retrying a failing Lambda action, or amplifying tokens through the orchestration layer — it bills per FM token with no guardrails. A single bad run on Claude Sonnet 3.7 at $3 per million input tokens can cost $3–$20 before anyone notices. This page explains the four Bedrock-specific failure modes, shows how to add a circuit breaker to invoke_agent() with RunGuard, and gives a tuning guide for Knowledge Base retrieval patterns.

The four Bedrock agent failure modes

Bedrock agent pricing and why loops are expensive

AWS Bedrock charges for Foundation Model inference on a per-token basis. There are no per-session caps, no rate limits on agent invocations, and no built-in spending alarms specific to a single agent run (as opposed to a monthly account-level AWS Cost Explorer alert). The prices that matter for Bedrock agent workloads in 2026:

The practical consequence: a Bedrock agent session that fails catastrophically can spend more in one run than your entire monthly LLM budget for non-agentic use. Per-run circuit breaking is not optional for production Bedrock deployments; it is the difference between a recoverable incident and an unexpected AWS bill line item.

For a broader treatment of per-run cost capping methodology, see how to set max cost per LLM request and autonomous agent cost control best practices.

The three loop failure modes in detail

Knowledge base retrieval loop

The KB loop is the most common Bedrock-specific failure mode because Bedrock’s Knowledge Base integration is tightly coupled to the orchestration loop. The agent’s planning prompt instructs it to use the KB to answer questions. When the KB returns semantically similar but incomplete results, the agent is stuck: it cannot answer the question, but the KB appears to have relevant content. The natural LLM response is to try again with a related query. A concrete example:

  1. User asks: “What were the Q3 revenue figures for the APAC region?”
  2. Agent queries KB: APAC Q3 revenue → KB returns a chunk about Q3 global revenue without APAC breakdown.
  3. Agent concludes it needs more specific data and queries KB: Asia Pacific Q3 revenue breakdown → same chunk, slightly different framing.
  4. Agent queries KB: Q3 APAC quarterly financial results → same chunk again.
  5. Loop continues until session timeout or token budget exhaustion.

The signature of this loop, from RunGuard’s perspective, is that the agent’s knowledgeBaseLookupInput action repeats within a short window with semantically similar (and often identical) intent strings. Setting LoopDetector(repeats=3, max_cycle_len=4) fires after the third consecutive KB retrieval with the same action group name, before the fourth retrieval goes out.

Action group error escalation loop

Action Groups execute Lambda functions in response to FM-generated tool calls. When the Lambda returns an error, Bedrock’s orchestrator presents the error text to the FM and asks it to try again. This is the correct behavior for transient errors (a downstream API that was briefly unavailable), but it is catastrophic for permanent errors. A Lambda that fails because an IAM role is missing a permission will fail identically on every retry. A Lambda that calls a DynamoDB table that does not exist will fail on every retry. The agent has no mechanism to distinguish permanent from transient errors unless the Lambda explicitly encodes that distinction in its response — and most Lambda action handlers do not.

The solution has two components: (1) the Lambda handler itself should detect when it is being called in a loop (same parameters, same intent) and return a structured “permanently failed” response that tells the orchestrator to stop retrying; (2) a circuit breaker at the invoke_agent() level should catch the action group error loop as a repeating signature pattern regardless of what the Lambda returns. See the Lambda protection section below for the handler-level implementation.

Orchestration retry storm

An orchestration retry storm occurs when multiple failure signals combine: a KB retrieval loop feeds incorrect context to an action group call, which fails, which triggers additional KB lookups, which provide different incorrect context, which triggers a different action group failure. The result is not a simple A→A loop but a longer cycle: KB-lookup → action-call → error → KB-lookup → action-call → error. This multi-step cycle is harder to detect with simple “same tool called twice in a row” heuristics. RunGuard’s max_cycle_len parameter detects cycles of length 2, 3, and 4 in the call signature stream, catching the KB→action→KB→action pattern as a period-2 cycle after three repetitions. For more on retry storm detection generally, see AI agent retry storm prevention.

Adding RunGuard to AWS Bedrock agents

RunGuard wraps your invoke_agent() call with a guard that tracks the sequence of action names (action group name + API path) and accumulated spend across the streaming response. Install the SDK and add four lines around your existing boto3 call:

pip install runguard boto3

Basic invoke_agent() wrapper

import boto3
import json
from runguard import LoopDetector, BudgetTracker, LoopDetectedError, BudgetExceededError

# Bedrock agent client
bedrock_agent = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

# Create guards once per agent run (or per session for multi-turn agents)
loop_detector = LoopDetector(repeats=3, max_cycle_len=4)
budget_tracker = BudgetTracker(max_usd=5.00)

# Claude Sonnet 3.7 pricing on Bedrock (per token)
SONNET_INPUT_PER_TOKEN  = 3.00 / 1_000_000   # $3.00 / 1M input tokens
SONNET_OUTPUT_PER_TOKEN = 15.00 / 1_000_000  # $15.00 / 1M output tokens


def invoke_bedrock_agent(
    agent_id: str,
    agent_alias_id: str,
    session_id: str,
    input_text: str,
) -> str:
    """Invoke a Bedrock agent with loop detection and budget enforcement."""
    response = bedrock_agent.invoke_agent(
        agentId=agent_id,
        agentAliasId=agent_alias_id,
        sessionId=session_id,
        inputText=input_text,
    )

    output_parts = []
    input_tokens_total = 0
    output_tokens_total = 0

    # Stream the response event-by-event
    for event in response["completion"]:
        # --- Chunk: final text output ---
        if "chunk" in event:
            chunk = event["chunk"]
            if "bytes" in chunk:
                output_parts.append(chunk["bytes"].decode("utf-8"))
            # Attribution carries usage metadata when present
            if "attribution" in chunk:
                attr = chunk["attribution"]
                # Some Bedrock response schemas embed token counts in attribution
                usage = attr.get("usage", {})
                input_tokens_total  += usage.get("inputTokens", 0)
                output_tokens_total += usage.get("outputTokens", 0)

        # --- Trace: orchestration step metadata ---
        elif "trace" in event:
            trace = event["trace"].get("trace", {})

            # Pre-processing trace contains input token count
            pre = trace.get("preProcessingTrace", {})
            if "modelInvocationInput" in pre:
                inv = pre["modelInvocationInput"]
                usage = inv.get("inferenceConfiguration", {})
                # not all trace types carry token counts; skip if absent

            # Orchestration trace: extract action group call signatures
            orch = trace.get("orchestrationTrace", {})
            if "invocationInput" in orch:
                inv_input = orch["invocationInput"]
                inv_type  = inv_input.get("invocationType", "")

                if inv_type == "ACTION_GROUP":
                    ag   = inv_input.get("actionGroupInvocationInput", {})
                    name = ag.get("actionGroupName", "unknown_action")
                    api  = ag.get("apiPath", "")
                    sig  = f"{name}::{api}" if api else name

                    try:
                        loop_detector.record(sig)
                    except LoopDetectedError:
                        raise  # bubble up to caller

                elif inv_type == "KNOWLEDGE_BASE":
                    kb_input = inv_input.get("knowledgeBaseLookupInput", {})
                    kb_id    = kb_input.get("knowledgeBaseId", "kb")
                    sig      = f"kb_lookup::{kb_id}"

                    try:
                        loop_detector.record(sig)
                    except LoopDetectedError:
                        raise

            # Orchestration trace: accumulate token usage from model invocations
            if "modelInvocationOutput" in orch:
                metadata = orch["modelInvocationOutput"].get("metadata", {})
                usage    = metadata.get("usage", {})
                input_tokens_total  += usage.get("inputTokens", 0)
                output_tokens_total += usage.get("outputTokens", 0)

                # Check budget after each orchestration step
                step_usd = (
                    input_tokens_total  * SONNET_INPUT_PER_TOKEN +
                    output_tokens_total * SONNET_OUTPUT_PER_TOKEN
                )
                try:
                    budget_tracker.record(step_usd)
                except BudgetExceededError:
                    raise

    return "".join(output_parts)


# --- Usage ---
try:
    result = invoke_bedrock_agent(
        agent_id="ABCDEF1234",
        agent_alias_id="TSTALIASID",
        session_id="session-001",
        input_text="Summarise the Q3 APAC revenue figures from our finance knowledge base.",
    )
    print(result)
except LoopDetectedError as e:
    print(f"Loop detected: action '{e.pattern}' repeated {e.repeats} times. Aborting.")
except BudgetExceededError as e:
    print(f"Budget cap reached: ${e.spent:.4f} spent (cap: ${e.limit:.2f}). Aborting.")

Async version for high-throughput invocations

import aioboto3
from runguard import guard_async, LoopDetectedError, BudgetExceededError

async def invoke_bedrock_agent_async(
    agent_id: str,
    agent_alias_id: str,
    session_id: str,
    input_text: str,
    max_usd: float = 5.00,
) -> str:
    """Async Bedrock agent invocation with RunGuard protection."""
    session = aioboto3.Session()
    async with session.client("bedrock-agent-runtime", region_name="us-east-1") as client:
        response = await client.invoke_agent(
            agentId=agent_id,
            agentAliasId=agent_alias_id,
            sessionId=session_id,
            inputText=input_text,
        )

        loop_detector = LoopDetector(repeats=3, max_cycle_len=4)
        budget_tracker = BudgetTracker(max_usd=max_usd)
        output_parts = []

        async for event in response["completion"]:
            if "chunk" in event and "bytes" in event["chunk"]:
                output_parts.append(event["chunk"]["bytes"].decode("utf-8"))
            elif "trace" in event:
                _process_trace_event(event["trace"], loop_detector, budget_tracker)

        return "".join(output_parts)

Action group Lambda protection

In addition to the circuit breaker at the invoke_agent() call site, you can add loop detection inside the Lambda handler itself. This gives you a second layer of protection: the Lambda can detect when Bedrock is calling it repeatedly with the same parameters and return a structured error that tells the orchestrator to stop retrying, rather than returning a generic error that causes another rephrase-and-retry cycle.

import json
import hashlib
import boto3
from runguard import LoopDetector, LoopDetectedError

# DynamoDB table used to track per-session call history across Lambda invocations
# (Lambda is stateless; state must be externalised for cross-invocation detection)
dynamodb = boto3.resource("dynamodb")
call_history_table = dynamodb.Table("bedrock-agent-call-history")

# In-process guard for within-invocation loops (same Lambda execution context)
_loop_detector = LoopDetector(repeats=3, max_cycle_len=4)


def lambda_handler(event: dict, context) -> dict:
    """
    Bedrock Action Group Lambda handler with loop detection.
    Bedrock passes: event["actionGroup"], event["apiPath"], event["requestBody"]
    Expected return: {"messageVersion": "1.0", "response": {...}}
    """
    action_group = event.get("actionGroup", "")
    api_path     = event.get("apiPath", "")
    parameters   = event.get("requestBody", {}).get("content", {})
    session_id   = event.get("sessionId", "unknown")

    # Build a stable signature from the call parameters
    params_hash = hashlib.sha256(
        json.dumps(parameters, sort_keys=True).encode()
    ).hexdigest()[:12]
    call_sig = f"{action_group}::{api_path}::{params_hash}"

    # Check DynamoDB for repeated calls in this session
    try:
        item = call_history_table.get_item(Key={"sessionId": session_id})
        history = item.get("Item", {}).get("callHistory", [])
    except Exception:
        history = []

    history.append(call_sig)
    recent = history[-12:]  # keep last 12 entries for cycle detection

    # Persist updated history
    try:
        call_history_table.put_item(Item={"sessionId": session_id, "callHistory": recent})
    except Exception:
        pass  # non-fatal; detection degrades gracefully

    # Count consecutive repetitions of this exact signature
    consecutive = 0
    for past_sig in reversed(recent[:-1]):  # exclude the one we just added
        if past_sig == call_sig:
            consecutive += 1
        else:
            break

    if consecutive >= 2:
        # Third consecutive call with identical parameters — return permanent failure
        return {
            "messageVersion": "1.0",
            "response": {
                "actionGroup": action_group,
                "apiPath": api_path,
                "httpMethod": event.get("httpMethod", "POST"),
                "httpStatusCode": 422,
                "responseBody": {
                    "application/json": {
                        "body": json.dumps({
                            "error": "PERMANENT_FAILURE",
                            "message": (
                                f"Action {action_group}::{api_path} has been called "
                                f"{consecutive + 1} times with identical parameters "
                                f"and continues to fail. Do not retry this action. "
                                f"Report the failure to the user."
                            ),
                            "retryable": False,
                        })
                    }
                },
            },
        }

    # --- Normal action execution ---
    try:
        result = execute_action(api_path, parameters)
        return {
            "messageVersion": "1.0",
            "response": {
                "actionGroup": action_group,
                "apiPath": api_path,
                "httpMethod": event.get("httpMethod", "POST"),
                "httpStatusCode": 200,
                "responseBody": {
                    "application/json": {"body": json.dumps(result)}
                },
            },
        }
    except Exception as exc:
        return {
            "messageVersion": "1.0",
            "response": {
                "actionGroup": action_group,
                "apiPath": api_path,
                "httpMethod": event.get("httpMethod", "POST"),
                "httpStatusCode": 500,
                "responseBody": {
                    "application/json": {
                        "body": json.dumps({"error": str(exc), "retryable": True})
                    }
                },
            },
        }


def execute_action(api_path: str, parameters: dict) -> dict:
    """Dispatch to actual action implementation."""
    # Replace with your action logic
    raise NotImplementedError(f"No handler for {api_path}")

The key design decision in this handler: it returns HTTP 422 (Unprocessable Entity) with "retryable": false for detected loops. This status code is unambiguous to a well-configured Bedrock orchestrator — it signals that the action permanently cannot process this request. If your Bedrock agent is configured to treat 4xx responses as terminal failures (which is the recommended configuration for production agents), the 422 stops the retry cycle immediately.

For the cross-session state, the handler uses DynamoDB as a call history store. In a high-throughput system, you can replace this with ElastiCache (Redis) for lower latency. The history lookup adds ~5–10ms to each Lambda execution, which is negligible compared to the FM inference time. For the multi-agent orchestration case, see multi-agent orchestration cost control (coming soon).

AWS Bedrock built-in guardrails vs. RunGuard

AWS Bedrock Guardrails is a content moderation and policy enforcement product. It is not a circuit breaker. The table below compares what Bedrock provides natively versus what RunGuard adds at the invocation layer:

Capability AWS Bedrock built-in RunGuard
Loop detection (same action repeated) Not supported LoopDetector(repeats=3, max_cycle_len=4) — fires before the Nth repeat is sent
Per-run dollar budget cap Not supported (account-level Cost Explorer alarms only) BudgetTracker(max_usd=5.00) — fires after each orchestration step
Knowledge base retrieval loop detection Not supported KB lookup action signatures tracked; fires on repeating KB query patterns
Action group error escalation detection Not supported Action group name + API path signature tracked across orchestration steps
Signature-based cycle detection (A→B→A pattern) Not supported max_cycle_len=4 detects period-1 through period-4 cycles in the action stream
Cross-call state tracking across streaming events Not supported State persists across all events in the invoke_agent() response stream
Real-time Slack / PagerDuty alert on trip Not supported alerts: slack_webhook or pagerduty_key in guard config
Content moderation (PII, harmful content) Yes — Bedrock Guardrails (separate product, per-request charge) Not in scope (use Bedrock Guardrails for content policy)
Prompt injection blocking Partial — Bedrock Guardrails “prompt attacks” filter Indirect: loop detection catches the downstream effect of successful injection

The two products are complementary: Bedrock Guardrails handles content policy (what the agent is allowed to say and do), while RunGuard handles operational safety (how many times the agent is allowed to try and how much it is allowed to spend). You should run both in production.

Tuning guide: LoopDetector settings for Bedrock agents

The default RunGuard settings (repeats=3, max_cycle_len=4) work well for most agent frameworks, but Bedrock’s Knowledge Base retrieval pattern has characteristics that affect tuning. Here is how to calibrate the settings for common Bedrock use cases:

For LangChain agents that invoke Bedrock models via the BedrockChat integration, see LangChain agent budget limit for the corresponding LangChain-level guard configuration.

Add a circuit breaker to your Bedrock agent in 5 minutes

RunGuard’s Python SDK installs with pip install runguard. The TypeScript SDK installs with npm install @runguard/sdk for Node.js Bedrock integrations. Wrap your invoke_agent() call with LoopDetector and BudgetTracker, catch LoopDetectedError and BudgetExceededError, and your Bedrock agent has a production-grade circuit breaker with no changes to your agent definition, Knowledge Base configuration, or Action Group Lambda handlers.

RunGuard pricing: Solo at $19/month (1 app), Team at $79/month (3 apps). Both plans include a 14-day free trial with no credit card required.

Start your free trial — add a circuit breaker to your Bedrock agent today

Already using a different framework with Bedrock as the backend? See the same circuit breaker pattern for LangChain, autonomous agents generally, and retry storm prevention.