Bee Agent Framework Cost Control: Loop Detection and Budget Enforcement in Production

IBM Research's Bee Agent Framework takes an opinionated TypeScript-first approach to building production AI agents. Where many frameworks give you primitives and leave orchestration to you, Bee provides a cohesive stack: a BeeAgent class built on a ReAct (Reason + Act) core, a pluggable memory system with multiple strategies, typed tool contracts enforced at compile time, and a rich event emitter system that exposes every step of agent execution. The result is a framework that is genuinely well-suited to enterprise deployments — structured, observable, and designed for integration with IBM's broader AI portfolio.

That architecture also creates failure modes that are specific to Bee's design. The maxIterations parameter — the primary built-in safeguard — limits how many ReAct loop iterations an agent can execute before the framework halts execution. What it cannot do is detect whether those iterations are making progress. A Bee agent that calls the same tool with near-identical inputs ten times in a row, each time receiving an observation that doesn't satisfy its reasoning, will consume all ten iterations and exit at the limit. A circuit breaker would have tripped after the third repetition and preserved the remaining budget for a retry with a different strategy.

This post covers four Bee-specific failure modes and shows how to build a BeeAgentBreaker circuit breaker in TypeScript using Bee's native event emitter API.

Bee Agent Framework architecture in brief

Understanding the failure modes requires a mental model of how Bee runs an agent turn. At the core is a ReAct loop: the LLM receives the current memory (system prompt + conversation history) and produces either a thought + action (tool call) or a final answer. When it produces a tool call, Bee resolves the tool, executes it, appends the observation to memory, and feeds the updated context back to the LLM for the next iteration. When it produces a final answer, the loop exits and the answer is returned to the caller.

Memory in Bee is pluggable. The three common strategies are:

UnconstrainedMemory — appends every message unconditionally. Grows without bound. Suitable only for short-lived agents where the total turn count is provably small.
TokenMemory — tracks token counts and evicts oldest messages when the limit is approached. The eviction is by message, not by token; evicting one large observation can free more space than evicting several small messages.
SlidingMemory — keeps the N most recent messages, discarding older ones. Predictable memory footprint at the cost of lost early context.

Tools are defined as Tool<TInput, TOutput> subclasses with a typed schema enforced at compile time via Zod. Each tool has an optional maxNumRetries that tells Bee how many times to retry a failed tool call before surfacing the error to the LLM as an observation. This retry mechanism — separate from maxIterations — is the source of one of the four failure modes below.

Observability is first-class: Bee's event system emits AgentStartEvent, AgentIterationEvent, AgentToolCallEvent, AgentToolSuccessEvent, AgentToolErrorEvent, AgentSuccessEvent, and AgentErrorEvent on a per-run emitter. These events are the hook points for a circuit breaker — every tool call and iteration is observable without patching the agent internals.

Why maxIterations is not a circuit breaker

Bee's maxIterations is a hard upper bound on the number of ReAct loop iterations before the framework raises a AgentError with the message "Agent was unable to complete the task in the allocated number of iterations." The default is framework-dependent and often set to 10–20 in production configurations; teams raising it to 30–50 for complex research tasks is common.

A circuit breaker detects a pattern — a sequence of behavior that signals the agent is burning budget without advancing toward a solution — and halts before the full limit is consumed. maxIterations answers "has this agent run long enough?" A pattern detector answers "is this agent going in circles?" The distinction is critical in production:

A research agent performing 20 distinct tool calls across different data sources should not be halted. An agent calling the same search tool 20 times with semantically identical queries should trip after the third repetition.
With maxIterations=20, the spiraling agent consumes all 20 iterations before stopping. A circuit breaker trips at iteration 3–5 and preserves 15–17 iterations for a retry with different parameters or a graceful fallback.
In multi-agent setups where one BeeAgent invokes another as a tool, each nested agent run starts a fresh maxIterations counter. The outer agent's counter doesn't account for work done in nested runs. A 5-iteration outer agent that triggers 4 nested agents each running 5 iterations will consume 25 total LLM calls — all within the per-agent iteration limits, but multiplied by the nesting depth.

The four failure modes below each exploit the gap between "iterations consumed" and "progress made."

Failure mode 1: Tool observation fixation spiral

The tool observation fixation spiral is Bee's most common production cost failure. It occurs when a tool returns an observation that is informative but not conclusive — the agent receives real data, incorporates it into its reasoning, determines it needs more information, and calls the same tool again with a slightly modified input. The cycle repeats because each new observation is genuinely different (the query changed slightly) but semantically identical in content (the tool is returning the same class of result).

A concrete example: an agent using a web search tool to research a topic queries "latest benchmarks for transformer architecture 2026", gets back a results page with summaries, determines the summaries are too high-level, and queries "transformer architecture benchmark results 2026 detailed". Gets similar results. Queries "2026 transformer benchmark numbers accuracy latency". The tool is working correctly. The agent is not in an infinite loop by any strict definition. But it is consuming one LLM inference and one tool call per iteration to retrieve increasingly similar information that it is unable to synthesize into a final answer.

Bee's maxNumRetries per tool applies to failed tool calls — errors, timeouts, schema violations. It does not apply to tool calls that succeed and return valid output. A tool that consistently returns valid-but-unsatisfying observations will execute on every iteration with no per-tool limit applying. Detection requires tracking the sequence of successful tool calls and their observations:

// Track (toolName, normalizedInput) pairs across iterations
// to detect when an agent is calling the same tool with
// semantically near-identical inputs.

function normalizeToolInput(input: unknown): string {
  const raw = JSON.stringify(input ?? "");
  return raw
    .toLowerCase()
    .replace(/[^a-z0-9\s]/g, " ")
    .split(/\s+/)
    .filter(Boolean)
    .sort()
    .join(" ");
}

function jaccardSimilarity(a: string, b: string): number {
  const setA = new Set(a.split(" "));
  const setB = new Set(b.split(" "));
  const intersection = new Set([...setA].filter(x => setB.has(x)));
  const union = new Set([...setA, ...setB]);
  return union.size === 0 ? 0 : intersection.size / union.size;
}

// In the breaker: per-tool sliding window of normalized inputs
// Check: if last N inputs for toolName have pairwise Jaccard >= 0.70,
// trip the breaker.

The 0.70 Jaccard threshold is a reliable production starting point: it catches paraphrase repetition while allowing tools that legitimately need several similar-but-distinct queries (pagination-style refinement, faceted search) to proceed. At 0.70, "what is the revenue of Company X" and "Company X annual revenue" will match; "what is the revenue of Company X" and "what is the market cap of Company X" will not.

Failure mode 2: ReAct reasoning echo loop

The reasoning echo loop is harder to detect than the tool fixation spiral because it doesn't repeat tool calls — it repeats reasoning. Bee's ReAct loop gives the LLM a chance on each iteration to either call a tool or produce a final answer. In the echo loop failure mode, the LLM consistently chooses to call a tool rather than synthesize what it already has. But the "thinking" step — the LLM's reasoning about what to do next — becomes self-referential: it cites earlier observations as justification for further investigation, which produces new observations that are cited as justification for even more investigation.

The pattern is common when:

The system prompt instructs the agent to "be thorough" or "verify information from multiple sources" without a stopping condition.
The task is open-ended enough that "more research" always seems like a valid next step.
The agent is designed with a low temperature setting that makes it conservative — it hesitates to commit to a final answer until it has high confidence, and the confidence threshold is never reached.

From the tool call perspective, each iteration may call a different tool — search, then summarize, then search again, then fact-check. So the per-tool observation fixation detector won't fire. The signal for a reasoning echo loop is at the iteration level: the agent's tool diversity relative to its iteration count drops over time. In early iterations, each new tool call explores a new facet of the problem. In a reasoning echo loop, the agent cycles through the same 2–3 tools repeatedly, just in different orders, without converging on an answer.

// Track unique (toolName, normalizedInput) pairs seen across all iterations.
// If after N iterations the ratio of unique_pairs / total_calls < 0.4,
// the agent is repeating rather than exploring.

interface IterationState {
  toolName: string;
  normalizedInput: string;
}

function explorationRatio(history: IterationState[]): number {
  if (history.length === 0) return 1.0;
  const unique = new Set(history.map(h => `${h.toolName}::${h.normalizedInput}`));
  return unique.size / history.length;
}

// Trip when: history.length >= minIterationsBeforeCheck (e.g., 5)
// AND explorationRatio(history) < 0.40

The exploration ratio check requires a minimum history window before it fires — you don't want to trip after 2 iterations just because the agent reasonably called the same tool twice. With minIterationsBeforeCheck=5 and explorationThreshold=0.40, the breaker trips when more than 60% of the last five tool calls are repetitions of previously-seen (tool, input) pairs. This is a strong signal that the agent is circling rather than advancing.

Failure mode 3: Memory token drift

Memory token drift is a cost failure that doesn't feel like a loop because the agent is making apparent progress — each iteration produces genuinely different tool calls and observations. The problem is that each new observation appended to memory increases the size of the context sent to the LLM on the next iteration. In agents using UnconstrainedMemory, this growth is unbounded. With TokenMemory, the growth is bounded but the eviction policy can create its own problems (evicting early context that the agent relies on for coherence, causing the agent to re-fetch information it already retrieved).

The cost curve for token drift is geometric. An agent that starts with a 2K-token context and adds 500 tokens per iteration will be sending a 7K-token context by iteration 10. With a 15 iterations budget, the total input tokens across all iterations are not 15 × 4,500 (average) — they're the sum of 2,000 + 2,500 + 3,000 + ... + 9,500, which totals 86,250 tokens. The same 15-iteration agent with a stable 2K context costs 30,000 input tokens. The drift multiplier is 2.9×. At GPT-4-class prices, the difference between a well-bounded agent and a drifting one is material — and the drift is invisible unless you're tracking token counts per iteration.

Bee's memory API exposes the current message list, but not a running token count per iteration. The breaker needs to estimate this from iteration observations:

// Rough token estimation: 1 token ≈ 4 characters (English prose/code)
function estimateTokens(content: string): number {
  return Math.ceil(content.length / 4);
}

// In the breaker: track cumulative estimated input tokens
// across all iterations. Trip when growth rate exceeds threshold
// OR total budget cap is reached.

interface TokenDriftState {
  iterationTokens: number[];  // estimated tokens per iteration
}

function isDrifting(state: TokenDriftState, growthThreshold = 1.35): boolean {
  const tokens = state.iterationTokens;
  if (tokens.length < 3) return false;
  // Check if last 3 iterations show compounding growth
  const recent = tokens.slice(-3);
  return (
    recent[1] / recent[0] >= growthThreshold &&
    recent[2] / recent[1] >= growthThreshold
  );
}

The drift check compares consecutive iteration token sizes. If three consecutive iterations each show context growth of 35% or more over the prior iteration, the breaker flags a drift condition. This is intentionally tight: the goal is to catch unbounded growth early, before the 8th or 10th iteration where the multiplied cost has already landed.

For production Bee agents, the right response to a drift trip is not to abort the run — it's to flush memory. Call the agent's memory reset() method, inject a summary of what was learned so far, and continue from a stable context baseline. This is the "graceful degrade" path that preserves most of the agent's progress while cutting future per-iteration costs back to baseline.

Failure mode 4: Nested sub-agent back-delegation cycle

Bee supports a multi-agent pattern where one BeeAgent wraps another inside a custom Tool implementation — the outer agent calls the inner agent as a tool call, passes it a task, receives a response, and proceeds. This pattern is clean and well-structured. It is also the source of Bee's most expensive failure mode when it goes wrong.

The back-delegation cycle occurs when the inner agent cannot complete its assigned task and returns a response that looks like progress to the outer agent but is actually a request for more information. The outer agent interprets the response, determines it needs to provide more context, and calls the inner agent again. The inner agent receives the additional context but still cannot resolve the ambiguity — it produces another informative-but-inconclusive response. The cycle continues until the outer agent's maxIterations is exhausted.

The cost multiplication is significant. Each outer agent iteration that triggers an inner agent run incurs: one LLM call for the outer agent's decision to call the inner agent tool, the full inner agent run (up to its own maxIterations), and one LLM call for the outer agent to process the response. With an outer agent configured for 10 iterations and an inner agent for 10 iterations, a back-delegation cycle that runs to the outer agent's limit costs up to 10 outer-agent LLM calls + 10 × inner-agent-run costs, each inner run potentially consuming multiple iterations of its own.

Detection at the outer agent level requires tracking the tool call history for the sub-agent tool specifically: how many times has the same inner agent tool been called, and are consecutive responses from that tool semantically near-identical?

// Track responses from sub-agent tools by tool name.
// If the same sub-agent tool is called N times AND consecutive
// responses have Jaccard similarity >= 0.65, the inner agent
// is stuck in an indeterminate state.

interface SubAgentCallRecord {
  callIndex: number;
  normalizedResponse: string;
}

function detectBackDelegation(
  records: SubAgentCallRecord[],
  maxCalls: number,
  similarityThreshold: number
): boolean {
  if (records.length >= maxCalls) return true;
  if (records.length < 2) return false;
  const last = records[records.length - 1];
  const prev = records[records.length - 2];
  return jaccardSimilarity(last.normalizedResponse, prev.normalizedResponse)
    >= similarityThreshold;
}

Building BeeAgentBreaker

Bee's event emitter API makes it possible to implement all four circuit breaker checks without modifying agent or tool internals. Every BeeAgent.run() call exposes an emitter that fires synchronously before and after each tool call and each iteration. The breaker subscribes to AgentToolCallEvent, AgentToolSuccessEvent, and AgentIterationEvent, tracks state across the run, and throws a BreakerOpenError when a trip condition fires. Bee surfaces this as an AgentError, which callers already handle.

import { BeeAgent } from "bee-agent-framework/agents/bee/agent";
import {
  AgentToolCallEvent,
  AgentToolSuccessEvent,
  AgentIterationEvent,
} from "bee-agent-framework/agents/bee/events";
import { BaseMessage, Role } from "bee-agent-framework/llms/primitives/message";

// --- Types ---

export interface BreakerConfig {
  // Failure mode 1: tool observation fixation
  toolFixationWindow: number;          // consecutive calls to check (default: 4)
  toolFixationJaccardThreshold: number; // similarity threshold (default: 0.70)

  // Failure mode 2: reasoning echo loop
  echoMinIterations: number;           // minimum iterations before echo check (default: 5)
  echoExplorationThreshold: number;    // min ratio of unique calls (default: 0.40)

  // Failure mode 3: memory token drift
  driftGrowthThreshold: number;        // per-iteration growth ratio to flag (default: 1.35)
  driftMaxTotalTokens: number;         // hard cap on total estimated input tokens (default: 100000)

  // Failure mode 4: sub-agent back-delegation
  subAgentToolNames: string[];         // names of tools that wrap sub-agents
  subAgentMaxCalls: number;            // max calls before tripping (default: 3)
  subAgentSimilarityThreshold: number; // response similarity threshold (default: 0.65)
}

export const DEFAULT_BREAKER_CONFIG: BreakerConfig = {
  toolFixationWindow: 4,
  toolFixationJaccardThreshold: 0.70,
  echoMinIterations: 5,
  echoExplorationThreshold: 0.40,
  driftGrowthThreshold: 1.35,
  driftMaxTotalTokens: 100_000,
  subAgentToolNames: [],
  subAgentMaxCalls: 3,
  subAgentSimilarityThreshold: 0.65,
};

export class BreakerOpenError extends Error {
  constructor(
    public readonly reason: string,
    public readonly tripCondition: string
  ) {
    super(`BeeAgentBreaker tripped: ${tripCondition} — ${reason}`);
    this.name = "BreakerOpenError";
  }
}

// --- State ---

interface RunState {
  toolCallHistory: Array<{ toolName: string; normalizedInput: string }>;
  iterationTokenEstimates: number[];
  subAgentCalls: Map<string, Array<{ normalizedResponse: string }>>;
  totalEstimatedTokens: number;
}

function initRunState(): RunState {
  return {
    toolCallHistory: [],
    iterationTokenEstimates: [],
    subAgentCalls: new Map(),
    totalEstimatedTokens: 0,
  };
}

// --- Helpers ---

function normalizeText(input: unknown): string {
  return JSON.stringify(input ?? "")
    .toLowerCase()
    .replace(/[^a-z0-9\s]/g, " ")
    .split(/\s+/)
    .filter(Boolean)
    .sort()
    .join(" ");
}

function jaccardSimilarity(a: string, b: string): number {
  const setA = new Set(a.split(" ").filter(Boolean));
  const setB = new Set(b.split(" ").filter(Boolean));
  if (setA.size === 0 && setB.size === 0) return 1.0;
  const intersection = new Set([...setA].filter(x => setB.has(x)));
  const union = new Set([...setA, ...setB]);
  return union.size === 0 ? 0 : intersection.size / union.size;
}

function estimateTokens(value: unknown): number {
  return Math.ceil(JSON.stringify(value ?? "").length / 4);
}

// --- Main breaker ---

export class BeeAgentBreaker {
  private readonly config: BreakerConfig;
  private state: RunState = initRunState();

  constructor(config: Partial<BreakerConfig> = {}) {
    this.config = { ...DEFAULT_BREAKER_CONFIG, ...config };
  }

  reset(): void {
    this.state = initRunState();
  }

  private checkToolFixation(): void {
    const { toolCallHistory } = this.state;
    const { toolFixationWindow, toolFixationJaccardThreshold } = this.config;
    if (toolCallHistory.length < toolFixationWindow) return;

    const window = toolCallHistory.slice(-toolFixationWindow);
    const firstName = window[0].toolName;

    // All calls in the window must be to the same tool
    if (!window.every(c => c.toolName === firstName)) return;

    // Check pairwise Jaccard similarity across the window
    for (let i = 1; i < window.length; i++) {
      const sim = jaccardSimilarity(window[0].normalizedInput, window[i].normalizedInput);
      if (sim < toolFixationJaccardThreshold) return;
    }

    throw new BreakerOpenError(
      `Tool "${firstName}" called ${toolFixationWindow} times with >=${Math.round(toolFixationJaccardThreshold * 100)}% input similarity`,
      "TOOL_FIXATION_SPIRAL"
    );
  }

  private checkEchoLoop(): void {
    const { toolCallHistory } = this.state;
    const { echoMinIterations, echoExplorationThreshold } = this.config;
    if (toolCallHistory.length < echoMinIterations) return;

    const unique = new Set(
      toolCallHistory.map(h => `${h.toolName}::${h.normalizedInput}`)
    );
    const ratio = unique.size / toolCallHistory.length;
    if (ratio < echoExplorationThreshold) {
      throw new BreakerOpenError(
        `Exploration ratio ${ratio.toFixed(2)} below threshold ${echoExplorationThreshold} after ${toolCallHistory.length} calls`,
        "REASONING_ECHO_LOOP"
      );
    }
  }

  private checkTokenDrift(iterationTokens: number): void {
    const { iterationTokenEstimates, totalEstimatedTokens } = this.state;
    const { driftGrowthThreshold, driftMaxTotalTokens } = this.config;

    if (totalEstimatedTokens > driftMaxTotalTokens) {
      throw new BreakerOpenError(
        `Total estimated input tokens ${totalEstimatedTokens.toLocaleString()} exceeded hard cap ${driftMaxTotalTokens.toLocaleString()}`,
        "TOKEN_BUDGET_EXCEEDED"
      );
    }

    const estimates = iterationTokenEstimates;
    if (estimates.length < 3) return;

    const recent = estimates.slice(-3);
    const growth1 = recent[1] / Math.max(recent[0], 1);
    const growth2 = recent[2] / Math.max(recent[1], 1);

    if (growth1 >= driftGrowthThreshold && growth2 >= driftGrowthThreshold) {
      throw new BreakerOpenError(
        `Memory token drift detected: last 3 iteration sizes ${recent.map(n => n.toLocaleString()).join(" → ")} tokens`,
        "MEMORY_TOKEN_DRIFT"
      );
    }
  }

  private checkSubAgentBackDelegation(toolName: string, responseNorm: string): void {
    const { subAgentToolNames, subAgentMaxCalls, subAgentSimilarityThreshold } = this.config;
    if (!subAgentToolNames.includes(toolName)) return;

    if (!this.state.subAgentCalls.has(toolName)) {
      this.state.subAgentCalls.set(toolName, []);
    }
    const calls = this.state.subAgentCalls.get(toolName)!;
    calls.push({ normalizedResponse: responseNorm });

    if (calls.length >= subAgentMaxCalls) {
      throw new BreakerOpenError(
        `Sub-agent tool "${toolName}" called ${calls.length} times — exceeded max ${subAgentMaxCalls}`,
        "SUB_AGENT_MAX_CALLS"
      );
    }

    if (calls.length >= 2) {
      const last = calls[calls.length - 1].normalizedResponse;
      const prev = calls[calls.length - 2].normalizedResponse;
      const sim = jaccardSimilarity(last, prev);
      if (sim >= subAgentSimilarityThreshold) {
        throw new BreakerOpenError(
          `Sub-agent tool "${toolName}" returned near-identical responses (similarity ${sim.toFixed(2)})`,
          "SUB_AGENT_BACK_DELEGATION"
        );
      }
    }
  }

  // Subscribe to a BeeAgent run's emitter
  attach(emitter: ReturnType<BeeAgent["run"]>["emitter"]): void {
    this.reset();

    emitter.on(AgentToolCallEvent, (event) => {
      const normalizedInput = normalizeText(event.data.input);
      this.state.toolCallHistory.push({
        toolName: event.data.tool.name,
        normalizedInput,
      });
      this.checkToolFixation();
      this.checkEchoLoop();
    });

    emitter.on(AgentToolSuccessEvent, (event) => {
      const responseNorm = normalizeText(event.data.output);
      this.checkSubAgentBackDelegation(event.data.tool.name, responseNorm);
    });

    emitter.on(AgentIterationEvent, (event) => {
      const iterTokens = estimateTokens(event.data.state.messages);
      this.state.iterationTokenEstimates.push(iterTokens);
      this.state.totalEstimatedTokens += iterTokens;
      this.checkTokenDrift(iterTokens);
    });
  }
}

Wiring BeeAgentBreaker into your agent runs

The attach() method subscribes to all relevant events before the run starts. Because Bee's emitter fires synchronously, throwing inside an event handler immediately surfaces as a rejected promise on the agent.run() call — no polling, no separate abort controller needed.

import { BeeAgent } from "bee-agent-framework/agents/bee/agent";
import { BeeAgentBreaker, BreakerOpenError } from "./bee-agent-breaker";

// Create agent (standard Bee setup)
const agent = new BeeAgent({
  llm,
  memory: new TokenMemory({ llm }),
  tools: [searchTool, calculatorTool],
  maxIterations: 20,
});

// Create breaker — configure for your agent's expected behavior
const breaker = new BeeAgentBreaker({
  toolFixationWindow: 4,
  toolFixationJaccardThreshold: 0.72,
  echoMinIterations: 5,
  echoExplorationThreshold: 0.40,
  driftGrowthThreshold: 1.35,
  driftMaxTotalTokens: 80_000,
  subAgentToolNames: [],  // list any sub-agent tool names here
  subAgentMaxCalls: 3,
  subAgentSimilarityThreshold: 0.65,
});

async function runWithBreaker(prompt: string): Promise<string> {
  const run = agent.run({ prompt });

  // Attach breaker BEFORE the run starts
  breaker.attach(run.emitter);

  try {
    const result = await run;
    return result.result.text;
  } catch (error) {
    if (error instanceof BreakerOpenError) {
      console.warn(`[RunGuard] Breaker tripped: ${error.tripCondition}`);
      console.warn(`[RunGuard] Reason: ${error.reason}`);
      // Return a graceful fallback rather than propagating the error
      return `I was unable to complete the task due to a detected loop condition (${error.tripCondition}). Please try rephrasing your request or providing more specific constraints.`;
    }
    throw error;
  }
}

Wiring for multi-agent setups with sub-agent tools

When using a BeeAgent that invokes a nested agent as a tool, configure subAgentToolNames with the names of those wrapper tools. The breaker will track how many times the outer agent calls each sub-agent tool and detect when consecutive responses are near-identical:

import { Tool, ToolInput, ToolOutput } from "bee-agent-framework/tools/base";
import { z } from "zod";

// Sub-agent wrapped as a Tool
class ResearchAgentTool extends Tool<{ task: string }, string> {
  name = "research_agent";
  description = "Delegates research tasks to a specialized research agent.";

  inputSchema = z.object({ task: z.string() });

  protected async _run(input: ToolInput<this>): Promise<ToolOutput<this>> {
    const subRun = researchAgent.run({ prompt: input.task });
    // Optionally attach a child breaker for the sub-agent run
    subBreaker.attach(subRun.emitter);
    const result = await subRun;
    return result.result.text;
  }
}

// Outer agent with sub-agent as tool
const outerAgent = new BeeAgent({
  llm,
  memory: new TokenMemory({ llm }),
  tools: [new ResearchAgentTool(), calculatorTool],
  maxIterations: 10,
});

// Outer breaker knows "research_agent" is a sub-agent
const outerBreaker = new BeeAgentBreaker({
  subAgentToolNames: ["research_agent"],
  subAgentMaxCalls: 3,
  subAgentSimilarityThreshold: 0.65,
});

const run = outerAgent.run({ prompt: userQuery });
outerBreaker.attach(run.emitter);
const result = await run;

Tuning reference

Parameter	Default	Raise when	Lower when
`toolFixationWindow` Consecutive same-tool calls before fixation check	`4`	Agent legitimately needs 5+ similar queries (e.g., paginated search)	Cost overruns are frequent and quick — trip earlier at 3
`toolFixationJaccardThreshold` Input similarity to count as repetition	`0.70`	False positives on genuinely different queries with shared vocabulary	Agent paraphrases queries heavily — lower to 0.60 to catch loose paraphrases
`echoMinIterations` Minimum iterations before echo check fires	`5`	Long legitimate multi-tool sequences before patterns emerge	Short-running agents where 3 iterations is already a signal
`echoExplorationThreshold` Min ratio of unique (tool, input) pairs	`0.40`	Agent design involves intentional repetition of validation steps	Any unique-pair ratio below 0.50 is a problem in your agent design
`driftGrowthThreshold` Per-iteration context growth to flag as drift	`1.35`	Agent processes large documents where per-iteration growth is expected	Memory management is critical — catch any sustained growth above 1.20
`driftMaxTotalTokens` Hard cap on total estimated input tokens	`100,000`	High-budget research agents with large document corpora	Cost-sensitive production agents — set to 40,000–60,000 for typical tasks
`subAgentMaxCalls` Max calls to a sub-agent tool before tripping	`3`	Outer agent legitimately needs 4+ inner-agent calls per run	Sub-agent calls are expensive and any repetition is a signal

Memory strategy and the drift failure mode

The memory token drift failure mode is significantly worse with UnconstrainedMemory than with the bounded strategies. If you are using UnconstrainedMemory in production, replace it with TokenMemory and set the limit to 60–80% of your model's context window. This gives the agent access to most of its context budget while preventing runaway growth from consuming the entire window and forcing the LLM into degraded coherence at the tail of a long context.

When the breaker trips the MEMORY_TOKEN_DRIFT condition, the correct recovery path is to flush memory and re-inject a compressed summary:

async function runWithDriftRecovery(
  agent: BeeAgent,
  prompt: string,
  breaker: BeeAgentBreaker
): Promise<string> {
  let attempt = 0;
  let collectedContext = "";

  while (attempt < 2) {
    const run = agent.run({ prompt: attempt === 0 ? prompt : `${prompt}\n\nContext so far: ${collectedContext}` });
    breaker.attach(run.emitter);

    // Collect observations before any potential trip
    run.emitter.on(AgentToolSuccessEvent, (ev) => {
      collectedContext += ` [${ev.data.tool.name}]: ${JSON.stringify(ev.data.output).slice(0, 200)}`;
    });

    try {
      const result = await run;
      return result.result.text;
    } catch (error) {
      if (
        error instanceof BreakerOpenError &&
        error.tripCondition === "MEMORY_TOKEN_DRIFT" &&
        attempt === 0
      ) {
        // Reset memory, summarize what we have, and retry
        await agent.memory.reset();
        attempt++;
        continue;
      }
      throw error;
    }
  }
  return "Unable to complete: memory drift detected on retry.";
}

The pre-flight approach is also worth adding for agents that use UnconstrainedMemory by design (where large context is the feature, not the bug). Before starting a run, check whether the current memory size is already above a threshold that makes successful completion unlikely:

async function preflightMemoryCheck(agent: BeeAgent, maxTokens: number): Promise<boolean> {
  const messages = await agent.memory.getMessages();
  const estimatedTokens = messages
    .map(m => estimateTokens(m.text))
    .reduce((a, b) => a + b, 0);
  return estimatedTokens <= maxTokens;
}

// Before each run:
if (!(await preflightMemoryCheck(agent, 40_000))) {
  await agent.memory.reset();
  // Optionally inject a summary message before continuing
}

This is the same pre-flight pattern described in our Python circuit breaker guide — checking preconditions before entering the agent loop prevents the most expensive drift scenarios from starting at all.

Connecting to RunGuard

The BeeAgentBreaker above is a standalone, zero-dependency implementation. If you want persistent trip state across restarts, multi-agent breaker coordination, and a dashboard showing which agents are tripping and why, that's what RunGuard's SDK provides. The integration is a guard() wrapper around your runWithBreaker function — the same API described in our cost engineering guide.

import { guard, BudgetTracker } from "@runguard/sdk";

const budget = new BudgetTracker({ maxCostUsd: 1.50 });

const guardedRun = guard(runWithBreaker, { budget, loopDetector: true });

// Use identically to runWithBreaker:
const result = await guardedRun("Research the latest AI safety benchmarks");

The guard() wrapper adds dollar-denominated budget enforcement that BeeAgentBreaker doesn't track: actual LLM API cost per invocation, accumulated against a configurable per-task cap. Combined with the four pattern-based trip conditions above, this covers both "agent looping without spending much per step" and "agent spending a lot per step without looping" — the two cost failure modes that together account for the majority of runaway agent bills.

FAQ

Does BeeAgentBreaker work with Bee's streaming output mode?

Yes. The breaker attaches to the run emitter and fires on AgentToolCallEvent, AgentToolSuccessEvent, and AgentIterationEvent — all of which are emitted regardless of whether the final output is collected via await run or via an async generator stream. The synchronous throw inside an event handler will surface as a rejected iteration on the stream. Wrap your stream consumption in a try/catch that handles BreakerOpenError the same way as the non-streaming example.

The tool fixation check keeps tripping during legitimate paginated searches. How should I adjust it?

Raise toolFixationWindow to 6–8 and lower toolFixationJaccardThreshold to 0.80. The combination allows more consecutive same-tool calls and requires higher similarity before tripping. Alternatively, exclude the pagination tool by checking event.data.tool.name before pushing to toolCallHistory — wrap that check in your attach() override. The fixation spiral detection is most valuable for unstructured search tools where semantic similarity across queries is the right signal; for paginated APIs where input contains an explicit page offset, the offset changes the normalized input enough that similarity will naturally stay below threshold.

Can I use BeeAgentBreaker with Bee's SlidingMemory to prevent the drift failure mode entirely?

SlidingMemory prevents unbounded growth, so the token drift failure mode in its severe form doesn't apply. However, sliding eviction creates its own failure mode: if early context (the original task description, key constraints, the user's initial instructions) is evicted to make room for recent observations, the agent may "forget" what it was supposed to do and begin generating directionless tool calls. Keep the drift check enabled even with SlidingMemory — it will catch cases where observation size is growing per-iteration even within a bounded window. Set driftGrowthThreshold to 1.50 to allow for legitimate size variation before tripping.

How does BeeAgentBreaker handle concurrent agent runs?

Each BeeAgentBreaker instance holds its own run state. Create one instance per concurrent run — do not share a breaker across multiple agent.run() calls. The reset() method clears all state, so you can reuse an instance across sequential runs: call reset() before each attach(). In a service that handles multiple simultaneous requests, instantiate a breaker per request: const breaker = new BeeAgentBreaker(config) inside your request handler.

Bee's maxIterations also fires a hard stop. Should I keep it or rely on BeeAgentBreaker?

Keep both. maxIterations is the absolute backstop — it guarantees the agent cannot run forever regardless of what the breaker does. BeeAgentBreaker is the early-warning pattern detector that trips before maxIterations is reached when a specific failure mode is detected. In a healthy run, the breaker never fires and the agent completes well within maxIterations. In a looping run, the breaker fires after 3–6 iterations and preserves most of the remaining iteration budget for retry logic. Set maxIterations to your genuine worst-case legitimate run length; set breaker thresholds to catch pathological patterns early within that budget.

Stop Bee agents from billing you for loops

RunGuard wraps any Bee Agent Framework run with persistent trip state, dollar-denominated budget caps, and a dashboard showing which agents are spiraling and why — without modifying your agent or tool definitions.

See pricing Learn more