Flowise Cost Control: Loop Detection and Budget Enforcement in Production

Flowise is an open-source, low-code AI agent builder that lets you connect nodes in a drag-and-drop visual canvas to build LLM-powered flows. It runs on LangChain.js, which means every Chat Flow agent is a AgentExecutor loop under the hood — and every Agentflow v2 multi-agent graph is a LangGraph state machine traversal. Both execution models share the same fundamental problem: they keep running until the model decides it's done, and neither ships with a real-time cost guard.

The "Max Iterations" field on each Agent node is Flowise's only built-in protection. It caps the number of loop iterations — not the number of semantically identical tool calls within those iterations, not the dollar cost of the session, not the depth of a supervisor-to-worker delegation cascade. If you set Max Iterations to 30 to accommodate a legitimate research agent, a spiraling agent gets 30 shots at your LLM budget before anything stops it.

This post covers four failure modes specific to Flowise's architecture — Chat Flow tool call spirals, Agentflow v2 multi-agent supervisor loops, context history accumulation with Summary Buffer Memory, and HTTP Request node retry cascades — and shows you exactly how to implement circuit breakers using Flowise's own Custom Function node environment. If you prefer a managed solution, the final section explains how to wire RunGuard into any Flowise flow via the HTTP Request tool node. For additional context on the broader problem space, the AI agent cost engineering guide covers framework-agnostic principles that apply across Flowise, LangGraph, and similar systems.

How Flowise executes agent loops

Understanding the failure modes requires a clear picture of how Flowise actually runs agents. There are two distinct execution models:

Chat Flows (classic mode)

You drag an Agent node onto the canvas — OpenAI Functions Agent, ReAct Agent, or Conversational Agent — connect it to a chat model, attach tool nodes, and wire up a memory node. At runtime, Flowise constructs a LangChain.js AgentExecutor and calls executor.invoke(). The executor loop:

  1. Passes the system prompt, conversation history from the memory node, and tool schema definitions to the chat model.
  2. Receives either a final answer or one or more tool calls from the model.
  3. Executes the referenced tool node(s) — HTTP Request, Custom Function, Zapier NLA, OpenAPI toolkit, retrieval tools, etc.
  4. Appends tool results to the conversation history and loops back to step 1.
  5. Stops when the model outputs a Final Answer action or the Max Iterations counter is exhausted.

Memory nodes Flowise supports: Buffer Memory (full conversation history), Buffer Window Memory (last K message pairs), Summary Buffer Memory (LLM-summarizes older messages when history exceeds a token threshold), Zep Memory (vector-based persistent memory), and Upstash Redis Memory (Redis-backed session storage for multi-instance deployments). Each memory type creates a distinct cost profile — Buffer Memory accumulates without limit, Summary Buffer Memory adds LLM calls for every summarization cycle.

Agentflows v2

Flowise v2 introduced a visual sequential agent and multi-agent builder backed by LangGraph under the hood. You connect Sequential Agent nodes (each with its own system prompt, model, and optional tools), Condition nodes (JavaScript-evaluated routing logic), and Start/End nodes in a directed graph. The execution engine is a LangGraph state machine: each agent node receives the current shared state object, updates it, and the graph transitions along edges.

The Multi-Agent pattern within Agentflows v2 adds a Supervisor node and one or more Worker nodes. The Supervisor uses function calling to delegate tasks to Workers. Workers execute their assigned subtask and return results to the Supervisor. The Supervisor evaluates results and either delegates more work or outputs a final answer. Each delegation is a graph edge traversal; the shared state accumulates messages from every step.

In Agentflows v2, LangGraph's recursion_limit (configurable in Flowise's flow settings) acts as a hard backstop on graph traversal count — analogous to Max Iterations in Chat Flows. It counts state machine transitions, not semantic content. A Supervisor that keeps delegating to the same Worker with the same instructions can cycle through the full recursion limit without any semantic repetition detector triggering.

The gap: Max Iterations and LangGraph's recursion limit count steps. They cannot detect that consecutive steps are calling the same tool with semantically identical arguments, that a Supervisor is delegating to the same Worker in an unproductive loop, that the conversation history has grown to 80,000 tokens and the model is re-researching already-answered questions, or that an HTTP Request tool is triggering a multiplicative retry storm at both the transport layer and the agent layer simultaneously. None of these failure modes produce iteration counts high enough to trip the built-in limits before significant costs have already accumulated.

Failure mode 1: Tool call spiral in a Chat Flow agent

The most common Flowise production failure. A ReAct Agent or OpenAI Functions Agent calls a search or retrieval tool, receives a result that partially satisfies its objective, and calls the same tool again with a slightly rephrased query. Each iteration looks like progress to the agent — a marginally different result arrives — but the model never converges because the underlying data doesn't contain a satisfying answer, or the prompt's objective is ambiguous enough that "sufficiency" is never reached.

In a Flowise Chat Flow connected to a web search tool, this looks like:

  • Iteration 1: search("flowise production deployment best practices 2026")
  • Iteration 2: search("flowise production configuration guide 2026")
  • Iteration 3: search("best practices deploying flowise in production")
  • Iteration 4: search("flowise deployment configuration production environment")
  • ... (continues until Max Iterations)

Each query is syntactically different, so Max Iterations counts four distinct steps. But semantically, all four queries are near-identical — the model is searching the same topic with cosmetically rephrased phrasing. With GPT-4o at $0.005 per 1K output tokens and a search API charging per call, 30 iterations of this spiral costs real money before the counter trips.

The detection strategy is Jaccard similarity on normalized, sorted word bags of tool argument strings, applied across a 4-call sliding window. If 3 or more pairs in the window exceed a similarity threshold of 0.72, the agent has entered a spiral and should be stopped.

In Flowise, this guard lives in a Custom Function tool node that wraps execution before the real tool fires. The Custom Function node receives the tool name and arguments as inputs and has access to a module-level global map for per-session state (Flowise runs in Node.js, and in-memory state is valid for the duration of a session if each chat session is scoped to a single process instance). For production multi-instance deployments, the Redis alternative shown below is the correct approach.

// Flowise Custom Function node: Tool Call Spiral Guard
// Wire this node BEFORE each tool node in your Chat Flow.
// Inputs: $toolName (string), $toolArgs (object or string)
// Output: passes through if clean; throws if spiral detected.

// In-memory store — valid per Node.js process instance (single-server deployments)
// For multi-instance production, replace with the Redis alternative below.
const SPIRAL_STORE = globalThis.__rg_spiralStore || (globalThis.__rg_spiralStore = new Map());

const JACCARD_THRESHOLD = 0.72;
const WINDOW_SIZE = 4;
const MIN_HIGH_SIM_PAIRS = 3; // 3+ similar pairs in last 4 calls = spiral

// Normalize tool arguments: lowercase, strip punctuation, sort words
function normalizeArgs(args) {
  const raw = typeof args === 'string' ? args : JSON.stringify(args);
  return raw
    .toLowerCase()
    .replace(/[^a-z0-9\s]/g, ' ')
    .split(/\s+/)
    .filter(Boolean)
    .sort()
    .join(' ');
}

function jaccardSimilarity(a, b) {
  const setA = new Set(a.split(' '));
  const setB = new Set(b.split(' '));
  const intersection = new Set([...setA].filter(x => setB.has(x)));
  const union = new Set([...setA, ...setB]);
  return union.size === 0 ? 0 : intersection.size / union.size;
}

// Build a session key from $flow.sessionId if available, else use a fallback
const sessionId = $flow.sessionId || 'default';
const storeKey = `${sessionId}::${$toolName}`;

if (!SPIRAL_STORE.has(storeKey)) {
  SPIRAL_STORE.set(storeKey, []);
}

const history = SPIRAL_STORE.get(storeKey);
const fingerprint = normalizeArgs($toolArgs);

// Append to sliding window
history.push({ fingerprint, ts: Date.now() });
if (history.length > WINDOW_SIZE) {
  history.shift();
}

// Evict sessions older than 2 hours to prevent memory growth
const now = Date.now();
for (const [key, entries] of SPIRAL_STORE.entries()) {
  if (entries.length > 0 && now - entries[0].ts > 7200000) {
    SPIRAL_STORE.delete(key);
  }
}

// Compute pairwise Jaccard similarities across the window
if (history.length >= 3) {
  const similarities = [];
  for (let i = 0; i < history.length - 1; i++) {
    for (let j = i + 1; j < history.length; j++) {
      similarities.push(jaccardSimilarity(history[i].fingerprint, history[j].fingerprint));
    }
  }

  const highSimilarityPairs = similarities.filter(s => s >= JACCARD_THRESHOLD).length;
  const maxSim = Math.max(...similarities);

  if (highSimilarityPairs >= MIN_HIGH_SIM_PAIRS) {
    throw new Error(
      `[RunGuard] Tool call spiral detected on "${$toolName}": ` +
      `${highSimilarityPairs} near-identical calls in last ${history.length} invocations ` +
      `(max Jaccard similarity: ${maxSim.toFixed(3)}, threshold: ${JACCARD_THRESHOLD}). ` +
      `Session: ${sessionId}. Stopping agent to prevent runaway cost.`
    );
  }
}

// Guard passed — return tool args unchanged for downstream execution
return { toolName: $toolName, toolArgs: $toolArgs, spiralCheck: 'passed' };

Redis alternative for multi-instance production

If you run Flowise across multiple Node.js instances (Docker Swarm, Kubernetes, PM2 cluster mode), the in-memory global map is not shared across instances. Each instance has its own globalThis.__rg_spiralStore, so a session that switches instances mid-conversation resets its spiral history. Use Flowise's built-in Upstash Redis Memory integration — or a direct Redis client in the Custom Function node — to share state:

// Flowise Custom Function node: Redis-backed spiral guard
// Requires: ioredis installed in Flowise's node_modules, or use the Upstash REST API

// Using Upstash Redis REST API (no native dependency needed)
const UPSTASH_URL = process.env.UPSTASH_REDIS_REST_URL;
const UPSTASH_TOKEN = process.env.UPSTASH_REDIS_REST_TOKEN;
const JACCARD_THRESHOLD = 0.72;
const WINDOW_SIZE = 4;
const MIN_HIGH_SIM_PAIRS = 3;
const TTL_SECONDS = 7200; // 2-hour session expiry

async function redisGet(key) {
  const res = await fetch(`${UPSTASH_URL}/get/${encodeURIComponent(key)}`, {
    headers: { Authorization: `Bearer ${UPSTASH_TOKEN}` },
  });
  const data = await res.json();
  return data.result ? JSON.parse(data.result) : [];
}

async function redisSet(key, value) {
  await fetch(`${UPSTASH_URL}/set/${encodeURIComponent(key)}`, {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${UPSTASH_TOKEN}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify([JSON.stringify(value), 'EX', TTL_SECONDS]),
  });
}

function normalizeArgs(args) {
  return (typeof args === 'string' ? args : JSON.stringify(args))
    .toLowerCase().replace(/[^a-z0-9\s]/g, ' ')
    .split(/\s+/).filter(Boolean).sort().join(' ');
}

function jaccard(a, b) {
  const sA = new Set(a.split(' ')), sB = new Set(b.split(' '));
  const inter = [...sA].filter(x => sB.has(x)).length;
  const union = new Set([...sA, ...sB]).size;
  return union === 0 ? 0 : inter / union;
}

const sessionId = $flow.sessionId || 'default';
const storeKey = `rg:spiral:${sessionId}:${$toolName}`;

const history = await redisGet(storeKey);
history.push(normalizeArgs($toolArgs));
if (history.length > WINDOW_SIZE) history.shift();
await redisSet(storeKey, history);

if (history.length >= 3) {
  const sims = [];
  for (let i = 0; i < history.length - 1; i++)
    for (let j = i + 1; j < history.length; j++)
      sims.push(jaccard(history[i], history[j]));

  const highPairs = sims.filter(s => s >= JACCARD_THRESHOLD).length;
  if (highPairs >= MIN_HIGH_SIM_PAIRS) {
    throw new Error(
      `[RunGuard] Tool spiral on "${$toolName}": ${highPairs} near-identical ` +
      `calls in window of ${history.length}. Max similarity: ${Math.max(...sims).toFixed(3)}.`
    );
  }
}

return { toolName: $toolName, toolArgs: $toolArgs, spiralCheck: 'passed' };

This approach is also documented in our guide to stopping AI agent infinite loops in TypeScript, which covers Jaccard fingerprinting in depth including edge cases around paginated tool calls and cursor-based pagination where consecutive calls legitimately share most argument content.

Failure mode 2: Multi-agent supervisor loop in Agentflow v2

Flowise Agentflows v2's Multi-Agent pattern places a Supervisor node at the center of a delegation graph. The Supervisor receives a task, uses function calling to select which Worker node to invoke, the Worker executes its specialized subtask and returns results, and the Supervisor evaluates those results before deciding what to do next. The design is powerful — and it contains a failure mode that Max Iterations and LangGraph's recursion limit both miss.

The delegation cascade failure looks like this:

  • Step 1: Supervisor delegates "research competitor pricing" to Worker A (Research Agent)
  • Step 2: Worker A returns partial data — pricing found for 3 of 5 competitors
  • Step 3: Supervisor, unsatisfied with partial data, re-delegates "complete competitor pricing research" to Worker A
  • Step 4: Worker A returns similar partial data — the missing 2 competitors simply don't have public pricing
  • Step 5: Supervisor delegates "find alternative pricing signals" to Worker B (Web Search Agent)
  • Step 6: Worker B returns some indirect indicators
  • Step 7: Supervisor re-delegates "verify pricing from Worker B results" to Worker A again
  • Step 8: ... cycle continues

In LangGraph terms, each delegation is an edge traversal with the shared state object accumulating new messages. The state grows at every step. The Supervisor's context window fills with the history of all prior delegations, making it increasingly likely to repeat earlier decisions rather than converge. The recursion limit stops the cascade eventually — but at the cost of every state transition along the way, each involving one or more LLM calls.

The correct guard is a Condition node placed before the Supervisor's output routing. In Flowise Agentflows v2, Condition nodes evaluate a JavaScript expression against the current flow state and route to different downstream nodes. You inject a delegationCount field into the shared state and route to an "End" node (with an error message) when any Worker has been called more than 3 times or total delegations exceed a configurable limit.

Here is the JavaScript code for the Condition node that reads delegation counts from the flow state:

// Flowise Agentflow v2: Condition node — Delegation Guard
// Placed immediately AFTER the Supervisor node, BEFORE routing to Worker nodes.
// This node reads from the shared state object and routes to "End" if thresholds exceeded.
//
// In Flowise Agentflows v2, Condition node code receives `$flow.state` as the shared
// LangGraph state object. The code must return true (proceed) or false (route to fallback).

const state = $flow.state || {};

// Initialize delegation tracking in state if not present
if (!state._rg_delegations) {
  state._rg_delegations = {
    total: 0,
    perWorker: {},
    startedAt: Date.now(),
  };
}

const dg = state._rg_delegations;
const MAX_TOTAL_DELEGATIONS = 12;
const MAX_PER_WORKER = 3;
const MAX_SESSION_MINUTES = 10;

// Extract which worker the Supervisor just selected (from Supervisor's last tool call)
// In Flowise v2, the Supervisor's output is stored in state.messages[-1].tool_calls
const messages = state.messages || [];
const lastMsg = messages[messages.length - 1];
let selectedWorker = null;

if (lastMsg && lastMsg.tool_calls && lastMsg.tool_calls.length > 0) {
  // Flowise multi-agent: the tool call name is the worker node's name
  selectedWorker = lastMsg.tool_calls[0].name || null;
}

if (selectedWorker) {
  dg.total += 1;
  dg.perWorker[selectedWorker] = (dg.perWorker[selectedWorker] || 0) + 1;
}

// Check total delegation limit
if (dg.total > MAX_TOTAL_DELEGATIONS) {
  // Inject a STOP signal into state so the next Supervisor context sees it
  state._rg_stop = {
    reason: 'max_delegations',
    message: `[RunGuard] Delegation limit reached: ${dg.total} total delegations ` +
             `(max: ${MAX_TOTAL_DELEGATIONS}). Forcing task completion with available results.`,
  };
  // Return false to route to the "End / Force Complete" branch
  return false;
}

// Check per-worker delegation limit
for (const [worker, count] of Object.entries(dg.perWorker)) {
  if (count > MAX_PER_WORKER) {
    state._rg_stop = {
      reason: 'worker_loop',
      worker,
      message: `[RunGuard] Worker "${worker}" has been delegated to ${count} times ` +
               `(max: ${MAX_PER_WORKER}). The task may be unanswerable — forcing completion.`,
    };
    return false;
  }
}

// Check session time limit
const elapsedMinutes = (Date.now() - dg.startedAt) / 60000;
if (elapsedMinutes > MAX_SESSION_MINUTES) {
  state._rg_stop = {
    reason: 'timeout',
    message: `[RunGuard] Session exceeded ${MAX_SESSION_MINUTES} minutes ` +
             `(${elapsedMinutes.toFixed(1)} min elapsed). Forcing completion.`,
  };
  return false;
}

// All checks passed — proceed to Worker routing
return true;

Wire the Condition node so that true routes to your normal Worker routing logic and false routes to a final Agent node (or a simple End node) that reads $flow.state._rg_stop.message and returns it as the flow's final answer. This ensures the user receives a meaningful response — "research was stopped after 12 delegations; here are the partial results" — rather than a silent truncation or an error page.

For comparison, LangGraph-native applications have access to the same pattern — the LangGraph circuit breaker patterns post covers how to inject guards into graph edges directly, which is the mechanism Flowise Agentflows v2 uses under the hood.

Failure mode 3: Context history accumulation and reprocessing spiral

Flowise Chat Flow agents with Buffer Memory accumulate every message in the conversation history indefinitely. For a long-running research or multi-step workflow session, this is the slow-building failure: the agent works efficiently for the first dozen iterations, then costs per iteration start climbing because the context passed to the model at each step includes the full prior history. At some point, the context exceeds the model's effective context window and one of two things happens.

The first outcome: the provider silently truncates the context window, dropping the oldest messages. The agent loses its earlier conclusions, re-researches already-answered subtasks, and the total iteration count climbs further. The second outcome: the provider returns an error or the model's reasoning quality degrades sharply because it can no longer hold all the relevant state simultaneously.

With Summary Buffer Memory, Flowise attempts to mitigate context growth by running an LLM summarization call when the conversation history exceeds a token threshold. This helps — but it introduces a second cost loop. Each summarization is a synchronous LLM call that adds latency and cost to every agent step once the threshold is crossed. If the agent is in a spiral and the history is growing faster than summaries can compact it, you get a summarization loop on top of the research loop: every step summarizes the history and then adds 3 new messages that push the history past the threshold again, triggering another summarization on the next step.

The guard here is a Custom Function node that estimates token usage from the conversation history before each agent step, injects a context-compaction signal when the history approaches the model's context limit, and throws a circuit-breaker error if the history has been summarized more than twice in the same session without the overall token count decreasing:

// Flowise Custom Function node: Context Accumulation Guard
// Place this node BEFORE the Agent node in your Chat Flow (in the pre-execution path).
// Inputs: $flow.chatHistory (array of message objects), $flow.sessionId
// The 1.3 words-to-tokens heuristic works well for English prose + tool results.

const MAX_HISTORY_TOKENS = 80000;        // ~80% of a 100K context window
const WARN_THRESHOLD = 0.65;             // Warn at 65% of limit
const MAX_SUMMARY_CYCLES = 2;            // Hard stop if summarized 3+ times without relief
const WORDS_TO_TOKENS_RATIO = 1.3;       // Conservative estimate: 1 word ≈ 1.3 tokens

// Per-session state — use globalThis for single-instance, Redis for multi-instance
const CONTEXT_STORE = globalThis.__rg_contextStore ||
  (globalThis.__rg_contextStore = new Map());

const sessionId = $flow.sessionId || 'default';

if (!CONTEXT_STORE.has(sessionId)) {
  CONTEXT_STORE.set(sessionId, {
    summaryCycles: 0,
    peakTokens: 0,
    lastChecked: Date.now(),
    compactionMarkers: [],
  });
}

const cs = CONTEXT_STORE.get(sessionId);
cs.lastChecked = Date.now();

// Estimate tokens from conversation history
const chatHistory = $flow.chatHistory || [];
let totalWords = 0;

for (const msg of chatHistory) {
  const content = typeof msg === 'string'
    ? msg
    : (msg.content || msg.text || JSON.stringify(msg));
  totalWords += content.split(/\s+/).filter(Boolean).length;
}

const estimatedTokens = Math.ceil(totalWords * WORDS_TO_TOKENS_RATIO);
cs.peakTokens = Math.max(cs.peakTokens, estimatedTokens);

// Check if a Summary Buffer Memory summary just ran by detecting
// a significant token reduction relative to the last peak
const prevPeak = cs.peakTokens;
if (estimatedTokens < prevPeak * 0.6 && prevPeak > 10000) {
  // History shrank by 40%+ — a summarization cycle completed
  cs.summaryCycles += 1;
  cs.compactionMarkers.push({
    cycle: cs.summaryCycles,
    at: Date.now(),
    tokensBeforeCompaction: prevPeak,
    tokensAfterCompaction: estimatedTokens,
  });
}

// Hard stop: too many summarization cycles without meaningful progress
// This indicates a summarization loop: compaction fires, history grows immediately,
// compaction fires again, repeat.
if (cs.summaryCycles >= MAX_SUMMARY_CYCLES) {
  const markers = cs.compactionMarkers
    .map(m => `cycle ${m.cycle}: ${m.tokensBeforeCompaction} -> ${m.tokensAfterCompaction} tokens`)
    .join('; ');
  throw new Error(
    `[RunGuard] Context summarization loop detected: ${cs.summaryCycles} compaction ` +
    `cycles without history stabilizing. Compaction history: ${markers}. ` +
    `Current estimated tokens: ${estimatedTokens}. Stopping agent.`
  );
}

// Hard stop: approaching context limit
if (estimatedTokens >= MAX_HISTORY_TOKENS) {
  throw new Error(
    `[RunGuard] Context limit approaching: ~${estimatedTokens} estimated tokens in ` +
    `conversation history (limit: ${MAX_HISTORY_TOKENS}). ` +
    `${cs.summaryCycles} summarization cycle(s) completed. ` +
    `Agent cannot make progress without exceeding model context window. Stopping.`
  );
}

// Soft warning: inject compaction signal into system prompt context
// Return a modified context hint that the agent's system prompt can read
let contextSignal = null;
if (estimatedTokens >= MAX_HISTORY_TOKENS * WARN_THRESHOLD) {
  contextSignal = {
    type: 'context_compaction_warning',
    message: `[Context note: ~${estimatedTokens} tokens of history accumulated at step ` +
             `${chatHistory.length}. Prioritize synthesizing existing results over ` +
             `additional tool calls unless strictly necessary.]`,
    estimatedTokens,
    summaryCycles: cs.summaryCycles,
  };
}

return {
  sessionId,
  estimatedTokens,
  summaryCycles: cs.summaryCycles,
  contextSignal,
  guard: 'passed',
};

The contextSignal in the return value can be injected into the agent's system prompt by a subsequent Custom Function node or by parameterizing the Agent node's system prompt field with a Flowise variable expression. When the agent sees the context note, it becomes more conservative about additional tool calls — reducing iteration count and cost organically before the hard stop triggers.

This same category of runaway context cost affects every framework that uses LangChain memory abstractions. The n8n AI Agent cost control post covers a similar guard for n8n's Window Buffer Memory and Summary Memory nodes, with additional detail on how to instrument token usage against actual billing data from your LLM provider.

Failure mode 4: HTTP Request tool retry cascade

Flowise's HTTP Request node is built on Axios and includes configurable retry logic in the node's settings panel. You can set the retry count (how many times to retry a failed request) and the retry delay. These retries are transparent to the agent — the agent fires the tool, Axios handles retries internally, and the agent sees only the final result (success or final failure) after all retry attempts complete.

When the HTTP Request node is attached to an AI Agent as a tool, this creates a two-level retry structure. Level 1: Axios retries the failing request N times at the transport layer. Level 2: the agent — receiving a final tool failure — decides that the API "might work if called with different parameters" and calls the tool again. This triggers another Axios retry cycle. The math is multiplicative:

  • Axios retries: 3 per tool invocation
  • Agent-level retries: 5 invocations before Max Iterations stops the loop
  • Total HTTP calls: 5 × 3 = 15 actual HTTP requests to an endpoint that was unavailable from the first call

If those requests hit a paid API — SerpAPI, a geocoding service, a data enrichment endpoint — the 15 calls accumulate cost independently of your LLM spend. If the failing endpoint is your own infrastructure, 15 requests per agent session multiplied by concurrent sessions creates an unintended load test on a system that's already experiencing issues.

The two-part fix: disable Axios-level retries on HTTP Request nodes used as agent tools (set retry count to 0 in the node settings — the agent's own retry logic is sufficient), and add a Custom Function node that tracks per-tool failure counts across agent iterations and opens a circuit-breaker state after 2 consecutive failures on the same tool:

// Flowise Custom Function node: HTTP Tool Failure Circuit Breaker
// Place AFTER your HTTP Request tool node, BEFORE the result returns to the Agent node.
// Inputs: $toolName (string), $httpStatusCode (number), $httpError (string or null)
// The node reads from in-memory state; swap to Redis for multi-instance deployments.

const FAILURE_STORE = globalThis.__rg_httpFailures ||
  (globalThis.__rg_httpFailures = new Map());

const sessionId = $flow.sessionId || 'default';
const toolName = $toolName || 'http_request';
const storeKey = `${sessionId}::${toolName}`;

const MAX_CONSECUTIVE_FAILURES = 2;
const MAX_FAILURE_RATE = 0.6;          // 60% failure rate triggers open circuit
const MIN_CALLS_FOR_RATE_CHECK = 5;    // Don't trip on rate until 5+ calls observed

// Initialize per-tool-per-session failure tracker
if (!FAILURE_STORE.has(storeKey)) {
  FAILURE_STORE.set(storeKey, {
    successes: 0,
    failures: 0,
    consecutiveFails: 0,
    firstCallAt: Date.now(),
    lastFailureCode: null,
    circuitOpen: false,
  });
}

const tf = FAILURE_STORE.get(storeKey);

// If circuit is already open, refuse immediately (agent should not have called this)
if (tf.circuitOpen) {
  throw new Error(
    `[RunGuard] Circuit is OPEN for tool "${toolName}" — do not retry this session. ` +
    `The tool has been unavailable for ${tf.consecutiveFails} consecutive calls. ` +
    `Last error: HTTP ${tf.lastFailureCode}. Session: ${sessionId}.`
  );
}

// Determine if this invocation was a failure
const isFailure = $httpError != null || ($httpStatusCode && $httpStatusCode >= 400);

if (isFailure) {
  tf.failures += 1;
  tf.consecutiveFails += 1;
  tf.lastFailureCode = $httpStatusCode || 'error';
} else {
  tf.successes += 1;
  tf.consecutiveFails = 0;
}

const totalCalls = tf.successes + tf.failures;

// Open circuit on consecutive failures
if (tf.consecutiveFails >= MAX_CONSECUTIVE_FAILURES) {
  tf.circuitOpen = true;
  throw new Error(
    `[RunGuard] HTTP tool circuit opened for "${toolName}": ` +
    `${tf.consecutiveFails} consecutive failures (threshold: ${MAX_CONSECUTIVE_FAILURES}). ` +
    `Last HTTP status: ${tf.lastFailureCode}. ` +
    `Returning 409-equivalent to agent: tool unavailable after ${tf.consecutiveFails} ` +
    `attempts — do not retry this session. External service may be down.`
  );
}

// Open circuit on persistent high failure rate
if (totalCalls >= MIN_CALLS_FOR_RATE_CHECK) {
  const failureRate = tf.failures / totalCalls;
  if (failureRate > MAX_FAILURE_RATE) {
    tf.circuitOpen = true;
    throw new Error(
      `[RunGuard] HTTP tool circuit opened for "${toolName}": ` +
      `${Math.round(failureRate * 100)}% failure rate across ${totalCalls} calls ` +
      `(threshold: ${Math.round(MAX_FAILURE_RATE * 100)}%). ` +
      `Tool unavailable — do not retry this session.`
    );
  }
}

// Attach exponential backoff metadata to the response so the agent understands
// it should wait before retrying (if it retries at all)
const backoffMs = Math.min(1000 * Math.pow(2, tf.consecutiveFails), 30000);

return {
  toolName,
  httpStatusCode: $httpStatusCode,
  circuitState: 'closed',
  consecutiveFails: tf.consecutiveFails,
  backoffRecommendationMs: backoffMs,
  guard: 'passed',
};

When the circuit opens, the error message includes explicit language that "tool unavailable — do not retry this session." Well-prompted LLM models (GPT-4, Claude 3.x series, Gemini 1.5+) generally respect this instruction and stop attempting to call the tool, routing instead to a graceful failure path. You can reinforce this behavior by adding a similar instruction to your Agent node's system prompt: "If a tool returns a message containing 'do not retry this session,' stop calling that tool immediately and synthesize a response from available results."

The same multiplicative retry dynamic appears in other frameworks. The CrewAI loop detection post covers how the same failure mode manifests in CrewAI's tool execution layer and how to implement a tool-failure circuit breaker there.

Adding RunGuard to Flowise

The four Custom Function guards above are correct and production-ready, but they require maintenance: you need to wire them into every flow, keep threshold constants consistent across flows, manage the in-memory or Redis state, and handle the growing globalThis stores in long-running processes. RunGuard provides all four checks as a managed HTTP endpoint callable from any Flowise flow via the standard HTTP Request node.

Option 1: HTTP Request node integration

Add an HTTP Request node at the start of each tool's execution path in your flow. Configure the node as a POST request to the RunGuard API:

// Flowise HTTP Request node configuration for RunGuard
// Add this node BEFORE each tool execution node in your Chat Flow or Agentflow
//
// Method: POST
// URL: https://runguard.dev/api/guard
// Headers:
//   Content-Type: application/json
//   X-RunGuard-Key: your-api-key-here
//
// Body (JSON — use Flowise variable expressions):
{
  "session_id": "{{$flow.sessionId}}",
  "app_id": "flowise-production",
  "tool_name": "{{$toolName}}",
  "tool_args_hash": "{{$toolArgsHash}}",
  "call_count": "{{$callCount}}",
  "guard_types": ["spiral", "http_failure", "budget"]
}
//
// Response handling:
// - HTTP 200: { "allowed": true, "checks": { "spiral": "passed", ... } }
//   → Proceed to the actual tool node
// - HTTP 409: { "allowed": false, "reason": "spiral", "detail": "..." }
//   → Flowise treats non-2xx as error; configure "Continue On Fail" = false
//   → The agent receives the detail message as a tool error and stops
//
// For Multi-Agent supervisor guards:
{
  "session_id": "{{$flow.sessionId}}",
  "app_id": "flowise-production",
  "guard_types": ["delegation"],
  "delegation_data": {
    "total": "{{$flow.state._rg_delegations.total}}",
    "per_worker": "{{$flow.state._rg_delegations.perWorker}}"
  }
}

On a 409 response, Flowise surfaces the error to the agent executor as a tool failure with the RunGuard detail message. The agent receives a clear explanation of why the tool was blocked and — if the message includes "do not retry" language — stops attempting to call the tool. The 200 path is a passthrough with zero behavioral change to the flow.

Option 2: RunGuard custom node package

RunGuard is available as a custom node package in Flowise's component marketplace. Install it by adding the package to your Flowise instance's custom component directory (configurable via the FLOWISE_COMPONENTS_PATH environment variable). Once installed, a RunGuard Circuit Breaker node appears in the canvas node palette under the "Utilities" category.

The node exposes four configuration fields: API key, app ID, guard types to enable (checkboxes for spiral, delegation, budget, http_failure), and threshold overrides. Wire it between your tool nodes and the Agent node, and RunGuard handles all state management server-side. The node package also registers a Flowise webhook that sends trip events to your configured Slack or PagerDuty endpoint — matching the alert capability of RunGuard's Team plan.

All guard trip events are stored in the RunGuard dashboard with full tool call history, similarity scores, delegation counts, token estimates, and session timelines. The data is retained for 30 days and searchable across all your flows and app IDs. For teams running multiple Flowise instances (development, staging, production), the app ID field scopes each instance's data independently.

FAQ

Does Flowise's "Max Iterations" protect against loops?

Partially. Max Iterations is a hard cap on the number of times the AgentExecutor loop cycles — it will always stop eventually. But it has three important limitations. First, it counts iterations, not cost: a model that generates 2,000 tokens of reasoning per iteration and calls three tools per step exhausts far more budget in 10 iterations than a simple calculator agent would in 30. Second, it does not detect semantic repetition: an agent calling the same search tool 10 times with near-identical queries will exhaust all 10 iterations doing pointless work. Third, in Flowise Agentflows v2, the equivalent parameter is LangGraph's recursion_limit, which counts graph traversals rather than agent reasoning steps — a single Supervisor delegation cycle may consume multiple traversals, making the effective limit lower than the configured number suggests. Max Iterations is a last-resort backstop, not a cost control mechanism.

Can I use Flowise's built-in error handling (Error Handler node) instead of a circuit breaker?

The Error Handler node in Flowise catches runtime errors thrown by any node in the flow and routes execution to a fallback path. It is useful for graceful degradation — showing a user-friendly error message when a tool fails — but it does not prevent the failure that triggers it. A circuit breaker intervenes before the problematic behavior completes its damage; an error handler responds after the damage is done. Additionally, error handlers catch single-node failures, not cumulative patterns like "this tool has been called 4 times with similar arguments." The combination of both is correct: use circuit breaker Custom Function nodes to detect and stop spirals early, and use Error Handler nodes to translate circuit-breaker throws into user-friendly responses. They serve different roles in the same safety stack.

In Flowise Agentflows v2, does LangGraph's built-in recursion limit help?

Yes — but with the same caveat as Max Iterations. LangGraph's recursion limit (set in Flowise's Agentflow graph configuration, default 25 in recent Flowise versions) counts state machine transitions, not semantic events. A Supervisor-Worker-Supervisor-Worker delegation cycle consumes 4 transitions per round-trip, so a recursion limit of 25 allows about 6 complete delegation cycles before stopping. This is a meaningful ceiling, but 6 delegation cycles with a GPT-4 Supervisor and Claude 3.5 Workers can still accumulate $5–10 in LLM costs on a complex task. The delegation guard described in Failure Mode 2 intercepts the spiral at 3–4 delegations to the same Worker before costs compound further. Use the recursion limit as a system-level safety net and the Condition node guard as your primary semantic detector.

What's the difference between a Flowise Chat Flow agent and an Agentflow agent in terms of loop risk?

Chat Flow agents run a linear AgentExecutor loop with a flat tool list — all loop risk is within a single agent's iteration count. The failure modes are tool call spirals (Failure Mode 1) and context accumulation (Failure Mode 3). Agentflow agents introduce structural loop risk that doesn't exist in Chat Flows: the multi-agent delegation graph can cycle between Supervisor and Worker nodes, accumulating cost at every graph edge. A Chat Flow with Max Iterations set to 20 might run 20 LLM calls before stopping. An Agentflow with recursion_limit set to 25 might run 25+ LLM calls per delegation step if Workers have their own internal loops — the total call count is multiplicative, not additive. Agentflows v2 require delegation guards (Failure Mode 2) in addition to the spiral and budget guards that Chat Flows need. The complexity difference makes Agentflows both more powerful and more expensive to run without explicit circuit breaking in place. See the OpenAI Agents SDK post for a parallel analysis of how multi-agent handoff patterns create similar multiplicative loop risks in OpenAI's native agent framework.

How do I monitor Flowise agent costs without a paid observability platform?

Flowise exposes a /api/v1/chatmessages REST endpoint that returns conversation history including the model used for each message. The response does not include token counts by default, but you can cross-reference the message content length against your LLM provider's token pricing to estimate cost per session. For more precise tracking, configure Flowise's LangChain.js callback system: set the LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY environment variables to route all LangChain traces to LangSmith's free tier (50,000 traces per month). LangSmith records input/output token counts for every LLM call and aggregates them by run ID, giving you per-session cost estimates without a paid platform. Alternatively, many Flowise deployments use a lightweight PostgreSQL-backed Flowise instance with the built-in execution logging, then run a simple SQL query against the chat messages table to sum content lengths as a cost proxy. None of these approaches are as precise as a token-aware observability platform, but they give you enough signal to identify outlier sessions before they impact your monthly bill significantly.

Stop runaway Flowise agents before the bill lands

RunGuard wraps all four Flowise agent guards — tool call spiral detection, multi-agent supervisor delegation limiting, context token budget enforcement, and HTTP retry cascade prevention — as a managed API endpoint. Add one HTTP Request node to your Flowise flow and get a persistent 30-day trip dashboard with Slack and PagerDuty alerts. No Code nodes to maintain, no Redis state to manage, no threshold drift across flows.

Start free 14-day trial →