n8n AI Agent Cost Control: Loop Detection and Budget Enforcement in Production

n8n's AI Agent node wraps a LangChain AgentExecutor loop inside a workflow node. You connect a chat model, some memory, and a set of tool nodes — then the agent calls tools, ingests results, and keeps going until it decides it's done. There is no built-in iteration cap, no per-run token budget, and no spiral detector. When the model gets confused, the loop runs until you kill the workflow manually or your LLM provider cuts you off on rate limits or credits.

This is not a hypothetical. A single misconfigured AI Agent node in a production n8n workflow has run up four-figure LLM bills in a matter of hours — the model enters a tool call spiral, n8n dutifully fires each tool call, and the LangChain executor keeps passing results back for the next step. The workflow logs fill up. Your Slack stays silent. The invoice does not.

This post covers four failure modes specific to n8n's agent architecture and shows you exactly how to build a circuit breaker using n8n's own Code node — no external dependencies required. If you'd rather not maintain the guard yourself, the final section covers how RunGuard wraps all four checks as a managed service callable via n8n's HTTP Request node.

How n8n's AI Agent node works

The AI Agent node (introduced in n8n v1.x, powered by LangChain.js) follows the standard ReAct loop:

  1. The chat model receives the system prompt + current conversation history + tool definitions.
  2. The model outputs either a final answer or a tool call (or multiple tool calls for models that support parallel function calling).
  3. n8n executes the referenced tool node(s) and feeds the results back into the conversation history.
  4. The loop repeats until the model outputs a final answer or the node's Max Iterations setting is reached.

The Max Iterations field (default: 10) is n8n's only built-in guard. It counts loop iterations — not semantic repetition, not token growth, not sub-workflow depth. A model that calls the same search tool 10 times with nearly identical queries will exhaust Max Iterations on the 10th call. But if you raised Max Iterations to 50 for a legitimate research agent, the same spiral runs 50 iterations before anything stops it.

n8n supports several memory options for the Agent node:

  • Window Buffer Memory — keeps the last N message pairs in memory; does not count tokens
  • Summary Memory — summarizes older messages via an LLM call; each summarization = additional cost
  • No memory — stateless; each trigger invocation starts fresh

Tool nodes that can be attached to an AI Agent include: HTTP Request, Code, Execute Workflow, Calculator, Wikipedia, SerpAPI, and any custom sub-workflow via the "Execute Workflow" tool type. This "Execute Workflow as tool" capability is what creates the most dangerous failure mode, as we'll see below.

The gap: Max Iterations counts steps. It cannot detect that five consecutive steps called the same tool with semantically identical arguments, that a sub-workflow triggered by a tool is itself running another AI Agent, or that the conversation history has grown to 60,000 tokens across 12 iterations of a Window Buffer Memory window.

Failure mode 1: Tool call invocation spiral

The most common n8n agent failure. The model calls a search or data-fetch tool, receives a result that doesn't fully satisfy its objective, and calls the same tool again with a slightly rephrased query. Each iteration produces a marginally different result. The model never converges on a final answer because the data isn't there or the prompt is ambiguous — so it keeps searching.

In practice this looks like:

  • Iteration 1: webSearch("latest AI agent frameworks 2026")
  • Iteration 2: webSearch("best AI agent frameworks 2026 comparison")
  • Iteration 3: webSearch("top AI agent frameworks production 2026")
  • Iteration 4: webSearch("AI agent frameworks 2026 production ready")
  • … (continues until Max Iterations)

The queries look different to Max Iterations (each is a new step), but they are semantically near-identical. With GPT-4o at $0.005 per 1K output tokens and an agent that generates 500 tokens of reasoning per iteration, 50 iterations costs $1.25 in model output alone — before accounting for the search API calls.

Detection via Jaccard similarity on normalized query fingerprints:

// n8n Code node: placed BEFORE the AI Agent node in the workflow
// Receives: $json.toolName, $json.toolArgs, workflow static data for history

const toolName = $input.first().json.toolName;
const toolArgs = JSON.stringify($input.first().json.toolArgs || {});

// Load spiral state from workflow static data
const state = $getWorkflowStaticData('global');
if (!state.spiralHistory) state.spiralHistory = {};
if (!state.spiralHistory[toolName]) state.spiralHistory[toolName] = [];

// Normalize: lowercase, strip punctuation, sort words
function normalize(str) {
  return str.toLowerCase().replace(/[^a-z0-9\s]/g, ' ')
    .split(/\s+/).filter(Boolean).sort().join(' ');
}

function jaccard(a, b) {
  const setA = new Set(a.split(' '));
  const setB = new Set(b.split(' '));
  const intersection = new Set([...setA].filter(x => setB.has(x)));
  const union = new Set([...setA, ...setB]);
  return union.size === 0 ? 0 : intersection.size / union.size;
}

const fingerprint = normalize(toolArgs);
const history = state.spiralHistory[toolName];

// Keep a sliding window of last 4 calls for this tool
history.push({ fingerprint, ts: Date.now() });
if (history.length > 4) history.shift();
$setWorkflowStaticData('global', state);

// Check if any 3 of the last 4 calls exceed similarity threshold
if (history.length >= 3) {
  const pairs = [];
  for (let i = 0; i < history.length - 1; i++) {
    for (let j = i + 1; j < history.length; j++) {
      pairs.push(jaccard(history[i].fingerprint, history[j].fingerprint));
    }
  }
  const highSimilarity = pairs.filter(s => s >= 0.72).length;
  if (highSimilarity >= 2) {
    throw new Error(
      `[RunGuard] Tool call spiral detected on "${toolName}": ` +
      `${highSimilarity} near-identical calls in last ${history.length} invocations ` +
      `(max similarity: ${Math.max(...pairs).toFixed(2)}). Stopping agent.`
    );
  }
}

return $input.first().json;

Place this Code node between your workflow trigger and the AI Agent node in a loop-back pattern, or as a sub-workflow tool wrapper that pre-validates every tool call before execution. When the spiral threshold is crossed, the Code node throws an error — n8n surfaces this as a workflow error, halting execution cleanly.

Failure mode 2: Sub-workflow recursion via "Execute Workflow" tool

n8n's most dangerous failure mode, and the hardest to detect from inside any single workflow. The AI Agent node can call an "Execute Workflow" tool, which triggers another n8n workflow as a sub-routine. If that sub-workflow also contains an AI Agent node — or triggers yet another sub-workflow — you get recursive execution with no built-in depth limit.

A real scenario: a research orchestrator agent calls an "expand topic" sub-workflow via an Execute Workflow tool. The expand-topic sub-workflow itself contains an AI Agent node that, when it can't find a satisfying answer, calls the same "expand topic" sub-workflow again. n8n will execute each nested trigger as a new workflow run — each with its own Max Iterations counter, its own billing, and no awareness of how deep the recursion has gone.

Sub-workflow recursion is especially insidious because each recursive call appears in your n8n execution log as a separate, independent workflow execution. There's no call stack visible in the n8n UI. You only notice when the execution count spikes or when the LLM credits run out.

The guard lives in a shared Code node at the top of every workflow that can be called as a sub-workflow:

// n8n Code node: "Recursion Depth Guard" — first node in every sub-workflow
// The calling workflow must pass { _rg_depth: number, _rg_root_id: string }
// in the sub-workflow input. Root workflows start with _rg_depth: 0.

const MAX_DEPTH = 4;
const input = $input.first().json;

const depth = typeof input._rg_depth === 'number' ? input._rg_depth : 0;
const rootId = input._rg_root_id || $execution.id;

if (depth >= MAX_DEPTH) {
  throw new Error(
    `[RunGuard] Sub-workflow recursion limit reached: depth ${depth} ` +
    `(max ${MAX_DEPTH}). Root execution: ${rootId}. ` +
    `Stopping to prevent runaway execution chain.`
  );
}

// Pass incremented depth to any downstream Execute Workflow calls
return [{
  json: {
    ...input,
    _rg_depth: depth + 1,
    _rg_root_id: rootId,
  }
}];

When calling a sub-workflow from an AI Agent's Execute Workflow tool, map the _rg_depth and _rg_root_id fields from the calling workflow's input into the sub-workflow's input payload. This propagates depth context through the call chain. Any sub-workflow that receives a depth at or above the limit throws immediately before its own AI Agent node fires.

Failure mode 3: Window Buffer Memory context accumulation

n8n's Window Buffer Memory keeps the last N conversation turns in memory. The default window size is often set generously (8–20 turns) to give the agent sufficient context. But "N turns" does not translate directly to "N tokens" — each turn can contain a lengthy tool result. A research agent that fetches full web page content via HTTP Request will accumulate 2,000–8,000 tokens per tool result. With a window of 10 turns and an average of 4,000 tokens per result, the effective context passed to the model on turn 10 includes 40,000 tokens of tool results — at prices that make each subsequent LLM call 10× more expensive than turn 1.

The failure compound: as context grows, the model's reasoning quality degrades, it starts making weaker tool selection decisions, and the total iteration count climbs to compensate — each iteration now costing more tokens than the last.

Build a token budget checker in a Code node that fires after each tool result is generated, before it gets written to memory:

// n8n Code node: "Memory Budget Guard"
// Place after tool execution, before passing result back to AI Agent
// Estimates tokens via character count (≈ 4 chars/token for English text)

const BUDGET_CONFIG = {
  maxSessionTokens: 40000,   // Total estimated tokens for this agent run
  maxSingleResultTokens: 6000, // Per-tool-result cap
  warnAt: 0.75,              // Log warning at 75% of budget
};

const input = $input.first().json;
const toolResult = JSON.stringify(input);
const estimatedTokens = Math.ceil(toolResult.length / 4);

const state = $getWorkflowStaticData('global');
if (!state.tokenBudget) state.tokenBudget = { used: 0, calls: 0, sessionId: $execution.id };

// Reset if new execution
if (state.tokenBudget.sessionId !== $execution.id) {
  state.tokenBudget = { used: 0, calls: 0, sessionId: $execution.id };
}

state.tokenBudget.used += estimatedTokens;
state.tokenBudget.calls += 1;
$setWorkflowStaticData('global', state);

const { used, calls } = state.tokenBudget;

// Hard stop: single result too large
if (estimatedTokens > BUDGET_CONFIG.maxSingleResultTokens) {
  // Truncate rather than hard-fail: return a trimmed summary
  const truncated = toolResult.substring(0, BUDGET_CONFIG.maxSingleResultTokens * 4);
  console.warn(
    `[RunGuard] Tool result truncated: ${estimatedTokens} estimated tokens ` +
    `exceeds per-result cap of ${BUDGET_CONFIG.maxSingleResultTokens}. ` +
    `Returning first ${BUDGET_CONFIG.maxSingleResultTokens} tokens.`
  );
  return [{ json: { ...input, _truncated: true, _original_length: toolResult.length,
    content: truncated + '\n\n[Content truncated by RunGuard budget guard]' } }];
}

// Hard stop: session budget exhausted
if (used > BUDGET_CONFIG.maxSessionTokens) {
  throw new Error(
    `[RunGuard] Session token budget exhausted: ~${used} estimated tokens used ` +
    `across ${calls} tool calls (limit: ${BUDGET_CONFIG.maxSessionTokens}). ` +
    `Stopping agent to prevent runaway context accumulation.`
  );
}

// Warn at threshold
if (used > BUDGET_CONFIG.maxSessionTokens * BUDGET_CONFIG.warnAt) {
  console.warn(
    `[RunGuard] Token budget at ${Math.round(used / BUDGET_CONFIG.maxSessionTokens * 100)}%: ` +
    `~${used}/${BUDGET_CONFIG.maxSessionTokens} estimated tokens used.`
  );
}

return $input.first().json;

The guard does two things: it truncates oversized single results (returning a partial result rather than hard-failing) and it hard-stops the session when the total estimated token count across all tool calls exceeds your budget. Truncation is the right behavior for large HTTP responses — the agent can work with partial data. Hard-stopping is right when the session budget is exhausted — at that point, additional iterations are unlikely to produce better output anyway.

Failure mode 4: HTTP Request tool retry cascade

n8n's HTTP Request node has its own retry settings (configurable in the node itself — up to 5 retries with configurable delay). When used as an AI Agent tool, a failing HTTP Request node can trigger a two-level retry storm: n8n's own HTTP retry mechanism fires at the transport layer, and then the AI Agent's LangChain executor — seeing a tool error — may ask the model whether to retry, modify the request, or try a different approach. The model often says "retry with modified parameters." The cycle repeats.

In a worst case: HTTP Request fires, fails with a 429 or 5xx, retries 5× at the transport layer (5 HTTP calls), returns an error to the agent, the model calls the tool again with slightly different parameters (1 more agent iteration), that fires 5 more HTTP retries, and so on. With 10 agent iterations and 5 HTTP retries each, you've made 50 HTTP calls on a tool that never succeeded. If those calls hit a paid API (SerpAPI, OpenAI embeddings, etc.), the costs stack independently of your LLM spend.

The fix has two parts: disable HTTP Request node-level retries for agent tools, and add a failure-rate guard in a Code node that watches for repeat tool failures:

// n8n Code node: "HTTP Tool Failure Guard"
// Place immediately after any HTTP Request tool node (before result returns to agent)
// Tracks failure rate per tool name and halts if failures are too frequent

const input = $input.first().json;
const toolName = $input.first().json._tool_name || 'http_request';

const state = $getWorkflowStaticData('global');
if (!state.toolFailures) state.toolFailures = {};
if (!state.toolFailures[toolName]) {
  state.toolFailures[toolName] = { successes: 0, failures: 0, consecutiveFails: 0 };
}

const tf = state.toolFailures[toolName];
const isError = input.error || (input.statusCode && input.statusCode >= 400);

if (isError) {
  tf.failures += 1;
  tf.consecutiveFails += 1;
} else {
  tf.successes += 1;
  tf.consecutiveFails = 0;
}

$setWorkflowStaticData('global', state);

// Stop on 3 consecutive failures from the same tool
if (tf.consecutiveFails >= 3) {
  throw new Error(
    `[RunGuard] HTTP tool failure cascade detected on "${toolName}": ` +
    `${tf.consecutiveFails} consecutive failures. ` +
    `Last error: ${JSON.stringify(input.error || input.statusCode)}. ` +
    `Stopping agent to prevent retry storm.`
  );
}

// Stop if failure rate exceeds 70% across 5+ total calls
const total = tf.successes + tf.failures;
if (total >= 5 && tf.failures / total > 0.7) {
  throw new Error(
    `[RunGuard] HTTP tool unreliable: "${toolName}" failing at ` +
    `${Math.round(tf.failures / total * 100)}% rate (${tf.failures}/${total} calls). ` +
    `Stopping agent — the endpoint may be unavailable.`
  );
}

return $input.first().json;

Putting it all together: the n8n agent guard workflow

The four guards above are most effective when composed into a reusable "RunGuard" sub-workflow that any AI Agent workflow can call. The architecture:

  1. Guard sub-workflow input node receives { guardType, toolName, toolArgs, toolResult, depth, sessionId }
  2. A Switch node routes to the appropriate Code node by guardType: "spiral", "depth", "budget", or "http_failure"
  3. Each guard Code node runs its check and either passes through or throws
  4. The output routes back to the calling workflow via the Execute Workflow return

In any AI Agent workflow, wire every tool through this guard sub-workflow as an intermediate step. The pattern — trigger → guard → tool → guard → back to agent — adds one round-trip per tool call but gives you full observability and a hard stop on any of the four failure modes.

Here's the complete guard orchestrator for the Switch node's "spiral" route, showing how all the pieces connect in a single Code node with configurable thresholds:

// n8n Code node: "Agent Guard Orchestrator"
// Called by all four guard routes from the Switch node
// Receives { guardType, toolName, toolArgs, toolResult, depth, sessionId }

const input = $input.first().json;
const {
  guardType,
  toolName = 'unknown',
  toolArgs = {},
  toolResult = {},
  depth = 0,
  sessionId = $execution.id,
} = input;

const THRESHOLDS = {
  spiral: { similarityThreshold: 0.72, windowSize: 4, minHighSimilarityPairs: 2 },
  depth: { maxDepth: 4 },
  budget: { maxSessionTokens: 40000, maxSingleResultTokens: 6000 },
  http_failure: { maxConsecutiveFails: 3, maxFailureRate: 0.70, minCallsForRate: 5 },
};

const state = $getWorkflowStaticData('global');
if (!state.guards) state.guards = {};
if (!state.guards[sessionId]) {
  state.guards[sessionId] = {
    spiralHistory: {},
    tokenBudget: { used: 0, calls: 0 },
    toolFailures: {},
    createdAt: Date.now(),
  };
}
const gs = state.guards[sessionId];

// Purge sessions older than 1 hour to prevent static data bloat
const now = Date.now();
Object.keys(state.guards).forEach(sid => {
  if (now - state.guards[sid].createdAt > 3600000) delete state.guards[sid];
});

let result = { passed: true, guardType, toolName };

if (guardType === 'spiral') {
  const cfg = THRESHOLDS.spiral;
  if (!gs.spiralHistory[toolName]) gs.spiralHistory[toolName] = [];
  const history = gs.spiralHistory[toolName];

  function normalize(obj) {
    return JSON.stringify(obj).toLowerCase().replace(/[^a-z0-9\s]/g, ' ')
      .split(/\s+/).filter(Boolean).sort().join(' ');
  }
  function jaccard(a, b) {
    const sA = new Set(a.split(' ')), sB = new Set(b.split(' '));
    const inter = new Set([...sA].filter(x => sB.has(x)));
    const union = new Set([...sA, ...sB]);
    return union.size === 0 ? 0 : inter.size / union.size;
  }

  history.push(normalize(toolArgs));
  if (history.length > cfg.windowSize) history.shift();
  $setWorkflowStaticData('global', state);

  if (history.length >= 3) {
    const sims = [];
    for (let i = 0; i < history.length - 1; i++)
      for (let j = i + 1; j < history.length; j++)
        sims.push(jaccard(history[i], history[j]));
    const high = sims.filter(s => s >= cfg.similarityThreshold).length;
    if (high >= cfg.minHighSimilarityPairs) {
      throw new Error(
        `[RunGuard:spiral] Tool "${toolName}" spiral detected: ` +
        `${high} near-identical calls in window of ${history.length}. ` +
        `Max similarity: ${Math.max(...sims).toFixed(2)}.`
      );
    }
  }

} else if (guardType === 'depth') {
  $setWorkflowStaticData('global', state);
  if (depth >= THRESHOLDS.depth.maxDepth) {
    throw new Error(
      `[RunGuard:depth] Recursion depth ${depth} >= max ${THRESHOLDS.depth.maxDepth}. ` +
      `Root session: ${sessionId}.`
    );
  }
  result.nextDepth = depth + 1;

} else if (guardType === 'budget') {
  const cfg = THRESHOLDS.budget;
  const resultStr = JSON.stringify(toolResult);
  const estTokens = Math.ceil(resultStr.length / 4);
  gs.tokenBudget.used += estTokens;
  gs.tokenBudget.calls += 1;
  $setWorkflowStaticData('global', state);

  if (estTokens > cfg.maxSingleResultTokens) {
    const truncated = resultStr.substring(0, cfg.maxSingleResultTokens * 4);
    result.toolResult = { content: truncated + '\n[truncated by RunGuard]', _truncated: true };
  }
  if (gs.tokenBudget.used > cfg.maxSessionTokens) {
    throw new Error(
      `[RunGuard:budget] Session token budget exhausted: ~${gs.tokenBudget.used} est. tokens ` +
      `across ${gs.tokenBudget.calls} calls (limit: ${cfg.maxSessionTokens}).`
    );
  }

} else if (guardType === 'http_failure') {
  const cfg = THRESHOLDS.http_failure;
  if (!gs.toolFailures[toolName]) gs.toolFailures[toolName] = { s: 0, f: 0, cf: 0 };
  const tf = gs.toolFailures[toolName];
  const isErr = toolResult.error || (toolResult.statusCode && toolResult.statusCode >= 400);
  isErr ? (tf.f++, tf.cf++) : (tf.s++, tf.cf = 0);
  $setWorkflowStaticData('global', state);

  if (tf.cf >= cfg.maxConsecutiveFails) {
    throw new Error(
      `[RunGuard:http_failure] "${toolName}" failed ${tf.cf} times consecutively. Stopping.`
    );
  }
  const total = tf.s + tf.f;
  if (total >= cfg.minCallsForRate && tf.f / total > cfg.maxFailureRate) {
    throw new Error(
      `[RunGuard:http_failure] "${toolName}" failure rate ${Math.round(tf.f/total*100)}% ` +
      `(${tf.f}/${total} calls) exceeds threshold. Stopping.`
    );
  }
}

return [{ json: { ...result, passed: true } }];

Integrating with n8n's error workflow

Every guard node throws a descriptive error when it trips. To turn those throws into actionable alerts rather than silent execution failures, configure an Error Workflow in your n8n settings:

  1. Create a dedicated "Agent Alert" workflow with a Start node set to "Error Trigger".
  2. Add an IF node that checks whether {{ $json.errorMessage }} contains "[RunGuard".
  3. Route RunGuard errors to a Slack node (or email, PagerDuty, webhook) with the error message, workflow name, and execution ID.
  4. Route non-RunGuard errors to a separate handler for investigation.

This gives you real-time Slack notifications on every guard trip — the same signal you'd get from RunGuard's managed service, but wired entirely within n8n's native error handling infrastructure.

Configuration tuning guide

Guard Parameter Default When to raise
Spiral similarityThreshold 0.72 If you have tools with naturally repetitive arguments (paginated API calls with the same base query + different cursor), raise to 0.85 and add cursor-stripping to the normalize function.
Spiral windowSize 4 Raise to 6 if your agent legitimately calls the same tool 3 times in quick succession for different sub-tasks within a broader research run.
Depth maxDepth 4 Raise only if you have intentionally recursive workflows that need more than 4 levels of nesting. Each level should be documented with a clear purpose.
Budget maxSessionTokens 40000 Raise for document-processing agents expected to ingest large content. Set to match your per-run LLM spend cap (40,000 tokens ≈ $0.20 at GPT-4o pricing).
Budget maxSingleResultTokens 6000 Raise for agents that intentionally process full-length documents via HTTP Request. 6,000 tokens ≈ a 24,000-character web page — sufficient for most summaries.
HTTP failure maxConsecutiveFails 3 Raise to 5 if your external APIs are occasionally flaky and the agent should be more patient before aborting.

RunGuard integration for n8n

If you'd rather not maintain the guard Code nodes yourself, RunGuard provides all four checks as a managed HTTP endpoint callable from any n8n workflow. The integration replaces each guard Code node with a single HTTP Request node pointing to the RunGuard API:

// n8n HTTP Request node configuration for RunGuard
// Method: POST
// URL: https://api.runguard.dev/v1/check
// Authentication: Header Auth (name: X-RunGuard-Key, value: your API key)
// Body (JSON):
{
  "app_id": "n8n-research-agent",
  "guard_type": "{{ $json.guardType }}",
  "tool_name": "{{ $json.toolName }}",
  "tool_args": "{{ $json.toolArgs }}",
  "session_id": "{{ $execution.id }}",
  "depth": "{{ $json.depth || 0 }}"
}

// Response on pass:  { "passed": true, "checks": [...] }
// Response on trip:  HTTP 409 { "passed": false, "reason": "spiral", "detail": "..." }
// n8n treats 409 as an error — configure the HTTP node to stop execution on error

RunGuard stores every trip event in your dashboard with full tool call history, similarity scores, and session token estimates — the same data the local Code nodes would log to your n8n execution log, but persistent across executions and searchable across all your workflows.

The managed integration also handles the static data cleanup problem automatically. n8n's $getWorkflowStaticData persists across workflow executions — if you run thousands of agent sessions per month, the local guard state accumulates in workflow static data indefinitely. RunGuard's server-side storage is session-scoped and auto-expires after 24 hours, eliminating the need for the manual purge loop in the orchestrator code above.

FAQ

Does the spiral guard work with n8n's parallel tool execution (when the model calls multiple tools simultaneously)?

Yes, with one caveat. When the AI Agent node fires multiple parallel tool calls in a single step, all tool nodes execute concurrently. The spiral guard tracks calls per-tool-name using a sliding window — if the model calls webSearch three times in parallel within one step, all three calls hit the guard in quick succession. The Jaccard similarity check will fire if the three parallel queries are semantically identical (a degenerate case where the model duplicates rather than parallelizes). For legitimate parallel execution — where each of three simultaneous calls has a distinct query — the similarity scores will be low and the guard passes. If you notice false positives on intentional parallel calls to the same tool, raise the minHighSimilarityPairs threshold from 2 to 3 so the guard only trips when at least 3 of the 4 last calls are near-identical.

The depth guard requires passing _rg_depth through all sub-workflow inputs. What if I'm calling third-party n8n workflows I don't control?

Third-party sub-workflows won't propagate the depth counter, which means the guard only fires at the boundary of workflows you control. For sub-workflows you don't control, add a wrapper sub-workflow that you do control: your AI Agent calls your wrapper, the wrapper calls the third-party workflow, and the wrapper enforces its own depth counter independently. This gives you a circuit breaker on the outbound call even if the third-party workflow doesn't support the depth propagation protocol. Additionally, monitor third-party sub-workflow execution counts via n8n's built-in execution list — a sudden spike in executions of a particular workflow is a reliable signal of recursion even without depth tracking.

The budget guard uses character count as a token estimate. How accurate is this for non-English content or structured JSON results?

The 4-characters-per-token approximation is calibrated for English prose. For structured JSON (common in HTTP Request tool responses), the ratio is closer to 2.5–3.5 characters per token because JSON keys, punctuation, and numbers are tokenized differently. For non-Latin scripts (Chinese, Arabic, Japanese), the ratio can be as low as 1–2 characters per token. If your workflows process primarily JSON or non-English content, use a tighter multiplier: replace Math.ceil(resultStr.length / 4) with Math.ceil(resultStr.length / 2.5) to avoid underestimating token usage. Better still, log estimated vs. actual token counts (from the LLM response's usage field if your chat model exposes it in n8n) for a week to calibrate the multiplier for your specific workload.

Can I use these guards with n8n's self-hosted community edition, or do they require n8n Cloud?

All four guards use only n8n core features — Code nodes, Execute Workflow nodes, Switch nodes, HTTP Request nodes, and $getWorkflowStaticData — that are fully available in the self-hosted community edition. There are no n8n Cloud-specific features involved. One difference: n8n Cloud automatically limits workflow execution time (default 300 seconds per execution), which acts as a soft safety net. In self-hosted deployments with no execution timeout configured, the guards are more important because a runaway agent has no system-level backstop beyond your LLM provider's rate limit. On self-hosted instances, also configure EXECUTIONS_TIMEOUT in your n8n environment variables as an additional outer boundary.

The static data approach for tracking guard state can get stale if a workflow is updated mid-session. How do I handle this?

When an n8n workflow is updated (saved with changes), active executions continue using the prior workflow version — n8n doesn't interrupt in-flight executions. The static data from the prior execution, however, persists across the update. If you update a workflow while agent sessions are in progress, the guard state (spiral history, token budget, failure counts) remains valid for those sessions. The session-ID-based state partitioning in the orchestrator code handles this correctly: each $execution.id gets its own state bucket, so a mid-session workflow update that restarts some executions simply creates new state buckets. The only edge case is if you change the THRESHOLDS values mid-session — those take effect immediately on the next Code node execution within existing sessions, since they're read at runtime, not cached in static data.

Stop runaway n8n agent workflows before the bill lands

RunGuard wraps all four n8n agent guards — spiral detection, sub-workflow recursion limiting, token budget enforcement, and HTTP failure cascade prevention — as a managed HTTP endpoint. Replace four Code nodes with one HTTP Request node and get a persistent 30-day trip dashboard with Slack alerts included.

Start free 14-day trial →