SuperAGI Cost Control: Recursive Task Spawning, Tool Retry Loops, and Parallel Agent Budget Runaway
SuperAGI is an open-source autonomous agent framework designed to run long-horizon goal-directed agents that break complex objectives into task queues, execute tools at each step, and optionally spawn parallel sub-agents to accelerate throughput. Unlike frameworks that require you to script the agent's loop yourself — CrewAI, AutoGen — SuperAGI handles task decomposition, memory management, and multi-agent coordination as first-class features. This makes it fast to deploy an agent that "just runs" toward a goal. It also makes it easy to deploy an agent that "just runs" indefinitely at your expense.
The core execution model: you define an agent with a goal, a set of tools, and a model (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro via SuperAGI's model abstraction). SuperAGI's task controller calls the LLM to produce a task list from the goal, then works through that list step by step — calling the LLM again at each step to pick the right tool, calling the tool, evaluating the result, and updating the task list if needed. Every LLM call in this loop bills at the configured model's token rates. The task list is not fixed at start-up: the agent can extend, reorder, or spawn child tasks based on what it encounters. That dynamic task list is where the four main cost failure modes originate.
The four patterns that account for the majority of unexpected SuperAGI LLM bills:
- Recursive goal decomposition — when an agent's goal is ambiguous or very broad, SuperAGI's task planner generates a large initial task list and then generates subtask lists for each item; tasks that require research may spawn additional sub-searches; a goal like "analyze our market position and produce a strategy document" can produce 30–50 task-planning LLM calls before a single tool executes.
- Tool retry spiral — an agent whose chosen tool returns an error or a low-confidence result will re-plan and retry, often choosing the same tool with slightly different arguments; a broken API endpoint or misconfigured credential can produce a loop that consumes the full
iteration_limit, with each iteration paying a planning LLM call plus the (failing) tool call. - Parallel agent multiplication — SuperAGI supports spawning multiple agents to work on subtasks in parallel; an orchestrator agent that decomposes a goal into N subtasks and spawns N specialist agents multiplies the per-task LLM cost N-fold; if any specialist agent itself loops, the cost multiplies further.
- Resource search reformulation loop — SuperAGI's resource manager lets agents query a vector store for context before deciding which tool to use; when the agent's goal requires knowledge that is not in the resource store, it reformulates the search query repeatedly before falling back to web search, paying an embedding call plus a planning LLM call per reformulation attempt.
SuperAGI's cost model
SuperAGI itself is open-source and free to self-host. The LLM costs come entirely from the model provider your deployment is configured to use. In the SuperAGI Cloud hosted product, LLM costs are bundled into a platform subscription; in self-hosted deployments, costs flow directly to your OpenAI, Anthropic, or Google API key.
Key cost multipliers in the SuperAGI execution model:
- Task planning calls: each time SuperAGI's task controller evaluates what to do next, it makes an LLM call with the full agent context (goal, completed tasks, current task, available tools, memory excerpts). A 10-task agent makes at minimum 10 planning calls, plus additional calls for any task list updates.
- Tool selection calls: separate from task planning, SuperAGI makes a dedicated LLM call to select which tool to invoke for the current task. This is a second LLM call per task step on top of the planning call in many agent configurations.
- Context accumulation: the agent's context grows with each completed task as results are appended to the conversation history. By task 15, a planning call that cost $0.01 at task 1 may cost $0.04 due to the larger context window, even with a fixed number of active tasks remaining.
- Memory retrieval: SuperAGI queries its vector store at each step to inject relevant memories into the agent's context. Each query pays an embedding model call. High-frequency agents make hundreds of embedding calls per session.
- Parallel agent overhead: each spawned agent maintains its own context and makes its own planning + tool-selection calls independently of sibling agents. There is no shared context between parallel agents by default.
A practical cost baseline: a SuperAGI agent completing a 15-task research goal with GPT-4o (at ~$0.01 per planning call, growing to ~$0.03 by late tasks) runs roughly $0.25–$0.60 in LLM costs for a successful, non-looping run. The same agent in a retry spiral on a broken tool can consume $5–$15 for the same goal before hitting the iteration limit — a 10–25× overrun on a task that never completes.
Failure mode 1: recursive goal decomposition
SuperAGI's task controller is designed to be adaptive. When it encounters a task that is too broad to execute with a single tool call, it can decompose that task into subtasks and insert them into the queue. This is the feature that makes SuperAGI powerful for long-horizon goals — and it is the first mechanism for runaway cost accumulation.
The decomposition risk scales with goal ambiguity. A goal like "research competitor pricing and write a comparison table" is well-scoped: the agent produces a short task list (search competitors, extract pricing, format table) and completes it. A goal like "assess our go-to-market strategy and identify growth opportunities" is open-ended: the agent may produce 10 top-level tasks, then decompose "analyze competitive landscape" into 6 subtasks, then decompose "analyze customer acquisition channels" into 8 subtasks. Each decomposition node pays a planning LLM call. A three-level decomposition tree with branching factor 6 produces up to 43 LLM planning calls before any tool runs.
The decomposition depth is controlled by the max_iterations parameter in SuperAGI's agent configuration, but max_iterations bounds total steps — not decomposition depth. An agent with max_iterations=50 can spend all 50 iterations on task planning without executing a single useful tool call if the goal generates deep decomposition trees at each planning step.
from runguard import BudgetTracker, BudgetExceededError
import anthropic
def estimate_superagi_goal_complexity(
goal: str,
model: str = "claude-haiku-4-5-20251001",
) -> dict:
"""
Asks a fast, cheap model to estimate task decomposition depth and
planning call count before launching a SuperAGI agent with an expensive model.
Returns estimated planning calls and a complexity label.
"""
client = anthropic.Anthropic()
prompt = (
"You are a SuperAGI task planner estimator. "
"Given the following agent goal, estimate: "
"(1) how many top-level tasks would be generated, "
"(2) how many of those tasks would decompose into subtasks, "
"(3) the expected total task-planning LLM calls (top-level + subtask nodes). "
"Respond as JSON: {\"top_level_tasks\": N, \"decomposition_nodes\": N, \"total_planning_calls\": N, \"complexity\": \"low|medium|high\"}. "
f"Goal: {goal}"
)
msg = client.messages.create(
model=model,
max_tokens=256,
messages=[{"role": "user", "content": prompt}],
)
import json, re
raw = msg.content[0].text
match = re.search(r'\{.*\}', raw, re.DOTALL)
if not match:
return {"total_planning_calls": 99, "complexity": "high"}
return json.loads(match.group())
def launch_superagi_agent_with_guard(
goal: str,
agent_config: dict,
max_planning_calls: int = 25,
session_budget_usd: float = 5.0,
cost_per_planning_call_usd: float = 0.02,
) -> None:
"""
Pre-checks goal complexity and budget before starting a SuperAGI agent run.
Raises RuntimeError if complexity or estimated cost exceeds thresholds.
"""
budget = BudgetTracker(cap=session_budget_usd)
complexity = estimate_superagi_goal_complexity(goal)
if complexity["total_planning_calls"] > max_planning_calls:
raise RuntimeError(
f"Goal blocked: estimated {complexity['total_planning_calls']} "
f"planning calls exceeds max_planning_calls={max_planning_calls}. "
f"Complexity: {complexity['complexity']}. Narrow the goal scope."
)
estimated_cost = complexity["total_planning_calls"] * cost_per_planning_call_usd
try:
budget.add(estimated_cost)
except BudgetExceededError as e:
raise RuntimeError(
f"Goal blocked: estimated cost ${estimated_cost:.2f} "
f"would exceed session budget ${session_budget_usd:.2f}."
) from e
# Proceed with agent launch using your SuperAGI SDK/API integration
print(
f"Goal complexity check passed: "
f"{complexity['total_planning_calls']} estimated planning calls, "
f"~${estimated_cost:.2f}. Launching agent."
)
The pre-check uses a fast, inexpensive model (Claude Haiku 4.5 here — at sub-cent costs) to estimate planning depth before committing the production model's budget. For SuperAGI deployments where agents run on GPT-4o or Claude 3.7 Sonnet, the planning-call cost differential between the estimation model and the execution model is 10–50×, so even a rough estimate is cost-justified. A goal that estimates 40 planning calls on Haiku should be decomposed into smaller sub-goals before being handed to the full agent.
The cost_per_planning_call_usd parameter needs calibration per deployment. Measure 10 planning calls with your actual model and context size and use the observed average. Planning calls grow in cost as context accumulates — the estimate is a lower bound for the full run.
Failure mode 2: tool retry spiral
SuperAGI agents re-plan after every tool result. When a tool returns an error — a 401 Unauthorized from an expired API key, a 503 from a temporarily unavailable service, a JSON parse failure from an LLM tool that returned malformed output — the agent's next planning call sees the error in its context and decides what to do. In most cases, the planning LLM concludes that it should retry the same tool with the same or slightly modified arguments. This is the tool retry spiral.
A concrete case: a SuperAGI agent configured with a GitHub search tool hits a rate limit (HTTP 429). The agent's planning LLM sees "GitHub search failed: rate limit exceeded" and decides to try GitHub search again — because there is no instruction in the system prompt to back off on rate limit errors, and the tool is the only available option for the current task. The retry hits the same 429. The agent retries again. At iteration_limit=20 with 2 LLM calls per iteration (planning + tool selection), this spiral costs 40 LLM calls for a task that could have been resolved by pausing 60 seconds.
The retry spiral is especially expensive because context grows with each failed attempt. Each tool error is appended to the conversation history, so by iteration 15, the planning call is paying for 14 error messages worth of tokens in the context window. The 15th planning call costs significantly more than the first — for an error that has not changed.
from runguard import LoopDetector, LoopDetectedError
import functools, time
def superagi_tool_with_retry_guard(
tool_fn,
max_consecutive_errors: int = 3,
error_window_seconds: int = 120,
backoff_seconds: float = 30.0,
):
"""
Wraps a SuperAGI tool function with consecutive-error detection.
After max_consecutive_errors failures within error_window_seconds,
raises LoopDetectedError to trip the circuit breaker.
Adds backoff on rate-limit errors to prevent 429 spirals.
"""
detector = LoopDetector(
max_consecutive=max_consecutive_errors,
window_seconds=error_window_seconds,
)
@functools.wraps(tool_fn)
def guarded(*args, **kwargs):
try:
result = tool_fn(*args, **kwargs)
detector.record_success()
return result
except Exception as exc:
error_str = str(exc)
# Rate limit: back off before recording the error
if "429" in error_str or "rate limit" in error_str.lower():
time.sleep(backoff_seconds)
try:
detector.record_error(error_str)
except LoopDetectedError as loop_err:
raise RuntimeError(
f"Tool '{tool_fn.__name__}' tripped circuit breaker: "
f"{max_consecutive_errors} consecutive errors. "
f"Last error: {error_str}. "
f"Agent halted to prevent cost spiral."
) from loop_err
raise # re-raise original for the agent's planning LLM to see
return guarded
# Usage: wrap tools before registering with SuperAGI
from superagi.tools.base_tool import BaseTool # hypothetical import path
class GuardedGitHubSearchTool(BaseTool):
name = "GitHub Search"
description = "Search GitHub repositories and issues"
def _run(self, query: str) -> str:
guarded_search = superagi_tool_with_retry_guard(
self._raw_github_search,
max_consecutive_errors=3,
backoff_seconds=45.0,
)
return guarded_search(query)
def _raw_github_search(self, query: str) -> str:
# actual GitHub API call here
...
The wrapper intercepts tool errors at the Python level rather than relying on SuperAGI's agent loop to self-limit. This is important because iteration_limit is a blunt instrument — it stops the agent after N total iterations regardless of whether those iterations made progress. The consecutive-error guard stops the agent specifically because a tool is repeatedly failing, which is the pattern that indicates a spiral rather than a slow but progressing agent.
For rate-limit errors specifically, the backoff before recording the error prevents the guard from tripping prematurely on a temporary 429. A 30–45 second backoff is typically enough for OpenAI, GitHub, and Google API rate limits to reset. If the error persists after backoff, it is structural (wrong credentials, exhausted quota) and the guard should trip immediately.
Failure mode 3: parallel agent multiplication
SuperAGI's multi-agent mode lets an orchestrator agent spawn specialist agents to work on subtasks in parallel. This is one of the framework's headline capabilities — it can dramatically reduce wall-clock time for tasks that decompose cleanly into independent workstreams. The cost consequence is equally dramatic: spawning N parallel agents multiplies the LLM cost of any per-task overhead by N, including any loops those agents encounter.
A typical deployment: an orchestrator agent receives a goal to "research three competing products and produce a feature comparison." It spawns three specialist researcher agents, one per competitor, to run in parallel. Each researcher makes 8–12 planning calls plus tool calls. Total LLM cost: 3× the single-researcher cost. If one researcher hits a data source that returns incomplete results and enters a reformulation loop (failure mode 4), it may make 20–30 planning calls while the other two complete normally — but the orchestrator waits for all three before aggregating results, so the loop continues uninterrupted.
The multiplication risk grows with hierarchical delegation. An orchestrator that spawns 3 specialists, where each specialist spawns 2 tool-runner agents, produces a 3-level tree with up to 9 leaf agents running simultaneously. Any loop at any leaf propagates cost while the full tree waits.
from runguard import BudgetTracker, BudgetExceededError
import threading
class ParallelAgentGuard:
"""
Enforces a maximum parallel agent count and a shared budget across
all concurrently-running SuperAGI sub-agents in a session.
Thread-safe for use with SuperAGI's concurrent agent execution.
"""
def __init__(
self,
max_parallel: int = 4,
session_budget_usd: float = 10.0,
cost_per_agent_planning_call_usd: float = 0.025,
max_planning_calls_per_agent: int = 20,
):
self.max_parallel = max_parallel
self.cost_per_planning_call = cost_per_agent_planning_call_usd
self.max_calls_per_agent = max_planning_calls_per_agent
self._budget = BudgetTracker(cap=session_budget_usd)
self._active = 0
self._lock = threading.Lock()
def request_agent_slot(self, agent_name: str) -> None:
"""
Called before spawning each sub-agent.
Blocks spawn if parallel limit or budget would be exceeded.
"""
with self._lock:
if self._active >= self.max_parallel:
raise RuntimeError(
f"Agent spawn blocked: {self._active} agents already running "
f"(max_parallel={self.max_parallel}). "
f"Wait for active agents to complete before spawning '{agent_name}'."
)
estimated_cost = (
self.max_calls_per_agent * self.cost_per_planning_call
)
try:
self._budget.add(estimated_cost)
except BudgetExceededError as e:
raise RuntimeError(
f"Agent spawn blocked: estimated cost for '{agent_name}' "
f"(${estimated_cost:.2f}) would exceed session budget. "
f"Current session spend: ${e.spent:.2f}"
) from e
self._active += 1
def release_agent_slot(self, agent_name: str) -> None:
with self._lock:
self._active = max(0, self._active - 1)
# Usage in orchestrator agent logic
guard = ParallelAgentGuard(max_parallel=3, session_budget_usd=8.0)
def spawn_researcher(competitor: str):
guard.request_agent_slot(f"researcher-{competitor}")
try:
# launch SuperAGI sub-agent for competitor research
run_superagi_agent(goal=f"Research {competitor} product features", ...)
finally:
guard.release_agent_slot(f"researcher-{competitor}")
The guard enforces two limits before each spawn: a concurrency ceiling (preventing the tree from growing wider than you can monitor and budget for) and a pre-committed budget check based on the worst-case cost of the new agent. The worst-case estimate — max_planning_calls × cost_per_call — is conservative but ensures the budget check is meaningful. If actual calls are fewer, the unused budget estimate can be released via BudgetTracker.release() when the agent slot is freed.
The max_parallel=3 ceiling in the example is not arbitrary. For most research-style goals in SuperAGI, 3 parallel agents provide 80–90% of the throughput benefit relative to full parallelism, while keeping the worst-case cost overrun bounded. Beyond 4–5 parallel agents, the orchestrator's own planning overhead (waiting for, aggregating, and re-planning around N results) often dominates the wall-clock time gain.
Failure mode 4: resource search reformulation loop
SuperAGI's resource manager allows agents to query a vector store of uploaded documents, prior agent outputs, and knowledge base entries before deciding which tool to use. The intent is to ground agents in relevant context before they act. The cost risk arises when the agent's current task requires knowledge that is not in the resource store — because the agent does not know this. It queries the resource store, receives low-similarity results, and concludes it should try a better search query. After 3–5 reformulations that all return low-similarity results, the agent typically falls back to a web search tool. But those 3–5 reformulations each paid an embedding call plus a planning LLM call.
For agents that frequently encounter out-of-distribution tasks, the reformulation overhead is structural: every task involving a topic not yet in the resource store pays 3–5 extra LLM calls before any real work begins. At high agent throughput — an automated SuperAGI deployment processing 100 requests per day — that overhead adds up to hundreds of wasted LLM calls daily.
from runguard import LoopDetector, LoopDetectedError
class ResourceQueryGuard:
"""
Tracks consecutive low-similarity resource query results.
Trips circuit breaker after max_low_similarity_attempts to prevent
the agent from spinning on reformulations for knowledge it does not have.
"""
SIMILARITY_THRESHOLD = 0.45 # below this = "not in resource store"
def __init__(self, max_low_similarity_attempts: int = 3):
self.detector = LoopDetector(max_consecutive=max_low_similarity_attempts)
self._reformulation_count = 0
def check_query_result(
self,
query: str,
top_similarity_score: float,
task_context: str = "",
) -> bool:
"""
Returns True if agent should proceed with resource results.
Returns False if agent should skip to web search immediately.
Raises LoopDetectedError if max reformulation attempts exceeded.
"""
if top_similarity_score >= self.SIMILARITY_THRESHOLD:
self.detector.record_success()
self._reformulation_count = 0
return True
self._reformulation_count += 1
try:
self.detector.record_error(
f"low_similarity:{top_similarity_score:.3f} query='{query[:50]}'"
)
except LoopDetectedError:
raise RuntimeError(
f"Resource search circuit breaker tripped after "
f"{self._reformulation_count} low-similarity attempts. "
f"Topic likely absent from resource store. "
f"Skipping resource search for task: {task_context[:100]}"
)
# Signal to skip additional reformulation — go straight to web search
return False
The similarity threshold of 0.45 is a starting point — calibrate it against your specific embedding model and resource corpus. A well-populated resource store with dense coverage will have most relevant queries returning scores above 0.7; scores consistently below 0.45 reliably indicate a topic gap. After 3 consecutive low-similarity results, the guard raises and signals the agent to proceed with a web search or acknowledge the knowledge gap rather than continuing to reformulate.
Unlike the other three failure modes, resource search reformulation is often invisible in standard SuperAGI logging because the reformulation steps are internal to the resource query subsystem. Monitoring embedding call counts per agent run is the practical way to detect this pattern in production — a spike in embedding calls relative to tool calls is the signature of a reformulation loop.
Putting the guards together
The four failure modes in SuperAGI are not mutually exclusive. A broad goal triggers recursive decomposition (mode 1), which produces subtasks that hit a broken API (mode 2), which the orchestrator addresses by spawning an additional specialist agent (mode 3), which searches a resource store for context it does not contain (mode 4). Each mode amplifies the others. The goal complexity pre-check catches mode 1 before the agent runs; the tool wrapper catches mode 2 at the tool boundary; the parallel spawn guard catches mode 3 at the orchestrator level; the resource similarity guard catches mode 4 at the query boundary. Together they form a layered defense for the full SuperAGI execution path.
| Failure mode | Detection point | Guard mechanism | Typical cost without guard |
|---|---|---|---|
| Recursive decomposition Broad goal spawns deep task tree |
Before agent launch | Goal complexity pre-check (fast model estimate) | $2–$8 in planning calls before any tool runs |
| Tool retry spiral Broken tool retried to iteration limit |
Tool wrapper (per call) | Consecutive error detector + backoff | 10–25× single-task cost overrun |
| Parallel agent multiplication Uncapped parallel spawns multiply loops |
Orchestrator (before spawn) | Concurrency ceiling + pre-committed budget | N× overrun where N = parallel agent count |
| Resource reformulation loop Topic-absent queries reformulated repeatedly |
Resource query wrapper | Similarity threshold + consecutive low-score detector | 3–8 extra LLM calls per topic-absent task |
For teams building with SuperAGI in production, the highest-priority guard is the goal complexity pre-check (mode 1) — it operates at zero cost relative to a full agent run and prevents the most common cause of "why did this agent make 50 LLM calls and produce nothing." The second priority is the tool retry guard (mode 2), which operates at the tool boundary and catches the error patterns that SuperAGI's own iteration limit does not distinguish from forward-progress iterations. Modes 3 and 4 apply specifically to deployments using multi-agent and resource store features and add minimal overhead when those features are in use.
Teams using OpenAI Agents SDK or LangGraph alongside SuperAGI for different agent roles face the same four failure modes in their respective frameworks — the pattern of recursive decomposition, retry spirals, parallel multiplication, and knowledge-gap loops is not SuperAGI-specific, it is structural to any autonomous agent framework that gives the agent control over its own task list.
Frequently asked questions
What is SuperAGI's default iteration limit and is it enough to prevent cost overruns?
SuperAGI's default iteration_limit is typically 25 in self-hosted deployments, though this is configurable. The limit is a blunt instrument: it stops the agent after N total iterations regardless of whether those iterations made progress or were spent retrying a broken tool. For a 10-task agent with 2 LLM calls per iteration, 25 iterations produces up to 50 LLM calls — and a tool retry spiral can consume all 25 iterations without completing any task. The iteration limit prevents runaway agents from running forever, but it does not distinguish a productive agent from a stuck one. Circuit breaker guards that detect consecutive errors or low-similarity results provide finer control at lower cost than relying on the iteration limit alone.
Does SuperAGI Cloud provide any built-in cost controls?
SuperAGI Cloud includes a credit-based usage system with configurable credit limits per agent and per workspace. When credits are exhausted, agent runs stop. However, this is a post-hoc cap rather than a per-run circuit breaker: the credits are consumed as the agent runs, and the run stops when the credit ceiling is hit rather than when a loop is detected. If the iteration limit is 25 and the credit limit cuts the agent off at iteration 18, the agent terminates mid-task with incomplete results. Circuit breaker guards that halt the agent specifically on loop-indicating patterns — consecutive errors, low-similarity resource queries, exploding task lists — provide a more precise stopping condition that preserves completed results rather than truncating mid-run.
How do I measure actual planning call costs in a SuperAGI deployment?
In self-hosted deployments, SuperAGI logs LLM calls through its model abstraction layer. Enable debug logging (LOG_LEVEL=DEBUG) to capture each call's prompt token count, completion token count, and model ID. Multiply by the model's token rates to compute per-call costs. For production monitoring, instrument the LLM client directly — wrap the create_chat_completion call in your model provider client to log token counts to a time-series store. After 50–100 agent runs, you can calculate the p50 and p95 planning call costs per task type, which gives you the calibration data needed for the goal complexity pre-check thresholds.
Does the parallel agent guard work with SuperAGI's async agent execution?
The ParallelAgentGuard example uses a threading lock for thread-safety, which works with both synchronous and async SuperAGI deployments that use threads for concurrency. For async-native deployments using Python's asyncio, replace threading.Lock() with asyncio.Lock() and make request_agent_slot an async def method that uses async with self._lock. The logic is otherwise identical. The key invariant is that the budget pre-commitment happens before the spawn and cannot be interleaved with concurrent spawn requests — the lock enforces that regardless of whether the concurrency model is threads or coroutines.
What is the relationship between SuperAGI and AutoGPT or BabyAGI cost patterns?
SuperAGI, AutoGPT, and BabyAGI all share the same underlying architecture: an LLM is called repeatedly to plan, execute, and revise a task list toward a goal. The cost failure modes are structurally identical across all three: recursive task decomposition, tool retry spirals, and unbounded iteration. SuperAGI adds multi-agent parallelism and a resource store — failure modes 3 and 4 — that AutoGPT and BabyAGI do not expose as first-class features. The guards in this post apply equally to any agent framework in the "LLM plans its own next action" family; see the AutoGen cost control guide and CrewAI cost control guide for framework-specific patterns.