CrewAI budget per agent: why a shared token pool is not enough for multi-role crews
CrewAI makes it easy to define a crew of specialized agents — a researcher, a writer, a reviewer — each with its own role, goal, and backstory. What CrewAI doesn’t give you is a per-agent budget. All roles draw from the same underlying LLM client. If your researcher agent enters an information-gathering loop — repeatedly calling search tools or web-scraping tools because results are never quite sufficient — it can consume 80% of your session’s token budget before the writer agent starts its first task. This page shows why per-role cost caps matter and how to add them to a CrewAI crew using RunGuard.
The shared-pool problem in CrewAI
- How CrewAI delegates to agents. In a typical CrewAI sequential or hierarchical crew, the crew manager delegates tasks to agents by role. Each agent uses the LLM client configured on the
Agentobject (or the crew-level fallback). There is no built-in tracking of how many tokens or dollars each agent has consumed within the current crew run. The only budget control available is a single globalmax_iteron each agent (default 25), which limits the number of iterations per task, not the dollar spend. - Why max_iter is not a cost cap.
max_itercounts the number of times an agent loops through its ReAct cycle (think, act, observe). A single iteration that calls a high-context tool — fetching a large web page, summarizing a long document — can cost $0.10–$0.50 at GPT-4o pricing. At defaultmax_iter=25, a single runaway researcher agent can spend $2.50–$12.50 before CrewAI stops it. If you have three agents in the crew and they all hit their iteration limits, costs compound. - The delegation-loop variant. In hierarchical crews, the manager agent can re-delegate tasks to subordinate agents if the output is judged insufficient. A manager that repeatedly re-delegates the same task — because the subtask completion criteria are vague — generates a delegation loop: manager → researcher → manager → researcher →… CrewAI’s built-in
max_iteron the manager does not detect repeated delegation of the same task; it just counts total iterations. - The inter-agent context explosion. As a hierarchical crew runs, the manager agent’s context grows with every delegation result. By the fifth or sixth delegation, the manager is sending hundreds of tokens of prior results in every call, multiplying the per-call cost significantly. The researcher agent, meanwhile, is receiving a progressively longer task description with each re-delegation. Without an explicit per-agent dollar cap, both the manager and the researcher can consume far more than their “fair share” of the session budget.
Adding per-agent budget caps with RunGuard
- The integration point: custom LLM per agent. CrewAI’s
Agentclass accepts anllmparameter that overrides the crew-level LLM. RunGuard wraps the underlying LLM call and maintains a per-instance budget counter. By passing a different RunGuard-wrapped LLM to each agent, you get independent budget tracking per role. - Python: per-role LLMs with independent budgets.
from crewai import Agent, Task, Crew, Process from langchain_openai import ChatOpenAI from runguard.langchain import GuardedChatOpenAI # Each agent gets its own guarded LLM with an independent budget researcher_llm = GuardedChatOpenAI( model="gpt-4o", budget={"max_usd": 0.50}, # researcher gets 50 cents loop={"repeats": 3, "window": 5}, on_budget_exceeded="raise", ) writer_llm = GuardedChatOpenAI( model="gpt-4o", budget={"max_usd": 0.30}, # writer gets 30 cents loop={"repeats": 3, "window": 5}, on_budget_exceeded="raise", ) reviewer_llm = GuardedChatOpenAI( model="gpt-4o", budget={"max_usd": 0.20}, # reviewer gets 20 cents loop={"repeats": 2, "window": 4}, on_budget_exceeded="raise", ) researcher = Agent( role="Senior Research Analyst", goal="Find accurate, cited information on the topic", backstory="You are a meticulous researcher who verifies every fact.", llm=researcher_llm, max_iter=10, # backstop ) writer = Agent( role="Content Writer", goal="Write a clear, engaging article from the research", backstory="You are a professional writer who turns research into readable content.", llm=writer_llm, max_iter=5, ) reviewer = Agent( role="Quality Reviewer", goal="Check the article for accuracy and completeness", backstory="You are an editor who ensures factual accuracy and consistent style.", llm=reviewer_llm, max_iter=3, ) crew = Crew( agents=[researcher, writer, reviewer], tasks=[research_task, write_task, review_task], process=Process.sequential, verbose=True, ) try: result = crew.kickoff() print("Crew completed:", result) except Exception as e: # RunGuard BudgetExceededError or LoopDetectedError surfaces here print(f"Crew halted: {e}") - What the budget covers. Each
GuardedChatOpenAIinstance counts the USD cost of every LLM call made by the agent that holds that instance. Tool calls are not counted (they don’t hit the LLM), but the model calls that process tool results are counted. The budget is a spend limit, not a token limit, so it automatically accounts for cost differences between model tiers and prompt sizes. - Reset between crew runs. By default, each
GuardedChatOpenAIinstance resets its budget counter when you create a new instance. For multi-run crews (e.g., a crew that runs once per hour), create new instances before each kickoff or callresearcher_llm.reset_budget()between runs to avoid carry-over from the previous run.
Detecting delegation loops in hierarchical crews
- The signature of a delegation loop. In a hierarchical crew, the manager LLM produces
Action: Delegate work to coworkersteps in its ReAct output. A delegation loop is detectable as repeated delegation to the same agent for the same task within the same crew run — typically identifiable by matching the delegate name and the first N words of the task description. - Configuring loop detection on the manager LLM. Apply RunGuard’s loop detection to the manager agent’s LLM with a signature function that extracts the delegation target:
from runguard.langchain import GuardedChatOpenAI def extract_delegation_target(response_text: str) -> str: """Extract 'Delegate to: <agent_role>' as the loop signature.""" import re m = re.search(r'Delegate work to coworker[:\s]+([^\n]+)', response_text, re.IGNORECASE) if m: return f"delegate:{m.group(1).strip().lower()[:40]}" return "non-delegate" manager_llm = GuardedChatOpenAI( model="gpt-4o", budget={"max_usd": 0.80}, loop={"repeats": 2, "window": 6, "sig_fn": extract_delegation_target}, on_loop_detected="raise", ) - What this catches. The
sig_fnmaps each manager response to a delegation target string. If the same target appears 3 times within the last 6 manager responses, RunGuard raisesLoopDetectedError. This fires before the manager’s ownmax_iterlimit, giving you a meaningful error message (the repeated delegation target) rather than a generic timeout.
CrewAI built-in controls vs. RunGuard
| Control | CrewAI built-in | RunGuard |
|---|---|---|
| Per-agent cost cap | Not supported | budget: max_usd per GuardedChatOpenAI instance |
| Loop detection | Not supported | loop: repeats + window catches tool and delegation loops |
| Max iterations | max_iter (default 25) — counts ReAct steps, not cost | Not needed (cost guard fires first in most cases) |
| Delegation loop detection | Not supported | sig_fn extracts delegation target for manager loop detection |
| Alert on budget exceeded | Not supported | alerts: slack_webhook or pagerduty_key |
| Partial output on halt | Not supported — task output is None if agent hits max_iter | BudgetExceededError exposes accumulated context and partial result |
Tuning per-agent budgets for your crew
- Profile before capping. Before setting per-agent limits, run your crew in verbose mode on 5–10 representative inputs and log the per-agent cost with RunGuard’s
on_budget_exceeded="log"mode. This tells you the 90th-percentile cost for each role, which becomes your starting cap (at 2–3x the median to allow headroom for harder inputs). - Set asymmetric limits. Research agents typically need more budget than writer or reviewer agents — they make more LLM calls to process tool results. A common initial allocation: researcher 50–60% of total session budget, writer 25–30%, reviewer 15–20%. Adjust based on your profiling data.
- Add a crew-level backstop. Even with per-agent limits, add a crew-level timeout or total-cost cap at the application layer. If one agent raises
BudgetExceededErrorand your crew exception handler restarts it with a reduced task, the restarted agent has a fresh budget counter — potentially spending again. The crew-level backstop catches the total run cost regardless of restarts. - Use output caching for researcher agents. CrewAI supports tool output caching via
cache_functionon the tool decorator. Enable caching for expensive search or fetch tools used by the researcher agent — repeated tool calls with the same inputs return cached results without an LLM processing step, significantly reducing the researcher’s token consumption and the likelihood of information-gathering loops.