CrewAI budget per agent: why a shared token pool is not enough for multi-role crews

CrewAI makes it easy to define a crew of specialized agents — a researcher, a writer, a reviewer — each with its own role, goal, and backstory. What CrewAI doesn’t give you is a per-agent budget. All roles draw from the same underlying LLM client. If your researcher agent enters an information-gathering loop — repeatedly calling search tools or web-scraping tools because results are never quite sufficient — it can consume 80% of your session’s token budget before the writer agent starts its first task. This page shows why per-role cost caps matter and how to add them to a CrewAI crew using RunGuard.

The shared-pool problem in CrewAI

How CrewAI delegates to agents. In a typical CrewAI sequential or hierarchical crew, the crew manager delegates tasks to agents by role. Each agent uses the LLM client configured on the Agent object (or the crew-level fallback). There is no built-in tracking of how many tokens or dollars each agent has consumed within the current crew run. The only budget control available is a single global max_iter on each agent (default 25), which limits the number of iterations per task, not the dollar spend.
Why max_iter is not a cost cap. max_iter counts the number of times an agent loops through its ReAct cycle (think, act, observe). A single iteration that calls a high-context tool — fetching a large web page, summarizing a long document — can cost $0.10–$0.50 at GPT-4o pricing. At default max_iter=25, a single runaway researcher agent can spend $2.50–$12.50 before CrewAI stops it. If you have three agents in the crew and they all hit their iteration limits, costs compound.
The delegation-loop variant. In hierarchical crews, the manager agent can re-delegate tasks to subordinate agents if the output is judged insufficient. A manager that repeatedly re-delegates the same task — because the subtask completion criteria are vague — generates a delegation loop: manager → researcher → manager → researcher →… CrewAI’s built-in max_iter on the manager does not detect repeated delegation of the same task; it just counts total iterations.
The inter-agent context explosion. As a hierarchical crew runs, the manager agent’s context grows with every delegation result. By the fifth or sixth delegation, the manager is sending hundreds of tokens of prior results in every call, multiplying the per-call cost significantly. The researcher agent, meanwhile, is receiving a progressively longer task description with each re-delegation. Without an explicit per-agent dollar cap, both the manager and the researcher can consume far more than their “fair share” of the session budget.

Adding per-agent budget caps with RunGuard

The integration point: custom LLM per agent. CrewAI’s Agent class accepts an llm parameter that overrides the crew-level LLM. RunGuard wraps the underlying LLM call and maintains a per-instance budget counter. By passing a different RunGuard-wrapped LLM to each agent, you get independent budget tracking per role.

Python: per-role LLMs with independent budgets.

from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
from runguard.langchain import GuardedChatOpenAI

# Each agent gets its own guarded LLM with an independent budget
researcher_llm = GuardedChatOpenAI(
    model="gpt-4o",
    budget={"max_usd": 0.50},   # researcher gets 50 cents
    loop={"repeats": 3, "window": 5},
    on_budget_exceeded="raise",
)

writer_llm = GuardedChatOpenAI(
    model="gpt-4o",
    budget={"max_usd": 0.30},   # writer gets 30 cents
    loop={"repeats": 3, "window": 5},
    on_budget_exceeded="raise",
)

reviewer_llm = GuardedChatOpenAI(
    model="gpt-4o",
    budget={"max_usd": 0.20},   # reviewer gets 20 cents
    loop={"repeats": 2, "window": 4},
    on_budget_exceeded="raise",
)

researcher = Agent(
    role="Senior Research Analyst",
    goal="Find accurate, cited information on the topic",
    backstory="You are a meticulous researcher who verifies every fact.",
    llm=researcher_llm,
    max_iter=10,   # backstop
)

writer = Agent(
    role="Content Writer",
    goal="Write a clear, engaging article from the research",
    backstory="You are a professional writer who turns research into readable content.",
    llm=writer_llm,
    max_iter=5,
)

reviewer = Agent(
    role="Quality Reviewer",
    goal="Check the article for accuracy and completeness",
    backstory="You are an editor who ensures factual accuracy and consistent style.",
    llm=reviewer_llm,
    max_iter=3,
)

crew = Crew(
    agents=[researcher, writer, reviewer],
    tasks=[research_task, write_task, review_task],
    process=Process.sequential,
    verbose=True,
)

try:
    result = crew.kickoff()
    print("Crew completed:", result)
except Exception as e:
    # RunGuard BudgetExceededError or LoopDetectedError surfaces here
    print(f"Crew halted: {e}")

What the budget covers. Each GuardedChatOpenAI instance counts the USD cost of every LLM call made by the agent that holds that instance. Tool calls are not counted (they don’t hit the LLM), but the model calls that process tool results are counted. The budget is a spend limit, not a token limit, so it automatically accounts for cost differences between model tiers and prompt sizes.
Reset between crew runs. By default, each GuardedChatOpenAI instance resets its budget counter when you create a new instance. For multi-run crews (e.g., a crew that runs once per hour), create new instances before each kickoff or call researcher_llm.reset_budget() between runs to avoid carry-over from the previous run.

Detecting delegation loops in hierarchical crews

The signature of a delegation loop. In a hierarchical crew, the manager LLM produces Action: Delegate work to coworker steps in its ReAct output. A delegation loop is detectable as repeated delegation to the same agent for the same task within the same crew run — typically identifiable by matching the delegate name and the first N words of the task description.

Configuring loop detection on the manager LLM. Apply RunGuard’s loop detection to the manager agent’s LLM with a signature function that extracts the delegation target:

from runguard.langchain import GuardedChatOpenAI

def extract_delegation_target(response_text: str) -> str:
    """Extract 'Delegate to: <agent_role>' as the loop signature."""
    import re
    m = re.search(r'Delegate work to coworker[:\s]+([^\n]+)', response_text, re.IGNORECASE)
    if m:
        return f"delegate:{m.group(1).strip().lower()[:40]}"
    return "non-delegate"

manager_llm = GuardedChatOpenAI(
    model="gpt-4o",
    budget={"max_usd": 0.80},
    loop={"repeats": 2, "window": 6, "sig_fn": extract_delegation_target},
    on_loop_detected="raise",
)

What this catches. The sig_fn maps each manager response to a delegation target string. If the same target appears 3 times within the last 6 manager responses, RunGuard raises LoopDetectedError. This fires before the manager’s own max_iter limit, giving you a meaningful error message (the repeated delegation target) rather than a generic timeout.

CrewAI built-in controls vs. RunGuard

Control	CrewAI built-in	RunGuard
Per-agent cost cap	Not supported	budget: max_usd per GuardedChatOpenAI instance
Loop detection	Not supported	loop: repeats + window catches tool and delegation loops
Max iterations	max_iter (default 25) — counts ReAct steps, not cost	Not needed (cost guard fires first in most cases)
Delegation loop detection	Not supported	sig_fn extracts delegation target for manager loop detection
Alert on budget exceeded	Not supported	alerts: slack_webhook or pagerduty_key
Partial output on halt	Not supported — task output is None if agent hits max_iter	BudgetExceededError exposes accumulated context and partial result

Tuning per-agent budgets for your crew

Profile before capping. Before setting per-agent limits, run your crew in verbose mode on 5–10 representative inputs and log the per-agent cost with RunGuard’s on_budget_exceeded="log" mode. This tells you the 90th-percentile cost for each role, which becomes your starting cap (at 2–3x the median to allow headroom for harder inputs).
Set asymmetric limits. Research agents typically need more budget than writer or reviewer agents — they make more LLM calls to process tool results. A common initial allocation: researcher 50–60% of total session budget, writer 25–30%, reviewer 15–20%. Adjust based on your profiling data.
Add a crew-level backstop. Even with per-agent limits, add a crew-level timeout or total-cost cap at the application layer. If one agent raises BudgetExceededError and your crew exception handler restarts it with a reduced task, the restarted agent has a fresh budget counter — potentially spending again. The crew-level backstop catches the total run cost regardless of restarts.
Use output caching for researcher agents. CrewAI supports tool output caching via cache_function on the tool decorator. Enable caching for expensive search or fetch tools used by the researcher agent — repeated tool calls with the same inputs return cached results without an LLM processing step, significantly reducing the researcher’s token consumption and the likelihood of information-gathering loops.