Multi-agent coordinator cost allocation: track spend across sub-agents at runtime
Multi-agent systems are the fastest-growing LLM architecture in 2026 — a coordinator agent decomposes a task, spawns specialized sub-agents, and aggregates their results. The architecture scales capability dramatically. It also scales cost dramatically and in ways that are hard to predict. A coordinator that spawns four sub-agents for a research task may generate eight to twelve LLM calls per user request instead of one, and the per-call token counts for coordinator reasoning, sub-agent reasoning, and inter-agent message serialization compound in non-obvious ways. Teams that deploy multi-agent systems without cost allocation tooling routinely discover that their per-user-session cost is 4–8x their single-agent estimate. Cost allocation — attributing each LLM call to the specific agent that made it, the task it was executing, and the originating user session — is the prerequisite for understanding where spend is going and enforcing budgets before they blow through.
Why multi-agent cost tracking is harder than single-agent tracking
- Parallel sub-agent invocations. When a coordinator spawns three sub-agents concurrently, their LLM calls are interleaved in time. Naive cost tracking tied to a single session object breaks under concurrency; you need per-agent context propagation that threads a trace ID through async boundaries. Without explicit propagation, all spend aggregates to “coordinator” even though it was generated by sub-agents, making per-agent attribution impossible after the fact.
- Recursive depth. Sub-agents can spawn their own sub-agents. A research coordinator might spawn a web-search agent, which spawns a content-extraction agent, which calls a summarization agent. The cost tree can be three or four levels deep. Flat cost tracking (total tokens per session) cannot distinguish whether a $0.50 session cost came from one expensive coordinator call or from twelve cheap sub-agent calls. You need a hierarchical cost tree, not a flat aggregation.
- Coordinator overhead. The coordinator itself incurs LLM costs — for task decomposition, for evaluating sub-agent outputs, for synthesis, and for re-routing when a sub-agent fails. This overhead is often 20–40% of total session cost and is frequently invisible in single-metric tracking. If you optimize only sub-agent cost (smaller models, fewer calls) without accounting for coordinator overhead, you may save 15% on sub-agents while coordinator costs rise 25% due to more complex re-routing logic.
- Retry and fallback chains. Multi-agent systems are more complex failure surfaces than single agents. A sub-agent failure triggers coordinator re-routing, which may spawn a different sub-agent, which may fail again. Each failure and re-route adds LLM calls. A single user request that takes three routing cycles to complete can cost 3× the cost of a successful first-try request. Per-routing-cycle cost visibility is essential for diagnosing these failure-amplification patterns.
Attribution model 1: span-based cost tagging
- How it works. Model each LLM call as a span with a parent span ID. The coordinator’s initial decomposition call is the root span. Each sub-agent call is a child span. Synthesization calls are sibling spans under the root. This mirrors the OpenTelemetry distributed tracing model and enables hierarchical cost aggregation: total session cost = sum of all span costs; coordinator cost = coordinator spans only; sub-agent A cost = spans where agent_id = “subagent_a”.
- Implementation with context propagation. Use a context object (a plain dict in Python, a Map or cls-style store in TypeScript) that is passed through every LLM call and every async boundary. Populate it with
session_id,agent_id,task_id, andparent_span_idbefore spawning sub-agents. Each sub-agent reads these keys and appends its own span ID as the new parent. On call completion, recordinput_tokens,output_tokens,model,cost_usdto a local cost log keyed by span ID. - Cost tree reconstruction. After session completion, reconstruct the cost tree by following parent span IDs. This gives you: total cost per agent type, cost per routing level, cost per task branch, and cost for failed attempts vs successful attempts. A failing sub-agent that consumed $0.08 before being abandoned is distinctly visible in the cost tree; in flat tracking it simply adds to the session total with no attribution.
- Storage overhead. A span record for a single LLM call is approximately 200–400 bytes of JSON (IDs, token counts, model, cost, timestamps). A 20-call coordinator session produces 4–8 KB of span data. At 10,000 sessions per day, that is 40–80 MB per day — negligible on any storage system. Store in SQLite with an index on
session_idfor real-time aggregation or in a log aggregator for batch analytics.
Attribution model 2: per-agent budget envelopes
- Budget envelopes as pre-conditions. Rather than tracking cost after the fact, define a budget envelope for each agent role before the session starts. The coordinator gets $0.05 for decomposition and synthesis. Each sub-agent gets $0.03. The coordinator enforces these envelopes when spawning sub-agents by passing a max_cost parameter alongside the task description. Sub-agents check remaining budget before each LLM call and fail gracefully when the envelope is exhausted.
- Envelope sizing by agent role. Budget envelopes should be sized empirically from production traces, not guessed. Sample 500 successful sessions, compute the 95th-percentile cost for each agent role, and set envelopes at 1.5× that value. This accommodates normal variance while capping pathological outliers. Coordinator agents typically need larger envelopes than sub-agents because coordinator reasoning involves more context (all sub-agent outputs plus original task); a 60/40 split (coordinator 60%, all sub-agents combined 40%) is typical for research-type tasks.
- Cascade failures from tight envelopes. Setting envelopes too tight causes predictable failures: a sub-agent runs out of budget mid-task, returns a partial result, the coordinator spends additional budget re-routing and re-running, total cost exceeds what an uncapped session would have spent. Tight envelopes do not necessarily reduce costs — they can increase costs by forcing expensive retry loops. Monitor envelope exhaustion rate per agent role; if any role exhausts its envelope on more than 10% of sessions, the envelope is too tight.
- Dynamic envelope reallocation. Allow the coordinator to transfer unused budget from one agent’s envelope to another’s. If sub-agent A completes in $0.01 against a $0.03 envelope, the unused $0.02 can be transferred to sub-agent B, which needs more tokens for a complex analysis. This prevents the common failure mode where one high-complexity sub-task fails due to budget while the overall session has budget headroom because other sub-tasks were inexpensive.
Coordinator-level cost accounting patterns
- Message-passing cost serialization. In multi-agent systems, sub-agents communicate results back to the coordinator as serialized message objects. These messages become input tokens for the coordinator’s synthesis call. A sub-agent that returns a 3,000-token detailed research report is transferring cost to the coordinator: if the coordinator synthesizes four such reports, its synthesis call has a 12,000-token input before even counting the system prompt or task context. Budget coordinators to account for the aggregate token cost of all sub-agent messages they must synthesize.
- Coordinator model tier optimization. The coordinator is typically run on a frontier model (Sonnet, GPT-4o) for reasoning quality. Sub-agents doing deterministic or structured tasks (data extraction, classification, format conversion) can run on smaller, cheaper models. A coordinator on Sonnet ($3/MTok) directing sub-agents on Haiku ($0.25/MTok) achieves a blended rate of $0.75/MTok on a typical 60/40 token split — 75% cheaper than running everything on Sonnet. Model tier assignment per agent role is one of the highest-ROI cost optimizations in multi-agent systems. See LLM model routing cost optimization for routing implementation patterns.
- Sub-agent output size contracts. Define explicit output size limits for each sub-agent role. A “search” sub-agent should return at most 2,000 tokens of results. A “summarizer” sub-agent should produce exactly 200–400 tokens. These contracts cap coordinator input tokens, make coordinator cost predictable, and expose sub-agents that fail to respect their contract. Violating a size contract is a signal that the sub-agent’s prompt needs more explicit output constraints, not that the contract is wrong.
- Per-request cost logging for billing attribution. If your multi-agent system serves multiple customers or teams, each session’s cost tree must be attributable to the originating customer for billing or showback. Thread a
customer_idandteam_idthrough the context propagation chain alongside agent IDs. At billing time, aggregate the cost tree bycustomer_idandsession_id. This enables per-customer cost reporting, identifies heavy users whose pricing tier is below their actual consumption, and flags customers whose usage patterns are causing outsized infrastructure costs.
RunGuard BudgetTracker for coordinator-level enforcement
- Session-level cost ceiling. RunGuard’s BudgetTracker wraps the coordinator’s main loop with a session-level cost ceiling. Every LLM call made by any agent in the tree (coordinator or sub-agent) is counted against the session ceiling. When cumulative session cost exceeds the ceiling, BudgetTracker throws
BudgetExceededError, halting the coordinator before it spawns additional sub-agents. This provides a single enforcement point for overall session cost regardless of how many agent levels deep the spending is happening.const guard = new RunGuard({ budget: { maxCostUsd: 0.25 }, // session ceiling agentId: 'coordinator' }); const result = await guard.run(async (ctx) => { const subA = spawnAgent('researcher', ctx.childContext({ agentId: 'researcher' })); const subB = spawnAgent('extractor', ctx.childContext({ agentId: 'extractor' })); const [resA, resB] = await Promise.all([subA.run(task), subB.run(task)]); return await synthesize(resA, resB, ctx); }); - Per-agent cost visibility. When sub-agents are initialized with
ctx.childContext(), RunGuard automatically attributes each child agent’s LLM calls to itsagentIdin the shared cost tree. The coordinator’s dashboard shows a breakdown: “researcher: $0.08; extractor: $0.04; coordinator: $0.06; total: $0.18”. This attribution is available in real time via the RunGuard API, not just at session end, so you can detect a runaway sub-agent before it exhausts the session budget. - Budget exhaustion callbacks. Register an
onBudgetWarningcallback at 80% of session ceiling. When the warning fires, the coordinator can decide to skip low-priority sub-agents (e.g., skip the “quality checker” agent and return the raw synthesized result) rather than failing the entire session. Graceful degradation under budget pressure produces a useful result at 85% of the planned cost instead of a failed session at 100% of the planned cost plus retry overhead.
Know what every sub-agent is spending before the bill arrives.
RunGuard’s BudgetTracker gives multi-agent coordinators real-time cost visibility across every agent level, with per-agent attribution and session-level ceilings that halt runaway sub-agents before they exhaust your budget.
Start free trial →