Multi-agent coordinator cost allocation: track spend across sub-agents at runtime

Multi-agent systems are the fastest-growing LLM architecture in 2026 — a coordinator agent decomposes a task, spawns specialized sub-agents, and aggregates their results. The architecture scales capability dramatically. It also scales cost dramatically and in ways that are hard to predict. A coordinator that spawns four sub-agents for a research task may generate eight to twelve LLM calls per user request instead of one, and the per-call token counts for coordinator reasoning, sub-agent reasoning, and inter-agent message serialization compound in non-obvious ways. Teams that deploy multi-agent systems without cost allocation tooling routinely discover that their per-user-session cost is 4–8x their single-agent estimate. Cost allocation — attributing each LLM call to the specific agent that made it, the task it was executing, and the originating user session — is the prerequisite for understanding where spend is going and enforcing budgets before they blow through.

Why multi-agent cost tracking is harder than single-agent tracking

Parallel sub-agent invocations. When a coordinator spawns three sub-agents concurrently, their LLM calls are interleaved in time. Naive cost tracking tied to a single session object breaks under concurrency; you need per-agent context propagation that threads a trace ID through async boundaries. Without explicit propagation, all spend aggregates to “coordinator” even though it was generated by sub-agents, making per-agent attribution impossible after the fact.
Recursive depth. Sub-agents can spawn their own sub-agents. A research coordinator might spawn a web-search agent, which spawns a content-extraction agent, which calls a summarization agent. The cost tree can be three or four levels deep. Flat cost tracking (total tokens per session) cannot distinguish whether a $0.50 session cost came from one expensive coordinator call or from twelve cheap sub-agent calls. You need a hierarchical cost tree, not a flat aggregation.
Coordinator overhead. The coordinator itself incurs LLM costs — for task decomposition, for evaluating sub-agent outputs, for synthesis, and for re-routing when a sub-agent fails. This overhead is often 20–40% of total session cost and is frequently invisible in single-metric tracking. If you optimize only sub-agent cost (smaller models, fewer calls) without accounting for coordinator overhead, you may save 15% on sub-agents while coordinator costs rise 25% due to more complex re-routing logic.
Retry and fallback chains. Multi-agent systems are more complex failure surfaces than single agents. A sub-agent failure triggers coordinator re-routing, which may spawn a different sub-agent, which may fail again. Each failure and re-route adds LLM calls. A single user request that takes three routing cycles to complete can cost 3× the cost of a successful first-try request. Per-routing-cycle cost visibility is essential for diagnosing these failure-amplification patterns.

Attribution model 1: span-based cost tagging

How it works. Model each LLM call as a span with a parent span ID. The coordinator’s initial decomposition call is the root span. Each sub-agent call is a child span. Synthesization calls are sibling spans under the root. This mirrors the OpenTelemetry distributed tracing model and enables hierarchical cost aggregation: total session cost = sum of all span costs; coordinator cost = coordinator spans only; sub-agent A cost = spans where agent_id = “subagent_a”.
Implementation with context propagation. Use a context object (a plain dict in Python, a Map or cls-style store in TypeScript) that is passed through every LLM call and every async boundary. Populate it with session_id, agent_id, task_id, and parent_span_id before spawning sub-agents. Each sub-agent reads these keys and appends its own span ID as the new parent. On call completion, record input_tokens, output_tokens, model, cost_usd to a local cost log keyed by span ID.
Cost tree reconstruction. After session completion, reconstruct the cost tree by following parent span IDs. This gives you: total cost per agent type, cost per routing level, cost per task branch, and cost for failed attempts vs successful attempts. A failing sub-agent that consumed $0.08 before being abandoned is distinctly visible in the cost tree; in flat tracking it simply adds to the session total with no attribution.
Storage overhead. A span record for a single LLM call is approximately 200–400 bytes of JSON (IDs, token counts, model, cost, timestamps). A 20-call coordinator session produces 4–8 KB of span data. At 10,000 sessions per day, that is 40–80 MB per day — negligible on any storage system. Store in SQLite with an index on session_id for real-time aggregation or in a log aggregator for batch analytics.

Attribution model 2: per-agent budget envelopes

Budget envelopes as pre-conditions. Rather than tracking cost after the fact, define a budget envelope for each agent role before the session starts. The coordinator gets $0.05 for decomposition and synthesis. Each sub-agent gets $0.03. The coordinator enforces these envelopes when spawning sub-agents by passing a max_cost parameter alongside the task description. Sub-agents check remaining budget before each LLM call and fail gracefully when the envelope is exhausted.
Envelope sizing by agent role. Budget envelopes should be sized empirically from production traces, not guessed. Sample 500 successful sessions, compute the 95th-percentile cost for each agent role, and set envelopes at 1.5× that value. This accommodates normal variance while capping pathological outliers. Coordinator agents typically need larger envelopes than sub-agents because coordinator reasoning involves more context (all sub-agent outputs plus original task); a 60/40 split (coordinator 60%, all sub-agents combined 40%) is typical for research-type tasks.
Cascade failures from tight envelopes. Setting envelopes too tight causes predictable failures: a sub-agent runs out of budget mid-task, returns a partial result, the coordinator spends additional budget re-routing and re-running, total cost exceeds what an uncapped session would have spent. Tight envelopes do not necessarily reduce costs — they can increase costs by forcing expensive retry loops. Monitor envelope exhaustion rate per agent role; if any role exhausts its envelope on more than 10% of sessions, the envelope is too tight.
Dynamic envelope reallocation. Allow the coordinator to transfer unused budget from one agent’s envelope to another’s. If sub-agent A completes in $0.01 against a $0.03 envelope, the unused $0.02 can be transferred to sub-agent B, which needs more tokens for a complex analysis. This prevents the common failure mode where one high-complexity sub-task fails due to budget while the overall session has budget headroom because other sub-tasks were inexpensive.

Coordinator-level cost accounting patterns

Message-passing cost serialization. In multi-agent systems, sub-agents communicate results back to the coordinator as serialized message objects. These messages become input tokens for the coordinator’s synthesis call. A sub-agent that returns a 3,000-token detailed research report is transferring cost to the coordinator: if the coordinator synthesizes four such reports, its synthesis call has a 12,000-token input before even counting the system prompt or task context. Budget coordinators to account for the aggregate token cost of all sub-agent messages they must synthesize.
Coordinator model tier optimization. The coordinator is typically run on a frontier model (Sonnet, GPT-4o) for reasoning quality. Sub-agents doing deterministic or structured tasks (data extraction, classification, format conversion) can run on smaller, cheaper models. A coordinator on Sonnet ($3/MTok) directing sub-agents on Haiku ($0.25/MTok) achieves a blended rate of $0.75/MTok on a typical 60/40 token split — 75% cheaper than running everything on Sonnet. Model tier assignment per agent role is one of the highest-ROI cost optimizations in multi-agent systems. See LLM model routing cost optimization for routing implementation patterns.
Sub-agent output size contracts. Define explicit output size limits for each sub-agent role. A “search” sub-agent should return at most 2,000 tokens of results. A “summarizer” sub-agent should produce exactly 200–400 tokens. These contracts cap coordinator input tokens, make coordinator cost predictable, and expose sub-agents that fail to respect their contract. Violating a size contract is a signal that the sub-agent’s prompt needs more explicit output constraints, not that the contract is wrong.
Per-request cost logging for billing attribution. If your multi-agent system serves multiple customers or teams, each session’s cost tree must be attributable to the originating customer for billing or showback. Thread a customer_id and team_id through the context propagation chain alongside agent IDs. At billing time, aggregate the cost tree by customer_id and session_id. This enables per-customer cost reporting, identifies heavy users whose pricing tier is below their actual consumption, and flags customers whose usage patterns are causing outsized infrastructure costs.

RunGuard BudgetTracker for coordinator-level enforcement

Session-level cost ceiling. RunGuard’s BudgetTracker wraps the coordinator’s main loop with a session-level cost ceiling. Every LLM call made by any agent in the tree (coordinator or sub-agent) is counted against the session ceiling. When cumulative session cost exceeds the ceiling, BudgetTracker throws BudgetExceededError, halting the coordinator before it spawns additional sub-agents. This provides a single enforcement point for overall session cost regardless of how many agent levels deep the spending is happening.
```
const guard = new RunGuard({
  budget: { maxCostUsd: 0.25 },   // session ceiling
  agentId: 'coordinator'
});

const result = await guard.run(async (ctx) => {
  const subA = spawnAgent('researcher', ctx.childContext({ agentId: 'researcher' }));
  const subB = spawnAgent('extractor', ctx.childContext({ agentId: 'extractor' }));
  const [resA, resB] = await Promise.all([subA.run(task), subB.run(task)]);
  return await synthesize(resA, resB, ctx);
});
```
Per-agent cost visibility. When sub-agents are initialized with ctx.childContext(), RunGuard automatically attributes each child agent’s LLM calls to its agentId in the shared cost tree. The coordinator’s dashboard shows a breakdown: “researcher: $0.08; extractor: $0.04; coordinator: $0.06; total: $0.18”. This attribution is available in real time via the RunGuard API, not just at session end, so you can detect a runaway sub-agent before it exhausts the session budget.
Budget exhaustion callbacks. Register an onBudgetWarning callback at 80% of session ceiling. When the warning fires, the coordinator can decide to skip low-priority sub-agents (e.g., skip the “quality checker” agent and return the raw synthesized result) rather than failing the entire session. Graceful degradation under budget pressure produces a useful result at 85% of the planned cost instead of a failed session at 100% of the planned cost plus retry overhead.

Know what every sub-agent is spending before the bill arrives.

RunGuard’s BudgetTracker gives multi-agent coordinators real-time cost visibility across every agent level, with per-agent attribution and session-level ceilings that halt runaway sub-agents before they exhaust your budget.

Start free trial →