Your cost dashboard tells you what you spent. RunGuard stops the run that is actively spending it.

Every major LLM observability platform — LangSmith, Langfuse, Braintrust, Helicone, and the native dashboards from Anthropic and OpenAI — provides cost tracking. You can see daily spend, cost per trace, cost per model, and cost per user. You can set fleet-level alerts: “notify me if daily spend exceeds $500.” These are retrospective instruments. By the time the alert fires, the money is already on your invoice. The alert tells you that you spent $437 in the last 24 hours. It does not tell you that there is a single agent run currently in its forty-eighth tool-call iteration that has spent $38 in the last six minutes, and that without intervention it will cross $100 before the on-call engineer reads the Slack message. The gap between fleet-level alerting and per-run circuit breaking is not a gap in dashboard coverage — it is a gap in architecture. A dashboard reads historical data. A circuit breaker reads current state, in-process, before the next LLM API call goes out. RunGuard’s BudgetTracker is that circuit breaker: it accumulates the USD cost you compute from each generation’s token usage and throws BudgetExceededError before the call that would push the run past your cap. The bill for that call never lands on your invoice because the call is never made.

Why fleet-level cost alerts are insufficient for runaway agent runs

How BudgetTracker works: accumulate, check, throw or pass

RunGuard’s BudgetTracker follows a three-step lifecycle for every guarded call. Step one (preflight): before your inner function runs, the tracker checks whether the accumulated spend for this run has already reached maxUsd. If it has, it throws BudgetExceededError immediately, before any API call is made. The API call that would have exceeded the cap is never sent; no tokens are consumed; no cost lands on the invoice. Step two (invocation): your inner function runs. The LLM API call goes out, the response comes back, and your function returns a result object that includes a usd field (the cost of this specific generation, computed by you from response.usage and your per-token rate). Step three (record): the tracker adds usd to the run’s accumulated total. If the new total exceeds maxUsd, it throws BudgetExceededError with the current e.spent value, ensuring the next call never fires. The important invariant is that the check happens before the call, not after. If the accumulated spend is $4.95 and your cap is $5.00, and the next generation costs $0.20, the guard fires before the $0.20 call goes out, not after it brings the total to $5.15. The cap is a hard upper bound, not a threshold crossed after the fact.

Implementation in TypeScript and Python

Calibrating the per-run cap: the 2–3× rule

The right value for maxUsd depends on your agent’s cost profile on successful runs. The goal is to let legitimate long runs finish while stopping runaway loops before they become expensive. A practical calibration process: run your agent on a representative sample of real tasks and record the cost of each run from e.spent (or from your observability platform). Find the 95th percentile cost across those runs — the cost of your most expensive legitimate run. Set maxUsd to 2–3× that value. A cap at 2× the 95th-percentile cost means that runs in the top 5% of your normal distribution still complete without hitting the guard, while a loop that has already doubled the expected cost for the most expensive legitimate run is immediately stopped. This heuristic works because loop-induced costs are super-linear: a loop that repeats three times has already spent at least 3× the per-iteration cost, not counting the growing context overhead. Setting the cap at 2–3× the 95th-percentile legitimate cost reliably separates looping runs from expensive-but-legitimate runs. If you have multimodal cost distributions (some tasks are inherently cheap and others inherently expensive), set different caps per task type rather than a single fleet-wide cap: wrap callModel differently for cheap-task callers and expensive-task callers, or use RunGuard’s reset capability (guard.reset()) to start a fresh budget accumulator for each new task invocation within the same process.

Pairing budget caps with loop detection

The horror story: six consecutive $437 billing cycles before the circuit breaker

We built RunGuard while running a bespoke daily script that posts a launch thread via the X API. The script had no loop detection and no budget cap (it made a single API call per day, not an agent loop). The API call came back HTTP 402 CreditsDepleted on day one. Days two through six: same error. Six consecutive sessions, zero progress, but also zero cost escalation because the X API returned an error before we spent anything. The X API failure was not a cost problem — it was a reliability problem. But the same failure pattern, if it had been a looping agent calling an LLM instead of a looping script calling a social API, would have looked like this: session one returns a wrong result, agent calls the LLM again with the same input, gets the same wrong result, calls again, and again, and again. At $0.015 per 1,000 tokens on a cheap model, six iterations of a 10,000-token context cost $0.90. At $0.015 per 1,000 tokens with a growing context (each iteration adds 1,000 tokens of accumulated results), six iterations cost $1.26. On a more expensive model like Claude Opus 4.7 at $0.075 per 1,000 input tokens, six iterations of a 10,000-token context cost $6.75, growing to $10.35 with accumulating context overhead. A cap at maxUsd: 5 would have stopped the run after the third iteration and saved $4–8 depending on the model. The full incident writeup is on the 30-day log, including the exact signature detection timeline.

What this is not

The minimum integration

Two lines of new code: import { guard } from "@runguard/sdk" and const guarded = guard(yourLlmCallFn, { budget: { maxUsd: 5 } }). One new return field from your guarded function: usd (the cost of this specific generation, computed from response.usage and your per-token rate). One catch block for BudgetExceededError. That’s the entire budget guard integration. The loop guard and context guard are additive options on the same guard() call. RunGuard ships as @runguard/sdk on npm and runguard on PyPI. The full API surface is in llms.txt. The LangChain circuit breaker page and CrewAI loop detection page show the same budget + loop integration applied to specific frameworks. For context-overflow protection alongside the budget cap, see the context window truncation alert page.