A runtime budget alert for OpenAI AgentKit agents

AgentKit ships a tracing dashboard that shows you what every Runner.run() spent after the fact. That is a finance tool. A budget alert is a different tool: it sits in-process, watches the cumulative dollar count of the run that’s happening right now, and stops the next turn before the LLM call goes out when the cap is crossed. max_turns bounds the loop; it does not bound the bill. This page is the runtime budget breaker we ship and how it slots into an AgentKit Runner.run() in eight lines of Python.

Where the dollars actually accumulate inside an AgentKit run

What AgentKit’s existing knobs give you and what they don’t

AgentKit’s primitives are correct in shape and wrong in unit. max_turns on Runner.run() is a count, not a dollar cap; a turn on a 30K-token thread costs five times what a turn on a 6K-token thread costs, and the cap doesn’t know the difference. tool_use_behavior controls when the loop ends after a tool call, not how much each turn cost. Input and output guardrails are content checks — PII detection, jailbreak filters, output schema validation — not dollar checks; a guardrail can refuse to ship a response that mentions a credit card, but it cannot refuse to start a turn that would push the run past $50. Tracing is the dashboard view of what already happened: the trace is uploaded after the run completes, and the developer reads it the next morning. None of these look at cumulative dollars spent so far in this run and none of them stop the next turn before it fires. A run that legitimately needs eight turns to research and summarise and a run that fires the same broken tool call in a loop until max_turns hits both look identical to the executor — they just produce different invoices.

What a runtime budget alert actually has to do

Wrapping Runner.run() with runguard

# OpenAI AgentKit + runguard. The Agent stays an Agent; we wrap the Runner
# step so the budget tracker sees every paid call and trips before the next.
from agents import Agent, Runner
from runguard import guard, BudgetExceededError, LoopDetectedError

researcher = Agent(name="researcher", instructions="...", tools=[search_web, summarize])

async def _run_step(payload):
    # One Runner.run() call. Returns the final output plus the dollar number
    # the tracker needs to push into the rolling-window ledger.
    result = await Runner.run(payload["agent"], input=payload["input"])
    usage = result.context_wrapper.usage  # tokens + usd from the model client
    last = result.new_items[-1] if result.new_items else None
    return {"final_output": result.final_output,
            "usd": usage.total_cost_usd,
            "last_kind": getattr(last, "type", "end")}

guarded_run = guard(
    _run_step,
    signature=lambda i: f"agentkit:{i['agent'].name}:{i['input'][:64]}",
    budget={"max_usd": 5, "window_ms": 60_000},
    loop={"repeats": 3, "max_cycle_len": 8},
    cost=lambda _i, o: o["usd"],
    on_trip=lambda e: print("[runguard]", e["reason"], e.get("spent"), "of", e.get("cap")),
)

try:
    out = await guarded_run({"agent": researcher, "input": "Brief me on Q3 SEC filings for $TICK"})
except (BudgetExceededError, LoopDetectedError) as e:
    print("halted:", e)

The budget primitive is the BudgetTracker shipped at product/sdk/src/budget.ts: maxUsd for the cap, optional windowMs for rolling-window throttles, an add(usd) the host calls post-call, and an exceeded() the wrap reads pre-call — a hard cap with rolling-window option, no daemon, no telemetry. The BudgetTracker file is 84 lines; the LoopDetector at product/sdk/src/loop-detector.ts is 111 lines. Defaults are honest: $5 per run is enough for a research orchestration on a frontier model, low enough that a stuck tool-call loop doesn’t become a six-figure incident. The same wrap watches for loops on the same step signature; the fingerprint-and-window approach is documented at how to detect LLM tool-call loops in production; the LangChain wrap is here; the multi-agent CrewAI wrap is here; the browser-use wrap is here.

How the breaker behaves inside Runner.run()

Tuning for AgentKit cost shapes

AgentKit’s default max_turns on Runner.run() is 10. On gpt-4o at a typical research-orchestration prompt size, a mid-run turn lands around $0.05–$0.20 of input tokens before assistant output, climbing as the conversation grows. The default max_usd: 5 on the budget tracker corresponds to roughly 25–100 turns on the small end, 12–50 on the heavy end — an honest research run with two or three handoffs, generous enough that legitimate workflows finish, tight enough that a stuck retry loop trips the breaker before the bill triples. For long-running orchestrations (a continuous monitor agent, a daily digest agent), set window_ms: 60_000 with the same max_usd: 5: the cap rolls; old spend evicts; the cumulative invoice over an hour is unbounded but the per-minute spike is. For high-stakes work where an over-spend is worse than an under-spend (production fan-out, paid lead enrichment), drop to max_usd: 1 — a tighter cap costs you one re-run on legitimate workflows; a looser cap costs you one Friday-night incident. Stack the budget guard with the loop detector on the same wrap: a stuck retry loop usually trips the loop guard first (signature repeats fast on identical tool arguments), but a slow-burn drift on slightly-different-each-time tool inputs trips the budget instead — both stop the run, both leave a typed error, both are cheap to retry.

Loop detection on the same wrap

The first loop our SDK caught was ours

It wasn’t an AgentKit run — it was our own launch script firing a six-tweet thread against a paid X API. The first attempt came back with HTTP 402 CreditsDepleted. Six consecutive sessions later, six identical signatures — post_tweet:402:CreditsDepleted — were sitting in a flat JSON file on disk. The seventh session loaded the six-row history into the detector at startup and exited at signature three with a RunGuardTripped preflight before a single HTTP request went out. It has held the breaker open every session since. Read the dogfood story on the 30-day log; the same pattern slots into an AgentKit run when a planner replans the same stuck tool against the same arguments three times in a row.

What this is not

The minimum AgentKit integration

One pip install runguard, one guard() wrap around a thin _run_step that calls Runner.run() and pulls the dollar number from result.context_wrapper.usage, and one on_trip that pages the channel you actually read. Eight lines of wrap, no guardrail subclass to register, no Runner override, no agent mixin. The breaker trips on the dollar cap or the third repeat of any agent-input signature, halts the orchestration, and leaves a structured event and a typed error behind for the post-mortem. RunGuard ships it as runguard on PyPI and @runguard/sdk on npm — same primitive, both runtimes, in-process, zero deps.