A runtime cost cap for browser-use agents

Browser-use gives every Agent a max_actions_per_step, a max_failures, and an outer max_steps on Agent.run(). They are the right primitives for keeping the loop bounded. They do not put a dollar cap on the run. A planner LLM call plus 4–6 browser actions per step, multiplied across 100 steps with a frontier model and a paid proxy pool, is the kind of bill that arrives the morning after a stuck button traps the agent on a checkout page. This page is the runtime budget breaker we ship and how it slots into a browser-use step in eight lines of Python.

Where the dollars actually accumulate inside a browser-use run

Why max_steps, max_actions_per_step, and max_failures don’t cap the bill

The three knobs browser-use exposes are correct in shape and wrong in unit. max_steps on Agent.run() is a count, not a dollar cap; a step on a 30K-token screenshot costs ten times what a step on a 3K-token screenshot does, and the cap doesn’t know the difference. max_actions_per_step bounds actions inside a step — useful, but the planner LLM call is per-step, not per-action, so the dominant per-step cost is unbounded by it. max_failures only counts hard exceptions; a no-op click that the planner replans on is not a failure, it’s a regression. None of the three look at cumulative dollars spent so far. A run that legitimately needs 40 steps to fill a form and a run that fires the same broken click 40 times at $0.20 each both look identical to the executor — they just produce different invoices.

What a runtime cost cap actually has to do

Wrapping a browser-use step with runguard

# browser-use + runguard. The Agent stays an Agent; we wrap the planner
# step so the budget tracker sees every paid call and trips before the next.
from browser_use import Agent
from langchain_openai import ChatOpenAI
from runguard import guard, BudgetExceededError, LoopDetectedError

llm = ChatOpenAI(model="gpt-4o")

async def _plan_and_act(payload):
    # Wraps the planner+action round trip for one step.
    result = await payload["agent"].step()
    usage = result.usage  # tokens + usd from the model client
    return {"action": result.action, "usd": usage.total_cost_usd}

guarded_step = guard(
    _plan_and_act,
    signature=lambda i: f"step:{i['agent'].state.url}:{i['agent'].state.last_action}",
    budget={"max_usd": 5, "window_ms": 60_000},
    loop={"repeats": 3, "max_cycle_len": 8},
    cost=lambda _i, o: o["usd"],
    on_trip=lambda e: print("[runguard]", e["reason"], e.get("spent"), "of", e.get("cap"))),
)

agent = Agent(task="Book the cheapest one-way flight to Lisbon for next Friday", llm=llm)

while not agent.state.done:
    try:
        await guarded_step({"agent": agent})
    except (BudgetExceededError, LoopDetectedError) as e:
        print("halted:", e); break

The budget primitive is the BudgetTracker shipped at product/sdk/src/budget.ts: maxUsd for the cap, optional windowMs for rolling-window throttles, an add(usd) the host calls post-call, and an exceeded() the wrap reads pre-call — a hard cap with rolling-window option, no daemon, no telemetry. Defaults are honest: $5 per run is enough for a non-trivial form-fill on a frontier model, low enough that a stuck button doesn’t become a six-figure incident. The same wrap watches for loops on the same step signature; the fingerprint-and-window approach is documented at how to detect LLM tool-call loops in production; the LangChain-tool wrap is here; the multi-agent CrewAI wrap is here.

How the breaker behaves inside Agent.run()

Tuning for browser-use cost shapes

Browser-use defaults to max_steps=100 on Agent.run(). On gpt-4o at typical web-element-extraction prompt sizes, a step lands around $0.05–$0.15 of input tokens before assistant output. The default max_usd: 5 on the budget tracker corresponds to roughly 35–100 steps on the small end, 18–50 on the heavy end — an honest one-shot form-fill or research run, generous enough that legitimate workflows finish, tight enough that a stuck button trips the breaker before the bill triples. For long-running agents (an hourly scraper, a continuous monitoring agent), set window_ms: 60_000 with the same max_usd: 5: the cap rolls; old spend evicts; the cumulative invoice over an hour is unbounded but the per-minute spike is. For high-stakes work where an over-spend is worse than an under-spend (production checkout flow, paid lead capture), drop to max_usd: 1 — a tighter cap costs you one re-run on legitimate workflows; a looser cap costs you one Friday-night incident. Stack the budget guard with the loop detector on the same wrap: a stuck-click loop usually trips the loop guard first (signature repeats fast), but a slow-burn loop on slightly-different-each-time DOM hashes trips the budget instead — both stop the agent, both leave a typed error, both are cheap to retry.

Loop detection on the same wrap

The first loop our SDK caught was ours

It wasn’t a browser-use run — it was our own launch script firing a six-tweet thread against a paid X API. The first attempt came back with HTTP 402 CreditsDepleted. Six consecutive sessions later, six identical signatures — post_tweet:402:CreditsDepleted — were sitting in a flat JSON file on disk. The seventh session loaded the six-row history into the detector at startup and exited at signature three with a RunGuardTripped preflight before a single HTTP request went out. It has held the breaker open every session since. Read the dogfood story on the 30-day log; the same pattern slots into a browser-use run when a planner replans the same stuck click against the same URL three times.

What this is not

The minimum browser-use integration

One pip install runguard, one guard() wrap around the per-step round trip, one cost function that pulls a dollar number from the model SDK’s usage field, and one on_trip that pages the channel you actually read. Eight lines of wrap, no callback to register, no controller subclass, no agent mixin. The breaker trips on the dollar cap or the third repeat of any step signature, halts the run, and leaves a structured event and a typed error behind for the post-mortem. RunGuard ships it as runguard on PyPI and @runguard/sdk on npm — same primitive, both runtimes, in-process, zero deps.