Weights & Biases Weave vs RunGuard: circuit breakers for production AI agents

Weights & Biases built Weave to be what W&B always was for ML training — a tracking and evaluation layer — now applied to LLM calls and agent traces. It integrates deeply with the W&B ecosystem: runs, artifacts, evaluations, and dashboards all live in the same platform you use for model training. What it shares with every other observability platform is the same architectural constraint: it records what happened, it does not interrupt what is happening. RunGuard is the interrupt. This page explains the distinction precisely and shows how the two tools compose.

W&B Weave: tracking-first LLM observability

Weave is W&B’s LLM observability and evaluation product. It instruments LLM calls via a @weave.op() decorator (Python) or a JS wrapper, groups them into call trees, and stores every input/output pair in the W&B cloud. From there you can:

This is excellent infrastructure for the engineering workflow around LLM applications. It answers: “Is prompt version B better than A on this dataset?”, “What fraction of my agent’s calls are hallucinating?”, “What was the exact input to the model on the run that produced a bad output?”

It does not answer “How do I stop this run from making 25 identical calls?” because that question requires in-process, synchronous intervention — the opposite architectural direction from a write-side tracer.

Why @weave.op() does not stop loops

When you decorate a function with @weave.op(), Weave wraps it to capture the call inputs, outputs, and timing, then ships those to the W&B backend. The wrapped function still executes normally; the decorator adds a logging side-effect, not a gate.

If your agent calls web_search("AI regulations 2026") three times in a row with the same arguments because a pagination bug caused it to never advance past page 1, Weave records three calls to web_search, all with the same inputs, all with the same outputs. You will see the pattern clearly in the Weave UI after the run completes. But the decorator did not have access to the call history; it had no window into what the previous two calls returned; and it had no mechanism to raise an exception on the third occurrence.

The fix would require you to add loop-detection logic explicitly to the decorated function or its caller — which is exactly what RunGuard’s guard() wrapper does, as a pre-built, configurable primitive.

How RunGuard’s circuit breaker is different

Using Weave and RunGuard together in Python

# Both decorators stack cleanly — apply RunGuard first (innermost), Weave second
import weave
from runguard import guard, BudgetTracker, LoopDetectedError, BudgetExceededError

weave.init("my-agent-project")

tracker = BudgetTracker(max_usd=3.0)

@weave.op()
@guard(budget=tracker, loop_window=24, loop_threshold=3)
async def web_search(query: str) -> dict:
    # Weave logs every call to W&B; RunGuard halts on the 3rd repeat
    response = await search_api.search(query)
    return response

# In your agent loop:
try:
    result = await web_search("AI regulations 2026")
except LoopDetectedError as e:
    # Weave has the call tree up to the trip point in W&B
    # RunGuard stopped the run before call #4
    print(f"Loop detected: {e.signature} repeated {e.count} times")
except BudgetExceededError as e:
    print(f"Budget exceeded: ${e.accumulated_usd:.2f} of ${e.max_usd:.2f}")

Decorator order matters: @guard is innermost (closest to the function), so it runs the fingerprint check before Weave’s logging wrapper even sees the call. When the guard raises, Weave records the exception as the call result — which is exactly right; you want the loop-detected event in your W&B trace.

Side-by-side capability comparison

CapabilityW&B WeaveRunGuard
LLM call tracing and storageYes — @weave.op() decoratorNo (use Weave for this)
Prompt version trackingYes — W&B Artifacts integrationNo
Dataset-based evaluationYes — Weave EvaluationsNo
Halt on repeated call signatureNoYes — raises LoopDetectedError
Per-run USD budget enforcementNo (cost visibility only)Yes — raises BudgetExceededError
Context-window overflow protectionNoYes — ContextOverflowError at configurable threshold
Real-time Slack alert on circuit tripNot on trip — W&B alerts are post-runYes — webhook fires synchronously on trip
Works without backend round-tripNo — logs go to W&B cloudYes — in-process only
Python supportYesYes
TypeScript/JavaScript supportYes (limited)Yes — first-class

The production case for layering both

Teams that use W&B for model training already have strong habits around experiment tracking and evaluation. Adding Weave to the LLM side of their stack is a natural extension — the same mental model, the same platform, the same dashboards their data science team already knows.

RunGuard fills the production-safety gap that Weave (by design) does not address. When a new agent version ships and hits an edge case that turns into a loop, Weave will have the trace data you need to understand and fix it. RunGuard will have stopped the loop at 3 iterations instead of 25, capping the cost and the blast radius.

For teams already deep in the W&B ecosystem, the layering pattern is: @guard innermost for safety, @weave.op() outer for observability. Both decorators see every call. The guard decides whether the call goes through; Weave logs whatever happens.

Add a circuit breaker to your Weave-instrumented agents

RunGuard takes five minutes to add to an existing Python or TypeScript agent. Set a dollar ceiling, set a loop threshold, and deploy. Your W&B traces get more informative (looping runs terminate early with a typed error); your cost surprises stop.

Get started with RunGuard — or compare it to Braintrust, Arize Phoenix, and Langfuse.