Langfuse doesn’t have a real-time circuit breaker. Here’s what does.

Langfuse is an open-source LLM observability platform. You install the Langfuse SDK, wrap your LLM calls with langfuse.generation() (or drop the @observe decorator on the function that calls your model), and Langfuse records each generation’s model name, prompt, completion, token counts, cost, latency, and metadata into a trace. After the run, those traces are queryable on cloud.langfuse.com or your self-hosted instance. You can run evals on them, score them, build dashboards, configure online sampling policies, and set up alerts in your own monitoring stack that fire when Langfuse’s daily cost exceeds a threshold. Langfuse is excellent at all of this. What Langfuse is not is a real-time circuit breaker: there is no onBudgetExceeded callback on langfuse.trace(), no hook the SDK calls before each generation() to ask “should this call fire?”, and no in-process mechanism that throws BudgetExceededError when your agent crosses $5 of cumulative spend. The Langfuse SDK records what happened; it does not gate what can happen next. The gap is real, the cost of leaving it open is measurable, and this page explains what sits on the other side of it.

What Langfuse actually gives you

What Langfuse does not have, and why it matters

The Langfuse SDK surfaces cost per generation in the UI. It does not provide a langfuse.checkBudget(maxUsd) function you can call before your LLM API call. It does not expose the current run’s cumulative spend as a synchronous numeric value on the trace object. It does not have a onCostCapExceeded callback you can wire to a throw. The Langfuse TypeScript SDK’s LangfuseGenerationClient type has update() and end() methods, both of which record data — neither returns a boolean your code can branch on to stop the next generation. The @observe Python decorator similarly wraps the function transparently: your function runs, Langfuse records the result. If you want to add “abort if cost crosses $5” behaviour, you write it yourself: read the accumulated cost out of your own variable that you’ve been summing from each response.usage call, compare to your cap, raise BudgetExceeded. Langfuse will dutifully record the last generation’s cost, but it is not the thing that stops the run. Similarly, Langfuse does not detect tool-call loops. The SDK records each generation’s input and output; a human reading the trace UI can see that the same tool was called with the same arguments five times in a row, but the SDK did not interrupt the fifth call. There is no langfuse.assertNoLoop() primitive, no fingerprint-and-window comparator that runs between each generation, and no typed LoopDetectedError that surfaces in the calling code. This matters because the pattern that kills teams is not a single expensive generation but a repeated pattern that compounds: the same tool call, the same 4XX error, the same re-try with the same payload, eight times in a row. At $0.03 per generation on claude-sonnet-4-6 that’s $0.24; at $0.15 per generation on claude-opus-4-7 that’s $1.20; and a more complex agent that loops 50 times over a weekend while no human is watching converts those per-call numbers into a Monday-morning invoice. The trace is excellent evidence for the post-mortem. It does not stop the fiftieth call.

The observer-vs-guardian distinction

RunGuard alongside Langfuse: the eight-line wrap

// Use both: Langfuse for post-run traces, RunGuard for pre-call circuit breaking.
import Anthropic from "@anthropic-ai/sdk";
import { Langfuse } from "langfuse";
import { guard, BudgetExceededError, LoopDetectedError } from "@runguard/sdk";

const client = new Anthropic();
const lf = new Langfuse();

async function _generate(messages, traceId) {
  const gen = lf.generation({ traceId, name: "agent-step", model: "claude-sonnet-4-6", input: messages });
  const resp = await client.messages.create({
    model: "claude-sonnet-4-6", max_tokens: 1024, messages,
  });
  const u = resp.usage;
  gen.end({ output: resp.content, usage: { input: u.input_tokens, output: u.output_tokens } });
  const usd = u.input_tokens * 3e-6 + u.output_tokens * 15e-6;  // sonnet 4.6
  const tu = resp.content.find(b => b.type === "tool_use");
  return { resp, usd, sig: `sonnet:${tu?.name ?? "end_turn"}:${JSON.stringify(tu?.input ?? {}).slice(0, 64)}` };
}

const guardedGenerate = guard(_generate, {
  signature: (_args, out) => out.sig,
  budget: { maxUsd: 5, windowMs: 60_000 },
  loop: { repeats: 3, maxCycleLen: 8 },
  cost: (_args, out) => out.usd,
  onTrip: (e) => console.log("[runguard]", e.reason, e.spent, "/", e.cap),
});

const traceId = lf.trace({ name: "agent-run" }).id;
try {
  while (!done) await guardedGenerate(messages, traceId);
} catch (e) {
  if (e instanceof BudgetExceededError) console.log("halted: budget", e.spent, "/", e.cap);
  if (e instanceof LoopDetectedError) console.log("halted: loop cycle", e.pattern);
} finally {
  await lf.flushAsync();  // always flush so Langfuse captures the partial run
}

The RunGuard wrap is the outer layer: it calls _generate, reads out.usd off the return value and pushes it into the BudgetTracker, reads out.sig and pushes it into the LoopDetector’s 32-entry sliding window. If cumulative spend crosses maxUsd: 5 before the next client.messages.create fires, the wrap throws BudgetExceededError and the HTTP request never goes out. If the last 32 signatures contain a cycle of length 1–8 that repeats 3+ times at the tail, the wrap throws LoopDetectedError before the next request fires. The finally block flushes Langfuse so the trace up to the trip point is captured in full — the post-mortem audit trail is intact even when the run was halted by the breaker. Two instruments, one hot loop, zero conflict: RunGuard guards the pre-call decision, Langfuse records the post-call evidence.

What the RunGuard primitives look like alongside Langfuse’s primitives

The patterns Langfuse records that RunGuard prevents

A Langfuse trace that shows the same tool name with the same arguments in five consecutive generations is evidence of a loop. A human reviewing that trace in the Langfuse UI can clearly see the pattern: search_web(query="climate summit 2026 agreement text") called in generation 3, generation 4, generation 5, generation 6, generation 7, all returning the same result, all producing the same assistant turn. The trace is a clear picture of what happened. What it cannot tell you is what generation 8 would have looked like, because generation 8 never fired: the RunGuard loop detector found a length-1 cycle of depth 3 at the tail of the signature window at generation 6 and threw LoopDetectedError before generation 7’s API request went out. The pattern RunGuard prevents is exactly the pattern that Langfuse records clearly. This is the intended division: Langfuse captures the evidence so the next similar run can be anticipated; RunGuard stops the run before the evidence accumulates past the threshold. A tool-call loop that accumulates over 200 generations before a human notices the Langfuse trace is a tool-call loop that cost $6. A tool-call loop that the RunGuard breaker trips at generation 4 is a tool-call loop that cost $0.12. The difference is not which tool you used; it is which instrument answered “should the next call fire?” before the fiftieth invoice line arrived. The same logic applies to context-window drift: Langfuse records input_tokens on each generation and shows a chart of token usage over the run. When that chart shows a linear climb toward the model’s context limit, the trace is revealing that the system prompt plus the accumulated tool-use history plus the prior completions is inflating with each turn. RunGuard’s context guard (a third primitive alongside the budget tracker and loop detector, wired to the same guard() wrap via a contextTokens extractor) fires before the generation whose token count would exceed the configured threshold. Langfuse records the climb; RunGuard cuts it off.

Using Langfuse and RunGuard together without double-instrumenting

Other Langfuse-adjacent tools and where RunGuard fits

The LLM observability space in 2026 has several players with overlapping feature sets: Langfuse (open-source tracing + evals), LangSmith (LangChain’s cloud platform, tracing + hub + prompt playground), Helicone (proxy-based observability, caches by default, one-line integration), and Braintrust (evals-first, with a trace viewer built around experiment runs). All four are observers: they record what happened to LLM calls, surface cost and latency data, and support eval workflows. None of them ship an in-process circuit breaker that stops the next call before it fires. The product category gap is not Langfuse-specific; it is industry-wide. The reason is clear: observability tools are designed to be safe to add, which means they must be passive; a tool that blocks your LLM call is no longer safe to add without understanding its failure modes. RunGuard fills the other category — a guardian, not an observer — which means it has explicit failure modes (the typed errors) and an explicit contract (it will stop the call, by design). You add it deliberately, with a configured cap, not as a passive recorder. The LangChain circuit breaker page covers the AgentExecutor Tool wrap specifically; the LangGraph infinite loop guard page covers the StateGraph node wrap; the CrewAI loop detection page covers the CrewAI tool wrap. If your stack already uses Langfuse for tracing any of those frameworks, the RunGuard wrap composes at the same innermost layer without touching the Langfuse instrumentation.

The first loop our SDK caught was ours — same gap, different surface

We were running a Claude Agent SDK session once per day that posts a six-tweet launch thread via the X API. The first call came back HTTP 402 CreditsDepleted. We had Langfuse tracing on the session: the trace showed one post_tweet generation with a tool_result containing the 402 response. Useful. What it did not do was stop the next daily session from firing the same call the next morning. Session two: same 402, same trace, same cost. Sessions three through six: same. Six rows in the Langfuse dashboard, six identical traces, six identical tool inputs, six identical 402 responses. On session seven we loaded the six-row history into our LoopDetector at startup and it found a length-1 cycle of depth 6 before the first API call went out — exiting with a RunGuardTripped preflight error, no HTTP request, no new Langfuse generation created, zero additional cost. The pattern is the same one any tool-call loop produces: same signature, repeated across multiple turns (or in our case, multiple sessions), the first three alone sufficient to trip the detector. Read the dogfood story on the 30-day log; the Langfuse trace was the evidence that confirmed the pattern after session six, and RunGuard was the mechanism that stopped session seven before it added a seventh row to the evidence pile.

What this is not

The minimum integration alongside an existing Langfuse setup

One npm i @runguard/sdk (TypeScript) or pip install runguard (Python). One guard() wrap around the function that calls your LLM API and already calls generation.end() for Langfuse. Two new return fields on that function: usd (from response.usage and your per-token rate, the same numbers you already compute for Langfuse’s cost tracking) and sig (the tool name plus a 64-byte slice of the tool input, or "end_turn" if the response was not a tool use). One finally block that calls await lf.flushAsync() so Langfuse captures the partial trace when the breaker fires. That is the entire integration delta. Your existing Langfuse setup — traces, evals, dashboards, prompts, alerts — is unchanged. What changes is that the next LLM call after a budget crossing or a loop cycle is prevented rather than recorded. The observer now has a guardian standing next to it. RunGuard ships as @runguard/sdk on npm and runguard on PyPI. The primitives are documented in llms.txt for LLM-assisted integration (so an AI coding assistant reading your codebase to add RunGuard gets the canonical API surface, not a hallucinated one). The widget at embed-preview.html lets you estimate per-run cost from your actual tool-call count and token averages before you commit to a cap figure.