Langfuse doesn’t have a real-time circuit breaker. Here’s what does.

Langfuse is an open-source LLM observability platform. You install the Langfuse SDK, wrap your LLM calls with langfuse.generation() (or drop the @observe decorator on the function that calls your model), and Langfuse records each generation’s model name, prompt, completion, token counts, cost, latency, and metadata into a trace. After the run, those traces are queryable on cloud.langfuse.com or your self-hosted instance. You can run evals on them, score them, build dashboards, configure online sampling policies, and set up alerts in your own monitoring stack that fire when Langfuse’s daily cost exceeds a threshold. Langfuse is excellent at all of this. What Langfuse is not is a real-time circuit breaker: there is no onBudgetExceeded callback on langfuse.trace(), no hook the SDK calls before each generation() to ask “should this call fire?”, and no in-process mechanism that throws BudgetExceededError when your agent crosses $5 of cumulative spend. The Langfuse SDK records what happened; it does not gate what can happen next. The gap is real, the cost of leaving it open is measurable, and this page explains what sits on the other side of it.

What Langfuse actually gives you

Traces and spans built around observations. The Langfuse data model is a Trace (a top-level unit of work, usually one user request or one agent run) that contains Spans (arbitrary sub-steps: tool calls, retrieval passes, post-processing) and Generations (LLM calls specifically, which carry model, input, output, usage, promptTokens, completionTokens, and computed calculatedTotalCost). You open a trace, you log generations into it, and at the end of the run you call langfuse.flush() or let the SDK auto-flush on process exit. The Langfuse cloud ingests each batch, associates cost at model+token rates, and surfaces a per-run cost figure in the UI.
Cost accounting that runs post-hoc, per generation. Langfuse computes cost from the generation’s usage field and the configured model prices in your project. That cost appears in the dashboard after the generation completed and the SDK flushed the batch. The timeline is: your LLM call fires, Anthropic (or OpenAI, or Gemini) returns, the response hits your process, your code calls generation.end({ output: ..., usage: ... }), the SDK queues the event, a background flush sends it to Langfuse servers, Langfuse ingests and multiplies tokens × model_rate, and the cost row appears in the UI, typically within a few seconds. At no point in this pipeline does Langfuse write a value that your agent’s main thread reads before the next LLM call fires.
Scores, evals, and online evaluation pipelines. Langfuse lets you attach scores to traces and generations programmatically (your own eval function, an LLM-judge, a human review) and configure online evaluation pipelines that sample a fraction of production traces, run your scorer against them, and write score records back. This is powerful for quality monitoring: you can detect semantic regressions in your agent’s output over time, track faithfulness or relevance on a rolling sample, and route low-scoring traces to a human review queue. None of this is synchronous to the agent’s own run: the score is computed after the trace exists, written back via the Langfuse API, and surfaced in dashboards or alert rules you configure in Langfuse or a connected monitoring tool.
Prompt management and versioning. Langfuse Cloud ships a prompt registry: you define prompt templates in the UI, version them, and pull the current production version at runtime via langfuse.getPrompt(). The SDK caches the last-fetched prompt locally so the agent keeps running if the Langfuse endpoint is unreachable. Prompt management is one of Langfuse’s most-used features for teams that A/B-test prompt versions across model deployments. It is orthogonal to circuit breaking.
Dashboards, alerts, and integration hooks. Langfuse ships a cost-over-time chart, a latency-over-time chart, token-usage histograms, user-level aggregations, and a model breakdown view. You can configure alert rules (via Langfuse’s native alert feature or via a Grafana or DataDog export of Langfuse’s metrics endpoint) that fire when daily cost, error rate, or latency exceeds a threshold. Those alerts go to Slack, PagerDuty, or an email address. By the time such an alert fires, the run that triggered it has already completed — the alert is a post-hoc signal that something is trending wrong, not an in-process gate that stops the next call.
The Langfuse SDK is a passive observer. langfuse.trace(), span.update(), generation.end(), and the @observe decorator all share the same contract: they record what happened. They are write-only, queue-backed, and designed to have zero impact on your agent’s main-thread latency. That is the right design for an observability SDK — you do not want your tracing library to block your LLM call. The consequence of that design is that the SDK also cannot stop your LLM call: there is no read path from the trace accumulator back to the agent’s hot loop before each generation.

What Langfuse does not have, and why it matters

The Langfuse SDK surfaces cost per generation in the UI. It does not provide a langfuse.checkBudget(maxUsd) function you can call before your LLM API call. It does not expose the current run’s cumulative spend as a synchronous numeric value on the trace object. It does not have a onCostCapExceeded callback you can wire to a throw. The Langfuse TypeScript SDK’s LangfuseGenerationClient type has update() and end() methods, both of which record data — neither returns a boolean your code can branch on to stop the next generation. The @observe Python decorator similarly wraps the function transparently: your function runs, Langfuse records the result. If you want to add “abort if cost crosses $5” behaviour, you write it yourself: read the accumulated cost out of your own variable that you’ve been summing from each response.usage call, compare to your cap, raise BudgetExceeded. Langfuse will dutifully record the last generation’s cost, but it is not the thing that stops the run. Similarly, Langfuse does not detect tool-call loops. The SDK records each generation’s input and output; a human reading the trace UI can see that the same tool was called with the same arguments five times in a row, but the SDK did not interrupt the fifth call. There is no langfuse.assertNoLoop() primitive, no fingerprint-and-window comparator that runs between each generation, and no typed LoopDetectedError that surfaces in the calling code. This matters because the pattern that kills teams is not a single expensive generation but a repeated pattern that compounds: the same tool call, the same 4XX error, the same re-try with the same payload, eight times in a row. At $0.03 per generation on claude-sonnet-4-6 that’s $0.24; at $0.15 per generation on claude-opus-4-7 that’s $1.20; and a more complex agent that loops 50 times over a weekend while no human is watching converts those per-call numbers into a Monday-morning invoice. The trace is excellent evidence for the post-mortem. It does not stop the fiftieth call.

The observer-vs-guardian distinction

An observer answers “what happened?” after the fact. It sits alongside your code, records every event, ships those events to a backend, and makes them queryable and alertable. The observer’s value grows with data volume: the more generations it sees, the better the cost trends, the more accurate the eval distributions, the more useful the per-model breakdown. It has no opinion about whether any individual call should fire: its job is faithful recording, not gatekeeping. Langfuse is an outstanding observer.
A guardian answers “should the next call fire?” before the call goes out. It reads from an in-process accumulator (cumulative spend so far, tool-call signature history) and makes a synchronous binary decision: pass or throw. Its value does not grow with data volume; it fires once per run boundary and its only job is to prevent the next billed action when the accumulated evidence says a loop is in progress or a budget cap is crossed. A guardian that runs post-hoc is not a guardian — it is a slow observer with extra steps.
The two instruments answer different questions at different times. “Did this agent drift toward higher cost this week vs. last week?” — Langfuse. “Is this current run crossing $5?” — RunGuard. “Which model is most expensive in production over the past 30 days?” — Langfuse. “Has this agent emitted the same tool-call signature three turns in a row?” — RunGuard. “Did my eval LLM score this trace below 0.7?” — Langfuse. “Should the next LLM call go out at all?” — RunGuard. The gap between post-hoc dashboards and pre-call prevention is not a Langfuse deficiency; it is a deliberate design split: observers are decoupled from the hot path so they are always safe to add, and guardians are tightly coupled to the hot path so they can actually stop something.
Searches for “Langfuse alternative for real-time circuit breaker” express a specific gap. People who search this phrase already use or have evaluated Langfuse. They know what it offers. They are specifically looking for the piece that Langfuse does not provide: an in-process check that halts the agent before the next expensive call. That is a different tool with a different contract, not a better tracing SDK. The “alternative” framing is technically imprecise — RunGuard does not replace Langfuse’s traces, evals, dashboards, or prompt registry — but the intent behind the search is precise: “I want real-time prevention, not just post-hoc visibility.”
The gap is open whether you use Langfuse Cloud or self-host. Self-hosted Langfuse on your own VPS gives you full data ownership and removes the cloud dependency. It does not add a real-time circuit-breaking primitive. The SDK’s contract is unchanged: write-only, queue-backed, no pre-call hook. The gap is in the SDK design, not in the hosting model.

RunGuard alongside Langfuse: the eight-line wrap

// Use both: Langfuse for post-run traces, RunGuard for pre-call circuit breaking.
import Anthropic from "@anthropic-ai/sdk";
import { Langfuse } from "langfuse";
import { guard, BudgetExceededError, LoopDetectedError } from "@runguard/sdk";

const client = new Anthropic();
const lf = new Langfuse();

async function _generate(messages, traceId) {
  const gen = lf.generation({ traceId, name: "agent-step", model: "claude-sonnet-4-6", input: messages });
  const resp = await client.messages.create({
    model: "claude-sonnet-4-6", max_tokens: 1024, messages,
  });
  const u = resp.usage;
  gen.end({ output: resp.content, usage: { input: u.input_tokens, output: u.output_tokens } });
  const usd = u.input_tokens * 3e-6 + u.output_tokens * 15e-6;  // sonnet 4.6
  const tu = resp.content.find(b => b.type === "tool_use");
  return { resp, usd, sig: `sonnet:${tu?.name ?? "end_turn"}:${JSON.stringify(tu?.input ?? {}).slice(0, 64)}` };
}

const guardedGenerate = guard(_generate, {
  signature: (_args, out) => out.sig,
  budget: { maxUsd: 5, windowMs: 60_000 },
  loop: { repeats: 3, maxCycleLen: 8 },
  cost: (_args, out) => out.usd,
  onTrip: (e) => console.log("[runguard]", e.reason, e.spent, "/", e.cap),
});

const traceId = lf.trace({ name: "agent-run" }).id;
try {
  while (!done) await guardedGenerate(messages, traceId);
} catch (e) {
  if (e instanceof BudgetExceededError) console.log("halted: budget", e.spent, "/", e.cap);
  if (e instanceof LoopDetectedError) console.log("halted: loop cycle", e.pattern);
} finally {
  await lf.flushAsync();  // always flush so Langfuse captures the partial run
}

The RunGuard wrap is the outer layer: it calls _generate, reads out.usd off the return value and pushes it into the BudgetTracker, reads out.sig and pushes it into the LoopDetector’s 32-entry sliding window. If cumulative spend crosses maxUsd: 5 before the next client.messages.create fires, the wrap throws BudgetExceededError and the HTTP request never goes out. If the last 32 signatures contain a cycle of length 1–8 that repeats 3+ times at the tail, the wrap throws LoopDetectedError before the next request fires. The finally block flushes Langfuse so the trace up to the trip point is captured in full — the post-mortem audit trail is intact even when the run was halted by the breaker. Two instruments, one hot loop, zero conflict: RunGuard guards the pre-call decision, Langfuse records the post-call evidence.

What the RunGuard primitives look like alongside Langfuse’s primitives

Langfuse’s generation.end({ usage }) records cost; RunGuard’s BudgetTracker.add(usd) enforces it. Both consume response.usage. Langfuse converts it to a cost figure server-side, stores it in the trace, and surfaces it in the UI. RunGuard converts it to a cost figure in-process, adds it to an accumulator, and on the next call checks whether the accumulator has crossed the cap. The Langfuse cost figure is useful for dashboards; the RunGuard accumulator is useful for stopping the next call. They can read the same usage object independently — there is no conflict, no double-spending, no race condition.
Langfuse’s generation.input records the prompt; RunGuard’s signature fingerprint detects repeated tool calls. Langfuse stores the full prompt and tool-use input as the generation’s input field, making it queryable and diffable in the trace UI. RunGuard hashes the tool name and a 64-byte slice of the tool input into a short signature string and checks whether that signature has appeared as part of a cycle in the last 32 turns. Langfuse can tell you what the tool was called with, in full, after the run. RunGuard can tell you whether the same call has been in a cycle, before the next call fires. The same tool-call data serves both, in different directions.
Langfuse’s cost alert fires when your daily aggregate crosses a threshold; RunGuard’s budget guard fires when this run’s per-run cumulative crosses a cap. A Langfuse cost alert is configured as a daily or hourly aggregate rule on the project level. It is appropriate for detecting a gradual drift toward higher spend across all runs, catching a new model that costs more than expected, or noticing a new feature that is unexpectedly expensive. A RunGuard budget guard is per-run, per-invocation, and fires before the call that would cross it. A single runaway agent doing 200 tool calls in 10 minutes can trip the RunGuard breaker on call 17 and leave 183 calls worth of cost unspent; the same agent on Langfuse alone would not trip the daily aggregate alert (200 tool calls is unusual for a single agent but not for an org-wide daily aggregate) and would surface the anomaly in the next dashboard refresh after the damage is done.
Langfuse’s @observe decorator has zero performance impact on your LLM call; RunGuard’s guard() wrap adds a synchronous integer comparison and a ring-buffer scan. The @observe decorator is designed to be safe to add to any function, production or otherwise, without measurable latency impact: it queues events to a background batch, never blocks the main thread, and fails silently if the Langfuse endpoint is unreachable. The RunGuard wrap adds a synchronous BudgetTracker.exceeded() check (one floating-point comparison) and a LoopDetector.scan() (a Knuth-Morris-Pratt cycle scan over at most 32 string entries), both of which run in microseconds and never touch the network. Neither the Langfuse observer nor the RunGuard guard adds meaningful latency to a call that is already spending 100–2000ms waiting for the LLM API to return.
Both instruments are additive, not competitive. Adding RunGuard to a codebase that already uses Langfuse requires wrapping one function (the function that calls client.messages.create, or the tool-dispatch step, or the agent’s main loop step) with guard() and returning { usd, sig } alongside the response. The Langfuse instrumentation stays exactly as it was: generation.end({ usage }) still fires on each successful call, traces are still shipped, evals still run, dashboards still populate. The only new behavior is that calls which would exceed the budget or close a loop cycle are halted before the HTTP request goes out. The Langfuse trace for that run will end at the last successful generation and capture its normal completion data in the finally block flush.

The patterns Langfuse records that RunGuard prevents

A Langfuse trace that shows the same tool name with the same arguments in five consecutive generations is evidence of a loop. A human reviewing that trace in the Langfuse UI can clearly see the pattern: search_web(query="climate summit 2026 agreement text") called in generation 3, generation 4, generation 5, generation 6, generation 7, all returning the same result, all producing the same assistant turn. The trace is a clear picture of what happened. What it cannot tell you is what generation 8 would have looked like, because generation 8 never fired: the RunGuard loop detector found a length-1 cycle of depth 3 at the tail of the signature window at generation 6 and threw LoopDetectedError before generation 7’s API request went out. The pattern RunGuard prevents is exactly the pattern that Langfuse records clearly. This is the intended division: Langfuse captures the evidence so the next similar run can be anticipated; RunGuard stops the run before the evidence accumulates past the threshold. A tool-call loop that accumulates over 200 generations before a human notices the Langfuse trace is a tool-call loop that cost $6. A tool-call loop that the RunGuard breaker trips at generation 4 is a tool-call loop that cost $0.12. The difference is not which tool you used; it is which instrument answered “should the next call fire?” before the fiftieth invoice line arrived. The same logic applies to context-window drift: Langfuse records input_tokens on each generation and shows a chart of token usage over the run. When that chart shows a linear climb toward the model’s context limit, the trace is revealing that the system prompt plus the accumulated tool-use history plus the prior completions is inflating with each turn. RunGuard’s context guard (a third primitive alongside the budget tracker and loop detector, wired to the same guard() wrap via a contextTokens extractor) fires before the generation whose token count would exceed the configured threshold. Langfuse records the climb; RunGuard cuts it off.

Using Langfuse and RunGuard together without double-instrumenting

Wrap the innermost LLM call, not the agent class. The cleanest composition is to wrap the function that calls client.messages.create (or openai.chat.completions.create or whatever SDK your stack uses) with both guard() and langfuse.generation(). The guard wraps the function and adds the pre-call check; the Langfuse generation wraps the call inside the function and records the post-call data. Order: guard fires first (pre-call), Langfuse records last (post-call). The two are not in conflict because they answer different questions at different points in the same call cycle.
Return { usd, sig } from the inner function; Langfuse already has what it needs. The RunGuard wrap needs a cost figure and a signature string. You can derive both from response.usage (cost) and the tool-use block’s name + a slice of input (signature) — the same data Langfuse reads for its generation.end({ usage }) call. No extra API calls, no separate computation: out.usd = input_tokens * rate_in + output_tokens * rate_out and out.sig = `${model}:${tu?.name}:${JSON.stringify(tu?.input).slice(0, 64)}`. The inner function computes both, returns them as structured fields on the result object, and the guard reads them. Langfuse reads the same response to fill the generation record. No redundancy, no conflict.
Always flushAsync() in a finally block. If RunGuard trips the breaker mid-run, the outer try block exits via exception. Without a finally that calls await lf.flushAsync(), the Langfuse SDK’s background batch may not ship before the process exits or the event loop drains: the partial trace is lost. With the finally flush, Langfuse captures every generation up to and including the last one before the trip, and the post-mortem has the full picture of what the agent was doing when the breaker fired. This is standard Langfuse guidance for short-lived processes and applies equally when the process exits normally and when it exits via a guard exception.
Set the RunGuard maxUsd tighter than the Langfuse alert threshold. If your Langfuse project has a cost alert configured at $20/day across all agents, set the RunGuard per-run maxUsd at $5 or lower for individual agents. The Langfuse alert is a fleet-wide backstop; the RunGuard cap is a per-job fence. A fleet-wide alert at $20/day means one rogue agent can burn $19 before the alert fires; a per-job cap at $5 means the same agent halts at $5 and the fleet stays under $6 for that job. Use both thresholds: fleet-wide for trend detection, per-job for incident prevention.
Log the trip event to the Langfuse trace as a score or a custom event. When RunGuard fires onTrip, you can log a structured event back to the Langfuse trace: lf.score({ traceId, name: "runguard_trip", value: 0, comment: JSON.stringify({ reason: e.reason, spent: e.spent, cap: e.cap, pattern: e.pattern }) }). This gives Langfuse’s dashboard a per-trace signal that the run was halted by the circuit breaker, so the post-mortem trace is annotated and searchable (“show me all traces where runguard_trip score = 0”). The trace remains the source of truth for what happened; the score records why the run stopped.

Other Langfuse-adjacent tools and where RunGuard fits

The LLM observability space in 2026 has several players with overlapping feature sets: Langfuse (open-source tracing + evals), LangSmith (LangChain’s cloud platform, tracing + hub + prompt playground), Helicone (proxy-based observability, caches by default, one-line integration), and Braintrust (evals-first, with a trace viewer built around experiment runs). All four are observers: they record what happened to LLM calls, surface cost and latency data, and support eval workflows. None of them ship an in-process circuit breaker that stops the next call before it fires. The product category gap is not Langfuse-specific; it is industry-wide. The reason is clear: observability tools are designed to be safe to add, which means they must be passive; a tool that blocks your LLM call is no longer safe to add without understanding its failure modes. RunGuard fills the other category — a guardian, not an observer — which means it has explicit failure modes (the typed errors) and an explicit contract (it will stop the call, by design). You add it deliberately, with a configured cap, not as a passive recorder. The LangChain circuit breaker page covers the AgentExecutor Tool wrap specifically; the LangGraph infinite loop guard page covers the StateGraph node wrap; the CrewAI loop detection page covers the CrewAI tool wrap. If your stack already uses Langfuse for tracing any of those frameworks, the RunGuard wrap composes at the same innermost layer without touching the Langfuse instrumentation.

The first loop our SDK caught was ours — same gap, different surface

We were running a Claude Agent SDK session once per day that posts a six-tweet launch thread via the X API. The first call came back HTTP 402 CreditsDepleted. We had Langfuse tracing on the session: the trace showed one post_tweet generation with a tool_result containing the 402 response. Useful. What it did not do was stop the next daily session from firing the same call the next morning. Session two: same 402, same trace, same cost. Sessions three through six: same. Six rows in the Langfuse dashboard, six identical traces, six identical tool inputs, six identical 402 responses. On session seven we loaded the six-row history into our LoopDetector at startup and it found a length-1 cycle of depth 6 before the first API call went out — exiting with a RunGuardTripped preflight error, no HTTP request, no new Langfuse generation created, zero additional cost. The pattern is the same one any tool-call loop produces: same signature, repeated across multiple turns (or in our case, multiple sessions), the first three alone sufficient to trip the detector. Read the dogfood story on the 30-day log; the Langfuse trace was the evidence that confirmed the pattern after session six, and RunGuard was the mechanism that stopped session seven before it added a seventh row to the evidence pile.

What this is not

Not a Langfuse replacement for tracing, evals, or dashboards. RunGuard has no trace viewer, no eval pipeline, no prompt registry, no cost dashboard, no per-model latency chart, no team access controls, and no LLM-judge integration. It is a two-primitive in-process guard: a budget accumulator and a loop fingerprint comparator. If you need tracing, evals, or observability, use Langfuse (or LangSmith, or Helicone, or Braintrust). If you need real-time prevention, add RunGuard to the mix. The two requirements are not in conflict.
Not a proxy-based solution. Unlike Helicone and some other observability products, RunGuard is not an HTTP proxy that sits between your process and the Anthropic API. It wraps a function in your process. This means no TLS termination at a third-party server, no API key routing through a vendor, no cold-start latency on the first intercepted request, and no “proxy is down, agent is down” failure mode. The tradeoff: it is not one-line: you wrap one function, you return two extra fields. The loop detection fundamentals page explains why the in-process approach catches cycles that a proxy cannot see.
Not a managed cloud service. RunGuard is an SDK you install with npm i @runguard/sdk or pip install runguard. The circuit-breaker logic runs entirely in your process: the BudgetTracker at product/sdk/src/budget.ts is 84 lines of pure TypeScript with no network calls; the LoopDetector at product/sdk/src/loop-detector.ts is 111 lines, also pure, also zero network calls. If you have a security review that asks “does this library phone home?” the answer is no: the only bytes that leave your process after adding RunGuard are the same LLM API calls you were already making (minus the ones the breaker stopped).
Not a replacement for Langfuse’s cost alerts on aggregates. A per-run cap of $5 and a fleet-wide daily alert at $20 are different instruments at different scopes. Do not replace the Langfuse daily alert with a tighter RunGuard cap and declare the problem solved: a fleet of ten agents each capped at $5/run can still generate $500/day in aggregate if they run a hundred times each. Use both: RunGuard caps individual runs, Langfuse alerts on fleet-wide drift. The per-job fence and the fleet-wide smoke detector are complementary, not redundant.

The minimum integration alongside an existing Langfuse setup

One npm i @runguard/sdk (TypeScript) or pip install runguard (Python). One guard() wrap around the function that calls your LLM API and already calls generation.end() for Langfuse. Two new return fields on that function: usd (from response.usage and your per-token rate, the same numbers you already compute for Langfuse’s cost tracking) and sig (the tool name plus a 64-byte slice of the tool input, or "end_turn" if the response was not a tool use). One finally block that calls await lf.flushAsync() so Langfuse captures the partial trace when the breaker fires. That is the entire integration delta. Your existing Langfuse setup — traces, evals, dashboards, prompts, alerts — is unchanged. What changes is that the next LLM call after a budget crossing or a loop cycle is prevented rather than recorded. The observer now has a guardian standing next to it. RunGuard ships as @runguard/sdk on npm and runguard on PyPI. The primitives are documented in llms.txt for LLM-assisted integration (so an AI coding assistant reading your codebase to add RunGuard gets the canonical API surface, not a hallucinated one). The widget at embed-preview.html lets you estimate per-run cost from your actual tool-call count and token averages before you commit to a cap figure.