LangSmith doesn’t have a runtime loop detector. Here’s what does.
LangSmith is LangChain’s observability and evaluation platform. You add the LangSmith tracer to your AgentExecutor, StateGraph, or LCEL chain, and every tool call, every LLM invocation, every node transition, and every token count flows into LangSmith’s trace view in real time. After the run you can replay the trace, build an eval dataset from it, run your evaluators across it, and configure online scoring pipelines that sample production runs and flag quality regressions. LangSmith is deeply integrated with LangChain, LangGraph, and LCEL, and the @traceable decorator and traceable() wrapper make it straightforward to add to any Python or TypeScript function that is not already a LangChain object. All of this is genuinely useful. What LangSmith is not is a runtime loop detector: there is no onLoopDetected callback on the tracer, no hook the CallbackManager calls before each node re-executes to ask “has this signature appeared before?”, no in-process mechanism that throws LoopDetectedError when the same tool is called with the same arguments for the third consecutive time, and no pre-call gate that halts the next client.messages.create before it goes out. LangSmith records every step; it does not decide whether the next step should fire. The gap is real, measurable in dollars, and this page explains what closes it.
What LangSmith actually gives you
- Deep traces across the full LangChain and LangGraph call stack. LangSmith’s native integration with LangChain means that when you pass
callbacks=[LangChainTracer()]to anAgentExecutoror enable theLANGCHAIN_TRACING_V2=trueenvironment variable, every layer of the stack — the chain, the agent, the individual LLM calls, the tool calls, the retriever fetches — appears in the trace hierarchy. A LangGraph run surfaces each node invocation as a named span, each edge transition is visible, and the state object at each node entry and exit is stored so you can see exactly what data flowed through each edge. For debugging unexpected agent behaviour, this level of granularity is excellent: you can see not just that a tool was called but what the state looked like when the node that invoked it was entered. - Token counts and cost per LLM invocation, post-hoc. LangSmith captures
llm_output.token_usagefrom each LLM invocation and applies configured per-model rates to compute a cost estimate for each generation. That cost figure appears in the run detail view after the run completes and the trace is ingested. The cost accounting is retrospective: it runs on data that has already been shipped from your process to LangSmith’s servers, after the generation that produced it returned. At no point does the LangSmith tracer read the cumulative-cost-so-far back into your agent’s hot loop before the next LLM call. - The LangSmith Hub for prompt versioning and collaboration. LangSmith ships a hosted prompt registry (the Hub) where you can version prompt templates, diff them, share them across team members, and pull the current production version at runtime via
hub.pull(). Prompt management is one of LangSmith’s most-used features for teams running A/B tests across model deployments. It is entirely orthogonal to loop detection and does not change the runtime behaviour of the agent that uses those prompts. - Evaluation datasets, evaluators, and online scoring pipelines. You can capture LangSmith runs into datasets, write evaluators (Python functions, LLM-judge prompts, reference-free metrics) and run them across those datasets, and configure online evaluation pipelines that sample a fraction of production runs, execute your evaluators, and write scores back. This is powerful for detecting quality regressions over time: a prompt change that degrades semantic faithfulness on a sample of production traces will surface in your evaluator metrics before your users notice. Like all LangSmith observability, this is retrospective: the evaluator runs after the trace exists, not before the next node fires.
- The LangSmith playground for interactive debugging. LangSmith provides a replay interface where you can load a production run from the trace view, edit the inputs or the prompt, and re-run it interactively to test whether a different prompt or a different model produces better output. This is useful for iterating on agent prompts after a bad production run. It does not change what happens in the next production run until you deploy the updated prompt or model; it is a debugging interface, not a runtime gate.
- Structured metadata, tags, and feedback primitives. LangSmith lets you attach metadata and tags to runs (
metadata={"user_id": "...", "session_id": "..."}on the tracer or thetraceable()call) and log human or programmatic feedback via the Feedback API. The feedback primitives (create_feedback(run_id, key, score)) are useful for closing the loop between production runs and offline eval: you log a thumbs-up or thumbs-down from your UI and the same run that caused the user complaint is right there in the LangSmith dataset, queryable alongside all the other feedback. None of these primitives write a synchronous value back to the running agent. - The LangSmith tracer is a passive callback, by design. LangChain’s
CallbackManagerdispatches events to registered handlers —on_llm_start,on_llm_end,on_tool_start,on_tool_end,on_agent_action,on_agent_finish, and so on. TheLangChainTracerimplements all of these. But the callbacks are notifications, not gates:on_tool_starttells the tracer that a tool is about to be called; the tool call has already been committed to, and the callback cannot stop it. Theon_agent_actionevent is fired after the agent has decided to take an action; theon_llm_startevent is fired after the LLM call has been dispatched to the provider. This is the right design for an observability callback — you do not want your tracer to add latency or failure modes to the hot path — and it is also why the tracer fundamentally cannot stop a loop: by the time it sees the event, the action is already in flight.
What LangSmith does not have, and why the gap costs money
The LangSmith trace view for a looping AgentExecutor run is illuminating: you can see every on_tool_start event, every on_tool_end, the tool name, the tool input, the tool output, and the LLM turn that generated each action. If the agent calls search(query="2026 AI safety summit agenda") in tool-use turn 3, turn 4, turn 5, turn 6, and turn 7 — the same query, the same tool, the same 200-token result each time — that pattern is perfectly visible in the LangSmith trace. The trace is an excellent record of what happened. What the LangSmith trace does not have is a langsmith.check_loop_before_tool_call(history) function you can call inside on_tool_start that would have raised an exception before turn 5’s tool call went out. The CallbackHandler interface does not expose a way to raise an exception from a callback that bubbles out of the chain and halts the run: the callback handlers are designed to be fire-and-forget, logging their events and returning. If you tried to raise inside on_tool_start, LangChain’s callback runner would catch it and continue. There is no langsmith.stop_run(run_id) API that retroactively terminates a running agent — LangSmith can mark a run as failed via the feedback API after it completes, but it cannot interrupt one mid-execution. The same gap exists for budget: the LangSmith Python SDK does not expose a get_current_run_cost() synchronous method you can call inside your agent loop. Token counts are in the trace, server-side, after the flush. If you want a synchronous in-process check that says “this run has now spent more than $5, stop before the next LLM call”, you write it yourself: accumulate the cost from each usage block into a local variable, compare, raise. LangSmith will faithfully trace the last generation before you raise, but it is not the mechanism that raised. This matters for the same reason it matters everywhere in LLM agent infrastructure: the generation that crosses $5 into $6, or the tool call that executes for the fortieth time, is a call that has already been made. The cost is already on the invoice. A tracer that records the fortieth call is useful for the post-mortem; a circuit breaker that prevents the fourth call (and thus the fifth through fortieth) is useful for the budget.
LangGraph loop shapes that LangSmith traces clearly and RunGuard prevents
- The conditional-edge routing cycle. A LangGraph
StateGraphuses conditional edges to decide which node to visit next based on the current state. A common pattern: theagentnode runs an LLM, decides to call a tool, routes to thetoolnode viashould_continue(), the tool returns, routes back toagent. This cycle is intentional and terminates when the LLM returns a non-tool-use stop reason. The dangerous variant: theshould_continue()function reads a field from state that is never updated by the tool (because the tool errored, or returned a value the LLM didn’t parse correctly), so the condition never changes, and the edge keeps routing back toagent. LangSmith shows everyagent → tool → agent → tooltransition with the full state at each node entry. RunGuard sees the tool-call signature repeating:tool_name:tool_input_hashappears at turn 3, turn 6, turn 9. At turn 9 (the third repeat of the same tool call), the loop detector fires before theagentnode’s LLM call goes out. - The state-mutation miss causing repeated LLM calls. A stateful LangGraph node writes the LLM’s output to state and then checks whether a termination criterion is met. If the criterion depends on a field the LLM output was supposed to update but didn’t (wrong field name, serialization error, validation failure that silently ate the update), the criterion never triggers, and the graph re-enters the same node with the same state, producing the same LLM call, which produces the same non-update, which produces the same repeated node entry. LangSmith’s state diff view shows the state snapshot at each node entry is identical across turns — a human reviewing the trace can clearly see the mutation is missing. At runtime, without a circuit breaker, the graph runs until it hits the graph-level
recursion_limit(LangGraph’s default is 25) — potentially 25 LLM calls, each billable, each producing the same output, before the graph raisesGraphRecursionError. RunGuard catches the repeated LLM signature at turn 3 and fires before turn 4. - Multi-agent handoff loops. LangGraph supports multi-agent architectures where a supervisor agent routes tasks to worker agents via tool calls or graph edges. A common failure: the supervisor routes to the researcher agent, the researcher returns “I need more context, please delegate to the writer”, the writer returns “I need the researcher to complete this first”, and the supervisor routes back to the researcher. The loop can be length-2 (A→B→A→B) or length-3 (A→B→C→A→B→C). LangSmith traces the full multi-agent interaction, showing each handoff with the tool inputs and outputs. RunGuard’s
maxCycleLen: 8default catches cycles of length 1 through 8 in the signature window — a length-2 handoff loop trips the breaker on the second repeat of the cycle, before the third handoff goes out. - LCEL
with_retry()storms when the upstream is down. LangChain’s LCEL (LangChain Expression Language) provides awith_retry()combinator that retries a chain step on exception. The intent is to handle transient errors (network hiccups, rate limits) gracefully. When the upstream service is not transiently down but structurally unavailable — wrong API key, decommissioned endpoint, account suspended — every retry attempt produces the same error, andwith_retry()executes all configured retries plus the exponential-backoff waits before propagating the exception. LangSmith records every retry attempt as a separate run or span, so the trace shows the retry storm clearly. RunGuard sees the same error signature repeating and trips before the third identical attempt, saving the cost of the remaining retries and their backoff waits. - ReAct agent tool-call loops with identical arguments. LangChain’s
AgentExecutorrunning a ReAct agent can loop when the agent’s LLM keeps choosing the same tool with the same input because the tool’s output does not satisfy the LLM’s success criterion but the LLM has no other tool to try or no mechanism to escalate. LangSmith shows everyAgentActionandAgentFinishevent; the repeated tool call with the same input is visible as identical consecutiveon_tool_startevents in the trace. RunGuard’s fingerprint istool_name:tool_input_hash— the same tool with the same input produces the same signature, and the loop detector trips after the third consecutive repeat before the fourth identical LLM invocation goes out. - The context-window drift in long LangGraph sessions. A LangGraph agent that accumulates tool outputs in the messages state list grows its input token count on every turn. LangSmith shows the token count per LLM invocation and the graph surfaces a linear climb. By the time the context approaches the model’s limit (200K tokens for claude-sonnet-4-6, 128K for gpt-4.1), individual turns are expensive and quality is degraded because the model is compressing older context. RunGuard’s context guard fires before the turn whose projected token count would exceed the configured ceiling — the same instrument that prevents loops also prevents context-window blow-through, wired to the same
guard()call viaopts.tokens.
RunGuard alongside LangSmith: the eight-line wrap for a LangGraph step
// Use both: LangSmith for post-run traces, RunGuard for pre-call loop detection.
import Anthropic from "@anthropic-ai/sdk";
import { traceable } from "langsmith/traceable";
import { guard, LoopDetectedError, BudgetExceededError } from "@runguard/sdk";
const client = new Anthropic();
// Inner step: LangSmith traces every call via traceable()
const _step = traceable(
async (messages) => {
const resp = await client.messages.create({
model: "claude-sonnet-4-6", max_tokens: 1024, messages,
});
const u = resp.usage;
const usd = u.input_tokens * 3e-6 + u.output_tokens * 15e-6;
const tu = resp.content.find(b => b.type === "tool_use");
return { resp, usd, sig: `sonnet:${tu?.name ?? "end_turn"}:${JSON.stringify(tu?.input ?? {}).slice(0, 64)}` };
},
{ name: "agent-step", project_name: process.env.LANGCHAIN_PROJECT }
);
// Outer guard: RunGuard trips before the next _step() call if a loop is detected
const guardedStep = guard(_step, {
signature: (_args, out) => out.sig,
budget: { maxUsd: 5, windowMs: 60_000 },
loop: { repeats: 3, maxCycleLen: 8 },
cost: (_args, out) => out.usd,
onTrip: (e) => console.log("[runguard]", e.reason, e.message),
});
try {
while (!done) await guardedStep(messages);
} catch (e) {
if (e instanceof LoopDetectedError) console.log("halted: loop", e.pattern);
if (e instanceof BudgetExceededError) console.log("halted: budget", e.spent, "/", e.cap);
}
The composition is: traceable() wraps the inner function so LangSmith records every call; guard() wraps the traced function so RunGuard checks the signature and budget before each call goes out. Order of operations: guard fires first (pre-call decision), LangSmith traces inside the call (post-call record). If the breaker opens, the inner traceable() function never executes — LangSmith records no new span for the halted turn, so the trace ends cleanly at the last successful step. LangSmith’s record of the run up to the trip point is intact; RunGuard’s breaker prevented the turn that would have extended it.
The tracing-vs-guarding distinction from a LangSmith angle
- A tracer answers “what happened?” and surfaces it after the run. LangSmith’s fundamental value proposition is making the internals of your LangChain and LangGraph runs visible: the inputs, the outputs, the token counts, the latencies, the state at each node, the tool calls, the eval scores. All of this is computed from data that was produced during the run and shipped to LangSmith’s servers after the run completed. The trace is the run’s story told in retrospect. Its value grows with every run you trace: the more data LangSmith has, the better the percentile latency charts, the more statistically meaningful the eval distributions, the more accurate the cost-trend alerts. A tracer that tried to block a run in progress would no longer be a safe-to-add observability layer; it would be a gate with failure modes.
- A circuit breaker answers “should the next call fire?” before the call goes out. RunGuard’s
BudgetTrackerandLoopDetectorread from in-process state (a floating-point accumulator and a ring buffer of signature strings) and make a synchronous binary decision: pass or throw. The decision is made in the same thread as the agent’s hot loop, between the previous call’s response and the next call’s request. If the decision is “throw”, the HTTP request for the next call never leaves the process. The circuit breaker’s value does not grow with data volume — it fires once per decision point, and its only job is to prevent the next action when the accumulated evidence (tool-call history + cumulative spend) says a loop is in progress or a cap is crossed. - Both instruments see the same run from different vantage points. LangSmith sees the run from outside: events are shipped to its servers and made queryable after the fact. RunGuard sees the run from inside: the accumulated state lives in process memory alongside the agent’s own variables. “Has the LangGraph graph been in this node with this tool call before, on this session?” — RunGuard, synchronously, in the node function body, before the LLM is called. “How does this session’s node-visit count compare to the median session in production?” — LangSmith, asynchronously, from a dashboard query. Neither question is answerable by the other instrument in the same time frame.
- Searches for “LangSmith alternative loop detection” name a missing piece, not a competing product. People searching this phrase are in one of two situations: they use LangSmith and want the part LangSmith does not provide (a runtime loop detector that fires before the fourth tool call, not after the fortieth), or they are evaluating observability tools and wondering which one includes loop detection. The answer to both is the same: loop detection and tracing are different product categories. LangSmith is the right product for tracing. RunGuard is the right product for runtime loop prevention. The two compose directly, as shown above, and using both costs you one
npm i @runguard/sdkand one function wrap. - The LangSmith callback-based architecture cannot be retroactively made into a circuit breaker. LangChain’s callback system is intentionally designed as a one-way notification channel: callbacks receive events, they do not gate them. The
on_tool_startcallback signature is(serialized, input_str, *, run_id, parent_run_id, tags, metadata)— it returns nothing, and returning early from it does not stop the tool from executing. If you try to raise from inside a callback, LangChain catches the exception in the callback runner and logs it, then continues. Making LangSmith into a circuit breaker would require changing LangChain’s core callback contract — adding a return value that theAgentExecutororStateGraphchecks before proceeding. That would be a breaking change to LangChain’s architecture, not a LangSmith feature addition. RunGuard works at a different layer: it wraps the function that LangChain eventually calls, one level below the callback system, where it can make the pre-call decision without touching LangChain’s internals.
The LangGraph recursion_limit and why it is not the same as a loop detector
LangGraph has a built-in recursion_limit parameter (default 25 for most configurations) that raises GraphRecursionError when the graph has executed more steps than the limit. This is a useful backstop — without it, a looping graph would run until the process is killed or the account balance reaches zero. It is not the same as a loop detector for three reasons. First, it counts steps, not patterns: a graph that alternates between two nodes (A, B, A, B, A, B...) uses 25 steps before the limit fires, burning 12 or 13 full LLM calls; a loop detector that recognizes the (A, B) cycle fires at step 4 (the first repeat of the full cycle), burning 2. Second, recursion_limit is a blunt instrument: it fires regardless of whether the graph is making progress (a long legitimate multi-step research run also burns its step budget) or stuck in a tight cycle (same node, same state, same output). A signature-based loop detector only fires when it detects an actual repetition pattern in the tool-call history, leaving long legitimate runs untouched. Third, the recursion_limit is a hard stop with a non-descriptive error; the LoopDetectedError from RunGuard includes the full cycle pattern (e.pattern), the number of times it repeated (e.repeats), and the index in the window where it was detected — which means your error handler has the information it needs to log a structured event, route to a fallback, or email the on-call engineer with the exact tool-call sequence that triggered the trip. LangSmith’s trace at the time of the LoopDetectedError shows every generation up to the trip point; the e.pattern string from RunGuard tells you exactly which ones formed the cycle. Used together, you get both the precise structured error signal (RunGuard) and the full run context that explains why the loop formed (LangSmith).
Using LangSmith and RunGuard together: practical integration points
- Annotate LangSmith runs with RunGuard trip data via feedback. When RunGuard’s
onTripcallback fires, you have access to the run’s LangSmithrun_id(passed through as metadata or retrieved from the currenttraceablecontext). You can callclient.create_feedback(run_id, key="runguard_trip", score=0, comment=json.dumps({"reason": e.reason, "pattern": e.pattern, "spent": e.spent}))to annotate the LangSmith trace with the trip reason. This means the same trace that shows you the call sequence that led to the trip also has a score record that marks it as a circuit-breaker event, making it queryable in LangSmith’s filter UI: “show me all runs whererunguard_trip = 0”. The post-mortem dataset for studying loop patterns in your production agent is built from these annotated runs. - Set the RunGuard
maxUsdtighter than LangSmith’s cost alerting threshold. If your LangSmith project has a cost alert at $50/day across all runs, a single runaway agent could burn $40 before the fleet-wide alert fires. Set RunGuard’s per-runmaxUsdto $5 so the per-job fence fires at $5 and the fleet-wide daily alert serves as the backstop for sustained but non-catastrophic spend. Use LangSmith’s cost view to calibratemaxUsd: look at the 95th-percentile cost for successful runs in your project and set the cap at 2–3× that figure so legitimate long runs finish but runaway loops are caught well before they reach the fleet-level alert threshold. - Use LangSmith datasets to tune RunGuard’s
repeatsthreshold. LangSmith makes it straightforward to build a dataset of production runs and query the tool-call patterns across them. If your production traces show that legitimate runs regularly call the same tool twice in a row (because your ReAct agent does a preliminary search then a confirmatory search with the same query), setrepeats: 4orrepeats: 5instead of the default 3. LangSmith’s trace data is the empirical input for calibrating RunGuard’s sensitivity; RunGuard’s trip data annotated back into LangSmith closes the feedback loop. - Wrap the LangGraph node function, not the graph itself. In a LangGraph
StateGraph, the loop-producing step is the LLM call inside the node function that generates the next action. The most effective place to add RunGuard is around that innermost call: wrap the function that callsclient.messages.create(or the equivalent) withguard(), return{ usd, sig }alongside the response, and the breaker fires exactly when the loop pattern is first detected in the call history. Wrapping thegraph.invoke()call itself is less effective: the graph’srecursion_limitis already at that level, and adding RunGuard there gives you a second step-counter rather than a signature-aware loop detector. - Use
traceable()inside the guarded function, not outside it. The recommended composition isguard(traceable(innerFn)): the guard wraps the traced function, so the LangSmith trace is created only when the guard decides the call should proceed. If the guard trips, no new LangSmith span is created for the halted call, and the run tree is complete as-of the last successful step. The alternative,traceable(guard(innerFn)), would create a LangSmith span for every guard call including the ones that throw before the inner function runs — the trace would have entries for tripped calls with nooutputsfields, which is noisier and less useful for post-mortem analysis. Wrap in the order shown above: tracer inside, guard outside.
The LangSmith observability stack and where RunGuard sits in it
LangSmith is the primary tracing and eval layer for teams that build on LangChain and LangGraph. It integrates with the broader LangChain ecosystem: LangChainTracer auto-instruments AgentExecutor and LCEL chains, the LangSmith Hub provides shared prompt versioning, and the LangSmith Python and TypeScript SDKs expose traceable() for any non-LangChain function. The LangSmith dashboard shows cost trends, latency percentiles, eval score distributions, and per-user aggregations. For teams running large LangGraph pipelines in production, LangSmith’s run-level debugging is often the first tool opened when an agent produces unexpected output. RunGuard sits one layer below LangSmith in the call stack — inside the innermost function that makes the LLM API call — and makes a pre-call decision that LangSmith never sees if the decision is “trip”. The two products are not in the same category: LangSmith is observability infrastructure, RunGuard is safety infrastructure. Adding RunGuard to a LangSmith-instrumented stack does not change what LangSmith records (it records every call that proceeds past the guard) and does not change the guard’s behaviour based on what LangSmith has seen (the guard reads only its own in-process state). The LangGraph infinite loop guard page covers the StateGraph-specific wrap in more detail, including how to share a single guard instance across multiple graph nodes so the loop detector sees the full cross-node signature sequence rather than per-node histories. The LangChain circuit breaker page covers AgentExecutor and LCEL chain wrapping. The Langfuse alternative page explains the same observer-vs-guardian distinction for the Langfuse SDK. If your stack uses both LangSmith and LangGraph, all three pages apply: the LangSmith integration is the annotation layer, the LangGraph wrap is the structural layer, and RunGuard is the pre-call decision layer.
The first loop our SDK caught was ours — same gap, different surface
We built RunGuard while running a Claude Agent SDK session that posts a six-tweet launch thread via the X API once per day. The session had no LangSmith tracing at the time — it was a bespoke script, not a LangChain agent. The first attempt came back HTTP 402 CreditsDepleted. The next day: same error. Sessions three through six: same. Six consecutive sessions, same endpoint, same payload shape, same 402 response, same zero-progress result. The exact pattern a LangSmith trace would have shown clearly — identical post_tweet spans, identical error responses, a human reviewer seeing it would immediately recognise the loop. What was missing was the mechanism that fires before the seventh attempt. At session seven we loaded the six-entry history into our LoopDetector on startup and it found a length-1 cycle of depth 6 in the signature window, opened the breaker before any HTTP call went out, and exited cleanly. The seventh, eighth, and all subsequent sessions have exited the same way: preflight detects the persisted loop history, exit code 4, zero new API calls, zero new cost. The pattern is the same one any tool-call loop produces regardless of whether the loop is inside a LangGraph StateGraph, an AgentExecutor ReAct loop, or a bespoke daily script: same signature, repeated past the threshold, detectable before the next call fires. Read the full dogfood story on the 30-day log.
What this is not
- Not a LangSmith replacement for tracing, eval, or the Hub. RunGuard has no trace viewer, no eval runner, no prompt registry, no cost dashboard, no latency percentile charts, no human feedback collection, and no LLM-judge integration. It is a two-primitive in-process guard: a cumulative budget accumulator and a tool-call signature ring buffer. If you need post-run visibility, eval workflows, or prompt versioning, use LangSmith. If you need a runtime circuit breaker, add RunGuard. The two requirements do not overlap.
- Not a LangGraph node or an
AgentExecutorplugin. RunGuard does not integrate with LangChain’s callback system or LangGraph’s edge routing. It wraps the innermost function that makes LLM API calls — one layer below the LangChain abstraction. This means it works with any LangChain version, any LangGraph version, any future LangChain refactoring that preserves theclient.messages.createcall at the bottom of the stack. It does not depend on LangChain’s internal APIs and does not need to be updated when LangChain changes its callback contract. - Not a replacement for LangGraph’s
recursion_limit. Use both. Therecursion_limitis a blunt step counter that catches runaway graphs regardless of pattern; RunGuard is a pattern-aware breaker that catches loops before the recursion limit is reached. Settingrecursion_limit=25andguard({ loop: { repeats: 3 } })together means: legitimate long runs finish if they don’t loop (RunGuard doesn’t fire), loops are caught at step 3×cycle_length (RunGuard fires), and genuinely unexpected infinite recursion is still caught at 25 steps (LangGraph fires). The two are complementary at different granularities. - Not a cloud service or a proxy. RunGuard is an SDK you install with
npm i @runguard/sdkorpip install runguard. The circuit-breaker logic runs entirely in your process: no LangSmith-style backend, no HTTP calls from the guard itself, no third-party service in the hot path. TheBudgetTrackerandLoopDetectorare pure TypeScript and pure Python with no runtime dependencies. If your security review asks whether RunGuard phones home, the answer is no: the only bytes that leave your process after adding RunGuard are the LLM API calls it decided not to block. - Not a replacement for your monitoring stack. LangSmith’s cost alerts, Grafana dashboards, PagerDuty integrations, and Slack notifications are fleet-wide monitoring instruments. RunGuard is a per-run guard. A fleet of fifty agents each running with a $5 cap can still generate $250/run (if each agent spawns one long legitimate run) without tripping any individual breaker. Use LangSmith’s fleet-level cost alerts alongside RunGuard’s per-run caps. The loop detection fundamentals page explains the difference between run-level detection (RunGuard) and fleet-level monitoring (LangSmith, Grafana) and when each instrument is the right one.
The minimum integration alongside an existing LangSmith setup
One npm i @runguard/sdk (TypeScript) or pip install runguard (Python). One guard() wrap around the function that calls your LLM provider SDK — the same function where you already have the traceable() call for LangSmith, or where the LangChain tracer fires its on_llm_start event. Two new return fields: usd (compute from response.usage and your per-token rate — the same number you already compute for LangSmith cost tracking) and sig (the tool name plus a 64-byte slice of the tool input, or "end_turn" if the response was not a tool call). One budget option (maxUsd: 5) and one loop option (repeats: 3, maxCycleLen: 8). That is the entire integration delta. Your existing LangSmith setup — traces, eval datasets, Hub prompts, dashboard alerts — is unchanged. What changes is that the next LLM call after a budget crossing or a loop detection fires a typed exception rather than proceeding. LangSmith records every call up to the trip point; RunGuard prevents the call that would have extended the loop. RunGuard ships as @runguard/sdk on npm and runguard on PyPI. The canonical API surface is documented in llms.txt for LLM-assisted integration. The CrewAI loop detection page, browser-use cost cap page, and AgentKit budget alert page cover the same guard-wrap pattern applied to other agent frameworks. If you use LangSmith with any of those frameworks, the composition is the same: guard(traceable(innerFn)), LangSmith traces what proceeds, RunGuard prevents what shouldn’t.