Traceloop OpenLLMetry vs RunGuard: loop detection for production LLM agents

Traceloop’s OpenLLMetry is the leading open standard for instrumenting LLM calls with OpenTelemetry. It wraps every major LLM SDK (OpenAI, Anthropic, Cohere, Vertex, Bedrock) and framework (LangChain, LlamaIndex, Haystack) and emits standard OTel spans that route to any backend — your own Jaeger, Grafana Tempo, Datadog, or Traceloop’s hosted dashboard. It is excellent infrastructure. It cannot stop a looping agent. This page covers why — with precision — and how to add RunGuard to an OpenLLMetry-instrumented stack so that traces are short by design, not by luck.

OpenLLMetry’s architecture: spans on the write path

OpenLLMetry works via monkey-patching at the SDK level. When you call Traceloop.init(), it patches the openai, anthropic, and other SDK clients so that every call emits an OTel span. The span records the request inputs (prompt, model, parameters), the response (completion text, usage), and the latency. These spans are sent asynchronously to your configured exporter.

This architecture has three properties that are ideal for observability:

Property three is the relevant one here: the exporter is async. The span is emitted after the call completes. There is no mechanism in the OTel model for an exporter to interrupt the call that generated its span — that would be a circular dependency in the data flow.

Why OTel spans can’t detect loops in real time

Loop detection requires two things that OTel spans structurally cannot provide:

  1. A window of previous call fingerprints, readable before the next call. To know that the current call is the third repeat of the same signature, you need to remember the previous two. OTel spans are write-only from the application’s perspective. The span exporter that receives them may be running in another process, another machine, or another cloud region. There is no standard OTel API for reading recent spans back into the process that emitted them.
  2. The ability to raise an exception before the call executes. Even if you could read the span history synchronously, the OTel instrumentation layer has already wrapped the underlying SDK. There is no hook in the standard OTel spec for an instrumentation layer to veto an outgoing call based on span history.

These are not implementation gaps in Traceloop or OpenLLMetry — they are structural properties of the observability architecture. Observability is read-after-write. Guards are pre-write. They require different abstractions.

RunGuard’s loop detection: how it actually works

Adding RunGuard to an OpenLLMetry-instrumented agent

# Python — both libraries installed in the same environment
# pip install runguard traceloop-sdk

from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import task
from runguard import guard, BudgetTracker, LoopDetectedError

# Init OpenLLMetry — patches all LLM SDKs automatically
Traceloop.init(
    app_name="my-research-agent",
    api_key=os.environ["TRACELOOP_API_KEY"],
)

tracker = BudgetTracker(max_usd=2.50)

# Stack: @task for Traceloop span, @guard for loop protection
@task(name="web_search")
@guard(budget=tracker, loop_window=20, loop_threshold=3)
async def web_search(query: str, max_results: int = 5) -> list:
    # Traceloop creates a span for this call
    # RunGuard checks the deque before calling the underlying API
    return await search_api.search(query, n=max_results)

# In your agent loop, handle the trip gracefully
async def run_agent(task: str):
    try:
        await agent_loop(task)
    except LoopDetectedError as e:
        # Traceloop has the full span tree up to the trip
        # Propagate a structured error for your caller to handle
        raise AgentStuckError(
            f"Agent looped on {e.tool_name}: {e.count} repeats"
        ) from e

Capability comparison: what each tool covers

CapabilityTraceloop OpenLLMetryRunGuard
Auto-instrument LLM SDK callsYes — zero code change, all major SDKsNo (use Traceloop for this)
OTel-compatible spans to any backendYes — Jaeger, Datadog, Grafana, etc.No
Association ID grouping across agent stepsYes — @workflow decoratorNo
Real-time loop detection (pre-call)NoYes — in-process, synchronous
Per-run budget capNoYes — BudgetExceededError before next call
Context-window proximity alertNoYes — ContextOverflowError
Framework-agnostic tool wrappingPartial — framework instrumentationsYes — any callable
Works with any OTel backendYesN/A (not an OTel component)

What the combined stack looks like in production

With both tools running:

Add loop detection to your OpenLLMetry stack

RunGuard installs in one command and wraps any function. If you’re already using Traceloop OpenLLMetry for observability, you can add a circuit breaker in five minutes.

Get started with RunGuard — or read about how loop detection works in detail, or compare with Langfuse and Arize Phoenix.