LLM agent fault tolerance patterns: retry, circuit breaker, fallback, and graceful degradation

LLM agents fail in ways that traditional web services do not. A web service either returns a response or throws an exception — faults are discrete, per-request events. An LLM agent runs over multiple turns, accumulates state, and has failure modes that compound: a 429 rate-limit error on turn 3 triggers a retry loop that burns budget on turns 4 through 8; a context window approaching its limit causes model quality to silently degrade across turns 10 through 15 before a hard truncation error fires on turn 16; a tool that returns an unexpected error type causes the model to try every variant it can think of, each variant burning tokens and money. Standard microservice fault tolerance patterns — retry with exponential backoff, circuit breakers, health checks — address transient infrastructure failures. They do not address the unique failure modes of multi-turn, stateful, cost-accumulating systems. This guide covers four fault tolerance patterns adapted for LLM agents: (1) retry with jitter for transient provider errors, (2) circuit breaking for tool-call loops and retry storms, (3) model fallback for provider outages and budget exhaustion, and (4) graceful degradation when the current session cannot complete its task within constraints.

Pattern 1: retry with jitter for transient LLM API errors

Pattern 2: circuit breaking for tool-call loops and retry storms

Pattern 3: model fallback for provider outages and cost escalation

Pattern 4: graceful degradation when the task cannot complete within constraints

LLM agent fault tolerance pattern comparison

Pattern Failure mode addressed Without pattern With pattern
Retry with jitter Transient 429 / 5xx errors Task fails on first transient error; no retry budget management Distributed retry avoids thundering-herd; non-retryable errors fail fast
Circuit breaker (loop detection) Tool-call loops and retry storms Agent loops indefinitely, billing thousands of tokens per minute Breaker trips after N repeats; loop detected before the bill lands
Model fallback Budget exhaustion and provider outages Task hard-fails or overspends when primary model is unavailable or expensive Cheaper model completes remaining work within budget; outage path has an alternate
Graceful degradation Any fault that prevents task completion Unhandled exception; caller has no idea what completed and where to resume Structured partial result; caller can resume from the correct checkpoint

For the broader cost control architecture these patterns sit inside, see autonomous agent cost control best practices. For retry-storm-specific circuit breaking with RunGuard, see AI agent retry storm prevention. For the graceful degradation model-fallback approach in detail, see AI agent graceful degradation patterns.

Add fault tolerance to your LLM agent in minutes

RunGuard installs in one command: pip install runguard for Python, npm install @runguard/sdk for TypeScript. The circuit breaking and budget tracking patterns above use RunGuard’s LoopDetector and BudgetExceededError directly — no wrappers or config files required. Start with the circuit breaker (Pattern 2) as it catches the highest-cost failure mode: tool loops. Add the budget cap next. Then layer in retry with jitter and graceful degradation as your agent moves toward production.

RunGuard pricing: Solo plan at $19/month for individual developers. Team plan at $79/month adds Slack and PagerDuty webhook alerts, shared dashboards, and audit log. Both plans include a 14-day free trial — no credit card required.

Start your 14-day free trial — or explore related: AI agent retry storm prevention, graceful degradation patterns, autonomous agent cost control best practices, prevent runaway cost in real time, and production LLM agent reliability checklist.