AI agent graceful degradation patterns: what to do when the circuit breaker trips instead of crashing

A circuit breaker that terminates an agent run and returns an exception to the caller is a cost control mechanism, not a user experience feature. For background workers and batch pipelines, returning an exception is acceptable — the job queue can retry with a flag or escalate to a human reviewer. But for interactive agents where a user is waiting for a response, a bare BudgetExceededError or LoopDetectedError traceback is the worst possible outcome: the user gets nothing, the agent has already spent some of the budget, and the error message gives no indication of what progress was made before the trip. Graceful degradation is the set of patterns that sit between “hard crash on limit” and “no limits at all”. The goal is to return something useful to the user even when the agent cannot complete the full task within its resource constraints. Three patterns cover most cases: return partial results accumulated before the trip, fall back to a cheaper model with a smaller context, and summarize accumulated context then continue on a reduced budget. Each pattern has a different cost-quality tradeoff and suits a different type of agent task.

Pattern 1: return partial results on budget exceeded

The simplest degradation pattern. When the budget cap fires, return whatever the agent has already accumulated rather than discarding it. This works well for research and summarization agents where partial output is genuinely useful — “here are the first three sources I found” is better than nothing. It works poorly for transaction agents where partial output is unusable or dangerous (a code edit that is half-applied, a booking that is half-confirmed).

Pattern 2: model fallback on budget exceeded

When the primary model’s budget cap fires, retry the same task with a cheaper model at a fraction of the cost. Claude Haiku 4.5 costs roughly 10× less per token than Claude Sonnet 4.6; GPT-4o Mini costs roughly 15× less than GPT-4o. For tasks where quality matters but a good-enough answer is better than no answer, switching to a cheaper model on budget exception gives users a usable response while keeping total session cost within bounds.

The key design decision: LoopDetectedError propagates without fallback because a loop on Sonnet will loop on Haiku too. Only BudgetExceededError triggers the model switch, because a cheaper model may complete the same task within its reduced budget. See autonomous agent cost control best practices for a broader treatment of model routing decisions.

Pattern 3: summarize-and-continue on context window or budget limit

For long-running agents where context grows across many turns, the context window fills before the task completes. When context is near the limit or cost per call spikes (because input tokens are large), summarize the accumulated context into a compact state representation, start a fresh context with the summary, and continue on a reduced budget. This pattern keeps the agent running indefinitely on fixed-cost inputs.

The compaction uses Claude Haiku for the summarization step (cost: ~$0.002 for a 512-token summary) rather than Sonnet, so the compaction itself does not materially contribute to session cost. For context window management strategies without compaction, see LLM context window exceeded agent recovery.

Graceful degradation pattern comparison

Pattern Best for User experience on limit Extra cost Complexity
Hard crash (no degradation) Background batch jobs where retry is cheap Exception — nothing returned None None
Return partial results Research, summarization, list-completion tasks Partial answer with explanation of what’s missing None Low — catch exception, return accumulator
Model fallback Interactive Q&A where good-enough beats nothing Full answer at lower quality ~10% of primary model cost for one retry Low — two guard instances, catch and retry
Summarize-and-continue Long-running multi-step agents, autonomous pipelines Seamless continuation (user may not see the compaction) ~$0.002 per compaction for Haiku summarization Medium — compact logic, restart guard after compaction

All three patterns work with RunGuard’s typed exceptions. BudgetExceededError and LoopDetectedError are distinct types, so you can handle them differently: a loop detection always warrants investigation before retry, while budget exceeded is often safe to handle with fallback or partial result. For the full cost control stack these patterns sit within, see autonomous agent cost control best practices.

Add graceful degradation to your AI agent

RunGuard installs in one command: pip install runguard for Python, npm install @runguard/sdk for TypeScript. Wrap your LLM call function with guard() and catch BudgetExceededError and LoopDetectedError separately in your agent loop. Add a partial result accumulator for research agents, a model fallback for interactive agents, or context compaction for long-running pipelines. All three patterns are pure application code — no RunGuard configuration changes required beyond the base guard setup.

RunGuard pricing: Solo plan at $19/month for individual developers. Team plan at $79/month adds Slack and PagerDuty webhook alerts, shared dashboards, and audit log. Both plans include a 14-day free trial — no credit card required.

Start your 14-day free trial — or explore related patterns: autonomous agent cost control best practices, LLM context window exceeded recovery, prevent AI agent runaway cost in real time, set max cost per LLM request, and AI agent retry storm prevention.