AI agent graceful degradation patterns: what to do when the circuit breaker trips instead of crashing
A circuit breaker that terminates an agent run and returns an exception to the caller is a cost control mechanism, not a user experience feature. For background workers and batch pipelines, returning an exception is acceptable — the job queue can retry with a flag or escalate to a human reviewer. But for interactive agents where a user is waiting for a response, a bare BudgetExceededError or LoopDetectedError traceback is the worst possible outcome: the user gets nothing, the agent has already spent some of the budget, and the error message gives no indication of what progress was made before the trip. Graceful degradation is the set of patterns that sit between “hard crash on limit” and “no limits at all”. The goal is to return something useful to the user even when the agent cannot complete the full task within its resource constraints. Three patterns cover most cases: return partial results accumulated before the trip, fall back to a cheaper model with a smaller context, and summarize accumulated context then continue on a reduced budget. Each pattern has a different cost-quality tradeoff and suits a different type of agent task.
Pattern 1: return partial results on budget exceeded
The simplest degradation pattern. When the budget cap fires, return whatever the agent has already accumulated rather than discarding it. This works well for research and summarization agents where partial output is genuinely useful — “here are the first three sources I found” is better than nothing. It works poorly for transaction agents where partial output is unusable or dangerous (a code edit that is half-applied, a booking that is half-confirmed).
-
Python: partial result accumulation
from runguard import guard, LoopDetectedError, BudgetExceededError from dataclasses import dataclass, field from typing import Any import anthropic client = anthropic.Anthropic() @dataclass class AgentResult: completed: bool output: str partial_work: list[str] = field(default_factory=list) halt_reason: str = "" cost_usd: float = 0.0 def call_claude(messages: list) -> dict: response = client.messages.create( model="claude-sonnet-4-6", max_tokens=2048, messages=messages, ) usd = (response.usage.input_tokens * 3.0 + response.usage.output_tokens * 15.0) / 1_000_000 return {"response": response, "usd": usd} guarded_call = guard(call_claude, budget={"max_usd": 3.0}, loop={"repeats": 3, "max_cycle_len": 5}) def research_agent(topics: list[str]) -> AgentResult: """Research multiple topics, return partial results if budget trips mid-way.""" accumulated: list[str] = [] total_cost = 0.0 for topic in topics: try: result = guarded_call([ {"role": "user", "content": f"Summarize the key points about: {topic}. Be concise (3-4 sentences)."} ]) total_cost += result.get("usd", 0.0) response = result["response"] text = "\n".join(b.text for b in response.content if hasattr(b, "text")) accumulated.append(f"## {topic}\n{text}") except BudgetExceededError as e: # Return whatever we have so far partial = "\n\n".join(accumulated) remaining = [t for t in topics if t not in [a.split("\n")[0].replace("## ", "") for a in accumulated]] return AgentResult( completed=False, output=partial, partial_work=accumulated, halt_reason=f"Budget cap reached after {len(accumulated)}/{len(topics)} topics. Remaining: {remaining}. Error: {e}", cost_usd=total_cost, ) except LoopDetectedError as e: return AgentResult( completed=False, output="\n\n".join(accumulated), partial_work=accumulated, halt_reason=f"Loop detected on topic '{topic}': {e}", cost_usd=total_cost, ) return AgentResult(completed=True, output="\n\n".join(accumulated), cost_usd=total_cost) # Usage result = research_agent(["LangChain", "AutoGen", "CrewAI", "LangGraph", "DSPy"]) if not result.completed: print(f"Partial result ({len(result.partial_work)} topics completed):") print(result.output) print(f"\nHalt reason: {result.halt_reason}") else: print(result.output)
Pattern 2: model fallback on budget exceeded
When the primary model’s budget cap fires, retry the same task with a cheaper model at a fraction of the cost. Claude Haiku 4.5 costs roughly 10× less per token than Claude Sonnet 4.6; GPT-4o Mini costs roughly 15× less than GPT-4o. For tasks where quality matters but a good-enough answer is better than no answer, switching to a cheaper model on budget exception gives users a usable response while keeping total session cost within bounds.
-
Python: primary model with Haiku fallback
from runguard import guard, BudgetExceededError, LoopDetectedError import anthropic client = anthropic.Anthropic() def call_sonnet(messages: list) -> dict: r = client.messages.create(model="claude-sonnet-4-6", max_tokens=2048, messages=messages) usd = (r.usage.input_tokens * 3.0 + r.usage.output_tokens * 15.0) / 1_000_000 return {"response": r, "usd": usd, "model": "sonnet"} def call_haiku(messages: list) -> dict: r = client.messages.create(model="claude-haiku-4-5-20251001", max_tokens=2048, messages=messages) usd = (r.usage.input_tokens * 0.25 + r.usage.output_tokens * 1.25) / 1_000_000 return {"response": r, "usd": usd, "model": "haiku"} # Sonnet guard: $2 cap — trips fast if task is expensive sonnet_guard = guard(call_sonnet, budget={"max_usd": 2.0}, loop={"repeats": 3, "max_cycle_len": 5}) # Haiku guard: $0.50 cap — cheaper model, lower cap haiku_guard = guard(call_haiku, budget={"max_usd": 0.50}, loop={"repeats": 3, "max_cycle_len": 5}) def agent_with_model_fallback(messages: list) -> dict: """Try Sonnet; if budget exceeded, retry same messages with Haiku.""" try: result = sonnet_guard(messages) return result except BudgetExceededError: print("[FALLBACK] Sonnet budget exceeded — retrying with Haiku") try: result = haiku_guard(messages) result["degraded"] = True result["fallback_reason"] = "sonnet_budget_exceeded" return result except BudgetExceededError as e2: raise RuntimeError(f"Both Sonnet and Haiku budget caps reached: {e2}") from e2 # LoopDetectedError propagates — not retried on cheaper model (loop is structural, not cost-related)
The key design decision: LoopDetectedError propagates without fallback because a loop on Sonnet will loop on Haiku too. Only BudgetExceededError triggers the model switch, because a cheaper model may complete the same task within its reduced budget. See autonomous agent cost control best practices for a broader treatment of model routing decisions.
Pattern 3: summarize-and-continue on context window or budget limit
For long-running agents where context grows across many turns, the context window fills before the task completes. When context is near the limit or cost per call spikes (because input tokens are large), summarize the accumulated context into a compact state representation, start a fresh context with the summary, and continue on a reduced budget. This pattern keeps the agent running indefinitely on fixed-cost inputs.
-
Python: context compaction with sliding summary
from runguard import guard, BudgetExceededError, LoopDetectedError import anthropic client = anthropic.Anthropic() MAX_MESSAGES_BEFORE_COMPACT = 12 # compact every N turns def call_claude(messages: list) -> dict: r = client.messages.create(model="claude-sonnet-4-6", max_tokens=2048, messages=messages) usd = (r.usage.input_tokens * 3.0 + r.usage.output_tokens * 15.0) / 1_000_000 return {"response": r, "usd": usd} main_guard = guard(call_claude, budget={"max_usd": 10.0}, loop={"repeats": 3, "max_cycle_len": 6}) def compact_context(messages: list, system: str) -> list: """Summarize a long message history into a single compact system context.""" history_text = "\n".join( f"{m['role'].upper()}: {m['content'] if isinstance(m['content'], str) else '[tool calls]'}" for m in messages[-MAX_MESSAGES_BEFORE_COMPACT:] ) summary_response = client.messages.create( model="claude-haiku-4-5-20251001", # cheap model for summarization max_tokens=512, messages=[{ "role": "user", "content": f"Summarize this agent conversation into a compact state (key decisions made, data found, remaining goals). Be specific and brief:\n\n{history_text}" }] ) summary = summary_response.content[0].text return [ {"role": "user", "content": f"[CONTEXT SUMMARY — continuing from previous session]\n{summary}\n\nContinue from where we left off."}, ] def long_running_agent(task: str) -> str: system = "You are a research assistant completing a multi-step research task." messages = [{"role": "user", "content": task}] total_cost = 0.0 compactions = 0 for turn in range(50): # allow many turns # Compact if conversation is long if len(messages) > MAX_MESSAGES_BEFORE_COMPACT: messages = compact_context(messages, system) compactions += 1 print(f"[COMPACT] Summarized context at turn {turn} (compactions: {compactions})") try: result = main_guard(messages) except BudgetExceededError as e: remaining_budget = 10.0 - total_cost if remaining_budget < 0.10: # Genuinely out of budget return f"[BUDGET HALT] Completed {turn} turns. Cost: ${total_cost:.4f}. {e}" # Compact and retry with fresh context on reduced window messages = compact_context(messages, system) compactions += 1 print(f"[BUDGET COMPACT] Mid-budget compaction at turn {turn}") continue except LoopDetectedError as e: return f"[LOOP HALT] Loop at turn {turn}. Cost: ${total_cost:.4f}. {e}" total_cost += result.get("usd", 0.0) response = result["response"] tool_calls = [b for b in response.content if b.type == "tool_use"] if not tool_calls: return "\n".join(b.text for b in response.content if hasattr(b, "text")) messages.append({"role": "assistant", "content": response.content}) messages.append({"role": "user", "content": [ {"type": "tool_result", "tool_use_id": tc.id, "content": "done"} for tc in tool_calls ]}) return f"[MAX TURNS] Completed 50 turns across {compactions} compactions. Cost: ${total_cost:.4f}"
The compaction uses Claude Haiku for the summarization step (cost: ~$0.002 for a 512-token summary) rather than Sonnet, so the compaction itself does not materially contribute to session cost. For context window management strategies without compaction, see LLM context window exceeded agent recovery.
Graceful degradation pattern comparison
| Pattern | Best for | User experience on limit | Extra cost | Complexity |
|---|---|---|---|---|
| Hard crash (no degradation) | Background batch jobs where retry is cheap | Exception — nothing returned | None | None |
| Return partial results | Research, summarization, list-completion tasks | Partial answer with explanation of what’s missing | None | Low — catch exception, return accumulator |
| Model fallback | Interactive Q&A where good-enough beats nothing | Full answer at lower quality | ~10% of primary model cost for one retry | Low — two guard instances, catch and retry |
| Summarize-and-continue | Long-running multi-step agents, autonomous pipelines | Seamless continuation (user may not see the compaction) | ~$0.002 per compaction for Haiku summarization | Medium — compact logic, restart guard after compaction |
All three patterns work with RunGuard’s typed exceptions. BudgetExceededError and LoopDetectedError are distinct types, so you can handle them differently: a loop detection always warrants investigation before retry, while budget exceeded is often safe to handle with fallback or partial result. For the full cost control stack these patterns sit within, see autonomous agent cost control best practices.
Add graceful degradation to your AI agent
RunGuard installs in one command: pip install runguard for Python, npm install @runguard/sdk for TypeScript. Wrap your LLM call function with guard() and catch BudgetExceededError and LoopDetectedError separately in your agent loop. Add a partial result accumulator for research agents, a model fallback for interactive agents, or context compaction for long-running pipelines. All three patterns are pure application code — no RunGuard configuration changes required beyond the base guard setup.
RunGuard pricing: Solo plan at $19/month for individual developers. Team plan at $79/month adds Slack and PagerDuty webhook alerts, shared dashboards, and audit log. Both plans include a 14-day free trial — no credit card required.
Start your 14-day free trial — or explore related patterns: autonomous agent cost control best practices, LLM context window exceeded recovery, prevent AI agent runaway cost in real time, set max cost per LLM request, and AI agent retry storm prevention.