AI agent graceful degradation patterns: what to do when the circuit breaker trips instead of crashing

A circuit breaker that terminates an agent run and returns an exception to the caller is a cost control mechanism, not a user experience feature. For background workers and batch pipelines, returning an exception is acceptable — the job queue can retry with a flag or escalate to a human reviewer. But for interactive agents where a user is waiting for a response, a bare BudgetExceededError or LoopDetectedError traceback is the worst possible outcome: the user gets nothing, the agent has already spent some of the budget, and the error message gives no indication of what progress was made before the trip. Graceful degradation is the set of patterns that sit between “hard crash on limit” and “no limits at all”. The goal is to return something useful to the user even when the agent cannot complete the full task within its resource constraints. Three patterns cover most cases: return partial results accumulated before the trip, fall back to a cheaper model with a smaller context, and summarize accumulated context then continue on a reduced budget. Each pattern has a different cost-quality tradeoff and suits a different type of agent task.

Pattern 1: return partial results on budget exceeded

The simplest degradation pattern. When the budget cap fires, return whatever the agent has already accumulated rather than discarding it. This works well for research and summarization agents where partial output is genuinely useful — “here are the first three sources I found” is better than nothing. It works poorly for transaction agents where partial output is unusable or dangerous (a code edit that is half-applied, a booking that is half-confirmed).

Python: partial result accumulation

from runguard import guard, LoopDetectedError, BudgetExceededError
from dataclasses import dataclass, field
from typing import Any
import anthropic

client = anthropic.Anthropic()

@dataclass
class AgentResult:
    completed: bool
    output: str
    partial_work: list[str] = field(default_factory=list)
    halt_reason: str = ""
    cost_usd: float = 0.0

def call_claude(messages: list) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6", max_tokens=2048, messages=messages,
    )
    usd = (response.usage.input_tokens * 3.0 + response.usage.output_tokens * 15.0) / 1_000_000
    return {"response": response, "usd": usd}

guarded_call = guard(call_claude, budget={"max_usd": 3.0}, loop={"repeats": 3, "max_cycle_len": 5})

def research_agent(topics: list[str]) -> AgentResult:
    """Research multiple topics, return partial results if budget trips mid-way."""
    accumulated: list[str] = []
    total_cost = 0.0

    for topic in topics:
        try:
            result = guarded_call([
                {"role": "user", "content": f"Summarize the key points about: {topic}. Be concise (3-4 sentences)."}
            ])
            total_cost += result.get("usd", 0.0)
            response = result["response"]
            text = "\n".join(b.text for b in response.content if hasattr(b, "text"))
            accumulated.append(f"## {topic}\n{text}")
        except BudgetExceededError as e:
            # Return whatever we have so far
            partial = "\n\n".join(accumulated)
            remaining = [t for t in topics if t not in [a.split("\n")[0].replace("## ", "") for a in accumulated]]
            return AgentResult(
                completed=False,
                output=partial,
                partial_work=accumulated,
                halt_reason=f"Budget cap reached after {len(accumulated)}/{len(topics)} topics. Remaining: {remaining}. Error: {e}",
                cost_usd=total_cost,
            )
        except LoopDetectedError as e:
            return AgentResult(
                completed=False,
                output="\n\n".join(accumulated),
                partial_work=accumulated,
                halt_reason=f"Loop detected on topic '{topic}': {e}",
                cost_usd=total_cost,
            )

    return AgentResult(completed=True, output="\n\n".join(accumulated), cost_usd=total_cost)

# Usage
result = research_agent(["LangChain", "AutoGen", "CrewAI", "LangGraph", "DSPy"])
if not result.completed:
    print(f"Partial result ({len(result.partial_work)} topics completed):")
    print(result.output)
    print(f"\nHalt reason: {result.halt_reason}")
else:
    print(result.output)

Pattern 2: model fallback on budget exceeded

When the primary model’s budget cap fires, retry the same task with a cheaper model at a fraction of the cost. Claude Haiku 4.5 costs roughly 10× less per token than Claude Sonnet 4.6; GPT-4o Mini costs roughly 15× less than GPT-4o. For tasks where quality matters but a good-enough answer is better than no answer, switching to a cheaper model on budget exception gives users a usable response while keeping total session cost within bounds.

Python: primary model with Haiku fallback

from runguard import guard, BudgetExceededError, LoopDetectedError
import anthropic

client = anthropic.Anthropic()

def call_sonnet(messages: list) -> dict:
    r = client.messages.create(model="claude-sonnet-4-6", max_tokens=2048, messages=messages)
    usd = (r.usage.input_tokens * 3.0 + r.usage.output_tokens * 15.0) / 1_000_000
    return {"response": r, "usd": usd, "model": "sonnet"}

def call_haiku(messages: list) -> dict:
    r = client.messages.create(model="claude-haiku-4-5-20251001", max_tokens=2048, messages=messages)
    usd = (r.usage.input_tokens * 0.25 + r.usage.output_tokens * 1.25) / 1_000_000
    return {"response": r, "usd": usd, "model": "haiku"}

# Sonnet guard: $2 cap — trips fast if task is expensive
sonnet_guard = guard(call_sonnet, budget={"max_usd": 2.0}, loop={"repeats": 3, "max_cycle_len": 5})
# Haiku guard: $0.50 cap — cheaper model, lower cap
haiku_guard  = guard(call_haiku,  budget={"max_usd": 0.50}, loop={"repeats": 3, "max_cycle_len": 5})

def agent_with_model_fallback(messages: list) -> dict:
    """Try Sonnet; if budget exceeded, retry same messages with Haiku."""
    try:
        result = sonnet_guard(messages)
        return result
    except BudgetExceededError:
        print("[FALLBACK] Sonnet budget exceeded — retrying with Haiku")
        try:
            result = haiku_guard(messages)
            result["degraded"] = True
            result["fallback_reason"] = "sonnet_budget_exceeded"
            return result
        except BudgetExceededError as e2:
            raise RuntimeError(f"Both Sonnet and Haiku budget caps reached: {e2}") from e2
    # LoopDetectedError propagates — not retried on cheaper model (loop is structural, not cost-related)

The key design decision: LoopDetectedError propagates without fallback because a loop on Sonnet will loop on Haiku too. Only BudgetExceededError triggers the model switch, because a cheaper model may complete the same task within its reduced budget. See autonomous agent cost control best practices for a broader treatment of model routing decisions.

Pattern 3: summarize-and-continue on context window or budget limit

For long-running agents where context grows across many turns, the context window fills before the task completes. When context is near the limit or cost per call spikes (because input tokens are large), summarize the accumulated context into a compact state representation, start a fresh context with the summary, and continue on a reduced budget. This pattern keeps the agent running indefinitely on fixed-cost inputs.

Python: context compaction with sliding summary

from runguard import guard, BudgetExceededError, LoopDetectedError
import anthropic

client = anthropic.Anthropic()
MAX_MESSAGES_BEFORE_COMPACT = 12  # compact every N turns

def call_claude(messages: list) -> dict:
    r = client.messages.create(model="claude-sonnet-4-6", max_tokens=2048, messages=messages)
    usd = (r.usage.input_tokens * 3.0 + r.usage.output_tokens * 15.0) / 1_000_000
    return {"response": r, "usd": usd}

main_guard = guard(call_claude, budget={"max_usd": 10.0}, loop={"repeats": 3, "max_cycle_len": 6})

def compact_context(messages: list, system: str) -> list:
    """Summarize a long message history into a single compact system context."""
    history_text = "\n".join(
        f"{m['role'].upper()}: {m['content'] if isinstance(m['content'], str) else '[tool calls]'}"
        for m in messages[-MAX_MESSAGES_BEFORE_COMPACT:]
    )
    summary_response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # cheap model for summarization
        max_tokens=512,
        messages=[{
            "role": "user",
            "content": f"Summarize this agent conversation into a compact state (key decisions made, data found, remaining goals). Be specific and brief:\n\n{history_text}"
        }]
    )
    summary = summary_response.content[0].text
    return [
        {"role": "user", "content": f"[CONTEXT SUMMARY — continuing from previous session]\n{summary}\n\nContinue from where we left off."},
    ]

def long_running_agent(task: str) -> str:
    system = "You are a research assistant completing a multi-step research task."
    messages = [{"role": "user", "content": task}]
    total_cost = 0.0
    compactions = 0

    for turn in range(50):  # allow many turns
        # Compact if conversation is long
        if len(messages) > MAX_MESSAGES_BEFORE_COMPACT:
            messages = compact_context(messages, system)
            compactions += 1
            print(f"[COMPACT] Summarized context at turn {turn} (compactions: {compactions})")

        try:
            result = main_guard(messages)
        except BudgetExceededError as e:
            remaining_budget = 10.0 - total_cost
            if remaining_budget < 0.10:
                # Genuinely out of budget
                return f"[BUDGET HALT] Completed {turn} turns. Cost: ${total_cost:.4f}. {e}"
            # Compact and retry with fresh context on reduced window
            messages = compact_context(messages, system)
            compactions += 1
            print(f"[BUDGET COMPACT] Mid-budget compaction at turn {turn}")
            continue
        except LoopDetectedError as e:
            return f"[LOOP HALT] Loop at turn {turn}. Cost: ${total_cost:.4f}. {e}"

        total_cost += result.get("usd", 0.0)
        response = result["response"]
        tool_calls = [b for b in response.content if b.type == "tool_use"]

        if not tool_calls:
            return "\n".join(b.text for b in response.content if hasattr(b, "text"))

        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": [
            {"type": "tool_result", "tool_use_id": tc.id, "content": "done"} for tc in tool_calls
        ]})

    return f"[MAX TURNS] Completed 50 turns across {compactions} compactions. Cost: ${total_cost:.4f}"

The compaction uses Claude Haiku for the summarization step (cost: ~$0.002 for a 512-token summary) rather than Sonnet, so the compaction itself does not materially contribute to session cost. For context window management strategies without compaction, see LLM context window exceeded agent recovery.

Graceful degradation pattern comparison

Pattern	Best for	User experience on limit	Extra cost	Complexity
Hard crash (no degradation)	Background batch jobs where retry is cheap	Exception — nothing returned	None	None
Return partial results	Research, summarization, list-completion tasks	Partial answer with explanation of what’s missing	None	Low — catch exception, return accumulator
Model fallback	Interactive Q&A where good-enough beats nothing	Full answer at lower quality	~10% of primary model cost for one retry	Low — two guard instances, catch and retry
Summarize-and-continue	Long-running multi-step agents, autonomous pipelines	Seamless continuation (user may not see the compaction)	~$0.002 per compaction for Haiku summarization	Medium — compact logic, restart guard after compaction

All three patterns work with RunGuard’s typed exceptions. BudgetExceededError and LoopDetectedError are distinct types, so you can handle them differently: a loop detection always warrants investigation before retry, while budget exceeded is often safe to handle with fallback or partial result. For the full cost control stack these patterns sit within, see autonomous agent cost control best practices.

Add graceful degradation to your AI agent

RunGuard installs in one command: pip install runguard for Python, npm install @runguard/sdk for TypeScript. Wrap your LLM call function with guard() and catch BudgetExceededError and LoopDetectedError separately in your agent loop. Add a partial result accumulator for research agents, a model fallback for interactive agents, or context compaction for long-running pipelines. All three patterns are pure application code — no RunGuard configuration changes required beyond the base guard setup.

RunGuard pricing: Solo plan at $19/month for individual developers. Team plan at $79/month adds Slack and PagerDuty webhook alerts, shared dashboards, and audit log. Both plans include a 14-day free trial — no credit card required.

Start your 14-day free trial — or explore related patterns: autonomous agent cost control best practices, LLM context window exceeded recovery, prevent AI agent runaway cost in real time, set max cost per LLM request, and AI agent retry storm prevention.