AI agent idempotency design: safe retries and duplicate tool call prevention
When an AI agent’s tool call fails mid-execution, the agent retries. When the retry fails again, the agent tries a variant. When the variant produces an unexpected result, the agent tries the original approach again. Each of these retries is reasonable from the model’s perspective — it is working through its problem-solving repertoire. The problem is that tool calls are not always safe to repeat. Sending an email twice, inserting a database row twice, charging a payment twice, or posting a message to Slack twice are all duplicate side effects caused by retrying a non-idempotent operation. Unlike a web API where you control the retry logic and can add idempotency keys at the HTTP layer, an LLM agent generates tool calls autonomously: the model decides when to retry, not your application code. This creates a category of bug that does not exist in traditional software systems: the agent is correct (it detected an error and retried), but the system is in an inconsistent state (the side effect happened twice). This guide covers three approaches to agent idempotency: (1) designing tool implementations to be inherently idempotent, (2) adding idempotency keys to state-modifying tools, and (3) using RunGuard’s loop detection to prevent the agent from retrying the same operation more than N times regardless of the tool’s idempotency guarantees.
Why agent-level idempotency is different from API idempotency
- The source of the retry is the model, not your code. In a standard REST API, you control when retries happen and can attach idempotency keys in your HTTP client. In an agent, the model decides to retry a tool call based on the error message it receives from the previous call. Your application code sees the tool call request arrive from the model, executes it, returns a result, and the model decides whether to call it again. The idempotency logic must live in the tool implementation itself — or in a gateway layer between the model and the tools — because you cannot intercept the model’s decision to retry at the agent orchestration level.
-
Three failure modes that cause duplicate tool calls.
- Transient execution error: The tool call partially executed (e.g., email sent but delivery confirmation not received) and threw an exception. The model calls the tool again with the same arguments, causing a second send.
- Confusing error message: The tool succeeded but returned an error-like message (e.g., “0 rows updated” when the record already had the target value). The model interprets this as failure and retries.
- Tool-call loop: The tool returned a result the model did not know how to handle, causing it to cycle through a set of tool calls repeatedly, re-executing each one every cycle.
Approach 1: design tools to be inherently idempotent
- Use upsert semantics instead of insert. If a tool call creates a resource (row, record, file), make it an upsert: check if the resource already exists based on a natural key, and either update it or skip it. The tool returns success in both cases. The model sees a consistent “success” response and does not retry.
-
Python: upsert-style tool implementation
import sqlite3 from dataclasses import dataclass @dataclass class ToolResult: success: bool message: str created: bool # True if new, False if already existed def upsert_user_record( db_path: str, user_id: str, email: str, plan: str = "trial", ) -> ToolResult: """ Idempotent user creation: safe to call multiple times with the same user_id. Returns success=True whether the row was created or already existed. The model never sees a 'duplicate' error, so it does not retry. """ with sqlite3.connect(db_path) as conn: existing = conn.execute( "SELECT user_id FROM users WHERE user_id = ?", (user_id,) ).fetchone() if existing: # Already exists: update email/plan if changed, return created=False conn.execute( "UPDATE users SET email=?, plan=? WHERE user_id=?", (email, plan, user_id), ) return ToolResult( success=True, message=f"User {user_id} already existed; record confirmed.", created=False, ) else: conn.execute( "INSERT INTO users (user_id, email, plan) VALUES (?,?,?)", (user_id, email, plan), ) return ToolResult( success=True, message=f"User {user_id} created with plan={plan}.", created=True, ) - Make notification tools check before sending. For tools that send notifications (email, Slack, webhook), maintain a sent-log keyed by a deterministic ID derived from the content and recipient. If the log shows the notification was already sent, return success without re-sending. The log is the idempotency store; the content hash is the idempotency key.
-
Python: notification deduplication with sent-log
import hashlib import json import sqlite3 from datetime import datetime def send_alert_idempotent( db_path: str, recipient: str, subject: str, body: str, channel: str = "email", ) -> dict: """ Send an alert exactly once. Safe to call multiple times with the same args. Returns {sent: True, idempotency_key: '...', duplicate: bool}. """ # Deterministic idempotency key from content + recipient + channel content = json.dumps({"to": recipient, "subject": subject, "channel": channel}, sort_keys=True) idem_key = hashlib.sha256(content.encode()).hexdigest()[:16] with sqlite3.connect(db_path) as conn: conn.execute(""" CREATE TABLE IF NOT EXISTS sent_alerts ( idem_key TEXT PRIMARY KEY, sent_at TEXT NOT NULL, recipient TEXT, subject TEXT ) """) existing = conn.execute( "SELECT sent_at FROM sent_alerts WHERE idem_key=?", (idem_key,) ).fetchone() if existing: return { "sent": True, "idempotency_key": idem_key, "duplicate": True, "originally_sent_at": existing[0], } # First send _do_send(recipient, subject, body, channel) # replace with real send conn.execute( "INSERT INTO sent_alerts VALUES (?,?,?,?)", (idem_key, datetime.utcnow().isoformat(), recipient, subject), ) return { "sent": True, "idempotency_key": idem_key, "duplicate": False, } def _do_send(recipient, subject, body, channel): """Placeholder — replace with real email/Slack send.""" print(f"[{channel.upper()}] → {recipient}: {subject}")
Approach 2: idempotency keys at the tool gateway layer
- Generating idempotency keys for agent-issued tool calls. Rather than adding idempotency logic to every tool individually, you can add a gateway layer between the model and the tools that generates an idempotency key for each incoming tool call, checks whether the call was already executed, and returns the cached result if so. The key should be derived from the tool name, the arguments, and the agent session ID — not from a random UUID (which would not match on retry).
-
TypeScript: tool gateway with idempotency cache
import crypto from "crypto"; interface ToolCall { name: string; args: Record<string, unknown>; } interface CachedResult { result: unknown; executedAt: string; } class IdempotentToolGateway { private cache = new Map<string, CachedResult>(); private readonly sessionId: string; constructor(sessionId: string) { this.sessionId = sessionId; } private idempotencyKey(call: ToolCall): string { // Drop volatile args that would prevent cache hits on retry const { request_id, timestamp, trace_id, ...stable } = call.args; const content = JSON.stringify({ session: this.sessionId, tool: call.name, args: Object.fromEntries(Object.entries(stable).sort()), }); return crypto.createHash("sha256").update(content).digest("hex").slice(0, 16); } async execute(call: ToolCall): Promise<{ result: unknown; cached: boolean }> { const key = this.idempotencyKey(call); const cached = this.cache.get(key); if (cached) { console.log(`[IDEMPOTENT] Cache hit for ${call.name} (key=${key})`); return { result: cached.result, cached: true }; } // First execution const result = await this.dispatchTool(call); this.cache.set(key, { result, executedAt: new Date().toISOString() }); return { result, cached: false }; } private async dispatchTool(call: ToolCall): Promise<unknown> { // Replace with real tool dispatch return `${call.name} executed with args: ${JSON.stringify(call.args)}`; } } // Usage const gateway = new IdempotentToolGateway("session-abc123"); async function handleAgentToolCall(call: ToolCall) { const { result, cached } = await gateway.execute(call); if (cached) { return `[ALREADY DONE] ${result}`; // model sees success, stops retrying } return result; } - Cache TTL and session scoping. The idempotency cache should be scoped to the agent session: within one session, any identical tool call returns the same cached result. Across sessions (new conversation, new run), the cache resets. This prevents the agent from getting stale results if the underlying resource has changed between sessions. For long-running agents (hours or days), add a TTL to the cache entries — 10 minutes is appropriate for most state-modifying operations.
Approach 3: RunGuard loop detection as the last line of defense
-
Why idempotency does not replace loop detection. Idempotency ensures that duplicate tool calls do not cause duplicate side effects. It does not prevent the model from spending tokens and money on those duplicate calls. An agent that calls
send_email50 times in a loop because the email tool returns a confusing error message will exhaust its budget even if every call after the first is a cache hit. RunGuard’s loop detection trips the circuit breaker on the tool-call pattern itself, preventing the model from continuing to call the tool regardless of whether the tool is idempotent. -
Python: combining idempotency with loop detection
from runguard import LoopDetector, LoopDetectedError, BudgetExceededError detector = LoopDetector(repeats=3, max_cycle_len=2) def execute_tool_safe( gateway: "IdempotentToolGateway", call: ToolCall, ) -> str: """ Execute a tool call with both idempotency (no duplicate effects) and loop detection (no runaway spend on duplicate calls). """ # 1. Loop detection: trips if this pattern repeats 3+ times sig = f"{call['name']}:{sorted(call['args'].items())}" match = detector.record(sig) if match: raise LoopDetectedError( f"Loop detected for tool '{call['name']}': " f"pattern repeated {match.repeats}× — stopping agent.", match=match, ) # 2. Idempotency: deduplicate the actual side effect result = gateway.execute_sync(call) return result["result"] if not result["cached"] else f"[CACHED] {result['result']}"
Agent idempotency approach comparison
| Approach | Prevents duplicate side effects | Prevents duplicate token spend | Complexity |
|---|---|---|---|
| Tool-level upsert / check-before-send | Yes — inherently safe to call twice | No — model still makes the call | Low — per-tool change |
| Gateway idempotency cache | Yes — cache hit returns stored result | Partially — cache hit is free, but model still decides to retry | Medium — gateway layer required |
| RunGuard loop detection | Partially — trips breaker, not a dedup cache | Yes — breaker stops the model after N repeats | Low — one detector instance, one record() call |
| All three combined | Yes | Yes | Medium — layered defense-in-depth |
For the retry storm pattern that idempotency prevents, see AI agent retry storm prevention. For the cost implications of non-idempotent loops, see prevent AI agent runaway cost in real time. For production reliability considerations beyond idempotency, see production LLM agent reliability checklist.
Make your agent tool calls safe to retry
RunGuard installs in one command: pip install runguard for Python, npm install @runguard/sdk for TypeScript. The loop detection approach (Approach 3) requires the fewest code changes and directly prevents the highest-cost scenario: an agent that spends $50 in tokens retrying a tool call that was idempotent all along but returned a confusing result. Add LoopDetector(repeats=3, max_cycle_len=2) around your tool dispatch first, then work backward to add idempotency to the most critical tools.
RunGuard pricing: Solo plan at $19/month for individual developers. Team plan at $79/month adds Slack and PagerDuty webhook alerts, shared dashboards, and audit log. Both plans include a 14-day free trial — no credit card required.
Start your 14-day free trial — or explore related: AI agent retry storm prevention, LLM agent fault tolerance patterns, autonomous agent cost control best practices, prevent runaway cost in real time, and production reliability checklist.