AI agent idempotency design: safe retries and duplicate tool call prevention

When an AI agent’s tool call fails mid-execution, the agent retries. When the retry fails again, the agent tries a variant. When the variant produces an unexpected result, the agent tries the original approach again. Each of these retries is reasonable from the model’s perspective — it is working through its problem-solving repertoire. The problem is that tool calls are not always safe to repeat. Sending an email twice, inserting a database row twice, charging a payment twice, or posting a message to Slack twice are all duplicate side effects caused by retrying a non-idempotent operation. Unlike a web API where you control the retry logic and can add idempotency keys at the HTTP layer, an LLM agent generates tool calls autonomously: the model decides when to retry, not your application code. This creates a category of bug that does not exist in traditional software systems: the agent is correct (it detected an error and retried), but the system is in an inconsistent state (the side effect happened twice). This guide covers three approaches to agent idempotency: (1) designing tool implementations to be inherently idempotent, (2) adding idempotency keys to state-modifying tools, and (3) using RunGuard’s loop detection to prevent the agent from retrying the same operation more than N times regardless of the tool’s idempotency guarantees.

Why agent-level idempotency is different from API idempotency

The source of the retry is the model, not your code. In a standard REST API, you control when retries happen and can attach idempotency keys in your HTTP client. In an agent, the model decides to retry a tool call based on the error message it receives from the previous call. Your application code sees the tool call request arrive from the model, executes it, returns a result, and the model decides whether to call it again. The idempotency logic must live in the tool implementation itself — or in a gateway layer between the model and the tools — because you cannot intercept the model’s decision to retry at the agent orchestration level.
Three failure modes that cause duplicate tool calls.
- Transient execution error: The tool call partially executed (e.g., email sent but delivery confirmation not received) and threw an exception. The model calls the tool again with the same arguments, causing a second send.
- Confusing error message: The tool succeeded but returned an error-like message (e.g., “0 rows updated” when the record already had the target value). The model interprets this as failure and retries.
- Tool-call loop: The tool returned a result the model did not know how to handle, causing it to cycle through a set of tool calls repeatedly, re-executing each one every cycle.

Approach 1: design tools to be inherently idempotent

Use upsert semantics instead of insert. If a tool call creates a resource (row, record, file), make it an upsert: check if the resource already exists based on a natural key, and either update it or skip it. The tool returns success in both cases. The model sees a consistent “success” response and does not retry.

Python: upsert-style tool implementation

import sqlite3
from dataclasses import dataclass

@dataclass
class ToolResult:
    success: bool
    message: str
    created: bool  # True if new, False if already existed

def upsert_user_record(
    db_path: str,
    user_id: str,
    email: str,
    plan: str = "trial",
) -> ToolResult:
    """
    Idempotent user creation: safe to call multiple times with the same user_id.
    Returns success=True whether the row was created or already existed.
    The model never sees a 'duplicate' error, so it does not retry.
    """
    with sqlite3.connect(db_path) as conn:
        existing = conn.execute(
            "SELECT user_id FROM users WHERE user_id = ?", (user_id,)
        ).fetchone()

        if existing:
            # Already exists: update email/plan if changed, return created=False
            conn.execute(
                "UPDATE users SET email=?, plan=? WHERE user_id=?",
                (email, plan, user_id),
            )
            return ToolResult(
                success=True,
                message=f"User {user_id} already existed; record confirmed.",
                created=False,
            )
        else:
            conn.execute(
                "INSERT INTO users (user_id, email, plan) VALUES (?,?,?)",
                (user_id, email, plan),
            )
            return ToolResult(
                success=True,
                message=f"User {user_id} created with plan={plan}.",
                created=True,
            )

Make notification tools check before sending. For tools that send notifications (email, Slack, webhook), maintain a sent-log keyed by a deterministic ID derived from the content and recipient. If the log shows the notification was already sent, return success without re-sending. The log is the idempotency store; the content hash is the idempotency key.

Python: notification deduplication with sent-log

import hashlib
import json
import sqlite3
from datetime import datetime

def send_alert_idempotent(
    db_path: str,
    recipient: str,
    subject: str,
    body: str,
    channel: str = "email",
) -> dict:
    """
    Send an alert exactly once. Safe to call multiple times with the same args.
    Returns {sent: True, idempotency_key: '...', duplicate: bool}.
    """
    # Deterministic idempotency key from content + recipient + channel
    content = json.dumps({"to": recipient, "subject": subject, "channel": channel},
                          sort_keys=True)
    idem_key = hashlib.sha256(content.encode()).hexdigest()[:16]

    with sqlite3.connect(db_path) as conn:
        conn.execute("""
            CREATE TABLE IF NOT EXISTS sent_alerts (
                idem_key TEXT PRIMARY KEY,
                sent_at TEXT NOT NULL,
                recipient TEXT,
                subject TEXT
            )
        """)
        existing = conn.execute(
            "SELECT sent_at FROM sent_alerts WHERE idem_key=?", (idem_key,)
        ).fetchone()

        if existing:
            return {
                "sent": True,
                "idempotency_key": idem_key,
                "duplicate": True,
                "originally_sent_at": existing[0],
            }

        # First send
        _do_send(recipient, subject, body, channel)  # replace with real send
        conn.execute(
            "INSERT INTO sent_alerts VALUES (?,?,?,?)",
            (idem_key, datetime.utcnow().isoformat(), recipient, subject),
        )
        return {
            "sent": True,
            "idempotency_key": idem_key,
            "duplicate": False,
        }

def _do_send(recipient, subject, body, channel):
    """Placeholder — replace with real email/Slack send."""
    print(f"[{channel.upper()}] → {recipient}: {subject}")

Approach 2: idempotency keys at the tool gateway layer

Generating idempotency keys for agent-issued tool calls. Rather than adding idempotency logic to every tool individually, you can add a gateway layer between the model and the tools that generates an idempotency key for each incoming tool call, checks whether the call was already executed, and returns the cached result if so. The key should be derived from the tool name, the arguments, and the agent session ID — not from a random UUID (which would not match on retry).

TypeScript: tool gateway with idempotency cache

import crypto from "crypto";

interface ToolCall {
  name: string;
  args: Record<string, unknown>;
}

interface CachedResult {
  result: unknown;
  executedAt: string;
}

class IdempotentToolGateway {
  private cache = new Map<string, CachedResult>();
  private readonly sessionId: string;

  constructor(sessionId: string) {
    this.sessionId = sessionId;
  }

  private idempotencyKey(call: ToolCall): string {
    // Drop volatile args that would prevent cache hits on retry
    const { request_id, timestamp, trace_id, ...stable } = call.args;
    const content = JSON.stringify({
      session: this.sessionId,
      tool: call.name,
      args: Object.fromEntries(Object.entries(stable).sort()),
    });
    return crypto.createHash("sha256").update(content).digest("hex").slice(0, 16);
  }

  async execute(call: ToolCall): Promise<{ result: unknown; cached: boolean }> {
    const key = this.idempotencyKey(call);
    const cached = this.cache.get(key);

    if (cached) {
      console.log(`[IDEMPOTENT] Cache hit for ${call.name} (key=${key})`);
      return { result: cached.result, cached: true };
    }

    // First execution
    const result = await this.dispatchTool(call);
    this.cache.set(key, { result, executedAt: new Date().toISOString() });
    return { result, cached: false };
  }

  private async dispatchTool(call: ToolCall): Promise<unknown> {
    // Replace with real tool dispatch
    return `${call.name} executed with args: ${JSON.stringify(call.args)}`;
  }
}

// Usage
const gateway = new IdempotentToolGateway("session-abc123");

async function handleAgentToolCall(call: ToolCall) {
  const { result, cached } = await gateway.execute(call);
  if (cached) {
    return `[ALREADY DONE] ${result}`;  // model sees success, stops retrying
  }
  return result;
}

Cache TTL and session scoping. The idempotency cache should be scoped to the agent session: within one session, any identical tool call returns the same cached result. Across sessions (new conversation, new run), the cache resets. This prevents the agent from getting stale results if the underlying resource has changed between sessions. For long-running agents (hours or days), add a TTL to the cache entries — 10 minutes is appropriate for most state-modifying operations.

Approach 3: RunGuard loop detection as the last line of defense

Why idempotency does not replace loop detection. Idempotency ensures that duplicate tool calls do not cause duplicate side effects. It does not prevent the model from spending tokens and money on those duplicate calls. An agent that calls send_email 50 times in a loop because the email tool returns a confusing error message will exhaust its budget even if every call after the first is a cache hit. RunGuard’s loop detection trips the circuit breaker on the tool-call pattern itself, preventing the model from continuing to call the tool regardless of whether the tool is idempotent.

Python: combining idempotency with loop detection

from runguard import LoopDetector, LoopDetectedError, BudgetExceededError

detector = LoopDetector(repeats=3, max_cycle_len=2)

def execute_tool_safe(
    gateway: "IdempotentToolGateway",
    call: ToolCall,
) -> str:
    """
    Execute a tool call with both idempotency (no duplicate effects)
    and loop detection (no runaway spend on duplicate calls).
    """
    # 1. Loop detection: trips if this pattern repeats 3+ times
    sig = f"{call['name']}:{sorted(call['args'].items())}"
    match = detector.record(sig)
    if match:
        raise LoopDetectedError(
            f"Loop detected for tool '{call['name']}': "
            f"pattern repeated {match.repeats}× — stopping agent.",
            match=match,
        )

    # 2. Idempotency: deduplicate the actual side effect
    result = gateway.execute_sync(call)
    return result["result"] if not result["cached"] else f"[CACHED] {result['result']}"

Agent idempotency approach comparison

Approach	Prevents duplicate side effects	Prevents duplicate token spend	Complexity
Tool-level upsert / check-before-send	Yes — inherently safe to call twice	No — model still makes the call	Low — per-tool change
Gateway idempotency cache	Yes — cache hit returns stored result	Partially — cache hit is free, but model still decides to retry	Medium — gateway layer required
RunGuard loop detection	Partially — trips breaker, not a dedup cache	Yes — breaker stops the model after N repeats	Low — one detector instance, one record() call
All three combined	Yes	Yes	Medium — layered defense-in-depth

For the retry storm pattern that idempotency prevents, see AI agent retry storm prevention. For the cost implications of non-idempotent loops, see prevent AI agent runaway cost in real time. For production reliability considerations beyond idempotency, see production LLM agent reliability checklist.

Make your agent tool calls safe to retry

RunGuard installs in one command: pip install runguard for Python, npm install @runguard/sdk for TypeScript. The loop detection approach (Approach 3) requires the fewest code changes and directly prevents the highest-cost scenario: an agent that spends $50 in tokens retrying a tool call that was idempotent all along but returned a confusing result. Add LoopDetector(repeats=3, max_cycle_len=2) around your tool dispatch first, then work backward to add idempotency to the most critical tools.

RunGuard pricing: Solo plan at $19/month for individual developers. Team plan at $79/month adds Slack and PagerDuty webhook alerts, shared dashboards, and audit log. Both plans include a 14-day free trial — no credit card required.

Start your 14-day free trial — or explore related: AI agent retry storm prevention, LLM agent fault tolerance patterns, autonomous agent cost control best practices, prevent runaway cost in real time, and production reliability checklist.