AI agent idempotency design: safe retries and duplicate tool call prevention

When an AI agent’s tool call fails mid-execution, the agent retries. When the retry fails again, the agent tries a variant. When the variant produces an unexpected result, the agent tries the original approach again. Each of these retries is reasonable from the model’s perspective — it is working through its problem-solving repertoire. The problem is that tool calls are not always safe to repeat. Sending an email twice, inserting a database row twice, charging a payment twice, or posting a message to Slack twice are all duplicate side effects caused by retrying a non-idempotent operation. Unlike a web API where you control the retry logic and can add idempotency keys at the HTTP layer, an LLM agent generates tool calls autonomously: the model decides when to retry, not your application code. This creates a category of bug that does not exist in traditional software systems: the agent is correct (it detected an error and retried), but the system is in an inconsistent state (the side effect happened twice). This guide covers three approaches to agent idempotency: (1) designing tool implementations to be inherently idempotent, (2) adding idempotency keys to state-modifying tools, and (3) using RunGuard’s loop detection to prevent the agent from retrying the same operation more than N times regardless of the tool’s idempotency guarantees.

Why agent-level idempotency is different from API idempotency

Approach 1: design tools to be inherently idempotent

Approach 2: idempotency keys at the tool gateway layer

Approach 3: RunGuard loop detection as the last line of defense

Agent idempotency approach comparison

Approach Prevents duplicate side effects Prevents duplicate token spend Complexity
Tool-level upsert / check-before-send Yes — inherently safe to call twice No — model still makes the call Low — per-tool change
Gateway idempotency cache Yes — cache hit returns stored result Partially — cache hit is free, but model still decides to retry Medium — gateway layer required
RunGuard loop detection Partially — trips breaker, not a dedup cache Yes — breaker stops the model after N repeats Low — one detector instance, one record() call
All three combined Yes Yes Medium — layered defense-in-depth

For the retry storm pattern that idempotency prevents, see AI agent retry storm prevention. For the cost implications of non-idempotent loops, see prevent AI agent runaway cost in real time. For production reliability considerations beyond idempotency, see production LLM agent reliability checklist.

Make your agent tool calls safe to retry

RunGuard installs in one command: pip install runguard for Python, npm install @runguard/sdk for TypeScript. The loop detection approach (Approach 3) requires the fewest code changes and directly prevents the highest-cost scenario: an agent that spends $50 in tokens retrying a tool call that was idempotent all along but returned a confusing result. Add LoopDetector(repeats=3, max_cycle_len=2) around your tool dispatch first, then work backward to add idempotency to the most critical tools.

RunGuard pricing: Solo plan at $19/month for individual developers. Team plan at $79/month adds Slack and PagerDuty webhook alerts, shared dashboards, and audit log. Both plans include a 14-day free trial — no credit card required.

Start your 14-day free trial — or explore related: AI agent retry storm prevention, LLM agent fault tolerance patterns, autonomous agent cost control best practices, prevent runaway cost in real time, and production reliability checklist.