LLM output validation loop prevention: why validate-retry cycles become infinite and how to stop them
A common agent pattern is structured output validation: ask the model to produce JSON (or another structured format), parse the output, validate it against a schema, and if validation fails, send the error back to the model and ask it to fix the output. This works well when the model occasionally produces minor formatting errors that it can self-correct in one or two retries. It becomes a runaway cost problem when the model enters a validation failure loop — consistently producing output that fails the same schema constraint, retrying indefinitely, paying for each retry. The failure mode is subtle: the model is not “stuck” in the sense of producing identical output; it may produce different invalid outputs on each retry, which means a loop detector based purely on output deduplication will not catch it. What makes it a loop is the pattern: “generate → validate → fail → generate → validate → fail,” regardless of the specific error message. This guide covers how to detect validation loops by their structural pattern rather than their content, how to enforce hard retry limits, and how to use RunGuard’s loop detector with a validation-specific signature scheme to catch these storms before they exhaust your API budget.
Why validation loops are hard to catch with standard retry logic
-
The model produces different invalid outputs on each retry. A model asked to produce a JSON object with a required
statusfield of type"success" | "failure"might produce"status": "ok"on retry 1,"status": "completed"on retry 2, and"status": "done"on retry 3. Each is a different string, so output-deduplication-based loop detection reports no loop. But the underlying constraint violation is the same on all three retries: the value is not one of the allowed enum values. The loop is real; the detector missed it because it was looking at values, not constraint violations. - Validation error messages are not stable identifiers. If you route the validation error back to the model as part of the retry prompt, different invalid outputs produce different error messages (“'ok' is not a valid value for status” vs “'completed' is not a valid value for status”). Hashing the full error message as the loop signature produces a different signature each time, so three retries never accumulate to the threshold needed to trip the breaker.
-
The fix: signature by constraint violation category, not by specific value. Rather than using the full validation error message as the loop signature, extract the constraint violation type:
enum_violation:status,missing_required_field:user_id,type_error:amount:expected_number. This signature is stable across retries that fail for the same structural reason, even when the specific invalid values differ. Three retries ofenum_violation:statustrip the loop breaker; three retries of three different error messages do not.
Python: validation loop detection with constraint-violation signatures
-
Python: structured output validator with RunGuard loop detection
import json import re import anthropic from runguard import LoopDetector, LoopDetectedError, BudgetExceededError, guard from typing import Any client = anthropic.Anthropic() # Constraint violation categories — stable loop signatures def extract_violation_signature(validation_error: str) -> str: """ Convert a validation error message to a stable violation-category signature. This prevents the loop detector from being fooled by varying error messages that represent the same structural constraint failure. """ error_lower = validation_error.lower() # Enum / allowed values violation m = re.search(r"['\"](\w+)['\"] (?:is not|not) (?:a valid|an? allowed|one of)", error_lower) if m: # Try to extract field name from the error field = re.search(r"for (?:field|key|property) ['\"]?(\w+)", error_lower) field_name = field.group(1) if field else "unknown_field" return f"enum_violation:{field_name}" # Missing required field m = re.search(r"(?:required|missing) (?:field|key|property) ['\"]?(\w+)", error_lower) if m: return f"missing_required:{m.group(1)}" # Type error m = re.search(r"(?:expected|should be) (\w+).*?['\"]?(\w+)['\"]?", error_lower) if m: return f"type_error:{m.group(2)}:expected_{m.group(1)}" # JSON parse failure if "json" in error_lower and ("parse" in error_lower or "decode" in error_lower or "invalid" in error_lower): return "json_parse_error" # Fallback: use first 40 chars of error stripped to safe chars safe = re.sub(r"[^a-zA-Z0-9_]", "_", error_lower[:40]).strip("_") return f"validation_error:{safe}" def validate_task_output(output: str, schema: dict) -> tuple[bool, str]: """ Validate output against schema. Returns (is_valid, error_message). Simple demo validator; replace with jsonschema.validate() in production. """ try: data = json.loads(output) except json.JSONDecodeError as e: return False, f"JSON parse error: {e}" for required_field in schema.get("required", []): if required_field not in data: return False, f"Missing required field '{required_field}'" properties = schema.get("properties", {}) for field, rules in properties.items(): if field not in data: continue value = data[field] if "enum" in rules and value not in rules["enum"]: return False, f"'{value}' is not a valid value for field '{field}'. Allowed: {rules['enum']}" if "type" in rules: py_type = {"string": str, "number": (int, float), "integer": int, "boolean": bool}.get(rules["type"]) if py_type and not isinstance(value, py_type): return False, f"Field '{field}' expected {rules['type']}, got {type(value).__name__}" return True, "" def generate_with_validation( prompt: str, schema: dict, max_retries: int = 4, budget_usd: float = 0.10, ) -> dict[str, Any]: """ Generate structured output with validation-loop protection. Uses RunGuard's LoopDetector with constraint-violation signatures to detect systematic validation failures before exhausting the retry budget. """ loop_detector = LoopDetector(max_repeats=2, max_cycle_len=3) total_spent = 0.0 HAIKU_IN = 0.25 / 1_000_000 HAIKU_OUT = 1.25 / 1_000_000 messages = [{"role": "user", "content": prompt}] schema_description = json.dumps(schema, indent=2) for attempt in range(max_retries + 1): if total_spent >= budget_usd: raise BudgetExceededError(f"Validation budget exhausted: ${total_spent:.5f} >= ${budget_usd}") resp = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=512, system=f"Always respond with valid JSON matching this schema:\n{schema_description}", messages=messages, ) output = resp.content[0].text total_spent += resp.usage.input_tokens * HAIKU_IN + resp.usage.output_tokens * HAIKU_OUT is_valid, error_msg = validate_task_output(output, schema) if is_valid: return {"output": json.loads(output), "attempts": attempt + 1, "spent_usd": total_spent} # Extract violation signature and check for loop violation_sig = extract_violation_signature(error_msg) match = loop_detector.record(violation_sig) if match: raise LoopDetectedError( f"Validation loop detected after {attempt + 1} attempts: " f"repeated violation '{violation_sig}' — model cannot satisfy constraint. " f"Error: {error_msg}" ) if attempt < max_retries: # Add error feedback to conversation for next attempt messages.append({"role": "assistant", "content": output}) messages.append({ "role": "user", "content": f"The output failed validation: {error_msg}\nPlease fix and return valid JSON.", }) raise RuntimeError(f"Validation failed after {max_retries + 1} attempts. Last error: {error_msg}") # Example usage TASK_SCHEMA = { "type": "object", "required": ["status", "confidence", "reasoning"], "properties": { "status": {"type": "string", "enum": ["success", "failure", "uncertain"]}, "confidence": {"type": "number"}, "reasoning": {"type": "string"}, }, } try: result = generate_with_validation( prompt="Classify this customer support ticket: 'My order hasn't arrived after 3 weeks.'", schema=TASK_SCHEMA, max_retries=4, budget_usd=0.05, ) print(f"Result after {result['attempts']} attempts: {result['output']}") print(f"Cost: ${result['spent_usd']:.5f}") except LoopDetectedError as e: print(f"[LOOP] {e}") except BudgetExceededError as e: print(f"[BUDGET] {e}") -
The loop detector trips on constraint violation category, not specific error text. If the model produces
"status": "ok", then"status": "resolved", then"status": "done", all three produce the signatureenum_violation:status. On the third attempt,loop_detector.record("enum_violation:status")returns a match, and the loop is detected and terminated. Without this signature normalization, all three retries produce different signatures and the detector never trips.
Validation retry strategies and their loop risk
| Strategy | Loop detection quality | Cost risk | Recommended cap |
|---|---|---|---|
| Hard retry limit only | Good — terminates after N retries | Low if N is small (3–5) | max_retries=3 for most validation tasks |
| Loop detector with full error message as signature | Poor — same constraint, different messages = missed loop | High — detector never trips on varying errors | Do not use without signature normalization |
| Loop detector with violation-category signature | Excellent — catches same-constraint failures regardless of value | Low — trips after 2–3 retries of same constraint failure | max_repeats=2 trips on 3rd same-constraint failure |
| Structured output APIs (JSON mode, tool use) | Prevention, not detection — reduces validation failure rate | Low — parser enforces schema at API level | Prefer over free-text + validation when schema is fixed |
| Budget cap only | None — spends to cap before stopping | Medium — limits total spend but not loop efficiency | Always pair with loop detector, never use alone |
For the general retry storm pattern, see AI agent retry storm prevention. For loop detection in tool call chains, see how to detect LLM tool call loops in production.
Stop validation retry storms before they drain your budget
RunGuard’s LoopDetector is the right primitive for validation loop prevention when used with normalized violation-category signatures. Set max_repeats=2 to trip after the third occurrence of the same constraint violation type, and pair it with a hard budget_usd cap via guard(). The combination ensures you never pay for more than a few validation retries regardless of whether the loop is detected structurally (same violation category) or by cost exhaustion (budget cap).
RunGuard pricing: Solo plan at $19/month for individual developers. Team plan at $79/month adds Slack and PagerDuty webhook alerts, shared dashboards, and audit log. Both plans include a 14-day free trial — no credit card required.
Start your 14-day free trial — or explore related: AI agent retry storm prevention, detect LLM tool call loops, stop AI agent infinite loops in TypeScript, AI agent infinite loop in Python, and autonomous agent cost control best practices.