AI agent context poisoning: four types of context corruption and how to detect each before it causes a runaway

Context poisoning is the corruption of an AI agent’s conversation history with data that causes the model to reason incorrectly, act on false premises, or enter runaway behavior. It’s distinct from a simple tool error — the tool call “succeeds” and a result is appended to the context, but that result is bad data: a hallucinated value from a flaky API, an injected instruction from an adversarial document, redundant error text from a loop, or truncated content from a silent context-window overflow. In each case, the agent’s reasoning in subsequent turns is corrupted by the bad context entry. This page describes the four context poisoning types, how to detect each one, and how to wire a circuit breaker that fires before the corrupted context causes an unrecoverable failure.

Type 1: loop-generated noise pollution

Type 2: prompt injection via tool results

Type 3: silent truncation from context-window overflow

Type 4: hallucinated tool results in agent chains

Wiring context integrity checks with RunGuard

Context poisoning types and detection methods

Poisoning typeSourceDetection methodMitigation
Loop noise pollutionRepeated tool callsRunGuard loop detector (pattern in signature stream)Trip at repeat-3, compact noisy entries
Prompt injection via toolsExternal content in tool resultsKeyword scan before append; structural isolationSanitize / suppress; RunGuard budget caps blast radius
Silent truncationContext overflow + silently truncating hostRunGuard context guard (pre-call projection)Trip before request sent; compact and retry
Hallucinated sub-agent resultsSub-agent LLM hallucinationStructured output confidence + dual-agent verificationBudget cap on sub-agent; confidence gate before appending