LLM agent resource cleanup cost patterns: the leaks that silently multiply your bill

Every unclosed HTTP connection, every unbounded in-process cache, every file descriptor pointing at a vector index that finished five minutes ago — these are not just operational annoyances. They are direct cost amplifiers. When an embedding cache grows until the OS kills the process, the agent restarts cold, replays its context, and sends 3,500 tokens it already processed. When a thread pool fills with stalled tool-call futures, the agent retries the whole task from scratch. When a connection pool is exhausted and the agent opens a raw database connection on every call, latency spikes enough to trigger timeout-induced re-runs. This page catalogs every major resource category that accumulates without explicit cleanup, quantifies the cost impact of each, and shows the Python and TypeScript patterns that eliminate them — with RunGuard hooks that catch cleanup failures before they cascade.

Five resource categories that accumulate without cleanup

Quantifying the cost impact: from resource leak to dollar figure

Python: ResourceScope context manager with RunGuard integration

TypeScript: explicit resource management with using and AsyncResourceManager

Bounded LRU embedding cache with RunGuard eviction monitoring

Resource leak types: symptoms, cleanup patterns, RunGuard detection, and cost impact

Resource type Leak symptom Cleanup pattern RunGuard detection Cost impact
HTTP client / connection pool OSError: Cannot assign requested address; ephemeral port exhaustion on busy hosts Instantiate once per process; call aclose() / destroy() in context manager exit rg.connection_guard(max_open=5) alerts when a single agent task holds more than N connections Each forced re-plan from connection failure: ~1,000–3,500 input tokens × $3/MTok = $0.003–$0.011 per event
In-process embedding cache OOM kill; container restart; cold-start token replay Bounded LRU with max_size; shared singleton; external Redis for >100k entries rg.metric("embedding_cache_eviction") callback; high eviction rate triggers alert 200 OOM restarts/day × 3,500 replay tokens × $3/MTok = $2.10/day ($766/year)
Vector search index (FAISS, Hnswlib) OSError: Too many open files; FD limit exhaustion Open index once at startup; share across requests; call close() at process shutdown rg.resource_guard() cleanup-timing alert on __aexit__ if close takes >500ms FD exhaustion triggers tool failure → retry storm; cost equals N retry LLM calls × full context
ThreadPoolExecutor for sync tool calls Thread accumulation; memory pressure; eventual OOM or tool-call timeout One shared executor per process; executor.shutdown(wait=True) on app teardown; cancel stalled futures rg.alert("slow_cleanup") when executor shutdown exceeds 500ms threshold Stalled thread → tool timeout → task duplication: 2× full task token cost per event
Database connection pool Connection refused; pool wait timeout; raw connection fallback adding 100ms latency per call Single pool per process; pool.close() / pool.end() in scope exit; release individual connections in finally rg.connection_guard() tracks pool saturation; alert on pool wait >200ms 100ms latency spike × 10k calls/day = 1,000s wasted latency; each timeout triggers retry LLM call

Related: AI agent health monitoring cost tradeoffs — how to instrument agent processes for memory, FD count, and connection health without adding overhead that itself increases cost. See also production LLM agent reliability checklist for the full set of operational checks that prevent resource-driven failures.

Applying cleanup patterns to the full agent lifecycle

Stop resource leaks from becoming token-spend multipliers

Every resource leak in an LLM agent eventually manifests as extra token spend — through cold-start replays, retry storms, or task duplication. The patterns on this page — ResourceScope context managers, bounded LRU caches, shared singleton pools, and deterministic cleanup on task exit — eliminate the five most common leak categories. RunGuard’s resource_guard, connection_guard, and eviction metric callbacks provide the runtime visibility to catch cleanup regressions before they accumulate into meaningful cost.

RunGuard pricing: Solo plan at $19/month for individual developers. Team plan at $79/month adds Slack and PagerDuty webhook alerts, shared dashboards, and audit log. Both plans include a 14-day free trial — no credit card required.

Start your 14-day free trial — or explore related: AI agent context window truncation alert and AI agent graceful degradation patterns.