LLM agent token limit exceeded in TypeScript: detect before the provider throws, recover with partial output

A TypeScript agent running a long research or coding task accumulates tool results in its context window. At some point the projected input tokens for the next call will exceed the model’s limit — 128k for GPT-4o, 200k for Claude Sonnet 4, 131k for Llama 3.1 70B. What happens next depends on which provider you’re using and how you’ve wired the call. OpenAI throws a 400 context_length_exceeded error. Anthropic throws a 400 prompt_too_long. Some older hosted models silently truncate from the beginning, corrupting earlier context without any error at all. None of these outcomes give you a graceful recovery path. RunGuard’s context guard projects the token count before each call and trips a recoverable error with the full accumulated context still in memory — so you can compact, checkpoint, or summarize instead of losing the run.

Why token-limit overruns happen in agent loops

The two failure modes: hard 400 vs. silent truncation

Detecting token overruns before the call: RunGuard context guard in TypeScript

Recovery strategies when the context guard fires

Token-limit handling approaches compared

ApproachWhen it firesContext preserved?Recovery possible?Cost impact
Catch provider 400After HTTP request failsNo (only error type)Manual, state lostFull prompt tokens billed
max_tokens on completionNever for input overrunN/AN/A (wrong problem)No protection
Manual token count before callPre-call (if implemented)YesYesFull prompt counted, not sent
RunGuard context guardPre-call, automaticYes — ContextOverflowError exposes projectedTokensYes — compaction or checkpointZero tokens billed on trip

Multi-provider token limits in TypeScript agents (2026 reference)