The 30-day log

Every post below is driven by data we can point at on disk — our own dogfood trip log first, then the loops, context overflows, and budget breaches our SDK catches inside customer apps. If a post's data isn't there yet, it stays gated. No vibes; no fabricated stats.

Editorial rule

Cadence is a target, not a contract. If the gate hasn't opened, we slip the day rather than ship filler. The credibility of the 30-day series is the whole point.

Engineering guides

Live · June 13 2026

OpenAI Realtime API Cost Control: Loop Detection and Budget Enforcement for Voice Agents

The Realtime API bills partial audio even on interruption — barge-in amplification loops, server VAD false-positive storms, function call echo chambers, and session transcript accumulation will drain your gpt-4o-realtime-preview budget invisibly. Four Python circuit breakers with complete implementations.

Live · June 13 2026

Ollama and Llama.cpp Agent Cost Control: Loop Detection and Resource Enforcement

Local models skip the billing dashboard — but VRAM OOM crash loops, silent context truncation causing tool-call repetition, cold-start cascades from model-reload thrash, and CPU inference runaway are just as expensive. Four failure modes unique to Ollama and llama.cpp agents with complete Python guard implementations.

Live · June 13 2026

Amazon Bedrock Converse API Cost Control: Loop Detection and Budget Enforcement

The Converse API unifies tool use across every Bedrock model behind a single boto3 call — but you own the messages list, the loop, and the budget. Four failure modes — tool call spirals, conversation history explosion, cross-model retry amplification, and streaming accumulation traps — with a complete Python ConverseBreaker implementation.

Live · June 12 2026

Amazon Bedrock Inline Agents Cost Control: Loop Detection and Budget Enforcement

invoke_inline_agent lets you define an agent’s foundation model, instructions, and action groups dynamically at runtime. Four cost failure modes absent from standard Bedrock Agents — instruction loops, session accumulation, action group thrash, and supervisor cascades — with complete Python InlineAgentBreaker implementation.

Live · June 14 2026

Anthropic Claude API Cost Control: Loop Detection and Budget Enforcement

Building agentic loops directly on the Anthropic Messages API means no framework guardrails between you and the billing meter. Four failure modes — tool use spirals, context window accumulation, retry cascade multiplication, and budget breach — with complete Python and TypeScript circuit breaker implementations using the Anthropic SDK.

Live · June 11 2026

Microsoft Copilot Studio Cost Control: Loop Detection and Budget Enforcement in Production

Microsoft Copilot Studio has no built-in circuit breaker. Four failure modes — topic redirect cycles, Power Automate retry storms, generative AI knowledge search spirals, and autonomous agent tool call loops — with full guard implementations in Power Fx, Power Automate expressions, and TypeScript custom connectors.

Live · June 11 2026

Salesforce Agentforce Cost Control: Loop Detection and Budget Enforcement in Production

Salesforce Agentforce’s Atlas reasoning engine has no built-in circuit breaker. Four failure modes — action call spiral, write action idempotency failure, Data Cloud retrieval context avalanche, escalation retry deadlock — with full Apex guard implementations for @InvocableMethod actions and Platform Cache session state.

Live · June 11 2026

IBM watsonx.ai Agents Cost Control: Loop Detection and Budget Enforcement in Production

IBM watsonx.ai’s agent framework runs a ReAct loop with no built-in circuit breaker. Four failure modes — tool call invocation spiral, nested agent chaining, RAG retrieval context avalanche, Granite model retry storm — with full Python guard implementations for the watsonx.ai Python SDK.

Live · June 10 2026

Vercel AI SDK Cost Control: Loop Detection and Budget Enforcement in Production

Vercel AI SDK’s maxSteps counts agentic steps but can’t detect tool call invocation spirals, parallel tool call cost amplification, cross-step context window drift, or provider-fallback re-routing loops. Four failure modes with a full TypeScript AISdkBreaker circuit breaker wrapping tool execute functions.

Live · June 10 2026

Spring AI Cost Control: Loop Detection and Budget Enforcement in Production

Spring AI’s maxToolCallsPerRequest counts tool calls but can’t detect function callback invocation spirals, MessageChatMemoryAdvisor token inflation, VectorStore RAG query fixation, or multi-agent task delegation loops. Four failure modes with a full Java SpringAgentBreaker circuit breaker using the CallAroundAdvisor API.

Live · June 10 2026

Bee Agent Framework Cost Control: Loop Detection and Budget Enforcement in Production

IBM’s Bee Agent Framework maxIterations counts turns but can’t detect tool observation fixation spirals, ReAct reasoning echo loops, memory token drift, or nested sub-agent back-delegation cycles. Four failure modes with a full TypeScript BeeAgentBreaker circuit breaker using Bee’s native event emitter API.

Live · June 10 2026

Vertex AI Agent Builder Cost Control: Loop Detection and Budget Enforcement in Production

Vertex AI Agent Builder’s session limits count turns, not patterns. Four failure modes — playbook tool invocation spiral, data store grounding query fixation, multi-playbook escalation loop, session context token drift — with a full Python VertexAgentBreaker circuit breaker wrapping the Dialogflow CX SDK.

Live · June 10 2026

Azure AI Agents Cost Control: Loop Detection and Budget Enforcement in Production

Azure AI Agent Service’s max_completion_tokens caps token spend but can’t detect run-step tool-call spirals, file search query fixation, thread token drift, or connected-agent re-delegation loops. Four failure modes with a full Python AzureAgentBreaker circuit breaker wrapping the azure-ai-projects SDK.

Live · June 5 2026

Haystack Agent Cost Control: Loop Detection and Budget Enforcement in Production

Haystack’s max_agent_steps counts steps but can’t detect pipeline back-edge cycles that never converge. Four failure modes — non-converging iterative refinement loops, tool repetition storms, chat history token inflation, cross-pipeline delegation depth — with full circuit breaker as a custom Component wrapper and HALF_OPEN recovery.

Live · June 5 2026

Microsoft Semantic Kernel Cost Control: Loop Detection and Budget Enforcement in Production

SK’s TerminationStrategy evaluates the latest message, not the pattern of messages across turns. Four failure modes — AgentGroupChat selection cycles, plugin re-invocation storms, Process Framework circular transitions, chat history cost inflation — with full circuit breaker wrapping AgentGroupChat.invoke() and HALF_OPEN recovery.

Live · June 4 2026

Microsoft AutoGen Cost Control: Loop Detection and Budget Enforcement in Production

AutoGen’s max_consecutive_auto_reply resets every time a different agent speaks — it can’t see speaker cycles in GroupChat. Four failure modes — speaker cycles, nested conversation cascades, code execution storms, message history explosion — with full circuit breaker using register_reply and HALF_OPEN recovery.

Week 1 — day-by-day

Publish-ready · fires at launch hour

Day 0 — We shipped RunGuard. The first loop it caught was ours.

The dogfood story: our own launch script looped against a shared upstream infra blocker. We instrumented the detector between failures. By the time the script retried a seventh time, the SDK opened the breaker before the API call went out.

Gated · T+24h

Day 1 — Launch numbers without the gloss

Signups, installs, referrers, and star counts at the 24-hour mark — with delta columns against the launch hour. Honest about whether the launch sustained or fizzled.

Gated · first non-self trip

Day 2 — The first non-self loop our SDK caught

A customer's agent looped. Our SDK opened the breaker. What the signature looked like, what the breaker defaults were, and what the customer's retry logic did next — anonymized, with permission.

Gated · T+72h + ≥3 signatures

Day 3 — Three loop signatures we hadn't seen before

Pattern-matching across 72 hours of customer trips. Categorized by trigger kind (loop / budget / context) and then by signature shape. One redacted example per category — code blocks, not prose.

Gated · first FP or T+96h

Day 4 — The first false positive (and what we changed)

When the breaker shouldn't have opened. What the user's legitimate workflow looked like, which default exposed the false positive, and whether we're shipping a version bump or a doc clarification.

Gated · both SDKs live 72h

Day 5 — TypeScript or Python? What our install ratio actually says

Five days of npm install @runguard/sdk vs pip install runguard. Two integers, one ratio, and three plausible explanations — not a "the Python community prefers X" from a week of data.

Gated · T+144h + ≥1 $-saved trip

Day 6 — $X in runaway runs we caught this week

The IDENTITY headline — "How we caught $X in runaway agent runs" — with the math shown. Customer-reported dollar figures where shared, token-pricing estimates otherwise, and every line tagged so readers can audit.

Gated · T+168h

Day 7 — A week-1 retro that names what we got wrong

Three concrete things we'd do differently, with the planned fix for each. One thing we got right and want to keep. Honest about cadence — did the gates hold, did we slip a day, did we publish anyway and now regret it?

Weeks 2–4 — weekly cadence

After day-7, the 30-day promise continues at a weekly cadence. Three stubs already scaffolded; each gates on real data so the structure matches what we've actually seen rather than what we imagined on day 0.

Gated · T+14d

Week 2 — 14 days of trips, ranked

Day-8 to day-14 in numbers — the second week of catches, framed as its own window rather than a "week-1 vs week-2 growth chart." Trip counts by trigger kind, install velocity, the first new signature week-1 didn't carry.

Gated · T+21d + new signature

Week 3 — A trip pattern we hadn't seen before

A signature that doesn't appear anywhere in the first 14 days of trip rows — shown raw, walked through the detector that caught it, with the customer's surrounding context (anonymized, with consent). One pattern per post.

Gated · T+28d

Week 4 — 30 days in — the kill-criteria check, told straight

The IDENTITY kill-criteria audit, made public. Verdict first; math second; one customer interview as the body; cadence audit of the entire 30-day arc. Publishing the threshold result honestly even if it's "kill" is the trust contract.

30-day soak

First post drops the hour RunGuard's launch channel fires. Stay close — or join the waitlist and we'll email when the SDK ships and the log starts.