Real-time LLM spend alerts setup: thresholds, Slack/PagerDuty integration, and avoiding alert fatigue
The LLM billing dashboard is the slowest possible feedback loop for AI agent cost anomalies. By the time you open it, the expensive session has been over for hours. Real-time LLM spend alerts — firing mid-session when cost is still accumulating, not post-session when the bill is already generated — are the difference between catching a runaway agent at $0.80 versus finding out about it at $47.00. But most teams that attempt to build real-time cost alerting make one of three mistakes: they set thresholds too low and drown in false positives (alert fatigue), set them too high and miss incidents that matter, or build alerts that fire after session completion instead of during. This guide walks through the complete setup: threshold calibration, session-level vs. aggregate alerting, Slack and PagerDuty integration, alert routing by severity, and the alert lifecycle practices that keep your alerting system useful rather than ignored.
Understanding alert types: session-level vs. aggregate
- Session-level alerts fire on a single anomalous session, mid-execution. A session-level alert fires when a single session’s running cost exceeds a configured multiple of the expected session cost. The key word is “running cost” — computed during execution, before the session completes. A session-level alert at 3× baseline gives you the opportunity to circuit-break the session before it reaches 10× baseline. Session-level alerts are the most actionable type: they identify a specific session that is actively misbehaving, allow you to intervene (stop the session or redirect it to a cheaper model), and provide a session ID for forensic analysis. They are highest-priority in your alert routing.
- Aggregate alerts fire when total spend rate is anomalous across all sessions. An aggregate alert fires when the total LLM spend rate across all active sessions — measured in cost per minute over a rolling window — exceeds a configured threshold. Aggregate alerts indicate systemic issues: a prompt regression that made all sessions more expensive, a model API change that increased per-token costs, a traffic spike that multiplied session volume. They are lower-urgency than session-level alerts (no single session is necessarily anomalous) but indicate a larger systemic problem that requires investigation. Route aggregate alerts to a dedicated monitoring channel rather than paging the on-call engineer immediately.
- Daily/monthly budget threshold alerts fire on budget progress. A budget threshold alert fires when cumulative LLM spend for a day, week, or month reaches a configured fraction of the budget (e.g., 50%, 80%, 100%). These are not emergency alerts — they are operational awareness alerts. A 50% alert mid-month tells you that you’re tracking to finish the month on budget; an 80% alert at mid-month tells you that you’re tracking to overspend. Route budget threshold alerts to a Slack channel visible to the FinOps owner and engineering lead, but do not page anyone on these alerts. See AI agent cost anomaly detection for the statistical foundations of anomaly-vs-threshold alerting.
- Circuit-breaker trip alerts fire when a budget limit is enforced. A circuit-breaker alert fires when a session is stopped because it exceeded the configured budget ceiling. This is a high-priority alert because it means a real user experienced a degraded or stopped session. Unlike a pure cost anomaly alert, a circuit-breaker trip has a user-experience impact that needs to be tracked. Route circuit-breaker alerts to your customer success and engineering channels. Include the user ID, session ID, and the task type that triggered the trip — a human may need to follow up with the user if the task was business-critical.
Threshold calibration: setting thresholds that fire on signal, not noise
- Start with 14 days of production cost data before setting thresholds. Thresholds calibrated without production data will be wrong. Before configuring any alerts, collect 14 days of session cost data in logging mode (no alerts, just recording). Compute the mean, standard deviation, and P95/P99 of session costs for each agent type. Set your session-level anomaly threshold at the P99 (fires on the 1% of sessions that are genuinely unusual) rather than at 2× mean (fires on the 16% of sessions that are above one standard deviation). On a log-normal session cost distribution, P99 is approximately 3–5× the mean. For most agents, this means the threshold is high enough to avoid false positives from legitimate variation while catching genuine anomalies.
- Separate thresholds by agent type and task complexity. A single cost threshold across all agents creates systematic false positives for complex-task agents and systematic misses for simple-task agents. A research agent with a $0.80 median session cost has very different anomaly characteristics than a classification agent with a $0.02 median session cost. Configure separate thresholds per agent type: the research agent anomaly threshold might be $4.00 (5× median), while the classification agent threshold might be $0.20 (10× median, reflecting that any significant cost deviation on a simple agent is genuinely anomalous). If you cannot instrument per-agent-type thresholds, segment by task complexity bucket and apply the appropriate threshold.
- Progressive threshold levels reduce alert fatigue. Instead of a single threshold, configure three levels: Warning (3× baseline → Slack notification), Alert (7× baseline → Slack with mention, no page), and Critical (15× baseline → circuit break + PagerDuty page). Most anomalous sessions will fire at Warning or Alert and resolve naturally (the session completes at high cost but within acceptable absolute limits). Only sessions that are genuinely runaway reach Critical and trigger a page. A three-level progressive threshold catches everything while paging only the sessions that require immediate human intervention. Teams that configure a single threshold and set it low will page themselves constantly; teams that set it high will miss real incidents.
- Recalibrate thresholds after product changes. Every significant product change — a new feature launch, a model tier change, a prompt update, a new tool added to an agent — changes the session cost baseline for affected agents. Recalibrate thresholds after every significant change by re-running the P95/P99 analysis on the week of data following the change. Thresholds that are not recalibrated after product changes drift: they either become too sensitive (new feature has higher baseline, old threshold fires constantly) or too permissive (optimization reduced baseline, old threshold allows anomalies through). Schedule threshold review in your post-deployment checklist.
Slack integration: alert content that drives action
- Alert message structure for session-level anomalies. A Slack alert that says “Cost anomaly detected” is useless. An actionable alert includes: agent name, session ID (clickable to RunGuard dashboard), user ID or segment, current running cost, baseline expected cost, anomaly multiple, current turn number, current tool call count, and a one-line recommended action (“Consider circuit-breaking this session if turn count continues to increase” or “Large tool result detected — add a tool output size limit”). The alert receiver should be able to triage the situation in 30 seconds without opening any other tool. This level of context in the alert message requires your alerting layer to pull session metadata at alert time, not just the cost number.
- Route alerts to the right Slack channels by severity and type. Configure alert routing so that different alert types go to different channels with different visibility levels. Suggested routing: Warning alerts → #ai-cost-watch (low-traffic monitoring channel, no @mentions), Alert alerts → #ai-cost-incidents (mentions the on-call engineer), Critical alerts + circuit-breaker trips → #incidents (mentions the engineering lead, visible to customer success). Budget threshold alerts (50%/80%) → #finops (visible to FinOps owner and engineering leads). This routing ensures that the right people see the right alerts at the right time, without everyone being notified about every minor cost variation.
- Thread replies for alert resolution tracking. When a session-level anomaly alert fires and an engineer investigates, they should reply to the alert thread in Slack with a resolution note: what the root cause was, what action was taken (session stopped, prompt fixed, threshold adjusted), and whether a ticket was created for a permanent fix. This thread becomes the audit log for the anomaly event. Without this practice, anomaly events disappear into the Slack noise and the same root cause fires alerts repeatedly without being permanently resolved. A 60-day search of the #ai-cost-incidents channel should show every anomaly event and its resolution.
- Deduplicate alerts for sustained anomalies. A single runaway session that takes 20 minutes to complete should fire one alert, not one alert every 30 seconds as the cost accumulates. Implement alert deduplication: once an alert fires for a specific session ID, suppress subsequent alerts for the same session for 5 minutes. If the session cost crosses the next threshold level after 5 minutes (Warning escalates to Alert), send a new alert noting the escalation. Without deduplication, a high-cost session floods the alert channel and the team desensitizes to the alerts — the definition of alert fatigue.
PagerDuty integration: when to page and when not to
- Page only on circuit-breaker trips and Critical-level anomalies. PagerDuty pages interrupt real humans at real cost to their attention and well-being. Reserve PagerDuty for two event types: (1) circuit-breaker trips on business-critical agents (the agent that processes customer orders, handles support tickets, or runs production workflows — not experimental or internal tools), and (2) Critical-level anomalies that are still accumulating (session cost is 15× baseline and rising, no circuit breaker has tripped). All other cost events should route to Slack. An on-call rotation that is paged for every Warning-level cost anomaly will develop alert fatigue within two weeks and start ignoring alerts — including the critical ones.
- On-call runbook for cost anomaly response. Every PagerDuty alert should link to a cost anomaly runbook with a defined response procedure: (1) open the session in the RunGuard dashboard, (2) check the session’s tool call count and per-turn cost trend, (3) if the session is still active and cost is rising, trigger manual circuit break via the dashboard, (4) if the session has completed, identify the root cause from the cost trace, (5) file a ticket for a permanent fix if the root cause is a code issue. The runbook converts a stressful 3 AM alert into a structured 10-minute triage process. See AI agent cost incident runbook for a complete runbook template.
- Measure mean time to acknowledge and resolve cost anomalies. Track MTTA (mean time to acknowledge) and MTTR (mean time to resolve) for cost anomaly alerts the same way you track these metrics for availability incidents. A cost anomaly that is acknowledged in 5 minutes and resolved in 20 minutes generates 4× less cost impact than one acknowledged in 20 minutes and resolved in 60 minutes, because cost accumulates until intervention. If your MTTA for cost alerts is consistently over 15 minutes during business hours, your alert routing needs adjustment — the right person isn’t seeing the alert quickly enough.
Setting up real-time spend alerts with RunGuard
- One-line alert configuration for Slack and PagerDuty. RunGuard’s BudgetTracker fires alert callbacks at configurable cost thresholds. Wire the callbacks to your Slack and PagerDuty integrations at initialization time:
import { RunGuard } from '@runguard/sdk'; import { WebClient } from '@slack/web-api'; import { PagerDutyClient } from '@pagerduty/pdjs'; const slack = new WebClient(process.env.SLACK_BOT_TOKEN); const pd = new PagerDutyClient({ token: process.env.PD_TOKEN }); const guard = new RunGuard({ appName: 'customer-support-agent', budget: { maxCostPerSessionUsd: 2.00, warningThresholdUsd: 0.60, // 3x baseline alertThresholdUsd: 1.40, // 7x baseline onWarning: async (ctx) => { await slack.chat.postMessage({ channel: '#ai-cost-watch', text: `[WARNING] ${ctx.appName} session ${ctx.sessionId} at $${ctx.currentCostUsd} (${ctx.multipleOfBaseline}x baseline) — turn ${ctx.turnCount}` }); }, onAlert: async (ctx) => { await slack.chat.postMessage({ channel: '#ai-cost-incidents', text: ` [ALERT] ${ctx.appName} session $${ctx.currentCostUsd} (${ctx.multipleOfBaseline}x baseline)` }); }, onBudgetExceeded: async (ctx) => { await pd.events.sendAlert({ routingKey: process.env.PD_ROUTING_KEY, payload: { summary: `Circuit break: ${ctx.appName} session $${ctx.currentCostUsd}`, severity: 'critical', source: 'runguard' } }); } } }); - Mid-session cost tracking for pre-completion alerts. RunGuard tracks running session cost after every LLM call and evaluates thresholds immediately. This means a Warning alert fires when the session reaches $0.60, not when it completes. The session may still be at turn 8 of 25, with $1.40 of cost still to accumulate. Mid-session alerting gives you the window to intervene before the full cost is incurred. By contrast, provider-level billing alerts fire on your monthly accumulated cost — hours or days after the expensive session has already completed.
- Aggregate spend rate monitoring. RunGuard’s aggregate monitoring tracks total cost per minute across all active sessions and fires an aggregate alert when the rate exceeds a configured threshold. This catches systemic issues (all sessions suddenly costing more) that session-level alerts miss (each individual session may be below the anomaly threshold). Configure aggregate alerts with a 5-minute rolling window and a 2× baseline rate threshold. See prevent AI agent runaway cost in real time for the complete real-time monitoring architecture.
Alerts that fire after the damage is done aren’t alerts. They’re receipts.
RunGuard fires cost alerts mid-session, while cost is still accumulating, giving you the window to intervene before a runaway agent becomes an expensive lesson. Set up in minutes. Start catching anomalies the same day.
Start free trial →