Slack AI & Slack Workflow Builder Cost Control: Event Fan-Out, Bot Self-Loop, Timeout Retry Duplication, and Workflow Storm
Slack's event subscription model delivers one HTTP POST to your endpoint for every matching workspace event. In a workspace with a few hundred members in active channels, that is thousands of events per day. Wire an LLM API call to that event handler without any rate limiting and you have converted your Slack workspace's natural activity into an unthrottled API billing multiplier.
The four failure modes that make Slack particularly dangerous for AI-powered apps share one root: Slack's delivery guarantees are designed around stateless, fast HTTP handlers. The platform retries on timeout, delivers to all matching subscribers, and has no built-in concept of "this event already triggered an AI call." Every one of those assumptions becomes a cost trap when the handler does expensive async work.
Four failure modes make Slack particularly tricky for AI workloads:
- Message event fan-out — a Slack app subscribed to
messageevents receives one event per message across all channels it has access to; in an active workspace with 1,000 messages per day, that is 1,000 potential LLM API calls before any filtering or budgeting is applied. - Bot self-loop from missing subtype filter — a bot that posts a message to a channel it is subscribed to will receive its own message as a new event; without a
subtype === "bot_message"orbot_idcheck, the bot enters an unbounded loop of reading and responding to its own outputs, making one LLM API call per loop iteration. - 3-second timeout retry duplication — Slack requires event endpoints to acknowledge with HTTP 200 within 3 seconds; LLM API calls commonly take 4–15 seconds for a substantive response; Slack retries unacknowledged deliveries up to three times, tripling the LLM calls from a single original event without idempotency protection.
- Workflow Builder AI step thundering herd — Slack Workflow Builder executes workflow steps concurrently for simultaneous triggers; when multiple workflow executions all call the same external LLM API endpoint and hit the same rate limit simultaneously, all retry at the configured delay in synchronized waves, multiplying total call volume by the retry count for each concurrent execution.
Failure Mode 1 — Message Event Fan-Out
Slack's Events API delivers a message event to your endpoint every time a message is posted in any channel your app has joined. A Slack app with the channels:history or groups:history scope and a subscription to message events will receive every message in every public or private channel it has access to, not just messages that mention the bot or match a keyword.
Teams building AI-powered Slack bots commonly subscribe to message events intending to process only certain messages — messages with a specific keyword, messages that mention the bot, messages in a specific channel. The filtering logic lives in the event handler. The LLM API call is inside an if block. The pattern feels safe because the filter runs first.
The failure mode is not in the filter — it is in the event volume. A workspace with 200 active members across 30 channels generates 2,000–5,000 messages per day on a typical workday. Even if your filter matches only 10% of messages, that is 200–500 LLM API calls per day, 6,000–15,000 per month, before any runaway scenario occurs. If a high-traffic channel gets added to the app's channel list, or if a bot in another workspace is misconfigured and floods your shared DM channel, the filter match rate can spike suddenly.
The fan-out rule: LLM calls from a Slack event subscription scale with workspace message volume, not with your intended usage. Measure actual event delivery rate per channel before assuming your filter is sufficient — and cap LLM calls per channel per time window at the handler level, before the filter, so an event volume spike cannot exhaust your API budget regardless of filter behavior.
The guard pattern for message event fan-out is a per-channel sliding window rate limiter applied at the handler entry point, before any filtering or LLM call. The limiter tracks how many LLM calls have been made for each channel in the current time window and short-circuits with a cached or empty response when the ceiling is reached:
import time
import sqlite3
import threading
from flask import Flask, request, jsonify
app = Flask(__name__)
db_lock = threading.Lock()
DB_PATH = "slack_event_guard.db"
# Ceiling: max LLM calls per channel per sliding window
MAX_CALLS_PER_CHANNEL = 20
WINDOW_SECONDS = 3600 # 1 hour
def init_db():
with sqlite3.connect(DB_PATH) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS channel_calls (
id INTEGER PRIMARY KEY AUTOINCREMENT,
channel_id TEXT NOT NULL,
event_ts TEXT,
called_at REAL
)
""")
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_channel_called "
"ON channel_calls (channel_id, called_at)"
)
class SlackEventGuard:
"""
Sliding window rate limiter for LLM calls triggered by Slack message events.
Call check() before every LLM invocation inside a Slack event handler.
Returns allow=True when under ceiling, allow=False when ceiling is reached.
"""
@staticmethod
def check(channel_id: str, event_ts: str = "") -> dict:
now = time.time()
window_start = now - WINDOW_SECONDS
with db_lock:
with sqlite3.connect(DB_PATH) as conn:
count = conn.execute(
"SELECT COUNT(*) FROM channel_calls "
"WHERE channel_id = ? AND called_at > ?",
(channel_id, window_start)
).fetchone()[0]
if count >= MAX_CALLS_PER_CHANNEL:
oldest = conn.execute(
"SELECT called_at FROM channel_calls "
"WHERE channel_id = ? AND called_at > ? "
"ORDER BY called_at ASC LIMIT 1",
(channel_id, window_start)
).fetchone()
retry_after = int(WINDOW_SECONDS - (now - oldest[0])) if oldest else WINDOW_SECONDS
return {
"allow": False,
"reason": "channel_llm_ceiling",
"channel_id": channel_id,
"count": count,
"ceiling": MAX_CALLS_PER_CHANNEL,
"retry_after_seconds": retry_after,
"message": (
f"LLM call ceiling reached for channel {channel_id!r}: "
f"{count} calls in the last hour (ceiling: {MAX_CALLS_PER_CHANNEL}). "
"Returning empty/cached response. "
"Check your event subscription scope — are you subscribed to more "
"channels than intended?"
),
}
conn.execute(
"INSERT INTO channel_calls (channel_id, event_ts, called_at) "
"VALUES (?, ?, ?)",
(channel_id, event_ts, now)
)
return {
"allow": True,
"channel_id": channel_id,
"count": count + 1,
"ceiling": MAX_CALLS_PER_CHANNEL,
}
@app.route("/slack/events", methods=["POST"])
def slack_events():
payload = request.get_json(force=True)
# Slack URL verification challenge
if payload.get("type") == "url_verification":
return jsonify({"challenge": payload["challenge"]})
event = payload.get("event", {})
channel_id = event.get("channel", "")
event_ts = event.get("ts", "")
# Guard: check channel LLM ceiling before any processing
guard_result = SlackEventGuard.check(channel_id, event_ts)
if not guard_result["allow"]:
return jsonify({"ok": True, "guarded": True}), 200
# Your LLM call goes here
# response = call_llm(event["text"])
# post_slack_message(channel_id, response)
return jsonify({"ok": True}), 200
if __name__ == "__main__":
init_db()
app.run(port=8084)
The ceiling of 20 LLM calls per channel per hour is a starting point, not a hard recommendation. Measure your actual intended call rate first: if your bot is designed to respond to every question in a dedicated #ask-ai channel that receives 5 questions per hour, set the ceiling at 10 (2× headroom) rather than 20. If the ceiling is hit on a normal workday, that is a signal that your subscription scope is wider than intended, not that the ceiling is too low.
Failure Mode 2 — Bot Self-Loop from Missing Subtype Filter
Slack sends a message event for every message posted in a channel, including messages posted by bots. If your bot is subscribed to message events in a channel and your bot posts a message into that channel, Slack delivers a message event for the bot's own message to the same event endpoint.
The event payload for a bot-posted message includes subtype: "bot_message" and a bot_id field matching your app's bot user ID. A handler that checks for these fields and returns early will never process its own messages. A handler that does not check will call the LLM on the bot's response, produce another response, post it to the channel, receive another event, and repeat — until the workspace rate limits the bot's posting rate or the LLM API budget is exhausted.
The loop velocity depends on how fast the LLM responds. A simple chat completion with GPT-4o mini at low temperature returns in under 1 second. A bot loop at 1 second per iteration generates 3,600 LLM calls per hour, 86,400 per day. At $0.15 per million tokens input and 500 tokens per call, that is $12.96 per day from a single unguarded bot in a single channel. With a longer model or a larger context, the cost scales proportionally.
The self-loop rule: Every Slack event handler that posts messages must filter out subtype === "bot_message" events and check that bot_id does not match the app's own bot user ID. Both checks are required: the subtype field is absent in some legacy event formats, and the bot_id check catches cases where a workspace has multiple bot apps and the handler is only meant to respond to human messages.
The guard combines a stateless message filter with a stateful loop detector that trips when the same bot posts to the same channel more than N times within a short window — catching cases where the subtype filter passes correctly but the bot is still looping due to another trigger condition:
import time
import sqlite3
import threading
import os
db_lock = threading.Lock()
DB_PATH = "slack_bot_loop.db"
# Your bot's user ID from the Slack app settings
BOT_USER_ID = os.environ.get("SLACK_BOT_USER_ID", "")
# Loop detector: block if bot posts more than N times to same channel
# within LOOP_WINDOW_SECONDS — indicates a self-response loop
LOOP_POST_CEILING = 5
LOOP_WINDOW_SECONDS = 60
def init_db():
with sqlite3.connect(DB_PATH) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS bot_posts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
channel_id TEXT NOT NULL,
posted_at REAL
)
""")
class SlackBotLoopGuard:
@staticmethod
def is_self_message(event: dict) -> bool:
"""Returns True if the event is the bot's own message — skip processing."""
if event.get("subtype") == "bot_message":
return True
if event.get("bot_id"):
return True
if event.get("user") == BOT_USER_ID:
return True
return False
@staticmethod
def check_post_ceiling(channel_id: str) -> dict:
"""
Call before posting a bot message. Returns allow=False if the bot
has already posted LOOP_POST_CEILING times in the last LOOP_WINDOW_SECONDS,
which indicates a self-response loop rather than legitimate activity.
"""
now = time.time()
window_start = now - LOOP_WINDOW_SECONDS
with db_lock:
with sqlite3.connect(DB_PATH) as conn:
count = conn.execute(
"SELECT COUNT(*) FROM bot_posts "
"WHERE channel_id = ? AND posted_at > ?",
(channel_id, window_start)
).fetchone()[0]
if count >= LOOP_POST_CEILING:
return {
"allow": False,
"reason": "bot_post_loop_detected",
"channel_id": channel_id,
"posts_in_window": count,
"ceiling": LOOP_POST_CEILING,
"window_seconds": LOOP_WINDOW_SECONDS,
"message": (
f"Bot loop detected in {channel_id!r}: "
f"{count} bot posts in {LOOP_WINDOW_SECONDS}s "
f"(ceiling: {LOOP_POST_CEILING}). "
"Check is_self_message() filter in your event handler. "
"Blocking all bot posts to this channel for the remainder "
"of the current window to stop cost accumulation."
),
}
conn.execute(
"INSERT INTO bot_posts (channel_id, posted_at) VALUES (?, ?)",
(channel_id, now)
)
return {"allow": True, "channel_id": channel_id, "posts_in_window": count + 1}
# Usage in event handler:
#
# event = payload["event"]
# if SlackBotLoopGuard.is_self_message(event):
# return jsonify({"ok": True}), 200
#
# # ... process and call LLM ...
#
# post_check = SlackBotLoopGuard.check_post_ceiling(channel_id)
# if not post_check["allow"]:
# logger.error(post_check["message"])
# return jsonify({"ok": True}), 200
#
# post_to_slack(channel_id, llm_response)
The post ceiling is the belt-and-suspenders layer. The is_self_message() filter should prevent the bot from ever processing its own output. The post ceiling catches cases the filter misses — for example, a slash command handler that triggers on every message and posts a response regardless of who sent the message, creating a loop that does not involve bot_message subtypes at all.
Failure Mode 3 — 3-Second Timeout Retry Duplication
Slack's Events API requires your endpoint to respond with HTTP 200 within 3 seconds of receiving an event. If the endpoint does not respond within 3 seconds, Slack marks the delivery as failed and retries. Slack retries event delivery up to three additional times with exponential backoff starting at 30 minutes.
LLM API calls routinely take longer than 3 seconds. A GPT-4o call with a 1,000-token context and 500-token output typically takes 2–5 seconds at low load and 8–20 seconds at high load. A Claude Sonnet call with tool use can take 10–30 seconds. An agentic loop that calls the LLM multiple times before responding can take minutes.
The naive implementation — receive Slack event, call LLM, post response, return 200 — will frequently time out under real load. When it times out, Slack retries the same event. If your handler is stateless (no idempotency check on the event ID), the retry fires another LLM call from the same original event. All three retries succeed because each makes a real LLM call and returns 200. The user sees one message from the original event; you pay for four LLM calls.
At scale, 3× multiplication of LLM costs from retry duplication is often the single largest unexpected line item in a Slack bot's API bill. A bot handling 200 events per day with a 30% timeout rate and Slack's 3-retry policy generates 200 + 200×0.3×3 = 380 LLM calls per day from 200 real user interactions — nearly 2× the expected cost without a single bug in the AI logic.
The timeout rule: Acknowledge Slack events immediately (within 200ms) and enqueue the AI work asynchronously. Never do blocking LLM work in the synchronous Slack event handler. Use the event's event_id field as an idempotency key — Slack sends the same event_id on every retry of the same delivery, so a deduplication check on event_id at queue enqueue time will filter all retries to a no-op.
The pattern is a two-part handler: a synchronous acknowledgment layer that returns 200 immediately after deduplicating on event_id, and an async worker that does the actual LLM call. The deduplication check is the critical guard — without it, the async worker would receive a job for each retry delivery:
import time
import sqlite3
import threading
from flask import Flask, request, jsonify
app = Flask(__name__)
db_lock = threading.Lock()
DB_PATH = "slack_event_idempotency.db"
# How long to retain event_id records (Slack retries within ~2 hours)
DEDUP_WINDOW_SECONDS = 7200
def init_db():
with sqlite3.connect(DB_PATH) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS processed_events (
event_id TEXT PRIMARY KEY,
channel_id TEXT,
received_at REAL
)
""")
class SlackEventIdempotency:
@staticmethod
def is_duplicate(event_id: str, channel_id: str = "") -> bool:
"""
Returns True if this event_id was already processed (or enqueued).
On first call for a given event_id, records it and returns False.
All subsequent calls (Slack retries) return True.
"""
now = time.time()
window_start = now - DEDUP_WINDOW_SECONDS
with db_lock:
with sqlite3.connect(DB_PATH) as conn:
# Expire old records to prevent unbounded growth
conn.execute(
"DELETE FROM processed_events WHERE received_at < ?",
(window_start,)
)
existing = conn.execute(
"SELECT event_id FROM processed_events WHERE event_id = ?",
(event_id,)
).fetchone()
if existing:
return True # duplicate — this is a Slack retry delivery
conn.execute(
"INSERT INTO processed_events (event_id, channel_id, received_at) "
"VALUES (?, ?, ?)",
(event_id, channel_id, now)
)
return False # first delivery — safe to process
def process_event_async(event: dict, event_id: str):
"""Worker that runs in a thread after the handler returns 200."""
channel_id = event.get("channel", "")
text = event.get("text", "")
# Your LLM call goes here — runs after 200 already returned to Slack
# response = call_llm(text)
# post_to_slack(channel_id, response)
pass
@app.route("/slack/events", methods=["POST"])
def slack_events():
payload = request.get_json(force=True)
if payload.get("type") == "url_verification":
return jsonify({"challenge": payload["challenge"]})
event_id = payload.get("event_id", "")
event = payload.get("event", {})
channel_id = event.get("channel", "")
# Dedup check — if this is a Slack retry, ack and return immediately
if SlackEventIdempotency.is_duplicate(event_id, channel_id):
return jsonify({"ok": True, "duplicate": True}), 200
# Acknowledge immediately (within 200ms) — Slack's 3-second clock is now stopped
# Spin up async worker for the actual LLM work
t = threading.Thread(
target=process_event_async,
args=(event, event_id),
daemon=True
)
t.start()
# Return 200 before LLM call starts
return jsonify({"ok": True}), 200
if __name__ == "__main__":
init_db()
app.run(port=8085)
In production, replace the threading.Thread with a proper task queue (Celery, RQ, or a simple database-backed queue) so that async workers survive process restarts. The threading approach is sufficient for low-volume bots and makes the idempotency pattern easy to see in isolation. The key invariant is that SlackEventIdempotency.is_duplicate() must run synchronously, before returning 200 and before starting the async worker — otherwise the race condition between the check and the thread start can allow duplicate processing on rapid retry delivery.
Failure Mode 4 — Workflow Builder AI Step Thundering Herd
Slack Workflow Builder (the native automation platform built into Slack) supports custom function steps written in Python or TypeScript using the Slack SDK. These steps can call external LLM APIs — generating text, classifying messages, extracting structured data from Slack messages or channel content. Workflow Builder triggers workflows automatically on Slack events: a new message posted, a reaction added, a shortcut used, a schedule firing.
When multiple events arrive simultaneously — a large message blast to a channel, a scheduled trigger firing for hundreds of users at once, or a reaction emoji added to a popular message getting notified to many subscribers — Workflow Builder starts multiple workflow executions concurrently. Each execution runs the custom AI step, which calls the same LLM API endpoint. The endpoint's per-minute rate limit applies across all concurrent calls in aggregate.
If 30 workflow executions all call the OpenAI API simultaneously and all hit the per-minute token rate limit, all 30 receive a 429 response. Workflow Builder retries failed steps. All 30 retry at the configured delay. At the retry moment, all 30 again call the API simultaneously and again hit the rate limit. The thundering herd pattern means each retry wave is as large as the original burst — N concurrent executions × R retries = N × (R + 1) total API calls, with N² / concurrent calls actually succeeding across all retry waves.
The thundering herd rule: Workflow Builder's concurrency is governed by Slack's platform, not your code. You cannot lower the concurrency limit from within the step function. The correct intervention is a circuit breaker webhook that all concurrent step executions check before calling the LLM — when the breaker is open, steps return a degraded result immediately rather than hitting the rate-limited endpoint and triggering another retry wave.
The circuit breaker for Workflow Builder AI steps is an external webhook endpoint that step functions call before invoking the LLM. When consecutive 429 responses exceed the threshold, the breaker opens and all step executions return a cached or empty result for the open window:
import time
import sqlite3
import threading
from flask import Flask, request, jsonify
app = Flask(__name__)
db_lock = threading.Lock()
DB_PATH = "slack_workflow_breaker.db"
BREAKER_THRESHOLD = 3 # consecutive 429s before opening
OPEN_DURATION = 90 # seconds the breaker stays open
HALF_OPEN_PROBE = 1 # allow 1 probe when transitioning from open to closed
def init_db():
with sqlite3.connect(DB_PATH) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS rate_limit_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ai_resource TEXT NOT NULL,
workflow_id TEXT,
http_status INTEGER,
recorded_at REAL
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS circuit_state (
ai_resource TEXT PRIMARY KEY,
state TEXT DEFAULT 'closed',
opened_at REAL,
consecutive_failures INTEGER DEFAULT 0
)
""")
class SlackWorkflowCircuitBreaker:
@staticmethod
def check(ai_resource: str) -> dict:
now = time.time()
with db_lock:
with sqlite3.connect(DB_PATH) as conn:
row = conn.execute(
"SELECT state, opened_at FROM circuit_state WHERE ai_resource = ?",
(ai_resource,)
).fetchone()
if row and row[0] == "open":
elapsed = now - row[1]
if elapsed < OPEN_DURATION:
retry_after = int(OPEN_DURATION - elapsed)
return {
"allow": False,
"state": "open",
"ai_resource": ai_resource,
"retry_after_seconds": retry_after,
"message": (
f"Circuit open for {ai_resource!r}: rate limit cascade detected. "
f"AI step blocked for {retry_after}s. "
"Return a degraded result to Slack — do not retry the LLM call. "
"The circuit closes automatically when the block window expires."
),
}
else:
# Transition to half-open: allow one probe
conn.execute(
"UPDATE circuit_state SET state = 'half-open' "
"WHERE ai_resource = ?",
(ai_resource,)
)
return {"allow": True, "state": row[0] if row else "closed", "ai_resource": ai_resource}
@staticmethod
def record_outcome(ai_resource: str, workflow_id: str, http_status: int) -> dict:
now = time.time()
with db_lock:
with sqlite3.connect(DB_PATH) as conn:
conn.execute(
"INSERT INTO rate_limit_events "
"(ai_resource, workflow_id, http_status, recorded_at) "
"VALUES (?, ?, ?, ?)",
(ai_resource, workflow_id, http_status, now)
)
conn.execute(
"INSERT OR IGNORE INTO circuit_state (ai_resource) VALUES (?)",
(ai_resource,)
)
if http_status == 429:
conn.execute(
"UPDATE circuit_state "
"SET consecutive_failures = consecutive_failures + 1 "
"WHERE ai_resource = ?",
(ai_resource,)
)
failures = conn.execute(
"SELECT consecutive_failures FROM circuit_state "
"WHERE ai_resource = ?",
(ai_resource,)
).fetchone()[0]
if failures >= BREAKER_THRESHOLD:
conn.execute(
"UPDATE circuit_state "
"SET state = 'open', opened_at = ? "
"WHERE ai_resource = ?",
(now, ai_resource)
)
return {
"action": "breaker_opened",
"ai_resource": ai_resource,
"consecutive_failures": failures,
"open_duration_seconds": OPEN_DURATION,
}
else:
# Success: reset consecutive counter and close if half-open
conn.execute(
"UPDATE circuit_state "
"SET consecutive_failures = 0, state = 'closed' "
"WHERE ai_resource = ?",
(ai_resource,)
)
return {"action": "recorded", "http_status": http_status}
@app.route("/circuit/check", methods=["POST"])
def circuit_check():
data = request.get_json(force=True)
result = SlackWorkflowCircuitBreaker.check(data.get("ai_resource", ""))
return jsonify(result), 200 if result["allow"] else 503
@app.route("/circuit/outcome", methods=["POST"])
def circuit_outcome():
data = request.get_json(force=True)
result = SlackWorkflowCircuitBreaker.record_outcome(
ai_resource=data.get("ai_resource", ""),
workflow_id=data.get("workflow_id", ""),
http_status=data.get("http_status", 0),
)
return jsonify(result), 200
if __name__ == "__main__":
init_db()
app.run(port=8086)
In the Workflow Builder step function, call /circuit/check before the LLM API call and call /circuit/outcome after it with the HTTP status code from the LLM provider. When the circuit is open, the step function returns a degraded result — an empty string, a cached previous result, or a message like "AI summary unavailable" — rather than retrying the LLM call. This prevents the retry wave from forming: when all 30 concurrent executions check the circuit and see it is open, none of them calls the LLM, and none of them generates a retry.
State Table
| Failure mode | Guard class | Ceiling / trigger | What to watch |
|---|---|---|---|
| Message event fan-out LLM calls scale with workspace message volume |
SlackEventGuard |
20 LLM calls / channel / hour | LLM calls per channel per hour; spike = new channel added to subscription scope |
| Bot self-loop Bot processes its own messages in a loop |
SlackBotLoopGuard |
5 bot posts / channel / 60s | Bot post rate per channel; loop = post rate rises monotonically between human interactions |
| Timeout retry duplication Slack retries unacknowledged events up to 3× |
SlackEventIdempotency |
1 LLM call per event_id |
Duplicate event_id count in processed_events table; non-zero = handler timing out |
| Workflow Builder thundering herd Concurrent workflow steps hit rate limit in waves |
SlackWorkflowCircuitBreaker |
3 consecutive 429s → open for 90s | consecutive_failures in circuit_state table; open state duration |
Checklist Before Going Live
- Measure event delivery volume before wiring an LLM call. Subscribe to
messageevents in a test workspace, log every event for 24 hours, and count the total delivery rate. Set yourSlackEventGuardceiling at 2× your expected intentional call rate — not at an arbitrary round number. - Add
is_self_message()as the first line of every event handler that posts messages. No exceptions. A missing subtype filter in a single handler can generate thousands of LLM calls within an hour of going live in an active workspace. - Return 200 within 200ms for every event, even events you intend to ignore. A handler that takes 100ms to decide it does not need to do anything still has 2,800ms of headroom before Slack's timeout. A handler that takes 4 seconds to call the LLM and then returns 200 has already caused Slack to retry.
- Validate that
event_iddeduplication is atomic. If you use a database for idempotency, the read-check-insert sequence must be inside a transaction or protected by a mutex. A race condition between two concurrent deliveries of the same event (Slack can deliver in parallel during retry windows) can pass both through the dedup check and fire two LLM calls. - Scope your event subscription to specific channels, not the whole workspace. The
channels:historyandgroups:historyscopes give access to all public and private channels. Narrow the subscription to specific channel IDs in your event handler's routing logic — or better, use per-channel subscriptions if your use case is channel-specific. - Wire the circuit breaker before the first external LLM call in every Workflow Builder step. The thundering herd scenario cannot form if concurrent executions check the circuit state before calling the LLM. A circuit check adds one HTTP call (50–100ms) per step execution — negligible compared to the LLM call cost it prevents when the breaker is open.
- Log
event_idfor every LLM call. When debugging cost spikes, the first question is always "was this one real event or a retry storm?" Theevent_idin your LLM call logs answers that question directly.
FAQ
Slack AI (the $10/user/month add-on) vs. building your own AI Slack app — which has worse cost failure modes?
Slack AI (Slack's native AI summaries and search) charges at the per-seat level — you pay for the seat regardless of how many summaries are generated. The cost is predictable but scales with headcount, not with usage. Building your own AI Slack app using the Events API introduces all four failure modes above because your code controls the LLM calls and Slack's delivery model is designed for fast handlers, not LLM latency. For most teams, Slack AI is cheaper for broad workspace summarization use cases; custom AI apps are better for specific, targeted workflows where you control the trigger conditions precisely.
Does Slack's new Workflow Builder platform have its own rate limits that would catch the thundering herd scenario before it hits my LLM provider?
Slack's platform has execution concurrency limits that vary by workspace plan — Enterprise Grid workspaces have higher concurrency allowances than Pro or Business+ workspaces. However, these limits are set at the platform level and are not exposed as configurable settings in your step function. They also do not align with your LLM provider's rate limits, which are typically per-minute token or request ceilings rather than concurrency ceilings. The circuit breaker guard is necessary because the scenario where 30 concurrent executions all call the LLM in the same second is within Slack's platform concurrency limits even though it exceeds most LLM providers' per-minute request ceilings.
We use Slack's Socket Mode instead of the Events API HTTP endpoint — do the same timeout rules apply?
Socket Mode uses a WebSocket connection instead of HTTP delivery, which removes the 3-second HTTP acknowledgment timeout. Slack's Socket Mode protocol still requires your handler to send an envelope_id acknowledgment promptly (within the same WebSocket keep-alive window), but the retry behavior on slow acknowledgment is different from the Events API's HTTP retry model. The event fan-out failure mode and the bot self-loop failure mode apply identically regardless of transport. The idempotency pattern still matters because Slack can re-deliver events after WebSocket reconnections — always deduplicate on event_id.
How do I determine whether my bot's LLM call volume is from real user interactions or from one of these failure modes?
Log the Slack event_id, channel_id, user, and subtype fields alongside every LLM API call your bot makes. A cost spike from event fan-out shows up as a sharp increase in unique channel_id values with no corresponding increase in unique user values. A bot self-loop shows up as repeated calls with the same channel_id and a null or bot user value. A retry duplication spike shows up as repeated event_id values in the log. The event_id field is the single most useful diagnostic data point — without it, all four failure modes look the same in an LLM API cost dashboard.
Can I integrate RunGuard's SDK directly into a Slack bot instead of building these guards from scratch?
Yes. RunGuard's Python SDK's LoopDetector covers the bot self-loop and thundering herd scenarios directly — the guard() wrapper trips on repeated identical tool-call patterns, which maps to a bot's repeated self-response loop and to concurrent executions all calling the same LLM endpoint with the same prompt. The BudgetTracker covers the event fan-out scenario by capping total LLM calls per session. The idempotency pattern requires your own database dedup table since it depends on Slack's event_id, which is external state the SDK cannot observe. Install with pip install runguard and wire guard(fn=call_llm, opts={"budget": {"maxCalls": 20}}) around your LLM call site inside the event handler.