June 21, 2026 LiveKit Real-Time AI Cost Control Loop Detection

LiveKit Agents Cost Control: Room Reconnection Loops, VAD Misfire Storms, Job Queue Amplification, and Track Subscription Cascades

LiveKit has become the dominant open-source infrastructure layer for real-time, multimodal AI agents in 2025–2026. Teams building real-time voice assistants, live video analysis pipelines, telehealth bots, customer service voice agents, and AI-enhanced video conferencing products use LiveKit to orchestrate the WebRTC media layer underneath their agents. The LiveKit Agents SDK — a Python framework built on top of the LiveKit Python SDK — provides the VoicePipelineAgent, MultimodalAgent, and AgentSession abstractions that connect WebRTC audio/video tracks to Speech-to-Text, Large Language Model, and Text-to-Speech providers.

LiveKit Cloud charges per SFU (Selective Forwarding Unit) minute — every minute a room session is active costs platform time regardless of whether media is actively flowing. Self-hosted LiveKit deployments shift platform costs to compute and egress bandwidth, but the STT, LLM, and TTS provider costs remain metered per call. The combination creates a cost profile with an important property: any loop or retry pattern that extends the duration of an active room session, or that triggers duplicate audio pipeline starts, multiplies cost across every provider simultaneously. Four structural failure modes in the LiveKit Agents architecture account for the majority of unexpected billing spikes:

Room reconnection loops — the agent's on_disconnect handler triggers a re-connect before the previous room session has fully closed, spinning up a fresh audio pipeline and STT session before the prior one is torn down. Each reconnect extends the SFU minute clock and starts a new billed inference session with the STT and LLM providers.
VAD background-noise misfire storms — Silero VAD or WebRTC VAD fires on ambient noise, keyboard clicks, microphone handling artefacts, or network jitter spikes, routing non-speech audio frames through the full STT-LLM-TTS pipeline. Each misfire that produces an empty or incoherent transcript causes the LLM to synthesize a filler response ("Sorry, could you repeat that?"), TTS to synthesize and play audio, and the pipeline to reset — all while incrementing the SFU minute counter.
Job dispatch queue amplification — LiveKit's worker pool assigns inbound call jobs to available workers. If the worker fails to signal readiness before the dispatch timeout expires, the room coordinator requeues the job and assigns it to a second worker. Two workers now hold references to the same room. When the original worker eventually signals readiness, both workers attempt to join the room and subscribe to the participant track, producing duplicate STT sessions and LLM context threads for a single user call.
Track subscription cascades — a participant's audio or video track publish/unpublish cycle — caused by network instability, browser tab switching, or mobile app backgrounding — triggers the agent's on_track_subscribed handler each time the track is re-published. The default handler starts a new STT session and initializes a fresh LLM context, discarding in-flight conversation state. On a flaky connection that publish/unpublishes a track eight times in a minute, eight separate STT sessions and LLM context initializations accumulate billing before any of them produce a useful exchange.

Billing model in detail

Understanding why these four patterns are expensive requires understanding how LiveKit Cloud billing accumulates. The SFU minute fee applies from the moment a room is created until its last participant leaves or it is explicitly closed via the API — regardless of whether media is flowing. A room created for a user who never connects still accrues SFU minutes if the agent is connected. A reconnection loop that generates 10 join/disconnect cycles in 60 seconds does not create 10 separate SFU billing events; the room stays active the entire time, but each reconnect adds a new participant session (the agent itself) which may be billed as a participant minute in addition to the SFU room minute, depending on plan tier.

STT providers (Deepgram, AssemblyAI, Google STT) bill per audio-second of submitted audio, or per transcription request depending on the provider. LiveKit's VoicePipelineAgent submits audio frames to STT as a streaming session; a VAD misfire that submits 3 seconds of noise audio costs 3 seconds of STT processing. LLM providers (OpenAI, Anthropic, Google) bill per token. Each spurious STT transcription result — even an empty one — triggers an LLM call to evaluate whether to respond; the LLM call consumes input tokens for the full conversation context accumulated so far, even if the output is one short filler sentence. TTS providers (ElevenLabs, Cartesia, OpenAI TTS) bill per character of text synthesized. A VAD misfire chain that triggers 20 "Sorry, could you repeat that?" responses synthesizes 20 × ~30 characters of TTS — small per event, but on a single call with persistent background noise, this accumulates.

Self-hosted LiveKit deployments replace SFU minute fees with egress bandwidth and compute costs, but the STT, LLM, and TTS provider fees are identical. For teams running LiveKit on a single compute instance, a job dispatch amplification event that produces two simultaneous workers for one call doubles the compute load on the instance and may cause both workers to fail under resource contention — triggering further retries.

Failure mode 1: Room reconnection loop

LiveKit's Python SDK provides lifecycle callbacks — on_disconnected, on_reconnecting, on_reconnected — that fire when the agent's WebRTC connection to the server drops. The standard reconnection pattern in LiveKit Agents applications is to call room.connect() from within on_disconnected after a delay. The failure mode appears when the delay is too short, the disconnection trigger persists (e.g., a flapping network interface, a server-side room expiry that fires before the agent closes its subscription, or a Kubernetes pod eviction mid-session), or the reconnection itself raises an exception that the handler catches and retries in a tight loop.

A reconnect loop that fires every 2–5 seconds keeps a room active indefinitely from a billing standpoint. Worse, each successful reconnect restarts the VoicePipelineAgent's internal state: a new STT streaming session opens, the LLM context resets to the system prompt, and TTS output from the previous session may still be playing while the new session begins processing input. The user hears audio artifacts; the agent loses conversation continuity; and the operator pays for two concurrent inference sessions for the overlap window.

The signature: room_reconnect:<room_name> appearing more than 3 times in a 60-second window in the same agent process. A one-off network blip causes at most 1–2 reconnects. Three or more identical reconnect events in a window means the root cause persists and the loop will continue until the agent is restarted or the cause is resolved.

Python

import asyncio
import time
from collections import deque
from dataclasses import dataclass, field
from runguard import LoopDetector, LoopDetectedError

@dataclass
class RoomReconnectGuard:
    room_name: str
    max_reconnects_per_window: int = 3
    window_seconds: int = 60
    _timestamps: deque = field(default_factory=deque)
    _detector: LoopDetector = field(
        default_factory=lambda: LoopDetector(repeats=3, max_cycle_len=1)
    )

    def check(self) -> None:
        now = time.monotonic()
        cutoff = now - self.window_seconds
        while self._timestamps and self._timestamps[0] < cutoff:
            self._timestamps.popleft()
        self._timestamps.append(now)

        sig = f"room_reconnect:{self.room_name}"
        match = self._detector.push(sig)
        if match:
            raise LoopDetectedError(
                f"Reconnection loop detected for room '{self.room_name}': "
                f"{len(self._timestamps)} reconnects in {self.window_seconds}s. "
                f"Stopping reconnection to prevent further SFU billing."
            )

# Usage in agent entrypoint:
reconnect_guard = RoomReconnectGuard(room_name=room.name)

async def on_disconnected():
    try:
        reconnect_guard.check()
    except LoopDetectedError as e:
        logger.error("Reconnect loop: %s", e)
        # Close cleanly rather than looping
        await room.disconnect()
        return
    await asyncio.sleep(2.0)
    await room.connect(url, token)

The guard tracks both a sliding-window count (for rate limiting) and a LoopDetector signal (for cycle detection). The LoopDetector catches a single repeated signature three times in a row — the canonical pattern of a stuck reconnection handler that keeps triggering on the same underlying fault. The sliding window count catches the case where two distinct faults trade off (disconnecting for alternating reasons) but both keep the agent stuck in a reconnection pattern. Either condition triggers the guard. When the guard fires, the agent disconnects cleanly rather than looping: the room billing clock stops, no new inference sessions open, and the failure is surfaced in logs for operator investigation.

Failure mode 2: VAD background-noise misfire storm

LiveKit's VoicePipelineAgent uses a Voice Activity Detection module — typically Silero VAD running in-process, or WebRTC's built-in VAD for lower latency — to decide when to route audio frames to the STT provider. VAD operates on short audio windows (typically 10–30ms frames) and emits a speech probability score; frames above a threshold are classified as speech and forwarded. The threshold default is tuned for clear microphone input in a quiet environment. Production scenarios that violate that assumption produce persistent misfire patterns:

Open-office background noise — keyboard clicks, HVAC, passing conversations, and notification sounds produce audio signals with speech-like spectral characteristics that consistently score above the VAD threshold. In a noisy open office, Silero VAD with default settings may classify 20–40% of frames as speech.
Bluetooth microphone latency artefacts — Bluetooth audio codecs introduce packet loss that manifests as short bursts of distorted audio, which VAD may classify as consonant-heavy speech (s, t, k sounds have similar frequency profiles to packet-loss artifacts).
Echo without AEC — when Acoustic Echo Cancellation is not configured or malfunctions, the agent's own TTS output is captured by the user's microphone and returned to the agent. The agent hears itself speaking, VAD classifies its own voice as the user's speech, STT transcribes the agent's output, and the LLM processes the agent's own words as user input — a self-reinforcing loop where every agent utterance triggers another.
Mobile app backgrounding — when a mobile browser or native app is backgrounded, the WebRTC audio track may emit silence frames with occasional noise bursts as the device's microphone pipeline remains active. These bursts can score above VAD threshold intermittently.

The cost amplification is clearest in the echo-without-AEC case: the agent speaks, the user's microphone returns the agent's voice, VAD triggers, STT transcribes the agent's own sentence, the LLM receives its own prior output as new user input, generates a response (often confused or hallucinated), TTS synthesizes it, the cycle repeats. Within 30 seconds a call can generate 10–20 complete STT-LLM-TTS pipeline cycles, each incrementing all three provider billing meters.

The signature: vad_misfire:empty_transcript or vad_misfire:low_confidence appearing in bursts. Any single call that triggers more than 5 consecutive empty-transcript STT results — where the STT provider returns an empty string or a result below a confidence threshold — is exhibiting a VAD misfire pattern, not normal silence handling.

Python

from dataclasses import dataclass, field
from runguard import LoopDetector, LoopDetectedError

@dataclass
class VADMisfireGuard:
    consecutive_empty_limit: int = 5
    min_confidence: float = 0.6
    _consecutive_empty: int = 0
    _detector: LoopDetector = field(
        default_factory=lambda: LoopDetector(repeats=5, max_cycle_len=1)
    )

    def on_stt_result(self, transcript: str, confidence: float) -> bool:
        """Returns True if the result is genuine speech; False if it should be suppressed."""
        is_empty = not transcript.strip()
        is_low_confidence = confidence < self.min_confidence

        if is_empty or is_low_confidence:
            sig = "vad_misfire:empty_transcript" if is_empty else "vad_misfire:low_confidence"
            match = self._detector.push(sig)
            if match:
                raise LoopDetectedError(
                    f"VAD misfire loop: {self._consecutive_empty + 1} consecutive "
                    f"empty/low-confidence STT results. Suppressing pipeline until "
                    f"genuine speech detected. Check for background noise, AEC config, "
                    f"or echo path."
                )
            self._consecutive_empty += 1
            return False

        # Genuine speech resets the counter
        self._consecutive_empty = 0
        self._detector.reset()
        return True

    def reset(self) -> None:
        self._consecutive_empty = 0
        self._detector.reset()

# Hook into VoicePipelineAgent's STT result stream:
vad_guard = VADMisfireGuard(consecutive_empty_limit=5, min_confidence=0.55)

async def stt_result_handler(result):
    try:
        if not vad_guard.on_stt_result(result.alternatives[0].text,
                                        result.alternatives[0].confidence):
            return  # Suppress forwarding to LLM
    except LoopDetectedError as e:
        logger.error("VAD misfire loop: %s", e)
        # Raise an alert, increase VAD threshold, or close the session
        await agent.say("I'm having trouble hearing you clearly. Please try reconnecting.")
        raise
    # Forward genuine result to LLM pipeline
    await agent.pipeline.process_stt_result(result)

The guard intercepts STT results before they reach the LLM. Empty transcripts and low-confidence results are suppressed; five consecutive suppressions trip the breaker and surface the failure for remediation. The reset on genuine speech ensures the guard does not penalize normal silence periods (a user thinking for several seconds before speaking will not trigger the guard, because genuine silence does not produce STT results at all — the VAD has simply not fired).

Failure mode 3: Job dispatch queue amplification

LiveKit Agents use a worker pool model: your server runs one or more Worker processes, each of which registers with LiveKit's room coordinator and receives job dispatch messages when a new inbound call needs an agent. The coordinator assigns the job to one available worker; that worker joins the room and starts the agent session. The failure mode occurs in the handshake between coordinator and worker:

The coordinator sends a job dispatch message and starts a timeout clock — typically 10–30 seconds, configurable via WorkerOptions.job_accept_timeout_ms. If the worker does not signal job acceptance before the timeout fires, the coordinator marks the job as unaccepted and requeues it to the next available worker. The original worker, which may still be initializing (loading a model into GPU memory, establishing STT provider authentication, fetching user context from a database), eventually signals acceptance — but by then the second worker has already joined the room. Both workers now call room.connect() with valid tokens. The LiveKit server accepts both connections (it cannot distinguish between a valid agent and an errant duplicate of the same agent without application-level coordination). The user's audio track is subscribed by both workers, and both begin processing the user's first utterance independently.

The signature: Two workers producing independent LLM responses to the same user utterance, both TTS outputs playing simultaneously. The user hears two voices. The operator is billed for two complete inference cycles per user utterance for the duration of the session — including all SFU minutes, STT audio-seconds, LLM tokens, and TTS characters.

Python

import time
import asyncio
from dataclasses import dataclass
from typing import Optional

@dataclass
class JobDispatchGuard:
    room_name: str
    worker_id: str
    _registry: dict  # Shared across workers — use Redis or a database in production

    async def claim(self, timeout_seconds: float = 5.0) -> bool:
        """
        Returns True if this worker successfully claimed the job.
        Returns False if another worker already holds the claim.
        """
        key = f"job_claim:{self.room_name}"
        claim_value = f"{self.worker_id}:{time.time()}"

        # SET NX (set if not exists) — atomic claim
        claimed = await self._registry.set(
            key, claim_value,
            nx=True,  # Only set if key does not exist
            ex=int(timeout_seconds * 2)  # Claim TTL = 2× timeout as safety margin
        )

        if not claimed:
            existing = await self._registry.get(key)
            if existing:
                existing_worker, _ = existing.decode().split(":", 1)
                if existing_worker != self.worker_id:
                    return False  # Another worker holds the claim

        return True

    async def release(self) -> None:
        key = f"job_claim:{self.room_name}"
        await self._registry.delete(key)

# In the worker entrypoint:
async def entrypoint(ctx: JobContext):
    guard = JobDispatchGuard(
        room_name=ctx.room.name,
        worker_id=os.environ["WORKER_ID"],
        _registry=redis_client,
    )

    if not await guard.claim():
        logger.warning(
            "Job for room %s already claimed by another worker. "
            "Exiting to prevent duplicate agent session.",
            ctx.room.name
        )
        return  # Exit without connecting to the room

    try:
        await ctx.connect()
        agent = VoicePipelineAgent(
            vad=silero.VAD.load(),
            stt=deepgram.STT(),
            llm=openai.LLM(model="gpt-4o-mini"),
            tts=cartesia.TTS(),
        )
        agent.start(ctx.room)
        await ctx.wait_for_disconnect()
    finally:
        await guard.release()

The guard uses a distributed atomic SET NX operation to ensure only one worker can claim a given room job at a time. A worker that loses the race exits cleanly without connecting to the room. The claim TTL (set to twice the typical job setup time) ensures the claim is automatically released if the claiming worker crashes during initialization without ever calling release(). For teams not using Redis, the same pattern works with PostgreSQL advisory locks or DynamoDB conditional writes.

The root cause fix — raising WorkerOptions.job_accept_timeout_ms to accommodate your worst-case initialization time — is the right long-term solution. The guard is the defense layer for the interim period and for edge cases where initialization time spikes unexpectedly under load.

Failure mode 4: Track subscription cascade

LiveKit's VoicePipelineAgent subscribes to participant audio tracks via the on_track_subscribed callback. The standard implementation starts a new STT streaming session and attaches it to the subscribed track when this callback fires. The failure mode occurs when a participant's track is published, unpublished, and re-published in rapid succession — a common pattern on mobile connections, browser page visibility changes, or during network handoffs between WiFi and cellular.

Each publish/unpublish cycle fires on_track_unsubscribed followed by on_track_subscribed when the track is re-published. If the on_track_subscribed handler starts a new STT session without checking whether a session for this participant already exists and is still usable, the handler accumulates multiple open STT sessions for the same participant. Deepgram and AssemblyAI streaming sessions have an idle timeout, but they remain billed from open to close. If a participant's connection flaps 8 times in 60 seconds, the agent may have 8 open STT streaming sessions, all receiving audio (the last subscribed track is live) but only the most recently opened session producing usable results.

The LLM context accumulation amplifies this: if the on_track_subscribed handler reinitializes the VoicePipelineAgent or resets its context, the conversation history is discarded. The user's first several sentences must be repeated. If the handler does not reinitialize the context, the LLM may receive transcriptions from multiple STT sessions interleaved — producing incoherent multi-stream input that the LLM interprets as a confused user.

The signature: track_subscribe:<participant_sid> firing more than 2 times in a 30-second window. A clean mobile handoff produces at most 1 unsubscribe/resubscribe cycle. Three or more in 30 seconds indicates a flapping connection; the agent should freeze new STT session creation and reuse the existing session context for the next successful subscription.

Python

import time
from collections import defaultdict
from dataclasses import dataclass, field
from runguard import LoopDetector, LoopDetectedError

@dataclass
class TrackSubscriptionGuard:
    window_seconds: int = 30
    _subscribe_times: dict = field(default_factory=lambda: defaultdict(list))
    _detectors: dict = field(default_factory=dict)

    def _get_detector(self, participant_sid: str) -> LoopDetector:
        if participant_sid not in self._detectors:
            self._detectors[participant_sid] = LoopDetector(
                repeats=3, max_cycle_len=1
            )
        return self._detectors[participant_sid]

    def on_subscribe(self, participant_sid: str) -> bool:
        """
        Returns True if a new STT session should start.
        Returns False if an existing session should be reused.
        Raises LoopDetectedError if the subscription rate exceeds the threshold.
        """
        now = time.monotonic()
        cutoff = now - self.window_seconds
        times = self._subscribe_times[participant_sid]
        self._subscribe_times[participant_sid] = [t for t in times if t >= cutoff]
        self._subscribe_times[participant_sid].append(now)

        count = len(self._subscribe_times[participant_sid])
        if count == 1:
            # First subscription — start new STT session
            return True

        sig = f"track_subscribe:{participant_sid}"
        detector = self._get_detector(participant_sid)
        match = detector.push(sig)
        if match:
            raise LoopDetectedError(
                f"Track subscription cascade for participant {participant_sid}: "
                f"{count} subscriptions in {self.window_seconds}s. "
                f"Reusing existing STT session; not starting new one."
            )

        # Second subscription in window — reuse existing session rather than starting new
        return count <= 2

    def on_unsubscribe(self, participant_sid: str) -> None:
        pass  # Don't reset the detector on unsubscribe — that would mask the loop

track_guard = TrackSubscriptionGuard(window_seconds=30)

async def on_track_subscribed(track, publication, participant):
    try:
        should_start_new_session = track_guard.on_subscribe(participant.sid)
    except LoopDetectedError as e:
        logger.warning("Track cascade: %s", e)
        # Reuse existing session — do not start a new STT session
        should_start_new_session = False

    if should_start_new_session:
        stt_session = stt_provider.stream()
        active_sessions[participant.sid] = stt_session
        asyncio.create_task(stt_session.attach(track))
    else:
        # Reattach the existing session to the new track
        existing = active_sessions.get(participant.sid)
        if existing:
            asyncio.create_task(existing.reattach(track))

The guard applies at the per-participant level: each participant's subscription events are tracked independently. A participant whose connection is stable will subscribe once and stay subscribed; the guard produces no overhead beyond the first check. A participant whose connection flaps three times in 30 seconds triggers the guard, and subsequent subscription events reuse the existing STT session rather than creating a new one. The conversation context remains intact across the reconnection cycles, and the user can continue speaking without losing the conversation history they've accumulated.

Combining the guards: LiveKit agent entrypoint

In a production LiveKit Agents deployment, all four guards can be applied together in the worker entrypoint and the VoicePipelineAgent configuration. The guards are independent — each addresses a distinct failure surface — but they share the property that they fail closed: when a guard trips, it prevents further resource consumption rather than allowing it to continue.

Python

async def entrypoint(ctx: JobContext):
    # Guard 1: prevent duplicate workers for the same room
    dispatch_guard = JobDispatchGuard(
        room_name=ctx.room.name,
        worker_id=WORKER_ID,
        _registry=redis_client,
    )
    if not await dispatch_guard.claim():
        logger.warning("Room %s already claimed. Exiting.", ctx.room.name)
        return

    # Guard 2: prevent reconnection loops
    reconnect_guard = RoomReconnectGuard(room_name=ctx.room.name)

    # Guard 3: prevent VAD misfire storms
    vad_guard = VADMisfireGuard(consecutive_empty_limit=5, min_confidence=0.55)

    # Guard 4: prevent track subscription cascades
    track_guard = TrackSubscriptionGuard(window_seconds=30)

    async def on_disconnected():
        try:
            reconnect_guard.check()
            await asyncio.sleep(2.0)
            await ctx.room.connect(LIVEKIT_URL, ctx.token)
        except LoopDetectedError as e:
            logger.error("Reconnect loop: %s", e)
            # Clean exit — no further reconnection attempt

    ctx.room.on("disconnected", on_disconnected)

    try:
        await ctx.connect()
        agent = VoicePipelineAgent(
            vad=silero.VAD.load(),
            stt=stt_provider,
            llm=llm_provider,
            tts=tts_provider,
        )
        agent.start(ctx.room)
        await ctx.wait_for_disconnect()
    finally:
        await dispatch_guard.release()

Summary of failure modes and guards

Failure mode	Primary cost driver	Guard	Trip condition
Room reconnection loop `on_disconnected` re-fires before session closes	SFU minutes, STT session restarts, LLM context resets	`RoomReconnectGuard`	3 reconnects with identical room_name signature
VAD misfire storm Background noise, AEC failure, echo path	STT audio-seconds, LLM tokens, TTS characters per misfire	`VADMisfireGuard`	5 consecutive empty or low-confidence STT results
Job dispatch queue amplification Worker timeout → coordinator requeue → duplicate worker	2× all provider costs per user utterance	`JobDispatchGuard`	Atomic SET NX on room claim; loser exits immediately
Track subscription cascade Flapping connection → repeated on_track_subscribed	Multiple open STT sessions, LLM context loss	`TrackSubscriptionGuard`	3 subscribe events per participant in 30-second window

Frequently asked questions

Does LiveKit's built-in reconnection logic not already handle reconnection loops?

LiveKit's SDK has automatic reconnection with exponential backoff for transient network drops. The loop pattern described here occurs when application-level on_disconnected handlers — not the SDK's built-in reconnection — trigger redundant reconnection attempts on top of the SDK's own retry. The guard should wrap your application-level handler, not the SDK's internal logic. If you are relying solely on the SDK's built-in reconnection and are seeing loops, it likely indicates the room itself has expired or been closed server-side and reconnection will always fail — the guard signals this to stop the attempt cycle.

Can I use LiveKit's built-in VAD threshold tuning instead of an application-level guard?

VAD threshold tuning reduces the misfire rate but cannot eliminate it. Production audio environments are diverse — a threshold that eliminates false positives in a quiet office will suppress legitimate soft-spoken utterances in a noisy environment, and vice versa. The application-level guard acts as a circuit breaker that fires after a sustained pattern of misfires rather than trying to tune a single threshold to fit all environments. Combine both: tune the VAD threshold for your primary user demographic and use the guard as a hard ceiling on consecutive misfires regardless of environment.

The JobDispatchGuard uses Redis — do I need Redis to protect against duplicate workers?

Redis is the cleanest implementation for distributed atomic claims. Alternatives: PostgreSQL advisory locks (pg_try_advisory_lock(hashtext(room_name))), DynamoDB conditional writes (ConditionExpression="attribute_not_exists(room_name)"), or a single-process coordination model where all workers run in one process and use an asyncio.Lock. The single-process model is simplest for low-volume deployments but does not protect against machine-level failures where a worker process crashes mid-session. Redis with a TTL is the production-grade option because it handles worker crashes, network partitions, and multi-machine deployments uniformly.

How do I detect that a track subscription cascade is happening in practice, before guarding against it?

Log every on_track_subscribed and on_track_unsubscribed call with timestamp and participant SID. If you see participant SIDs appearing in on_track_subscribed more than twice within 30 seconds in your logs, you have a cascade pattern. LiveKit's server-side room service API also exposes participant history — a participant who has joined and left the room more than twice in a short window is exhibiting the same underlying pattern. STT provider billing dashboards are the other signal: if your per-session streaming session count significantly exceeds your expected concurrent user count, open STT sessions are accumulating without being closed.

These guard patterns work for VoicePipelineAgent — do they apply to MultimodalAgent and AgentSession too?

The room reconnection guard and job dispatch guard apply identically to all three agent abstractions — they operate at the session and worker level, independent of which agent class you use. The VAD misfire guard applies to VoicePipelineAgent and any custom agent that uses a VAD + STT pipeline; MultimodalAgent using GPT-4o's native audio endpoint bypasses the STT layer and therefore does not have the same VAD-triggered STT billing, but may have analogous patterns around audio input buffering and model-side turn detection. The track subscription guard applies whenever your on_track_subscribed handler starts a stateful inference session — it is independent of which agent class manages the session.

Stop paying for LiveKit loops before they compound

RunGuard wraps your LiveKit Agents worker with loop detection that trips the breaker at the first sign of reconnection cycling, VAD storms, duplicate job dispatch, or track cascade — before a 60-second bad session turns into a 20-minute billed incident. One import, four guards.

Join the waitlist Try the cost estimator