Dust.tt Cost Control: Data Source Fan-Out, Retrieval Loops, and Chain Assistant Cascades

Dust.tt is an enterprise AI assistant platform where teams build and deploy AI assistants connected to internal knowledge bases — Notion, Slack, Google Drive, Confluence, GitHub, Intercom, and custom data sources. The value proposition is compelling: instead of copy-pasting context into a chat window, an assistant configured with your company's data sources can answer questions about internal processes, past decisions, and live project state by retrieving the right context automatically. Every user message triggers a retrieval pass across the configured data sources before the LLM sees the query. That retrieval-first architecture is what makes Dust.tt powerful — and it is the foundation of every unexpected cost pattern Dust.tt deployments encounter at scale.

Dust.tt sits on top of the model provider you choose: GPT-4o, Claude Sonnet, or Gemini Pro through Dust's model abstraction. LLM costs flow directly from Dust to your Anthropic, OpenAI, or Google API account (or are bundled into Dust's enterprise tier). Beyond the per-token model cost, every Dust query also pays for: embedding model calls to vectorize the user's query, similarity search across indexed document chunks, and context construction to inject retrieved chunks into the model's prompt window. These retrieval costs are typically small per query — but they multiply with the number of configured data sources and compound when retrieval quality is poor enough to trigger reformulation.

The four patterns that account for the majority of unexpected Dust.tt costs at the workspace level:

  • Data source fan-out over-retrieval — an assistant configured with many data sources performs a retrieval pass across all of them for every user query, regardless of whether each source is relevant; a "company knowledge" assistant with 20 connected data sources pays for 20 retrieval passes on a question that only involves one of those sources.
  • Retrieval reformulation loop — when the user's question doesn't match indexed content, the assistant may reformulate its search query and retry; topics absent from the corpus trigger multiple reformulation round-trips, each paying embedding plus similarity search, before the assistant acknowledges the knowledge gap.
  • Chain assistant cascade — Dust supports invoking other assistants via @assistant-name references; a workflow where assistant A invokes B invokes C creates a cost tree where any loop at any node propagates to all waiting ancestors; there is no native recursion guard.
  • Multi-action cascade amplification — assistants configured with multiple action types (data source search, web search, Dust App execution) chain actions sequentially when earlier results are insufficient; an under-constrained assistant may execute 5–8 actions to answer a question that a well-scoped retrieval would resolve in one.

Dust.tt's cost model

In Dust.tt's enterprise tier, costs are bundled into a per-seat platform subscription. In workspace plans where you bring your own API keys, every assistant interaction breaks down into three billable layers:

  • Retrieval layer: each configured data source performs an embedding model call to vectorize the user's query (or the assistant's reformulated query), followed by an ANN (approximate nearest-neighbor) similarity search against the indexed document chunks. Embedding calls for text-embedding-3-small (OpenAI's standard embedding model) run at roughly $0.02 per million tokens — individually negligible, but 20 data sources × 1,000 daily queries × a 50-token average query = 1 million tokens in embedding calls per day.
  • Context construction: retrieved chunks from all data sources are ranked and injected into the model's context window before the LLM call. More data sources and broader search queries produce larger injected contexts. An assistant with 10 data sources that retrieves 5 chunks per source injects 50 chunks — potentially 15,000–25,000 tokens — into every LLM call, inflating per-query model costs significantly compared to a focused single-source retrieval.
  • LLM generation: the model receives the full constructed context (instructions + retrieved chunks + conversation history + user message) and generates a response. The input token cost scales linearly with the injected context size. A 20,000-token input on GPT-4o at $0.005/1K input tokens costs $0.10 per query — for a model call that might have cost $0.01 with a targeted 2,000-token context.

A practical cost baseline: a Dust.tt assistant with 5 data sources answering a well-targeted query costs roughly $0.03–$0.06 per interaction (retrieval + a moderate context LLM call). The same assistant with 20 data sources answering a broad question that retrieves 10 chunks per source costs $0.15–$0.40 per interaction — a 5–10× difference for the same user-facing value. At 500 daily workspace queries, that difference is $22/day versus $60–$200/day in underlying API costs.

Failure mode 1: data source fan-out over-retrieval

Dust.tt's retrieval model queries all configured data sources in parallel for every user message. The design goal is comprehensive: the assistant shouldn't miss relevant context because the user didn't specify which data source to check. In practice, most questions have a natural home in one or two data sources. A question about vacation policy is in the HR Notion workspace. A question about a specific customer issue is in Intercom or the CRM Slack channel. The assistant doesn't know this routing in advance — it queries all sources and lets the ranking model sort out relevance.

The fan-out cost scales linearly with the number of configured sources. Each additional data source adds an embedding call, a similarity search, and potentially more injected chunks to the LLM context, regardless of relevance. An assistant configured with "everything" — 15–20 data sources covering every team's Notion space, every Slack channel, Google Drive, GitHub, and a custom knowledge base — is paying for 15–20 retrieval passes per query, most of which return low-relevance chunks that inflate context without adding information value.

The fix is assistant scoping: separate assistants for separate knowledge domains rather than one assistant with all data sources. But for teams that want a general-purpose assistant, a query routing pre-check can classify the question and select a targeted subset of data sources before retrieval.

Python — data source selector to avoid fan-out over-retrieval
import anthropic
from runguard import BudgetTracker, BudgetExceededError

# Map domain labels to Dust data source IDs
DATA_SOURCE_MAP = {
    "hr_policy":     ["notion-hr-workspace", "confluence-hr"],
    "engineering":   ["github-repos", "notion-eng", "slack-engineering"],
    "customer":      ["intercom", "slack-customer-success", "notion-cs-playbooks"],
    "finance":       ["notion-finance", "google-drive-finance"],
    "general":       ["notion-company-wiki"],
}

def classify_query_domain(
    query: str,
    model: str = "claude-haiku-4-5-20251001",
) -> list[str]:
    """
    Uses a fast model to classify which data source domains are relevant
    to the query, returning a targeted list of Dust data source IDs.
    Much cheaper than fanning out across all sources.
    """
    client = anthropic.Anthropic()
    domain_list = ", ".join(DATA_SOURCE_MAP.keys())
    msg = client.messages.create(
        model=model,
        max_tokens=64,
        messages=[{
            "role": "user",
            "content": (
                f"Classify which knowledge domains are relevant to this query. "
                f"Domains: {domain_list}. "
                f"Reply with only a comma-separated list of relevant domain names. "
                f"Query: {query}"
            ),
        }],
    )
    raw = msg.content[0].text.strip().lower()
    selected = [d.strip() for d in raw.split(",") if d.strip() in DATA_SOURCE_MAP]
    if not selected:
        selected = ["general"]

    # Flatten to data source IDs, deduplicate
    source_ids: list[str] = []
    seen: set[str] = set()
    for domain in selected:
        for src in DATA_SOURCE_MAP[domain]:
            if src not in seen:
                source_ids.append(src)
                seen.add(src)
    return source_ids


def dust_query_with_scoped_sources(
    workspace_id: str,
    query: str,
    all_source_ids: list[str],
    session_budget_usd: float = 2.0,
    max_sources: int = 6,
) -> dict:
    """
    Classifies the query domain, selects relevant sources (up to max_sources),
    and calls the Dust API with the scoped source list.
    Falls back to all sources only if classification returns no match.
    """
    budget = BudgetTracker(cap=session_budget_usd)

    # Classification call: ~$0.0001 on Haiku — cheap insurance
    try:
        budget.add(0.0002)
    except BudgetExceededError:
        raise RuntimeError("Session budget already exhausted before query.")

    selected_sources = classify_query_domain(query)
    if len(selected_sources) > max_sources:
        selected_sources = selected_sources[:max_sources]

    # Estimate cost: $0.005 per source (embedding + chunk injection overhead)
    estimated_retrieval_cost = len(selected_sources) * 0.005
    try:
        budget.add(estimated_retrieval_cost)
    except BudgetExceededError as e:
        raise RuntimeError(
            f"Query blocked: estimated retrieval cost ${estimated_retrieval_cost:.3f} "
            f"would exceed session budget. Reduce max_sources or increase budget."
        ) from e

    # Build Dust API payload with scoped data sources
    import urllib.request, json
    payload = {
        "query": query,
        "topK": 5,
        "fullText": False,
        "dataSources": [
            {"workspaceId": workspace_id, "dataSourceId": src}
            for src in selected_sources
        ],
    }
    return {"selected_sources": selected_sources, "payload": payload}

The classification call uses Claude Haiku 4.5 at sub-cent costs to route the query to 2–4 relevant data sources instead of fanning out across 15–20. For a workspace with 20 data sources and 500 daily queries, this pre-routing eliminates 80–90% of retrieval fan-out. The max_sources=6 cap prevents a broad query like "summarize everything about our Q3 strategy" from routing to all sources simultaneously — it forces prioritization.

The data source map is the key artifact to maintain. It should reflect how your team actually organizes knowledge, not an aspirational taxonomy. If engineers actually use Slack more than Notion for technical decisions, the engineering domain should weight slack-engineering higher. Wrong routing is worse than fan-out if it misses the one source where the answer lives — calibrate the map against 50–100 real queries before deploying.

Failure mode 2: retrieval reformulation loop

When a user's question does not match the indexed content in the configured data sources, Dust's assistant sees low-relevance retrieved chunks. The LLM's response to low-quality retrieval is to reformulate its search internally and try again — refining query terms, changing the search angle, or trying different keywords. Each reformulation triggers another round of embedding calls and similarity searches across the connected sources. For topics genuinely absent from the corpus — a question about a product decision made before documents were indexed, or a question about a domain the workspace hasn't added yet — this produces a reformulation loop that charges for 3–6 retrieval round-trips without improving the result quality.

The reformulation pattern is subtle because it is not visible in Dust's standard usage logs as a distinct event type. What appears in the logs is a single conversation turn with unusually high token consumption — the overhead of multiple retrieval passes is absorbed into the per-message cost without a clear attribution. Teams notice it as "some queries are much more expensive than others" without a clear cause until they instrument the embedding call count per conversation turn.

Python — retrieval quality monitor to detect reformulation loops
from runguard import LoopDetector, LoopDetectedError
import time

class DustRetrievalQualityGuard:
    """
    Monitors retrieval similarity scores across reformulation attempts.
    Trips circuit breaker when consecutive low-quality results indicate
    the topic is absent from indexed data sources.
    """

    # Below this similarity score = "not in corpus"
    MIN_USEFUL_SIMILARITY = 0.40

    def __init__(
        self,
        max_low_quality_attempts: int = 3,
        window_seconds: int = 60,
    ):
        self._detector = LoopDetector(
            max_consecutive=max_low_quality_attempts,
            window_seconds=window_seconds,
        )
        self._attempt_count = 0
        self._start_time = time.monotonic()

    def evaluate_retrieval(
        self,
        query: str,
        top_similarity_score: float,
        source_id: str = "",
    ) -> bool:
        """
        Returns True if retrieval quality is acceptable (proceed with LLM call).
        Returns False if low-quality but under the attempt limit (try reformulation).
        Raises RuntimeError if max low-quality attempts are exhausted.
        """
        self._attempt_count += 1

        if top_similarity_score >= self.MIN_USEFUL_SIMILARITY:
            self._detector.record_success()
            return True

        label = (
            f"low_similarity:{top_similarity_score:.3f} "
            f"source={source_id!r} query={query[:60]!r}"
        )
        try:
            self._detector.record_error(label)
        except LoopDetectedError:
            elapsed = time.monotonic() - self._start_time
            raise RuntimeError(
                f"Retrieval circuit breaker tripped after {self._attempt_count} "
                f"low-quality attempts in {elapsed:.1f}s. "
                f"Topic likely absent from indexed data sources. "
                f"Last query: {query!r}. "
                f"Recommend: acknowledge knowledge gap rather than continuing retrieval."
            )

        return False  # permit one reformulation attempt


class DustConversationGuard:
    """
    Per-conversation wrapper that resets the retrieval quality guard
    between conversation turns while preserving turn-level state.
    """

    def __init__(self, max_reformulations_per_turn: int = 3):
        self._max = max_reformulations_per_turn
        self._turn_guard: DustRetrievalQualityGuard | None = None

    def start_turn(self) -> DustRetrievalQualityGuard:
        self._turn_guard = DustRetrievalQualityGuard(
            max_low_quality_attempts=self._max,
        )
        return self._turn_guard

    def end_turn(self) -> None:
        self._turn_guard = None

The guard wraps the retrieval step before the results are passed to the LLM. After three consecutive low-similarity retrievals on the same query topic, it raises rather than permitting another reformulation. The LLM then receives the guard's exception as its "retrieval result" and can formulate a response like "I don't have indexed information about this topic" — which is the correct, zero-waste answer for a knowledge gap, rather than a response assembled from irrelevant chunks after six retrieval attempts.

The MIN_USEFUL_SIMILARITY threshold of 0.40 should be calibrated against your specific embedding model and corpus density. A dense, well-curated knowledge base will have most relevant results returning scores above 0.65; consistently sub-0.40 scores across multiple reformulations reliably indicate a topic gap. Calibrate on 20–30 "known gap" queries from your team (questions you know aren't in your data sources) and set the threshold at the 90th percentile of those scores.

Failure mode 3: chain assistant cascade

Dust.tt supports multi-assistant workflows through @assistant-name invocations — one assistant can call another assistant as part of generating its response. This enables composable architectures: a "research coordinator" assistant that invokes a "data analyst" assistant for quantitative questions and a "policy checker" assistant for compliance questions. Each invoked assistant runs a full Dust interaction cycle: retrieval across its configured data sources, context construction, and an LLM generation call. The cost of the root assistant interaction is the sum of all invoked assistant costs plus its own.

The cascade risk emerges from two patterns. First, unbounded depth: if the data analyst assistant is itself configured to call a "data source connector" assistant for raw queries, a three-level invocation tree (coordinator → analyst → connector) pays three full assistant interaction costs per root query. At each level, any retrieval or generation quality issue multiplies. Second, fan-out depth: if the coordinator invokes 4 specialist assistants in parallel, and two of those enter retrieval reformulation loops (failure mode 2), the cost of the two looping assistants propagates to the coordinator's total response cost while the coordinator waits.

Python — assistant invocation depth and cost tracker
from runguard import BudgetTracker, BudgetExceededError
import threading
from typing import Optional

class DustChainGuard:
    """
    Tracks assistant invocation depth and cumulative cost across
    a chain of Dust @assistant references.
    Thread-safe for parallel assistant invocations in the same chain.
    """

    def __init__(
        self,
        max_depth: int = 3,
        max_parallel_invocations: int = 4,
        chain_budget_usd: float = 1.0,
        cost_per_assistant_call_usd: float = 0.08,
    ):
        self.max_depth = max_depth
        self.max_parallel = max_parallel_invocations
        self._budget = BudgetTracker(cap=chain_budget_usd)
        self._cost_per_call = cost_per_assistant_call_usd
        self._active = 0
        self._lock = threading.Lock()

    def request_invocation(
        self,
        caller: str,
        callee: str,
        depth: int,
    ) -> None:
        """
        Called before each @assistant invocation.
        Enforces depth limit, parallel limit, and budget check.
        """
        if depth > self.max_depth:
            raise RuntimeError(
                f"Assistant chain depth {depth} exceeds max_depth={self.max_depth}. "
                f"Blocked: {caller!r} → {callee!r}. "
                f"Refactor the chain to reduce nesting."
            )

        with self._lock:
            if self._active >= self.max_parallel:
                raise RuntimeError(
                    f"Chain blocked: {self._active} assistant invocations already "
                    f"active (max_parallel={self.max_parallel}). "
                    f"Wait for in-flight assistants to complete."
                )
            try:
                self._budget.add(self._cost_per_call)
            except BudgetExceededError as e:
                raise RuntimeError(
                    f"Chain budget exhausted: invoking {callee!r} from {caller!r} "
                    f"at depth {depth} would exceed chain budget. "
                    f"Current spend: ${e.spent:.3f} / ${e.cap:.2f}."
                ) from e
            self._active += 1

    def release_invocation(self) -> None:
        with self._lock:
            self._active = max(0, self._active - 1)


# Usage: wrap your Dust API invocation layer
guard = DustChainGuard(max_depth=3, max_parallel_invocations=3, chain_budget_usd=0.75)

def invoke_dust_assistant(
    workspace_id: str,
    assistant_id: str,
    message: str,
    caller: str = "user",
    depth: int = 1,
) -> str:
    guard.request_invocation(caller=caller, callee=assistant_id, depth=depth)
    try:
        # Call Dust API: POST /api/v1/w/{workspace}/assistant/conversations
        # (implementation omitted — wrap your actual Dust client here)
        result = _call_dust_api(workspace_id, assistant_id, message)
        return result
    finally:
        guard.release_invocation()

The guard enforces a depth ceiling of 3 by default — enough for a coordinator → specialist → data-connector pattern while preventing runaway recursion. The parallel ceiling of 3 prevents the coordinator from fanning out to an unbounded number of specialists simultaneously. The budget pre-commitment at each invocation ensures the chain stops before a cost spike rather than after.

The cost_per_assistant_call_usd of $0.08 is a conservative estimate for an assistant with 4–6 data sources answering a non-trivial query on GPT-4o — calibrate this against your actual observed per-call costs from the Dust API response headers or your model provider's usage dashboard. The budget check uses this estimate to fail fast when the chain is running deep or wide on expensive models.

Failure mode 4: multi-action cascade amplification

Dust.tt assistants can be configured with multiple action types: data source search, web search (via Dust's integrated search action), and Dust App execution (custom workflows built in Dust's visual editor). The intent is flexibility — a research assistant that can pull from internal knowledge, then supplement with real-time web search, then run a calculation in a Dust App. When the assistant is well-scoped, it picks the right action for the query and executes once. When the assistant is under-constrained — given a broad goal with many tools available and no instruction to prefer targeted over comprehensive — it chains actions sequentially, using each result to justify requesting more information from the next action.

A concrete cascade: a user asks "what does our Q3 pipeline look like compared to last year?" The assistant searches the CRM data source (low match — the CRM isn't indexed with natural language). It searches the finance Notion space (partial match — some numbers but incomplete). It falls back to web search to find comparable industry benchmarks (tangentially relevant). It runs a Dust App to normalize the partial numbers it retrieved (partial output). By this point it has chained 4 actions and injected the output of each into its growing context window, paying 4× the single-action cost for a response that ultimately acknowledges it has incomplete data.

Python — multi-action step counter and budget gate
from runguard import BudgetTracker, BudgetExceededError

class DustMultiActionGuard:
    """
    Tracks the number of sequential actions executed in a single
    Dust assistant turn and gates further actions on budget.
    Prevents under-constrained assistants from chaining actions indefinitely.
    """

    ACTION_COSTS_USD = {
        "data_source_search": 0.015,   # embedding + retrieval + context injection
        "web_search":         0.025,   # external search API + result parsing
        "dust_app":           0.040,   # app execution + result formatting
        "llm_synthesis":      0.060,   # synthesis call over accumulated context
    }

    def __init__(
        self,
        max_actions_per_turn: int = 4,
        turn_budget_usd: float = 0.30,
    ):
        self.max_actions = max_actions_per_turn
        self._budget = BudgetTracker(cap=turn_budget_usd)
        self._action_count = 0
        self._action_log: list[str] = []

    def before_action(self, action_type: str, description: str = "") -> None:
        """
        Called before each Dust action in the turn.
        Raises RuntimeError if action count or budget limit reached.
        """
        self._action_count += 1
        label = f"{action_type}:{description[:40]}" if description else action_type
        self._action_log.append(label)

        if self._action_count > self.max_actions:
            raise RuntimeError(
                f"Multi-action limit reached: {self._action_count} actions in this turn "
                f"(max={self.max_actions}). "
                f"Actions so far: {', '.join(self._action_log)}. "
                f"Synthesize from available results rather than requesting more actions."
            )

        cost = self.ACTION_COSTS_USD.get(action_type, 0.03)
        try:
            self._budget.add(cost)
        except BudgetExceededError as e:
            raise RuntimeError(
                f"Turn budget exhausted at action #{self._action_count} ({action_type}). "
                f"Spend: ${e.spent:.3f} / ${e.cap:.2f}. "
                f"Actions completed: {', '.join(self._action_log[:-1])}. "
                f"Synthesize from those results now."
            ) from e

    def summary(self) -> dict:
        return {
            "action_count": self._action_count,
            "actions": self._action_log,
            "spend_usd": self._budget.spent,
        }

The guard caps the action chain at 4 steps per turn and at $0.30 in accumulated action costs. Both are configurable — a research-heavy assistant that legitimately needs data source search plus web search plus synthesis can set max_actions=5, while a simple FAQ assistant should use max_actions=2. The action cost table reflects rough per-action economics; web search is more expensive than data source search because it typically involves an external API call and a secondary parsing step; Dust App execution costs the most because it runs a custom workflow that may itself make LLM calls.

The key architectural fix for this failure mode is assistant scoping at the instruction level, not just the guard level. Dust assistant instructions should include explicit guidance like "if the first data source search does not return relevant results, summarize what you know and ask the user to clarify rather than searching additional sources." The guard is a runtime backstop for under-constrained prompts in production; the instruction is the prevention layer that reduces how often the guard fires.

Putting the guards together

The four Dust.tt failure modes compound. A broad query fans out across 20 data sources (mode 1), most of which return low-similarity results that trigger reformulation attempts (mode 2), one of which causes the assistant to invoke a specialist assistant for a second opinion (mode 3), which itself chains a data source search plus a web search to fill the knowledge gap (mode 4). The query that should have cost $0.05 — a targeted retrieval from one source plus a single LLM call — costs $0.80–$1.50 as the four modes stack. At 200 such queries per day in a mid-size enterprise workspace, that is $100–$300 per day in unnecessary API overhead.

Failure mode Detection point Guard mechanism Typical cost without guard
Data source fan-out
All N sources queried for every message
Before retrieval call Query domain classifier → scoped source list 5–20× retrieval cost vs. targeted query
Retrieval reformulation loop
Low-quality results trigger repeated re-queries
After each retrieval result Similarity threshold + consecutive low-score detector 3–6 extra retrieval round-trips per topic-absent query
Chain assistant cascade
@assistant invocations create recursive cost trees
Before each @assistant invocation Depth limit + parallel ceiling + pre-committed budget N× root interaction cost where N = chain length
Multi-action amplification
Under-constrained assistants chain 5–8 actions
Before each action execution Action step counter + per-turn budget gate 3–8× single-action cost per affected turn

For Dust.tt workspaces with moderate query volume (100–500/day), the highest-priority guard is the data source fan-out fix (mode 1) — it reduces the baseline cost of every query, not just edge cases. The second priority is the multi-action guard (mode 4), which catches the "comprehensive but expensive" assistant behavior pattern that is common in first-generation assistant configurations. The retrieval reformulation guard (mode 2) and chain cascade guard (mode 3) address lower-frequency but higher-severity events — when they fire, they prevent $1–$5 single-query costs that otherwise appear as anomalies in the usage report without clear attribution.

Teams running Dust.tt alongside code-level agent frameworks — LangGraph for structured agent workflows or OpenAI Agents SDK for tool-calling patterns — often connect those agents to Dust assistants via the Dust API, adding a fifth layer of cost exposure where framework-level loops and Dust retrieval costs compound. Instrument the Dust API call boundary in your framework with a budget tracker and treat each Dust API call as a unit cost to be tracked and capped alongside the framework's own LLM calls.

Frequently asked questions

Does Dust.tt provide built-in usage limits per assistant or workspace?

Dust.tt's enterprise tier includes workspace-level usage quotas that can be configured by workspace administrators — limits on message volume per day or per month, and per-assistant rate limits on invocation frequency. These caps prevent runaway usage at the workspace level but do not detect loop patterns within a single conversation turn. A quota that allows 1,000 messages per day does not prevent a single conversation from consuming $5–$10 in a single turn due to a retrieval reformulation loop or a chain assistant cascade. Per-turn guards (modes 2, 3, and 4 above) operate at the interaction level and catch cost spikes that workspace quotas miss.

How do I measure actual per-query costs in Dust.tt when using bring-your-own-key?

When using your own Anthropic or OpenAI API key with Dust.tt, costs appear in the model provider's usage dashboard attributed to your API key — not broken down by Dust assistant or conversation. To measure per-query costs, enable detailed usage logging at the API key level (OpenAI's Usage API or Anthropic's usage events) and correlate timestamps with Dust conversation IDs. After 50–100 queries across your most-used assistants, calculate the p50 and p95 cost per assistant. The p95 cost — the expensive outlier queries — is where retrieval loops and multi-action cascades concentrate. Comparing p50 to p95 gives you the headroom of your worst-case failure mode costs.

Is the query domain classifier approach (failure mode 1 guard) accurate enough to trust for production routing?

The classifier approach works well for workspaces with clear domain boundaries — HR vs. engineering vs. finance — and where the data source taxonomy maps reasonably to how users phrase questions. For workspaces where cross-domain questions are common ("what's our engineering headcount and what's the budget for it?"), a domain-routing classifier will frequently select both the HR and finance domains, which is the correct behavior — the key is that it still excludes the 15 other irrelevant sources. The main risk is false negatives: classifying a cross-domain question as single-domain and missing the second relevant source. Calibrate by running the classifier against 50 real user queries and measuring how often the correct data source is in the selected set. A recall rate above 90% on your real query distribution makes the classifier safe for production use.

How do chain assistant cascades appear in Dust.tt's audit logs?

In Dust.tt's workspace audit log, each assistant interaction appears as a separate event with its own message ID, conversation ID, and assistant ID. When assistant A invokes assistant B via an @mention, B's interaction appears as a separate audit event with a different conversation context — the invocation chain is implicit in the timing and content, not explicit in a parent-child relationship field. This makes cascade cost attribution difficult without instrumenting the Dust API call layer directly. The chain guard approach above (tracking depth and cost at the API call boundary in your integration layer) is more reliable than reconstructing cascades from Dust's audit log after the fact.

What is the difference between Dust.tt's data source fan-out and retrieval performance in other RAG platforms like Glean or Guru?

Glean and Guru are search-first platforms where retrieval is the primary function and LLM generation is secondary — their cost model is dominated by indexing and search infrastructure costs, not per-query LLM calls. Dust.tt's cost model is LLM-first: retrieval is a preprocessing step for an LLM assistant that generates a full response. This means Dust.tt's per-query costs are more sensitive to retrieval quality (poor retrieval inflates the LLM input context) and to LLM model choice (GPT-4o vs. Claude Haiku 4.5 has a 10–30× per-token cost difference). The fan-out pattern exists in all multi-source RAG systems, but its cost consequence is larger in Dust.tt because each additional retrieved chunk adds to the LLM's input token count at model rates rather than just adding to a search infrastructure overhead.

Stop Dust.tt cost spikes before they hit your API bill

RunGuard's circuit breaker SDK wraps the Dust API call layer to detect retrieval reformulation loops, chain assistant cascades, and multi-action amplification at runtime — so knowledge gaps produce clean "I don't know" responses instead of expensive multi-round retrieval marathons.

Start free trial — no card required