Salesforce Agentforce Cost Control: Loop Detection and Budget Enforcement in Production

Salesforce Agentforce runs on the Atlas reasoning engine — a multi-step planner that selects topics, chooses actions, executes them via the Agentforce Action framework, and iterates until the user's goal is met or a session step limit is reached. The platform ships with configurable per-session step counts and topic-level instruction guardrails. What it doesn't ship with is a real-time circuit breaker: nothing that detects an action call spiral while it's spinning, enforces a per-session token retrieval budget, prevents write actions from executing twice when the model misreads a result, or stops an escalation action from hammering a full Omni-Channel queue.

For teams building production Agentforce deployments in Service Cloud, Sales Cloud, or Slack, that gap translates directly into cost and quality issues. Agentforce sessions bill against your Einstein AI consumption quota. A runaway action spiral that exhausts the session step limit before reaching a resolution burns quota and delivers a dead-end response to your customer. A write action repeated three times because the model didn't recognize the first call's success creates duplicate records that customer ops teams then spend hours cleaning up.

This post covers four failure modes specific to Agentforce's architecture, with complete Apex implementations of guards for each one. Agentforce actions are implemented as Apex @InvocableMethod classes or declarative Flows — the guards described here slot into those action classes without any changes to the Agentforce configuration or topic setup. If you'd rather not maintain the guard layer yourself, the final section shows how to call RunGuard from any Agentforce action via an Apex HTTP callout.

How Salesforce Agentforce works

Agentforce agents are configured through three primitives:

  1. Topics — knowledge domains with a description, natural-language instructions, and a list of associated actions. The Atlas engine selects a topic based on the user's message and conversation context.
  2. Actions — what the agent can do within a topic: Apex @InvocableMethod classes, declarative Flows, Prompt Template actions, and built-in Salesforce platform actions (search Knowledge, create record, send email, escalate to human).
  3. The Atlas reasoning engine — the underlying LLM-powered planner. On each step, Atlas decides whether to call an action, respond directly to the user, or end the session. The decision is made from the full conversation history, topic instructions, and available action schemas.

Agentforce sessions have a configurable maximum step count (default 50, configurable per agent in Setup). A "step" is one Atlas decision cycle: one action call plus the model reasoning to call it. A session that hits the step limit ends with a generic fallback message. Sessions that loop without hitting the step limit accumulate LLM inference cost proportional to conversation history length — each subsequent step sends the full history to the model, so the cost per step grows as the session lengthens.

The gap: Agentforce's step limit is a blunt backstop, not a circuit breaker. It counts steps, not semantic repetition. It fires after the damage is done. There is no built-in mechanism to detect that your agent has called the same action fifteen times with semantically identical parameters, or that a write action succeeded on step 2 but the model called it again on step 4 because the success confirmation was ambiguous.

Failure mode 1: Action call spiral

The most common Agentforce failure in knowledge base and case-resolution agents. The Atlas engine calls a retrieval or lookup action, receives a result that partially addresses the user's question, and calls a similar action on the next step — either the same action with a refined query, or a different action from the same topic that retrieves overlapping data. When no action produces a result that allows the model to synthesize a final answer (because the data is fragmented or the user's query is underspecified), the engine keeps trying variations until the step limit fires.

In a Service Cloud knowledge base agent, this looks like:

  • Step 1: SearchKnowledge("password reset steps account locked")
  • Step 2: SearchKnowledge("how to reset password locked account")
  • Step 3: SearchKnowledge("account unlock password recovery portal")
  • Step 4: SearchKnowledge("locked user account reset credentials")
  • …(continues until step limit)

Each query is a syntactic variation on the same underlying information need. Exact-string deduplication doesn't catch it. The guard needs to compute semantic similarity across a rolling window of recent calls to the same action. In Apex, we implement this with Jaccard similarity over normalized keyword sets, storing the rolling window in Platform Cache keyed by session ID:

public class ActionSpiralGuard {

    private static final Integer WINDOW_SIZE = 4;
    private static final Double SIMILARITY_THRESHOLD = 0.72;
    private static final Integer MIN_HIGH_SIM_PAIRS = 2;
    private static final Set<String> STOP_WORDS = new Set<String>{
        'the','a','an','of','in','for','to','and','or','is','are',
        'was','were','be','been','with','how','what','when','where'
    };

    public static void check(String sessionId, String actionName, String queryText) {
        Cache.OrgPartition part = Cache.Org.getPartition('local.agentGuard');
        String cacheKey = ('spiral_' + sessionId + '_' + actionName)
                            .replaceAll('[^a-zA-Z0-9_]', '_')
                            .left(255);

        List<String> serializedHistory = new List<String>();
        String cached = (String) part.get(cacheKey);
        if (cached != null) {
            serializedHistory = (List<String>) JSON.deserialize(cached, List<String>.class);
        }

        Set<String> incoming = tokenize(queryText);

        if (serializedHistory.size() >= 2) {
            Integer highSimCount = 0;
            for (String pastSerialized : serializedHistory) {
                Set<String> past = new Set<String>(pastSerialized.split(','));
                if (jaccardSimilarity(incoming, past) >= SIMILARITY_THRESHOLD) {
                    highSimCount++;
                }
            }
            if (highSimCount >= MIN_HIGH_SIM_PAIRS) {
                throw new AgentGuardException(
                    'ActionSpiralGuard: action "' + actionName + '" called ' +
                    highSimCount + ' times with semantically similar parameters ' +
                    '(threshold ' + SIMILARITY_THRESHOLD + '). ' +
                    'Session: ' + sessionId
                );
            }
        }

        serializedHistory.add(String.join(new List<String>(incoming), ','));
        if (serializedHistory.size() > WINDOW_SIZE) {
            serializedHistory = serializedHistory.subList(
                serializedHistory.size() - WINDOW_SIZE,
                serializedHistory.size()
            );
        }
        part.put(cacheKey, JSON.serialize(serializedHistory), 3600);
    }

    private static Set<String> tokenize(String text) {
        Set<String> tokens = new Set<String>();
        for (String word : text.toLowerCase().replaceAll('[^a-z0-9 ]', ' ').split('\\s+')) {
            if (word.length() > 2 && !STOP_WORDS.contains(word)) {
                tokens.add(word);
            }
        }
        return tokens;
    }

    private static Double jaccardSimilarity(Set<String> a, Set<String> b) {
        if (a.isEmpty() && b.isEmpty()) return 1.0;
        Set<String> union = new Set<String>(a);
        union.addAll(b);
        if (union.isEmpty()) return 0.0;
        Set<String> intersection = new Set<String>(a);
        intersection.retainAll(b);
        return (Double) intersection.size() / union.size();
    }

    public class AgentGuardException extends Exception {}
}

To use this guard inside your knowledge search action:

public class SearchKnowledgeAction {
    @InvocableMethod(label='Search Knowledge Base' description='Search knowledge articles')
    public static List<SearchResult> execute(List<SearchInput> inputs) {
        SearchInput inp = inputs[0];

        // Guard: raises AgentGuardException if spiral detected
        ActionSpiralGuard.check(inp.sessionId, 'SearchKnowledge', inp.query);

        // Normal action logic follows
        List<Knowledge__kav> articles = [
            SELECT Title, Summary__c, Body__c
            FROM Knowledge__kav
            WHERE PublishStatus = 'Online'
            AND Language = 'en_US'
            AND (Title LIKE :('%' + inp.query + '%')
                 OR Keywords__c LIKE :('%' + inp.query + '%'))
            LIMIT 5
        ];
        // ... build and return SearchResult list
    }
}

The sessionId input field must be passed through from the Agentforce session context. In Agentforce, you can make the session ID available to actions by including it as an input parameter in the action's Apex class and mapping it in the topic's action instructions. Set SIMILARITY_THRESHOLD between 0.65 and 0.80. Below 0.65 you'll see false positives on legitimate exploratory searches. Above 0.80 the guard misses spirals where the model varies vocabulary enough to drop similarity while still looping semantically.

Failure mode 2: Write action idempotency failure

Agentforce write actions — create record, update opportunity stage, send email, create case — are the most dangerous failure mode when they repeat. Unlike search actions where repetition wastes quota, repeated write actions produce duplicate data: three identical support cases, two copies of the same outbound email, an opportunity moved to "Closed Won" twice and then moved back by a confused subsequent step.

The failure pattern: the action executes successfully, but the Atlas engine misinterprets the confirmation message as ambiguous or failure. For example, a create-case action returns "Case 00001234 created". The model may reason: "The case number looks like a placeholder — let me verify by creating it again." Or the action's return message is too terse ("Done.") and the model calls the action a second time to confirm.

The guard hashes the action name plus canonicalized input parameters. On first call, it stores the hash and the result. On a repeated call with the same hash, it returns the cached result immediately without re-executing the action:

public class WriteIdempotencyGuard {

    public static String checkOrStore(
        String sessionId,
        String actionName,
        Map<String, Object> params,
        String resultIfNew
    ) {
        Cache.OrgPartition part = Cache.Org.getPartition('local.agentGuard');

        // Canonical parameter string: sort keys, JSON-serialize values
        List<String> sortedKeys = new List<String>(params.keySet());
        sortedKeys.sort();
        String canonical = '';
        for (String k : sortedKeys) {
            canonical += k + '=' + String.valueOf(params.get(k)) + ';';
        }

        // Use Blob for deterministic hash (MD5 not available in Apex, use hashCode)
        Integer paramHash = (actionName + canonical).hashCode();
        String cacheKey = ('idem_' + sessionId + '_' + paramHash)
                            .replaceAll('[^a-zA-Z0-9_-]', '_')
                            .left(255);

        String cachedResult = (String) part.get(cacheKey);
        if (cachedResult != null) {
            // Already executed — return cached result, skip re-execution
            return '[ALREADY EXECUTED — returning cached result] ' + cachedResult;
        }

        // First execution — store result in cache
        part.put(cacheKey, resultIfNew, 3600);
        return resultIfNew;
    }
}

Usage pattern in a write action. The key insight is that the action body executes first, then the result is run through the idempotency guard's cache — but the guard also short-circuits before execution on a repeat call:

public class CreateCaseAction {
    @InvocableMethod(label='Create Support Case' description='Creates a new support case')
    public static List<CaseResult> execute(List<CaseInput> inputs) {
        CaseInput inp = inputs[0];

        Map<String, Object> params = new Map<String, Object>{
            'subject' => inp.subject,
            'accountId' => inp.accountId,
            'priority' => inp.priority
        };

        // Check cache before executing — if repeat call, return cached result immediately
        Cache.OrgPartition part = Cache.Org.getPartition('local.agentGuard');
        List<String> sortedKeys = new List<String>(params.keySet());
        sortedKeys.sort();
        String canonical = '';
        for (String k : sortedKeys) {
            canonical += k + '=' + String.valueOf(params.get(k)) + ';';
        }
        Integer paramHash = ('CreateCase' + canonical).hashCode();
        String cacheKey = ('idem_' + inp.sessionId + '_' + paramHash)
                            .replaceAll('[^a-zA-Z0-9_-]', '_').left(255);
        String cachedResult = (String) part.get(cacheKey);
        if (cachedResult != null) {
            return new List<CaseResult>{ new CaseResult(cachedResult) };
        }

        // Execute the actual write
        Case newCase = new Case(
            Subject = inp.subject,
            AccountId = inp.accountId,
            Priority = inp.priority,
            Status = 'New',
            Origin = 'Agentforce'
        );
        insert newCase;

        String result = 'Case ' + newCase.CaseNumber + ' created successfully (Id: ' + newCase.Id + ').';
        part.put(cacheKey, result, 3600);
        return new List<CaseResult>{ new CaseResult(result) };
    }
}

The hashCode() approach works for exact-parameter matching. For near-duplicate detection (the model rephrases an email subject slightly), combine this with the spiral guard's Jaccard similarity check on the action parameters. Run the idempotency guard first — it's a cheaper O(1) cache lookup. Only fall through to the spiral guard if the idempotency check passes.

Failure mode 3: Data Cloud retrieval context avalanche

Agentforce agents integrated with Data Cloud use the Einstein Semantic Search or Data Cloud Query actions to retrieve relevant content from your unified data store. Each retrieval step appends the retrieved content to the agent's context. Because Atlas sends the full conversation context — including all retrieved content from prior steps — to the LLM on every subsequent step, the effective token cost per step grows as retrieved content accumulates. In a long session with a fragmented knowledge base, this leads to the same avalanche pattern described for RAG agents in other frameworks: the model truncates older context, perceives missing information, and calls the retrieval action again.

The Agentforce-specific wrinkle is that Data Cloud retrieval actions can return rich structured records — account data, product specifications, contract terms — each potentially hundreds of tokens. A single DataCloudSearch call returning 5 records at 600 tokens each adds 3,000 tokens to the session context. After three such calls, the model is operating on 9,000 tokens of retrieved content plus conversation history, well past the efficient context range for coherent synthesis.

public class RetrievalBudgetGuard {

    private static final Integer MAX_SESSION_TOKENS = 6000;
    private static final Integer MAX_SINGLE_RESULT_CHARS = 6000; // ~1500 tokens at 4 chars/token
    private static final Double CHARS_PER_TOKEN = 4.0;

    public static String checkAndTruncate(
        String sessionId,
        String actionName,
        String resultText
    ) {
        Cache.OrgPartition part = Cache.Org.getPartition('local.agentGuard');
        String budgetKey = ('budget_' + sessionId)
                            .replaceAll('[^a-zA-Z0-9_]', '_').left(255);

        // Truncate single oversized result
        String processedResult = resultText;
        if (resultText.length() > MAX_SINGLE_RESULT_CHARS) {
            processedResult = resultText.left(MAX_SINGLE_RESULT_CHARS) +
                '\n[Content truncated: ' + resultText.length() + ' chars, ' +
                'limit ' + MAX_SINGLE_RESULT_CHARS + ']';
        }

        Integer resultTokens = (Integer) Math.ceil(processedResult.length() / CHARS_PER_TOKEN);

        // Accumulate session retrieval budget
        Integer sessionTokens = 0;
        String cachedTokens = (String) part.get(budgetKey);
        if (cachedTokens != null) {
            sessionTokens = Integer.valueOf(cachedTokens);
        }
        sessionTokens += resultTokens;
        part.put(budgetKey, String.valueOf(sessionTokens), 3600);

        if (sessionTokens >= MAX_SESSION_TOKENS) {
            throw new AgentGuardException(
                'RetrievalBudgetGuard: session retrieval budget exhausted on action "' +
                actionName + '". Accumulated ~' + sessionTokens +
                ' tokens (limit: ' + MAX_SESSION_TOKENS + '). ' +
                'Session: ' + sessionId
            );
        }

        return processedResult;
    }

    public class AgentGuardException extends Exception {}
}

Usage in a Data Cloud search action:

public class DataCloudSearchAction {
    @InvocableMethod(label='Search Data Cloud' description='Semantic search over Data Cloud knowledge')
    public static List<SearchResult> execute(List<SearchInput> inputs) {
        SearchInput inp = inputs[0];

        // Spiral guard: prevent semantically similar repeated queries
        ActionSpiralGuard.check(inp.sessionId, 'DataCloudSearch', inp.query);

        // Execute Data Cloud search (via ConnectApi or HTTP callout to Data Cloud API)
        String rawResult = executeDataCloudQuery(inp.query, inp.dmOrgId);

        // Budget guard: truncate oversized results and enforce session token limit
        String guardedResult = RetrievalBudgetGuard.checkAndTruncate(
            inp.sessionId, 'DataCloudSearch', rawResult
        );

        return new List<SearchResult>{ new SearchResult(guardedResult) };
    }

    private static String executeDataCloudQuery(String query, String dmOrgId) {
        // Data Cloud query implementation via HTTP callout
        // ...
        return ''; // placeholder
    }
}

Tune MAX_SESSION_TOKENS to 2× your expected normal-case retrieval volume. If a typical successful session retrieves 3 knowledge articles at 400 characters each, set the limit to 2400–3200 to allow legitimate multi-step lookups while catching runaway retrieval. The CHARS_PER_TOKEN divisor of 4.0 applies to English prose; use 3.0 for JSON-structured Data Cloud records and 2.5 for dense tabular data like financial records or contract terms.

Failure mode 4: Escalation retry deadlock

Agentforce service agents are expected to escalate to a human agent when they can't resolve a case. The escalation action calls Salesforce's Omni-Channel routing API to create a work item and transfer the conversation. The failure mode: if the target queue is at capacity, Omni-Channel may accept the escalation call (HTTP 200) but the work item sits in a "Queued" state indefinitely rather than routing to an available agent. The Atlas engine, inspecting the action's return value, sees an ambiguous state — not a clear success confirmation like "You've been connected to Agent Sarah" — and decides to retry the escalation on the next step.

Each retry creates a new work item in the queue. The customer sees the agent stuck saying "I'm connecting you to a specialist" repeatedly. Ops teams find five identical work items in the queue for the same conversation. On high-traffic days when queues are full, this failure mode produces a feedback loop: more work items → longer queue → more failed escalations → more retry work items.

The guard enforces two constraints: idempotency (one escalation work item per session, no matter how many times the action is called) and backoff with a minimum interval between escalation attempts:

public class EscalationGuard {

    private static final Integer MIN_RETRY_INTERVAL_SECONDS = 30;
    private static final Integer MAX_ESCALATION_ATTEMPTS = 2;

    public static EscalationCheckResult check(String sessionId) {
        Cache.OrgPartition part = Cache.Org.getPartition('local.agentGuard');
        String stateKey = ('escalation_' + sessionId)
                            .replaceAll('[^a-zA-Z0-9_]', '_').left(255);

        EscalationState state = new EscalationState();
        String cached = (String) part.get(stateKey);
        if (cached != null) {
            state = (EscalationState) JSON.deserialize(cached, EscalationState.class);
        }

        Long nowMs = System.currentTimeMillis();

        // Hard idempotency: escalation already succeeded — return cached result
        if (state.succeeded) {
            return new EscalationCheckResult(
                false,
                '[ESCALATION ALREADY COMPLETED] ' + state.workItemId +
                '. The customer is already in queue. Do not escalate again.'
            );
        }

        // Attempt limit: too many tries, stop
        if (state.attempts >= MAX_ESCALATION_ATTEMPTS) {
            return new EscalationCheckResult(
                false,
                'Escalation attempted ' + state.attempts + ' times without success. ' +
                'All queues may be at capacity. Please inform the customer of the delay.'
            );
        }

        // Backoff: not enough time since last attempt
        if (state.lastAttemptMs != null) {
            Long elapsedSeconds = (nowMs - state.lastAttemptMs) / 1000;
            if (elapsedSeconds < MIN_RETRY_INTERVAL_SECONDS) {
                return new EscalationCheckResult(
                    false,
                    'Escalation attempted ' + elapsedSeconds + 's ago. ' +
                    'Wait at least ' + MIN_RETRY_INTERVAL_SECONDS + 's before retrying.'
                );
            }
        }

        // Allowed: update attempt counter and timestamp
        state.attempts++;
        state.lastAttemptMs = nowMs;
        part.put(stateKey, JSON.serialize(state), 3600);
        return new EscalationCheckResult(true, null);
    }

    public static void recordSuccess(String sessionId, String workItemId) {
        Cache.OrgPartition part = Cache.Org.getPartition('local.agentGuard');
        String stateKey = ('escalation_' + sessionId)
                            .replaceAll('[^a-zA-Z0-9_]', '_').left(255);
        EscalationState state = new EscalationState();
        String cached = (String) part.get(stateKey);
        if (cached != null) {
            state = (EscalationState) JSON.deserialize(cached, EscalationState.class);
        }
        state.succeeded = true;
        state.workItemId = workItemId;
        part.put(stateKey, JSON.serialize(state), 3600);
    }

    public class EscalationState {
        public Integer attempts = 0;
        public Long lastAttemptMs;
        public Boolean succeeded = false;
        public String workItemId;
    }

    public class EscalationCheckResult {
        public Boolean allowed;
        public String blockedMessage;
        public EscalationCheckResult(Boolean allowed, String msg) {
            this.allowed = allowed;
            this.blockedMessage = msg;
        }
    }
}

Usage in an escalation action:

public class EscalateToAgentAction {
    @InvocableMethod(label='Escalate to Human Agent' description='Routes conversation to Omni-Channel queue')
    public static List<EscalationResult> execute(List<EscalationInput> inputs) {
        EscalationInput inp = inputs[0];

        EscalationGuard.EscalationCheckResult check =
            EscalationGuard.check(inp.sessionId);

        if (!check.allowed) {
            return new List<EscalationResult>{
                new EscalationResult(check.blockedMessage)
            };
        }

        // Proceed with Omni-Channel routing
        try {
            String workItemId = createOmniChannelWorkItem(inp.queueId, inp.conversationId);
            EscalationGuard.recordSuccess(inp.sessionId, workItemId);
            return new List<EscalationResult>{
                new EscalationResult('Escalation successful. Work item ' + workItemId +
                                     ' created. A specialist will join shortly.')
            };
        } catch (Exception e) {
            return new List<EscalationResult>{
                new EscalationResult('Escalation failed: ' + e.getMessage() +
                                     '. Will retry after ' +
                                     EscalationGuard.MIN_RETRY_INTERVAL_SECONDS + 's.')
            };
        }
    }
}

Combining all four guards: a composite Agentforce action wrapper

The four guards operate at different points in the action lifecycle and compose cleanly. Here is a utility class that bundles all four checks into a single wrapper you can call from any action:

public class AgentGuardComposite {

    public enum ActionType { READ, WRITE, RETRIEVAL, ESCALATION }

    public class GuardContext {
        public String sessionId;
        public String actionName;
        public String queryOrDescription;
        public Map<String, Object> writeParams;
        public ActionType actionType;
    }

    /**
     * Call BEFORE executing the action body.
     * Returns a non-null skip message if the action should not execute
     * (idempotent repeat or escalation blocked).
     * Throws AgentGuardException if a spiral or escalation deadlock is detected.
     */
    public static String preCheck(GuardContext ctx) {
        if (ctx.actionType == ActionType.READ || ctx.actionType == ActionType.RETRIEVAL) {
            // Spiral guard: raises exception if spiral detected
            ActionSpiralGuard.check(ctx.sessionId, ctx.actionName, ctx.queryOrDescription);
        }

        if (ctx.actionType == ActionType.WRITE) {
            // Idempotency: check cache before executing
            Cache.OrgPartition part = Cache.Org.getPartition('local.agentGuard');
            List<String> sortedKeys = new List<String>(ctx.writeParams.keySet());
            sortedKeys.sort();
            String canonical = '';
            for (String k : sortedKeys) {
                canonical += k + '=' + String.valueOf(ctx.writeParams.get(k)) + ';';
            }
            Integer paramHash = (ctx.actionName + canonical).hashCode();
            String cacheKey = ('idem_' + ctx.sessionId + '_' + paramHash)
                                .replaceAll('[^a-zA-Z0-9_-]', '_').left(255);
            String cached = (String) part.get(cacheKey);
            if (cached != null) {
                return '[ALREADY EXECUTED — returning cached result] ' + cached;
            }
        }

        if (ctx.actionType == ActionType.ESCALATION) {
            EscalationGuard.EscalationCheckResult check =
                EscalationGuard.check(ctx.sessionId);
            if (!check.allowed) {
                return check.blockedMessage;
            }
        }

        return null; // proceed with action execution
    }

    /**
     * Call AFTER the action returns its result string.
     * Applies retrieval budget guard and write idempotency caching.
     * Returns (possibly modified) result to pass back to Atlas.
     */
    public static String postCheck(GuardContext ctx, String result) {
        if (ctx.actionType == ActionType.RETRIEVAL) {
            return RetrievalBudgetGuard.checkAndTruncate(
                ctx.sessionId, ctx.actionName, result
            );
        }

        if (ctx.actionType == ActionType.WRITE && ctx.writeParams != null) {
            Cache.OrgPartition part = Cache.Org.getPartition('local.agentGuard');
            List<String> sortedKeys = new List<String>(ctx.writeParams.keySet());
            sortedKeys.sort();
            String canonical = '';
            for (String k : sortedKeys) {
                canonical += k + '=' + String.valueOf(ctx.writeParams.get(k)) + ';';
            }
            Integer paramHash = (ctx.actionName + canonical).hashCode();
            String cacheKey = ('idem_' + ctx.sessionId + '_' + paramHash)
                                .replaceAll('[^a-zA-Z0-9_-]', '_').left(255);
            part.put(cacheKey, result, 3600);
        }

        return result;
    }
}

With this composite, each action class reduces its guard boilerplate to two calls:

@InvocableMethod(label='Search Knowledge' description='Search knowledge articles')
public static List<SearchResult> execute(List<SearchInput> inputs) {
    SearchInput inp = inputs[0];

    AgentGuardComposite.GuardContext ctx = new AgentGuardComposite.GuardContext();
    ctx.sessionId = inp.sessionId;
    ctx.actionName = 'SearchKnowledge';
    ctx.queryOrDescription = inp.query;
    ctx.actionType = AgentGuardComposite.ActionType.RETRIEVAL;

    // Pre-check: throws on spiral, returns skip message on idempotent repeat
    String skipResult = AgentGuardComposite.preCheck(ctx);
    if (skipResult != null) {
        return new List<SearchResult>{ new SearchResult(skipResult) };
    }

    String rawResult = performKnowledgeSearch(inp.query);

    // Post-check: apply budget guard and cache write results
    String guardedResult = AgentGuardComposite.postCheck(ctx, rawResult);
    return new List<SearchResult>{ new SearchResult(guardedResult) };
}

Guard configuration reference

Guard Parameter Default When to adjust
Spiral WINDOW_SIZE 4 Raise to 6 for topic configurations where the agent legitimately calls a search action with varied refinements before synthesizing — exploratory research and complex troubleshooting flows. Lower to 3 for narrow-scope service agents where any repetition is a spiral.
Spiral SIMILARITY_THRESHOLD 0.72 Lower to 0.65 for agents operating on highly specialized vocabulary (legal, financial, medical) where legitimate query refinements share fewer common tokens. Raise to 0.80 for consumer service agents with high-overlap query patterns like billing and password reset.
Idempotency Cache TTL 3600s Reduce to 300s for write actions in high-volume orgs where the same session ID could theoretically be reused across short-lived sessions. Extend to 7200s for long-running case resolution sessions that span multiple hours of back-and-forth.
Budget MAX_SESSION_TOKENS 6000 Tune to 2× expected normal-case retrieval volume. A Service Cloud agent that retrieves 3 knowledge articles per session at ~400 tokens each should use 2400–3000. Raise to 10000 for complex case research agents that legitimately read multiple long documents.
Budget MAX_SINGLE_RESULT_CHARS 6000 Raise to 12000 for agents that read full contract or policy documents where a single document is 10,000+ characters. The guard truncates at this limit and continues; it doesn't stop the session on a single large result.
Escalation MIN_RETRY_INTERVAL_SECONDS 30 Raise to 60 for orgs where Omni-Channel queue status is updated on a slow polling interval. Lower to 15 for real-time routing deployments where queue state refreshes quickly. The goal is long enough to avoid thrashing the routing API, short enough that a legitimate retry after a transient failure succeeds promptly.
Escalation MAX_ESCALATION_ATTEMPTS 2 Keep at 2 for most deployments. Raise to 3 only if your Omni-Channel queues are routinely near capacity and a third attempt after two successive backoffs is genuinely useful. Do not raise above 3 — beyond 2-3 attempts, queue capacity is a systemic issue that requires operational attention, not more retries.

RunGuard integration for Agentforce

If you'd rather not maintain the four guard classes and the Platform Cache configuration yourself, RunGuard provides all four checks as a managed HTTP endpoint. In Agentforce, HTTP callouts from Apex require a named credential or remote site setting. Add api.runguard.dev to your org's Remote Site Settings, then use the following Apex HTTP callout pattern:

public class RunGuardClient {

    private static final String RUNGUARD_URL = 'https://api.runguard.dev/v1/check';
    private static final String RUNGUARD_RECORD_URL = 'https://api.runguard.dev/v1/record';

    public static void preCheck(
        String sessionId,
        String actionName,
        String queryOrParams,
        String actionType
    ) {
        HttpRequest req = new HttpRequest();
        req.setEndpoint(RUNGUARD_URL);
        req.setMethod('POST');
        req.setHeader('Content-Type', 'application/json');
        req.setHeader('X-RunGuard-Key', getApiKey());
        req.setBody(JSON.serialize(new Map<String, Object>{
            'app_id' => 'agentforce-prod',
            'session_id' => sessionId,
            'action_name' => actionName,
            'query' => queryOrParams,
            'action_type' => actionType
        }));
        req.setTimeout(2000);

        Http http = new Http();
        HttpResponse res = http.send(req);

        if (res.getStatusCode() == 409) {
            Map<String, Object> body =
                (Map<String, Object>) JSON.deserializeUntyped(res.getBody());
            throw new CalloutException(
                '[RunGuard] ' + body.get('reason') + ': ' + body.get('detail')
            );
        }
    }

    public static String recordResult(
        String sessionId,
        String actionName,
        String result,
        Boolean success
    ) {
        HttpRequest req = new HttpRequest();
        req.setEndpoint(RUNGUARD_RECORD_URL);
        req.setMethod('POST');
        req.setHeader('Content-Type', 'application/json');
        req.setHeader('X-RunGuard-Key', getApiKey());
        req.setBody(JSON.serialize(new Map<String, Object>{
            'app_id' => 'agentforce-prod',
            'session_id' => sessionId,
            'action_name' => actionName,
            'result' => result,
            'success' => success
        }));
        req.setTimeout(2000);

        Http http = new Http();
        HttpResponse res = http.send(req);
        if (res.getStatusCode() == 200) {
            Map<String, Object> body =
                (Map<String, Object>) JSON.deserializeUntyped(res.getBody());
            return (String) body.get('result');
        }
        return result; // fallback: return unmodified if RunGuard unreachable
    }

    private static String getApiKey() {
        // Store your RunGuard API key in a Custom Setting or Named Credential
        RunGuard_Config__c cfg = RunGuard_Config__c.getOrgDefaults();
        return cfg.API_Key__c;
    }
}

RunGuard persists all trip events in a dashboard with the full action call history, similarity scores, token budget consumption, and escalation attempt timelines — searchable across all sessions and all deployed agents. Slack alerts are included in all plans; PagerDuty integration is available on Team.

FAQ

Platform Cache requires a cache partition to be configured. What if our org doesn't have one set up?

Platform Cache partitions are configured in Setup → Platform Cache. You need at least one org cache partition named agentGuard (matching the local.agentGuard reference in the guard code) with enough allocated capacity for your session volume. A 1MB org cache partition supports thousands of concurrent sessions. If Platform Cache is unavailable or unconfigured in your org, you can substitute Custom Object records — create an Agent_Guard_Session__c object with text fields for each guard's state, use the session ID as an external ID, and replace part.get()/part.put() with SOQL selects and upserts. The logic is identical; the storage is a Custom Object instead of the cache. Platform Cache is strongly preferred for production because it avoids DML limits and record storage costs.

Can I apply these guards to Flow-based actions, or only to Apex @InvocableMethod actions?

You can apply the guards to Flow-based actions, but the integration point is different. Flows don't have Apex code inline — you need to call an Apex Action element inside the Flow that invokes the guard Apex class. Add an Apex Action element at the start of each Flow that calls ActionSpiralGuard.check() or AgentGuardComposite.preCheck(), passing the session ID and a query description from the Flow's input variables. Similarly, add an Apex Action element near the end of the Flow that calls the budget guard or idempotency cache. This means every guarded Flow has two extra Apex Action elements, which counts toward your Flow element limits, but for most flows this is negligible. For write Flows specifically, add the idempotency check as the first element — if the check returns a "skip" signal, use a Decision element to route to the early-return output without executing the DML steps.

Agentforce has a built-in 50-step session limit. Why do I need a spiral guard on top of that?

The 50-step limit catches infinite loops eventually, but the damage accumulates proportionally to the step count. By step 50, a spiraling knowledge search agent has sent the full conversation history — including all retrieved content from 50 retrieval steps — to the LLM on each step. The LLM inference cost for step 50 is dramatically higher than for step 5 because the context is 10× larger. More importantly, the customer sees 50 unhelpful responses before the session terminates with a generic failure message. The spiral guard catches the loop at step 3 or 4, returns a specific error message that the Atlas engine can relay to the customer ("I'm having trouble finding the right information — let me connect you to a specialist"), and prevents the LLM inference cost from growing through the entire session window. The 50-step limit is a safety net; the spiral guard is the circuit breaker.

The idempotency guard uses Java's hashCode() for parameter hashing. Is that collision-safe for production?

Apex's String.hashCode() returns a 32-bit signed integer, giving roughly 4 billion unique values. For session-scoped idempotency — the same write action called twice in the same short-lived session — collisions are astronomically rare in practice. The cache key includes both the session ID and the parameter hash, so even if two different parameter sets produce the same hash, they're in different session-scoped cache entries and won't incorrectly block each other. If you're processing extremely high session volumes and want a stronger guarantee, replace hashCode() with a Crypto.generateDigest('SHA-256', Blob.valueOf(actionName + canonical)) call and convert the first 16 bytes to a hex string for the cache key. This is overkill for most orgs — 32-bit hash in a session-scoped key is effectively collision-free at Agentforce session volumes — but it's a straightforward upgrade if you want belt-and-suspenders safety.

We use Agentforce for Sales Cloud, not Service Cloud — there are no escalation actions. Which guards apply?

For Sales Cloud Agentforce agents (opportunity research, account enrichment, meeting prep), the spiral guard and idempotency guard are the most important. The spiral guard catches repeated CRM data lookups — querying the same account's opportunity history with slightly different criteria because the initial result didn't fully answer the rep's question. The idempotency guard is critical for any write actions: logging activities, creating follow-up tasks, updating opportunity stages. A Sales Cloud agent that creates two identical follow-up tasks or logs the same activity twice pollutes the CRM data that your forecasting and reporting depends on. The retrieval budget guard is useful if your Sales Cloud agent queries Data Cloud or calls external enrichment APIs (ZoomInfo, LinkedIn Sales Navigator) — those APIs bill per call, and an unguarded spiral can accumulate unexpected API costs. Skip the escalation guard unless your Sales Cloud agent has a "hand off to human" action for qualifying complex opportunities.

Stop runaway Agentforce sessions before the bill lands

RunGuard wraps all four Agentforce agent guards — action spiral detection, write action idempotency, Data Cloud retrieval budget enforcement, and escalation retry prevention — as a managed HTTP endpoint. Two Apex callout methods replace four guard classes and a Platform Cache partition configuration, and you get a persistent 30-day trip dashboard with Slack alerts included.

Start free 14-day trial →