June 14, 2026 Temporal Workflow Orchestration Cost Control

Temporal AI Workflow Cost Control: History Bloat, Activity Retry Amplification, and ContinueAsNew

Temporal is architecturally different from every other AI agent orchestration framework in this series. Python SDKs like LangChain, CrewAI, and the OpenAI Agents SDK run in a single process — when the process dies, the agent state dies with it. Temporal workflows are durable: every function call, activity result, timer, and signal is persisted as an immutable event in a workflow history stored in the Temporal server. Workers replay this history to reconstruct execution state when they resume. This durability is Temporal's main feature, and it is also the source of all four AI cost failure modes in this guide.

Temporal Cloud pricing is based on actions: each workflow task, activity task, signal, timer, and child workflow start is a billable action. A research agent that calls 200 LLM activities generates at minimum 400 actions (schedule + completion per activity), plus workflow task actions for the orchestration logic between activities. This is fundamentally different from token-based billing — you pay for orchestration density, not just inference cost. High-frequency LLM tools inside a Temporal workflow compound both.

Four failure modes that are specific to Temporal AI workflows:

Workflow history accumulation — every activity result, including the full LLM response text, is stored as a history event. A research agent processing 500 sources accumulates thousands of events and megabytes of stored LLM output in the workflow history, forcing workers to replay all of it on every resume.
Activity retry amplification — the default Temporal RetryPolicy has MaximumAttempts: 0 (unlimited). LLM activities hitting API rate limits will retry indefinitely, each retry charged as a new activity action at Temporal Cloud pricing.
Child workflow fan-out runaway — fan-out patterns using workflow.ExecuteChildWorkflow are seeded by LLM output (e.g., "generate N sub-queries"). LLMs produce variable-length lists; without a ceiling, an agent that generates 200 sub-queries spawns 200 concurrent child workflows, each with its own history and action meter.
ContinueAsNew neglect — long-running AI agent workflows that never call workflow.ContinueAsNew accumulate history indefinitely. At Temporal Cloud storage pricing, a workflow with 10,000 events storing 40KB average LLM outputs represents 400MB of history. Every time the workflow processes a signal, the worker replays all 400MB to reach the current state.

Temporal Cloud action pricing (as of mid-2026): Actions are priced at approximately $25/million. An AI research agent running 500 LLM activities generates ~1,000 actions minimum — $0.025 for the Temporal overhead on top of LLM inference cost. At 1,000 agent runs/day, Temporal overhead is ~$25/day before any optimization. This is non-trivial when multiplied across multiple agent types.

Failure Mode 1: Workflow History Accumulation

Every result returned from a Temporal activity is serialized into the workflow history as an ActivityTaskCompleted event. For LLM activities, this includes the full model response — hundreds to thousands of tokens of text, serialized to JSON and stored in the Temporal server's persistence layer. A research workflow that runs 200 LLM calls doesn't just pay for 200 LLM API calls; it also accumulates 200 ActivityTaskCompleted events, each potentially storing 2–8KB of serialized LLM output.

The replay cost is the hidden problem. When a Temporal worker picks up an existing workflow (after a crash, after a long-sleep timer fires, after receiving a signal), it must replay the full history to reconstruct the workflow's Go or Python data structures. A workflow with 5,000 history events that average 4KB each requires deserializing 20MB of data and re-executing the workflow logic up to the current event — before processing the actual new event that woke the workflow.

The naive pattern stores complete LLM responses as activity return values:

// naive: full LLM response stored in history
func ResearchWorkflow(ctx workflow.Context, query string) (string, error) {
    results := make([]string, 0, 50)

    // 50 search sub-queries, each returning full LLM synthesis text
    for i := 0; i < 50; i++ {
        var synthesis string
        ao := workflow.ActivityOptions{
            StartToCloseTimeout: 60 * time.Second,
        }
        ctx = workflow.WithActivityOptions(ctx, ao)

        // BUG: full LLM response (2-8KB) stored as activity result in history
        err := workflow.ExecuteActivity(ctx, SynthesizeLLMActivity, query, i).Get(ctx, &synthesis)
        if err != nil {
            return "", err
        }
        results = append(results, synthesis)
    }

    // After 50 iterations: ~200KB of LLM text in history events
    // Replay cost: deserialize + re-evaluate all 50 results on every wake
    return strings.Join(results, "\n\n"), nil
}

After 50 activity completions with 4KB average responses, the history contains ~200KB of serialized LLM text in ActivityTaskCompleted payloads, plus the workflow task events, timer events, and the growing result accumulation. A workflow that processes 500 sources would carry ~2MB of LLM text in its history — loaded and deserialized on every signal receipt.

History event counts per activity execution:

Event type	Count per activity	Payload size	Notes
ActivityTaskScheduled	1	~200 bytes	Input args serialized
ActivityTaskStarted	1	~100 bytes	Worker identity
ActivityTaskCompleted	1	2–8KB	Full LLM output serialized here
WorkflowTaskScheduled/Started/Completed	1–2	~300 bytes each	Orchestration overhead between activities

The fix is to store large results externally and pass only identifiers through the workflow history. Write LLM outputs to an external store (S3, a database, Redis) and return only a reference key as the activity result. The history event payload shrinks from 4KB to 32 bytes:

// Fixed: store LLM output externally, pass only the key
type LLMResultRef struct {
    Key       string    `json:"key"`        // e.g., "research/run-abc/result-007"
    TokenCount int      `json:"token_count"` // for cost tracking
    Truncated  bool     `json:"truncated"`
}

func SynthesizeLLMActivityWithRef(
    ctx context.Context,
    query string,
    index int,
    store ResultStore,  // injected dependency: S3, Redis, etc.
) (LLMResultRef, error) {
    resp, err := callLLM(ctx, query)
    if err != nil {
        return LLMResultRef{}, err
    }

    key := fmt.Sprintf("research/%s/result-%03d", workflowRunID(ctx), index)
    if err := store.Put(ctx, key, resp.Text); err != nil {
        return LLMResultRef{}, fmt.Errorf("result store write failed: %w", err)
    }

    return LLMResultRef{
        Key:        key,
        TokenCount: resp.Usage.TotalTokens,
    }, nil
}

// Workflow now stores only 32-byte refs in history, not 4KB LLM texts
func ResearchWorkflow(ctx workflow.Context, query string) (string, error) {
    refs := make([]LLMResultRef, 0, 50)

    for i := 0; i < 50; i++ {
        var ref LLMResultRef
        ao := workflow.ActivityOptions{StartToCloseTimeout: 60 * time.Second}
        err := workflow.WithActivityOptions(ctx, ao).
            ExecuteActivity(ctx, SynthesizeLLMActivityWithRef, query, i).
            Get(ctx, &ref)
        if err != nil {
            return "", err
        }
        refs = append(refs, ref)
    }

    // Final aggregation reads from external store — not from history
    return aggregateFromStore(ctx, refs), nil
}

This pattern reduces history payload from ~200KB to ~1.6KB for a 50-activity workflow (50 refs × 32 bytes), cutting replay overhead by 99%. Apply it to any activity that returns LLM output, search results, or large document content.

Failure Mode 2: Activity Retry Amplification

Temporal's default RetryPolicy is generous by design — it assumes transient infrastructure failures, not rate-limited external APIs. The defaults are:

MaximumAttempts: 0 — unlimited retries
InitialInterval: 1s
BackoffCoefficient: 2.0
MaximumInterval: 100s
NonRetryableErrorTypes: [] — all errors are retryable by default

An LLM activity that hits a rate limit (429 Too Many Requests) will retry with exponential backoff: 1s, 2s, 4s, 8s, 16s, 32s, 64s, 100s, 100s, 100s… Each retry is a new activity attempt — a new ActivityTaskScheduled event, a new ActivityTaskStarted event, and eventually either a new ActivityTaskCompleted or ActivityTaskFailed event. At Temporal Cloud action pricing, 50 retries on a single rate-limited activity generates 100+ billable actions before the underlying quota window resets.

The rate limit window is the critical insight. OpenAI's rate limits reset every 60 seconds. If an activity starts retrying at second 0 and the limit resets at second 60, a retry policy with 1s initial interval and 2.0 coefficient will fire approximately 9 retries before the reset: 1+2+4+8+16+32 = 63 seconds. Nine retries × 2 actions/retry = 18 actions per rate-limited activity call. With 20 concurrent LLM activities all hitting the same rate limit simultaneously, that's 360 extra actions per 60-second window.

// EXPENSIVE: default retry policy on an LLM activity
func researchWorkflow(ctx workflow.Context, queries []string) error {
    for _, q := range queries {
        // Default ActivityOptions has unlimited retries on ALL errors
        err := workflow.ExecuteActivity(ctx,
            callLLMActivity, q,
        ).Get(ctx, nil)
        // If callLLMActivity hits a 429, it retries indefinitely
        // Each retry = 2+ Temporal Cloud actions
        if err != nil {
            return err
        }
    }
    return nil
}

The correct pattern sets an explicit RetryPolicy that aligns with the LLM provider's rate limit window and marks rate limit errors as non-retryable after a ceiling — letting the workflow handle the backoff at the orchestration level rather than the activity level:

import (
    "go.temporal.io/sdk/temporal"
    "go.temporal.io/sdk/workflow"
)

// LLM-tuned retry policy: cap attempts, match the 60s reset window
var llmRetryPolicy = &temporal.RetryPolicy{
    InitialInterval:    10 * time.Second,  // start after 10s, not 1s
    BackoffCoefficient: 1.5,               // slower growth than default 2.0
    MaximumInterval:    60 * time.Second,  // cap at one rate limit window
    MaximumAttempts:    4,                 // 4 attempts max: ~10+15+22+33 = 80s total
    NonRetryableErrorTypes: []string{
        "LLMContextLengthExceeded",        // retrying won't help
        "LLMInvalidAPIKey",                // credential error — not transient
        "LLMContentPolicyViolation",       // content error — not transient
    },
}

func researchWorkflowGuarded(ctx workflow.Context, queries []string) error {
    for _, q := range queries {
        ao := workflow.ActivityOptions{
            StartToCloseTimeout: 90 * time.Second,
            RetryPolicy:         llmRetryPolicy,
        }
        err := workflow.WithActivityOptions(ctx, ao).
            ExecuteActivity(ctx, callLLMActivity, q).
            Get(ctx, nil)

        if err != nil {
            var appErr *temporal.ApplicationError
            if errors.As(err, &appErr) && appErr.Type() == "RateLimitExhausted" {
                // All 4 attempts hit rate limits — back off at workflow level
                // This is a workflow.Sleep, not an activity retry — zero action cost
                workflow.Sleep(ctx, 90*time.Second)
                // Retry this specific query after the sleep
                err = workflow.WithActivityOptions(ctx, ao).
                    ExecuteActivity(ctx, callLLMActivity, q).
                    Get(ctx, nil)
            }
            if err != nil {
                return fmt.Errorf("query %q failed after retries: %w", q, err)
            }
        }
    }
    return nil
}

The key insight is that workflow.Sleep is a timer event, not a billable action in the same way as activity retries. A workflow sleeping for 90 seconds generates one TimerStarted and one TimerFired event — two actions total, regardless of the sleep duration. Four activity retries generate 8+ actions. For a rate-limited activity that needs to wait for a quota reset, sleeping at the workflow level is significantly cheaper than retrying at the activity level.

Python

from datetime import timedelta
from temporalio import workflow, activity
from temporalio.common import RetryPolicy

LLM_RETRY_POLICY = RetryPolicy(
    initial_interval=timedelta(seconds=10),
    backoff_coefficient=1.5,
    maximum_interval=timedelta(seconds=60),
    maximum_attempts=4,
    non_retryable_error_types=[
        "LLMContextLengthExceeded",
        "LLMInvalidAPIKey",
        "LLMContentPolicyViolation",
    ],
)

@workflow.defn
class ResearchWorkflowGuarded:
    @workflow.run
    async def run(self, queries: list[str]) -> str:
        results = []
        for q in queries:
            try:
                result = await workflow.execute_activity(
                    call_llm_activity,
                    q,
                    start_to_close_timeout=timedelta(seconds=90),
                    retry_policy=LLM_RETRY_POLICY,
                )
                results.append(result)
            except Exception as e:
                if "RateLimitExhausted" in str(type(e).__name__):
                    # Wait at workflow level — timer is cheap vs activity retries
                    await workflow.sleep(timedelta(seconds=90))
                    result = await workflow.execute_activity(
                        call_llm_activity,
                        q,
                        start_to_close_timeout=timedelta(seconds=90),
                        retry_policy=LLM_RETRY_POLICY,
                    )
                    results.append(result)
                else:
                    raise
        return "\n".join(results)

Failure Mode 3: Child Workflow Fan-Out Runaway

Fan-out in Temporal is implemented with workflow.ExecuteChildWorkflow (Go) or workflow.execute_child_workflow (Python). The pattern is common for parallel research: a parent workflow asks an LLM to decompose a query into N sub-queries, then spawns N child workflows to process them in parallel.

The failure mode is that N is determined by LLM output. LLMs asked to "generate all relevant sub-queries" produce variable-length lists, and prompt wording changes significantly affect the count. "Generate sub-queries for a comprehensive literature review on transformer architectures" might return 12 sub-queries in one run and 87 sub-queries in another, depending on the model's verbosity. Without a ceiling, both are launched as concurrent child workflows:

// DANGEROUS: fan-out count determined by LLM, no ceiling
func ResearchOrchestratorWorkflow(ctx workflow.Context, topic string) error {
    var subQueries []string
    // LLM generates the fan-out width — no ceiling enforced
    err := workflow.ExecuteActivity(ctx,
        generateSubQueriesActivity, topic,
    ).Get(ctx, &subQueries)
    if err != nil {
        return err
    }

    // subQueries could be 5 or 500 — all launched concurrently
    futures := make([]workflow.Future, len(subQueries))
    for i, q := range subQueries {
        cwo := workflow.ChildWorkflowOptions{
            WorkflowID: fmt.Sprintf("research-%s-%d", topic, i),
        }
        futures[i] = workflow.ExecuteChildWorkflow(
            workflow.WithChildOptions(ctx, cwo),
            ResearchSubWorkflow, q,
        )
    }

    // Wait for all — if 500 children launched, 500 workflow starts billed
    for _, f := range futures {
        if err := f.Get(ctx, nil); err != nil {
            return err
        }
    }
    return nil
}

At Temporal Cloud action pricing, each child workflow start is 2 actions (WorkflowExecutionStarted + first WorkflowTaskScheduled). Five hundred child workflows = 1,000 actions for the fan-out alone, before any activity work. If each child workflow runs 10 LLM activities (20 actions each), the total for one parent run is 1,000 + 500×200 = 101,000 actions — roughly $2.53 at $25/million, purely in Temporal overhead, for a single orchestration run.

The fix enforces a hard ceiling on fan-out width and uses a semaphore to limit concurrency within that ceiling. Both must be present: the ceiling prevents unbounded launch cost, and the semaphore prevents simultaneous LLM quota exhaustion across all children:

const (
    MaxFanOutWidth    = 20  // hard ceiling regardless of LLM output
    MaxConcurrentLLMs = 5   // semaphore: max simultaneous child workflows
)

func ResearchOrchestratorWorkflowGuarded(ctx workflow.Context, topic string) error {
    var subQueries []string
    err := workflow.ExecuteActivity(ctx,
        generateSubQueriesActivity, topic,
    ).Get(ctx, &subQueries)
    if err != nil {
        return err
    }

    // Enforce ceiling — truncate LLM output to max width
    if len(subQueries) > MaxFanOutWidth {
        workflow.GetLogger(ctx).Warn("fan-out ceiling applied",
            "llm_count", len(subQueries),
            "ceiling", MaxFanOutWidth,
        )
        subQueries = subQueries[:MaxFanOutWidth]
    }

    // Semaphore channel — limits concurrent children in flight
    sem := workflow.NewChannel(ctx)

    futures := make([]workflow.Future, 0, len(subQueries))
    inFlight := 0

    for i, q := range subQueries {
        // Block if at concurrency ceiling
        for inFlight >= MaxConcurrentLLMs {
            sem.Receive(ctx, nil)
            inFlight--
        }

        q := q // capture
        i := i
        cwo := workflow.ChildWorkflowOptions{
            WorkflowID: fmt.Sprintf("research-%s-%d", topic, i),
        }
        f := workflow.ExecuteChildWorkflow(
            workflow.WithChildOptions(ctx, cwo),
            ResearchSubWorkflow, q,
        )
        futures = append(futures, f)
        inFlight++

        // Signal semaphore when child completes
        workflow.Go(ctx, func(ctx workflow.Context) {
            f.Get(ctx, nil)
            sem.Send(ctx, nil)
        })
    }

    // Drain remaining futures
    for _, f := range futures {
        if err := f.Get(ctx, nil); err != nil {
            return err
        }
    }
    return nil
}

Alternative to child workflows for bounded fan-out: For fan-out widths under 10, consider running sub-tasks as parallel activities (workflow.Go + workflow.ExecuteActivity) rather than child workflows. Activities share the parent workflow's history instead of creating separate workflow histories — cheaper in storage and action count for small fan-out. Reserve child workflows for tasks that need independent durability (long-running, may outlive the parent's reasonable wait window).

Failure Mode 4: ContinueAsNew Neglect

Temporal workflows are designed to run for days, months, or indefinitely — a chat agent that stays alive across user sessions, a monitoring workflow that checks API status every 5 minutes. The mechanism for long-running workflows to stay healthy is workflow.ContinueAsNew: it terminates the current workflow execution and immediately starts a fresh execution with a new empty history, passing any state you choose as the initial input. The workflow appears continuous from the outside, but internally the history is reset.

Without ContinueAsNew, the workflow history grows forever. Temporal enforces a hard limit of 50,000 history events and a configurable maximum history size (default 50MB in the server configuration). Hitting either limit terminates the workflow with WORKFLOW_MAX_HISTORY_SIZE_LIMIT_EXCEEDED — an unrecoverable error that drops whatever work was in progress.

For AI agent workflows, the practical problem is replay cost, which degrades long before the hard limit. A support bot workflow that processes 100 customer messages, each triggering 3 LLM activities with 3KB average responses, accumulates:

100 messages × 3 activities × 3 events/activity = 900 activity events
~200 workflow task events for orchestration
900 ActivityTaskCompleted payloads × 3KB = 2.7MB of serialized LLM text in history
Total: ~1,100 events, ~3MB of history data

Each time this workflow processes a new customer message (woken by a signal), the worker must load and replay all 1,100 events and 3MB of data before doing anything. At 1,000 customer interactions per day across 500 active sessions, this replay overhead accumulates to measurable compute cost at the Temporal worker level — independent of any LLM API cost.

// DANGEROUS: support bot workflow that never calls ContinueAsNew
type SupportBotState struct {
    SessionID   string
    History     []Message
    TotalTokens int
}

func SupportBotWorkflow(ctx workflow.Context, state SupportBotState) error {
    ch := workflow.GetSignalChannel(ctx, "user-message")

    for {  // infinite loop — workflow runs forever without ContinueAsNew
        var msg string
        ch.Receive(ctx, &msg)

        var response string
        err := workflow.ExecuteActivity(ctx,
            callLLMActivity, state.History, msg,
        ).Get(ctx, &response)
        if err != nil {
            return err
        }

        state.History = append(state.History, Message{Role: "assistant", Content: response})
        // After 100 turns: 300+ activities, 1,100+ events, 3MB of LLM text in history
        // Replay cost per signal: O(events) deserialization + re-execution
    }
}

The fix calls ContinueAsNew after a configurable number of turns, carrying forward only the minimal state needed for the next session — typically a compressed summary of the conversation, not the full history:

const ContinueAsNewAfterTurns = 20

func SupportBotWorkflowGuarded(ctx workflow.Context, state SupportBotState) error {
    ch := workflow.GetSignalChannel(ctx, "user-message")
    turnsThisExecution := 0

    for {
        var msg string
        ch.Receive(ctx, &msg)

        var response string
        err := workflow.ExecuteActivity(ctx,
            callLLMActivity, state.History, msg,
        ).Get(ctx, &response)
        if err != nil {
            return err
        }

        state.History = append(state.History, Message{Role: "assistant", Content: response})
        turnsThisExecution++

        if turnsThisExecution >= ContinueAsNewAfterTurns {
            // Summarize history before ContinueAsNew to preserve context
            var summary string
            err := workflow.ExecuteActivity(ctx,
                summarizeHistoryActivity, state.History,
            ).Get(ctx, &summary)
            if err != nil {
                return err
            }

            // Start fresh execution with summarized state
            // History resets to zero — replay cost returns to baseline
            freshState := SupportBotState{
                SessionID: state.SessionID,
                History: []Message{
                    {Role: "system", Content: "Prior conversation summary: " + summary},
                },
                TotalTokens: state.TotalTokens,
            }
            return workflow.NewContinueAsNewError(ctx, SupportBotWorkflowGuarded, freshState)
        }
    }
}

The workflow.NewContinueAsNewError return value is a special error type that Temporal interprets as a clean handoff, not a failure. The workflow ID is preserved; the workflow appears uninterrupted to external observers. Worker replay cost resets to near-zero on the new execution.

Python

from temporalio import workflow
from datetime import timedelta

MAX_TURNS_PER_EXECUTION = 20

@workflow.defn
class SupportBotWorkflowGuarded:
    def __init__(self) -> None:
        self._turns = 0
        self._history: list[dict] = []

    @workflow.run
    async def run(self, state: dict) -> None:
        self._history = state.get("history", [])
        self._session_id = state["session_id"]
        self._turns = 0

        while True:
            # Wait for next user message signal
            msg = await workflow.wait_condition(
                lambda: len(self._pending_messages) > 0
            )
            user_msg = self._pending_messages.pop(0)

            response = await workflow.execute_activity(
                call_llm_activity,
                args=[self._history, user_msg],
                start_to_close_timeout=timedelta(seconds=60),
            )

            self._history.append({"role": "assistant", "content": response})
            self._turns += 1

            if self._turns >= MAX_TURNS_PER_EXECUTION:
                # Summarize and ContinueAsNew
                summary = await workflow.execute_activity(
                    summarize_history_activity,
                    self._history,
                    start_to_close_timeout=timedelta(seconds=30),
                )
                fresh_state = {
                    "session_id": self._session_id,
                    "history": [{"role": "system", "content": f"Prior context: {summary}"}],
                }
                workflow.continue_as_new(fresh_state)

    @workflow.signal
    def user_message(self, msg: str) -> None:
        self._pending_messages.append(msg)

Set MAX_TURNS_PER_EXECUTION based on your average activity payload size. The formula is: target history size ÷ (events per turn × avg payload size). For a workflow with 3 activities per turn at 3KB each, targeting 5MB history maximum: 5,000,000 ÷ (3 × 3,072) ≈ 540 turns. In practice, include a safety margin and target 60–70% of the theoretical maximum — workflows processing user input should never approach the limit during normal operation.

Composite Guard: TemporalAgentPolicy

All four failure modes interact. A workflow accumulating history (mode 1) that never calls ContinueAsNew (mode 4) will eventually hit the 50,000-event limit. If it also has unlimited activity retries (mode 2) on LLM calls and spawns unbounded child workflows (mode 3), the cost profile is exponential rather than linear. The following policy struct enforces all four ceilings consistently across workflow implementations:

package workflowguard

import (
    "fmt"
    "time"
    "go.temporal.io/sdk/temporal"
    "go.temporal.io/sdk/workflow"
)

// TemporalAgentPolicy centralizes all cost-control thresholds for AI agent workflows.
type TemporalAgentPolicy struct {
    // History
    ContinueAsNewAfterTurns int

    // Activity retries
    LLMRetryPolicy *temporal.RetryPolicy

    // Fan-out
    MaxFanOutWidth     int
    MaxConcurrentFanOut int

    // External store for large payloads
    ResultStore ResultStore
}

func DefaultPolicy(store ResultStore) *TemporalAgentPolicy {
    return &TemporalAgentPolicy{
        ContinueAsNewAfterTurns: 20,
        LLMRetryPolicy: &temporal.RetryPolicy{
            InitialInterval:    10 * time.Second,
            BackoffCoefficient: 1.5,
            MaximumInterval:    60 * time.Second,
            MaximumAttempts:    4,
            NonRetryableErrorTypes: []string{
                "LLMContextLengthExceeded",
                "LLMInvalidAPIKey",
                "LLMContentPolicyViolation",
            },
        },
        MaxFanOutWidth:      20,
        MaxConcurrentFanOut: 5,
        ResultStore:         store,
    }
}

// ActivityOptions returns configured options for an LLM activity.
func (p *TemporalAgentPolicy) ActivityOptions(timeout time.Duration) workflow.ActivityOptions {
    return workflow.ActivityOptions{
        StartToCloseTimeout: timeout,
        RetryPolicy:         p.LLMRetryPolicy,
    }
}

// ClampFanOut enforces the fan-out ceiling and logs a warning if clamping occurred.
func (p *TemporalAgentPolicy) ClampFanOut(ctx workflow.Context, items []string) []string {
    if len(items) <= p.MaxFanOutWidth {
        return items
    }
    workflow.GetLogger(ctx).Warn(
        "TemporalAgentPolicy: fan-out ceiling applied",
        "requested", len(items),
        "ceiling", p.MaxFanOutWidth,
    )
    return items[:p.MaxFanOutWidth]
}

// ShouldContinueAsNew returns true if the workflow should reset its history.
func (p *TemporalAgentPolicy) ShouldContinueAsNew(turnCount int) bool {
    return turnCount >= p.ContinueAsNewAfterTurns
}

// ResultStore interface — implement with S3, Redis, or a database.
type ResultStore interface {
    Put(ctx interface{}, key, value string) error
    Get(ctx interface{}, key string) (string, error)
}

Temporal vs Other Orchestration Frameworks: Cost Profile Comparison

Dimension	In-process frameworks (LangChain, CrewAI)	Temporal (unguarded)	Temporal (with policy)
Activity result storage	In memory — no persistence cost	Full result serialized to history event	Ref key only — result in external store
Retry cost	No per-retry infrastructure charge	2+ Temporal actions per retry attempt	4-attempt cap; workflow-level Sleep for quota
Fan-out width	Unconstrained (CPU bound)	Unbounded child workflows, each with own history	Ceiling enforced before child launch
Long-running cost	Process restart loses state; not designed for weeks	Growing history → O(n) replay cost per signal	ContinueAsNew resets history on schedule
Durability	None (in-process state)	Full history replay after crash	Same durability; replay cost bounded by history size
Observability	Custom logging only	Temporal UI shows all events and actions	Same + policy violations in structured workflow logs

Production Checklist

Audit every activity return type — search your codebase for activities that return string or large structs. Any return value containing LLM output should be replaced with an external store reference. The history payload size is the most impactful single optimization for Temporal AI workflows.
Set MaximumAttempts on all LLM activities — the default of 0 (unlimited) is never correct for LLM API calls. Start with 4 and measure; only increase if you have evidence that your LLM provider's rate limit window requires more retries to eventually succeed.
Add NonRetryableErrorTypes for semantic failures — context length exceeded, invalid API key, content policy violations are not transient. Every retry on these is guaranteed to fail and costs actions with no benefit. Type your errors in activity implementations and register the error type names in the policy.
Instrument fan-out width in your Temporal metrics — the Temporal Go and Python SDKs support custom metrics via the MetricsHandler. Record LLM-generated list length before clamping; alert when the raw LLM count exceeds your ceiling by more than 2×, which indicates a prompt engineering issue.
Test ContinueAsNew paths in development — set ContinueAsNewAfterTurns = 2 in test environments to force the code path on every test run. ContinueAsNew bugs (dropped signals, state serialization errors) are invisible until a long-running workflow reaches the threshold in production.

Temporal Cloud vs self-hosted cost model: Self-hosted Temporal (OSS server on your own infrastructure) doesn't charge per-action — you pay for the infrastructure running the Temporal server. The history bloat (mode 1) and ContinueAsNew (mode 4) failure modes still apply because they affect worker compute cost and database storage. Action-count optimization (modes 2 and 3) matters most for Temporal Cloud billing specifically; for self-hosted deployments, prioritize history size reduction and ContinueAsNew discipline instead.

FAQ

Does Temporal compress workflow history? Can I reduce storage cost without changing my code?

Temporal supports configurable data converters that can compress history payloads using zlib or zstd before persistence. Enabling compression at the data converter level can reduce history storage by 60–80% for text-heavy LLM outputs. However, compression doesn't reduce action count (billing is on actions, not storage size on Temporal Cloud) and doesn't reduce the deserialization work on replay — the worker still has to decompress and process all historical events. Compression is a useful supplement but not a substitute for storing large payloads externally.

What happens to in-flight activities when ContinueAsNew fires?

ContinueAsNew is a blocking operation from the workflow's perspective — you call it after all in-flight activities have completed (or been cancelled). The typical pattern is to check the ShouldContinueAsNew condition only at a clean quiescence point (end of a processing turn, after draining a signal queue batch) rather than mid-loop. If you call ContinueAsNew while activities are in flight, those activities complete but their results are discarded — which can cause lost work. Structure your loop to complete the current batch before evaluating whether to continue-as-new.

How do I track total token consumption across ContinueAsNew boundaries?

Pass cumulative token counts in the state struct that ContinueAsNew carries forward. In the example above, the SupportBotState includes a TotalTokens field that accumulates across executions. Each new execution starts from the prior total. Separately, use activity heartbeating with a HeartbeatDetails payload to checkpoint token usage within long activities — this also lets the worker recover the token count if the activity is retried after a worker crash. For Temporal Cloud, correlate your token counts with Temporal Cloud metrics on workflow action counts to build a cost-per-workflow dashboard.

Can I query the current event count from inside a workflow to decide when to ContinueAsNew?

Temporal workflows run in a sandboxed environment that intentionally limits access to runtime metadata to preserve determinism. You cannot query the current history event count from within a workflow function. The standard approach is to count events yourself via turn count or activity execution count, as shown in the examples. As an approximation: 1 activity = ~3 history events + payload; 1 signal received = ~2 events; 1 timer = ~2 events. A turn-based counter with some margin is more reliable than trying to derive the count from workflow metrics.

How does this compare to Dapr's durable workflow cost failure modes?

The failure modes are structurally similar — both Temporal and Dapr workflows persist execution history and replay it on resume. The key differences are in pricing model and defaults. Dapr workflows on Azure Container Apps bill by vCPU-second of worker runtime; Temporal Cloud bills per action. Dapr's default retry policy is more conservative (3 attempts) than Temporal's (unlimited). Dapr's ContinueAsNew equivalent (ContinueAsNewAsync in the Dapr workflow authoring SDK) follows the same pattern. For teams choosing between the two for AI agent orchestration, the Dapr AI agent cost control guide covers the Dapr-specific nuances.

Catch Temporal workflow cost spirals before they compound

RunGuard's SDK instruments your workflow activity executions, fan-out widths, and session token totals with automatic circuit breakers. Policy enforcement, retry classification, and ContinueAsNew triggers ship out of the box for both Go and Python Temporal workers.

Start free 14-day trial