June 13, 2026 Dapr Actor Model Cost Control

Dapr AI Agents Cost Control: Loop Detection in Actor Model Orchestration

Every major AI agent framework eventually needs to answer the same question: what happens when an agent gets stuck? LangGraph answers with a recursion limit. CrewAI answers with max_iter. AutoGen answers with consecutive auto-reply caps. These are in-process counters — they work because the framework controls the execution loop.

Dapr answers differently. Dapr is a distributed application runtime, not a framework. When you build AI agents on Dapr, the runtime manages state externally in a pluggable state store (Redis, Cosmos DB, DynamoDB), schedules reminders that survive process crashes, and coordinates multi-step workflows through the Durable Task Framework. The runtime's persistence layer is exactly what makes Dapr AI agents compelling for production — and exactly what creates cost failure modes that in-process guards cannot catch.

This post covers four failure modes that are structurally unique to Dapr-based AI orchestration. Each one emerges from a Dapr primitive — virtual actors, reminders, workflow activities, or state stores — rather than from the AI framework layer. If your agents use dapr-ext-fastapi, dapr-workflows, or the Dapr Python SDK's actor client, these failure modes apply to you directly.

Scope. Code examples target Dapr 1.13+ with the Python SDK (dapr>=1.13). For in-process agent frameworks, see LangGraph Cost Control, CrewAI Flows Cost Control, and AutoGen Cost Control. For the cross-framework pattern reference, see AI Agent Cost Control Pattern Reference.

Why Dapr AI agents fail differently

The critical difference is persistence boundary. In-process frameworks keep all execution state in memory: the message history list, the tool call counter, the recursion depth. When the process exits — whether by crash, scale-down, or OOM kill — that state is gone and the loop stops. A budget guard that lives in the same process will always see the loop while it runs.

Dapr moves the state boundary outside the process:

Actor state is saved to a configured state store after every method call. A new actor instance reads it back on the next activation. The state survives process restarts.
Reminders are stored in the Dapr runtime sidecar and survive both application crashes and sidecar restarts (for persistent reminder backends). They fire callbacks into your actor on a schedule you define — regardless of whether the application was up when the scheduled time passed.
Workflow state is checkpointed after every completed activity. A workflow that was mid-execution when the process died resumes from the last checkpoint on restart. The Durable Task engine guarantees at-least-once execution of each activity.
State store keys are external. A blob stored under key agent:user123:history grows forever unless your code explicitly trims it on each write.

Each of these properties is desirable for production resilience. Each also creates a failure mode where a cost loop outlasts the process that started it — sometimes surviving for days before it's noticed in the billing dashboard.

The four failure modes

1. Actor re-entrancy loop

Dapr virtual actors are single-threaded by default: only one method call executes at a time per actor instance. Re-entrancy, when enabled, relaxes this: an actor method can call back into the same actor (or into a chain of actors that call back into the first) without deadlocking.

Re-entrancy is commonly enabled for AI agent actors because agent tool calls often invoke other actors in the same application. A ResearchAgent actor calls a SearchTool actor, which calls a SummaryAgent actor, which determines more research is needed and calls back into ResearchAgent. With re-entrancy disabled this deadlocks immediately. With re-entrancy enabled it loops.

# config/config.yaml — re-entrancy enabled
apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
  name: appconfig
spec:
  features:
    - name: ActorReentrancy
      enabled: true
  actor:
    reentrancy:
      enabled: true
      maxStackDepth: 32   # default 32 — 32 re-entrant LLM calls before hard stop

The built-in maxStackDepth is 32. At GPT-4o pricing of $0.005 per 1K output tokens and an average of 800 tokens per agent step, 32 re-entrant calls costs about $0.13 — acceptable in testing, catastrophic when the loop fires 500 times across concurrent user sessions. maxStackDepth limits depth, not breadth.

The correct guard is an explicit invocation counter stored in actor state, checked at method entry before any LLM call:

import asyncio
from dapr.actor import Actor, ActorInterface, ActorProxy, actormethod
from dapr.clients import DaprClient
from dataclasses import dataclass, asdict
import json


class ResearchAgentInterface(ActorInterface):
    @actormethod(name="Research")
    async def research(self, query: str) -> dict: ...

    @actormethod(name="GetStats")
    async def get_stats(self) -> dict: ...


@dataclass
class AgentState:
    invocation_count: int = 0
    total_llm_tokens: int = 0
    history: list = None

    def __post_init__(self):
        if self.history is None:
            self.history = []


class ResearchAgent(Actor, ResearchAgentInterface):
    MAX_INVOCATIONS = 8      # per actor instance lifetime
    MAX_LLM_TOKENS  = 80_000 # cumulative token ceiling

    def __init__(self, ctx, actor_id):
        super().__init__(ctx, actor_id)

    async def _on_activate(self):
        state_bytes = await self._state_manager.try_get_state("agent_state")
        if state_bytes.has_value:
            self._state = AgentState(**json.loads(state_bytes.value))
        else:
            self._state = AgentState()

    async def _save_state(self):
        await self._state_manager.set_state(
            "agent_state",
            json.dumps(asdict(self._state))
        )
        await self._state_manager.save_state()

    async def research(self, query: str) -> dict:
        # Guard BEFORE any LLM call — state is authoritative across restarts
        if self._state.invocation_count >= self.MAX_INVOCATIONS:
            return {
                "error": "ActorInvocationLimitExceeded",
                "invocations": self._state.invocation_count,
                "message": f"Actor {self.id} reached max invocations ({self.MAX_INVOCATIONS}). "
                           f"Create a new actor instance to continue."
            }

        if self._state.total_llm_tokens >= self.MAX_LLM_TOKENS:
            return {
                "error": "ActorTokenLimitExceeded",
                "tokens_used": self._state.total_llm_tokens,
                "message": f"Token ceiling ({self.MAX_LLM_TOKENS:,}) reached for actor {self.id}."
            }

        self._state.invocation_count += 1
        await self._save_state()  # persist BEFORE the LLM call

        # --- LLM call ---
        response = await self._call_llm(query, self._state.history)
        tokens_used = response.get("usage", {}).get("total_tokens", 0)

        self._state.total_llm_tokens += tokens_used
        # Bounded history: keep last 20 turns, never grow unbounded
        self._state.history = (self._state.history + [
            {"role": "user",      "content": query},
            {"role": "assistant", "content": response["content"]}
        ])[-20:]
        await self._save_state()

        return {
            "content":     response["content"],
            "invocations": self._state.invocation_count,
            "tokens_used": self._state.total_llm_tokens
        }

    async def get_stats(self) -> dict:
        return asdict(self._state)

    async def _call_llm(self, query: str, history: list) -> dict:
        # Your actual LLM client here — placeholder shows token tracking shape
        raise NotImplementedError("Wire your LLM client")

Saving state before the LLM call is deliberate. If the process crashes mid-call, the counter has already been incremented. On restart the actor reloads the incremented count, preventing the "invocation appears free because the prior attempt crashed before save" pattern.

2. Reminder infinite recursion

Dapr reminders are a persistent scheduling primitive. An actor registers a reminder with a delay and optionally a period, and the Dapr runtime fires the reminder callback on schedule — even if the application was down when the scheduled time passed (the callback fires on next startup). This makes reminders useful for polling agents, retry schedulers, and periodic review tasks.

The failure mode: a fault handler registers a new reminder instead of clearing the existing one. The pattern appears frequently in "retry on failure" logic where the developer intends to back off and try again later, but forgets that reminders are additive — registering a new reminder with the same name replaces the prior one's schedule but does not reset the "has ever been registered" state. A bug that registers a one-shot reminder with period=timedelta(0) but then re-registers it on every callback creates an infinite callback chain that survives application restarts.

import asyncio
from datetime import timedelta
from dapr.actor import Actor, ActorInterface, actormethod
import json
import time


class ScheduledResearchAgentInterface(ActorInterface):
    @actormethod(name="StartResearch")
    async def start_research(self, config: dict) -> None: ...

    @actormethod(name="receive_reminder")
    async def receive_reminder(self, name: str, state: bytes,
                               due_time: timedelta, period: timedelta) -> None: ...


class ScheduledResearchAgent(Actor, ScheduledResearchAgentInterface):
    REMINDER_NAME   = "research_poll"
    MAX_REMINDERS   = 5    # how many times this reminder may ever fire
    REMINDER_PERIOD = timedelta(minutes=15)

    async def _on_activate(self):
        raw = await self._state_manager.try_get_state("reminder_count")
        self._reminder_count = int(raw.value) if raw.has_value else 0

    async def start_research(self, config: dict) -> None:
        if self._reminder_count >= self.MAX_REMINDERS:
            # Unregister to be safe — idempotent if already unregistered
            await self.unregister_reminder(self.REMINDER_NAME)
            return
        await self.register_reminder(
            reminder_name=self.REMINDER_NAME,
            state=json.dumps(config).encode(),
            due_time=timedelta(seconds=30),
            period=self.REMINDER_PERIOD,
        )

    async def receive_reminder(self, name: str, state: bytes,
                               due_time: timedelta, period: timedelta) -> None:
        if name != self.REMINDER_NAME:
            return

        # Check limit FIRST — before any LLM work
        if self._reminder_count >= self.MAX_REMINDERS:
            await self.unregister_reminder(self.REMINDER_NAME)
            return

        self._reminder_count += 1
        await self._state_manager.set_state("reminder_count", str(self._reminder_count))
        await self._state_manager.save_state()

        try:
            config = json.loads(state.decode())
            await self._run_research_cycle(config)
        except Exception as exc:
            # DO NOT re-register the reminder here — let the period handle it
            # Re-registering on error was the original bug that caused infinite recursion
            await self._log_error(exc)

        # Auto-unregister after final invocation
        if self._reminder_count >= self.MAX_REMINDERS:
            await self.unregister_reminder(self.REMINDER_NAME)

    async def _run_research_cycle(self, config: dict) -> None:
        raise NotImplementedError("Wire your LLM client")

    async def _log_error(self, exc: Exception) -> None:
        print(f"[ScheduledResearchAgent] error in reminder: {exc}")

The key discipline: never re-register a reminder inside the reminder callback's error handler. If you need exponential backoff, use a one-shot timer (set period=timedelta(0)) and register a new one-shot reminder with doubled delay only after confirming the LLM call partially succeeded. The counter persists across restarts — when the process comes back up, _on_activate loads the saved count and the guard applies immediately.

3. Workflow activity retry storm

Dapr Workflow (built on the Durable Task Framework) provides orchestration with automatic checkpointing. Each workflow activity can be configured with a RetryPolicy specifying maximum retry attempts, first retry interval, and backoff coefficient. When a Dapr Workflow runs LLM calls as activities, the retry policy applies to every transient failure: rate limits, timeouts, network errors, and model-reported service unavailability.

The failure mode: a RetryPolicy with no maximum retry count, or a very high one, combined with a model that repeatedly returns rate-limit errors. Each retry is an additional LLM call charged to your account. With Dapr Workflow's default exponential backoff, the interval grows between retries — but the total number of calls can still reach hundreds per workflow instance if the model is in a sustained overload period.

from dapr.ext.workflow import WorkflowRuntime, DaprWorkflowContext, WorkflowActivityContext
from dapr.ext.workflow.workflow_activity_context import RetryPolicy
from datetime import timedelta
import asyncio


# Safe retry policy: bounded retries + ceiling before fallback
SAFE_RETRY_POLICY = RetryPolicy(
    first_retry_interval=timedelta(seconds=5),
    max_number_of_attempts=3,      # absolute maximum — not "unlimited"
    backoff_coefficient=2.0,
    max_retry_interval=timedelta(seconds=60),
    retry_timeout=timedelta(minutes=5),
)

# Budget guard for cumulative workflow spending
class WorkflowBudget:
    def __init__(self, max_usd: float = 0.50):
        self.max_usd       = max_usd
        self.spent_usd     = 0.0
        self.activity_calls = 0

    def record_call(self, input_tokens: int, output_tokens: int,
                    cost_per_input_1k: float = 0.005,
                    cost_per_output_1k: float = 0.015) -> None:
        self.spent_usd += (input_tokens / 1000 * cost_per_input_1k +
                           output_tokens / 1000 * cost_per_output_1k)
        self.activity_calls += 1

    def is_over_budget(self) -> bool:
        return self.spent_usd >= self.max_usd

    def summary(self) -> dict:
        return {
            "spent_usd":      round(self.spent_usd, 4),
            "activity_calls": self.activity_calls,
            "max_usd":        self.max_usd,
        }


def research_workflow(ctx: DaprWorkflowContext, wf_input: dict) -> dict:
    budget = WorkflowBudget(max_usd=wf_input.get("budget_usd", 0.50))
    steps  = wf_input.get("research_steps", [])
    results = []

    for step in steps:
        # Check budget before scheduling each activity
        if budget.is_over_budget():
            return {
                "status":  "budget_exhausted",
                "budget":  budget.summary(),
                "results": results,
            }

        # Schedule activity with bounded retry policy
        result = yield ctx.call_activity(
            llm_research_activity,
            input={"step": step, "budget_remaining": budget.max_usd - budget.spent_usd},
            retry_policy=SAFE_RETRY_POLICY,
        )

        if result.get("error"):
            # Treat activity error as soft stop — log and break rather than retry at workflow level
            break

        budget.record_call(
            input_tokens=result.get("input_tokens", 0),
            output_tokens=result.get("output_tokens", 0),
        )
        results.append(result)

    return {
        "status":  "completed",
        "budget":  budget.summary(),
        "results": results,
    }


def llm_research_activity(ctx: WorkflowActivityContext, activity_input: dict) -> dict:
    step = activity_input["step"]
    # Wire your LLM client here — return {"content":..., "input_tokens":..., "output_tokens":...}
    # Raise an exception to trigger the RetryPolicy; return {"error":...} for soft failures
    raise NotImplementedError("Wire your LLM client")

Two layers of protection: the RetryPolicy hard-caps how many times a single activity retries on transient error, and the WorkflowBudget guard checks cumulative spend before scheduling each subsequent activity. If the model is rate-limited and all three retries fail, the activity raises and the workflow catches the exception rather than retrying at the workflow level.

4. State store unbounded accumulation

Dapr's state store API is key-value. You save state with client.save_state(store_name, key, value) and retrieve it with client.get_state(store_name, key). The value is an opaque blob — Dapr applies no schema, size enforcement, or TTL by default.

AI agent applications commonly store conversation history under a per-user or per-session key. Each turn appends to the history list and saves the full updated list back under the key. The key's value blob grows monotonically. After 100 turns with an average of 500 tokens per turn, the history blob is 50,000 tokens — and every subsequent LLM call receives the full 50,000 tokens as context, paying input token cost proportional to total conversation length.

from dapr.clients import DaprClient
from dapr.clients.grpc._state import StateItem
import json
from typing import Optional


STATE_STORE = "statestore"
MAX_HISTORY_TURNS    = 20    # keep last 20 turns (40 messages)
MAX_HISTORY_TOKENS   = 8_000 # estimated token ceiling for history blob
CHARS_PER_TOKEN_EST  = 4.0   # conservative estimate


class BoundedAgentStateStore:
    """
    Wraps Dapr state store with bounded history management.
    Prevents unbounded conversation history growth that inflates LLM input costs.
    """

    def __init__(self, client: DaprClient, actor_id: str):
        self._client   = client
        self._actor_id = actor_id
        self._key      = f"agent:{actor_id}:history"

    def _estimate_tokens(self, messages: list) -> int:
        total_chars = sum(len(str(m)) for m in messages)
        return int(total_chars / CHARS_PER_TOKEN_EST)

    def _trim_history(self, messages: list) -> list:
        """Keep last MAX_HISTORY_TURNS turns; then further trim if still over token ceiling."""
        # Turn = user + assistant pair (2 messages)
        trimmed = messages[-(MAX_HISTORY_TURNS * 2):]

        # Secondary trim: drop oldest pairs until under token ceiling
        while self._estimate_tokens(trimmed) > MAX_HISTORY_TOKENS and len(trimmed) >= 2:
            trimmed = trimmed[2:]  # drop oldest user+assistant pair

        return trimmed

    def load_history(self) -> list:
        result = self._client.get_state(STATE_STORE, self._key)
        if not result.data:
            return []
        try:
            return json.loads(result.data.decode())
        except (json.JSONDecodeError, UnicodeDecodeError):
            return []

    def append_and_save(self, user_message: str, assistant_message: str) -> dict:
        history = self.load_history()
        history.append({"role": "user",      "content": user_message})
        history.append({"role": "assistant", "content": assistant_message})

        trimmed = self._trim_history(history)
        dropped = len(history) - len(trimmed)

        self._client.save_state(
            STATE_STORE,
            self._key,
            json.dumps(trimmed).encode()
        )

        return {
            "history_turns": len(trimmed) // 2,
            "est_tokens":    self._estimate_tokens(trimmed),
            "turns_dropped": dropped // 2,
        }

    def get_context_for_llm(self) -> tuple[list, int]:
        """Return (history, estimated_token_count) for pre-call budget check."""
        history = self.load_history()
        return history, self._estimate_tokens(history)

    def clear(self) -> None:
        self._client.delete_state(STATE_STORE, self._key)


# Usage pattern
def handle_user_message(actor_id: str, user_message: str) -> dict:
    with DaprClient() as client:
        store = BoundedAgentStateStore(client, actor_id)
        history, est_tokens = store.get_context_for_llm()

        # Pre-call budget gate
        if est_tokens > MAX_HISTORY_TOKENS:
            return {
                "error":     "ContextCeilingExceeded",
                "est_tokens": est_tokens,
                "message":   "History exceeds context ceiling. Use store.clear() to reset."
            }

        # --- LLM call with history ---
        response = _call_llm(history + [{"role": "user", "content": user_message}])
        stats = store.append_and_save(user_message, response["content"])

        return {"content": response["content"], "state_stats": stats}


def _call_llm(messages: list) -> dict:
    raise NotImplementedError("Wire your LLM client")

The trim happens at write time, not read time. Reading back a trimmed history is cheap. Allowing unbounded growth and then trying to summarize at read time is expensive — you still pay the full context cost for the summarization call.

Composite guard: DaprAgentGuard

The four guards above are independent — each addresses a different Dapr primitive. A DaprAgentGuard class composes them into a single validation layer that any Dapr AI agent application can import:

from dapr.clients import DaprClient
from dapr.ext.workflow.workflow_activity_context import RetryPolicy
from datetime import timedelta
import json


class DaprAgentGuard:
    """
    Composite guard for Dapr AI agent applications.
    Enforces per-actor invocation limits, workflow budget ceilings,
    and bounded state store history.

    Use at every actor method entry and before every workflow activity scheduling.
    """

    DEFAULT_CONFIG = {
        "max_actor_invocations":  8,
        "max_actor_tokens":       80_000,
        "max_workflow_usd":       0.50,
        "max_history_turns":      20,
        "max_history_tokens":     8_000,
        "max_reminder_fires":     5,
        "safe_retry_policy": {
            "first_retry_interval_seconds": 5,
            "max_number_of_attempts":       3,
            "backoff_coefficient":          2.0,
        }
    }

    def __init__(self, actor_id: str, config: dict | None = None):
        self.actor_id = actor_id
        self.config   = {**self.DEFAULT_CONFIG, **(config or {})}
        self._actor_invocations  = 0
        self._actor_tokens       = 0
        self._workflow_spent_usd = 0.0
        self._reminder_fires     = 0

    # ── Actor layer ──────────────────────────────────────────────────────
    def check_actor_entry(self) -> dict | None:
        """Call at the top of every actor method. Returns error dict or None."""
        if self._actor_invocations >= self.config["max_actor_invocations"]:
            return {"error": "ActorInvocationLimitExceeded",
                    "invocations": self._actor_invocations}
        if self._actor_tokens >= self.config["max_actor_tokens"]:
            return {"error": "ActorTokenLimitExceeded",
                    "tokens": self._actor_tokens}
        return None

    def record_actor_call(self, tokens_used: int) -> None:
        self._actor_invocations += 1
        self._actor_tokens      += tokens_used

    # ── Reminder layer ───────────────────────────────────────────────────
    def check_reminder_entry(self) -> bool:
        """Returns True if reminder should proceed, False if it should unregister."""
        return self._reminder_fires < self.config["max_reminder_fires"]

    def record_reminder_fire(self) -> None:
        self._reminder_fires += 1

    # ── Workflow layer ────────────────────────────────────────────────────
    def check_workflow_budget(self) -> dict | None:
        if self._workflow_spent_usd >= self.config["max_workflow_usd"]:
            return {"error": "WorkflowBudgetExceeded",
                    "spent_usd": round(self._workflow_spent_usd, 4)}
        return None

    def record_workflow_activity(self, input_tokens: int, output_tokens: int) -> None:
        self._workflow_spent_usd += (
            input_tokens  / 1000 * 0.005 +
            output_tokens / 1000 * 0.015
        )

    def safe_retry_policy(self) -> RetryPolicy:
        c = self.config["safe_retry_policy"]
        return RetryPolicy(
            first_retry_interval=timedelta(seconds=c["first_retry_interval_seconds"]),
            max_number_of_attempts=c["max_number_of_attempts"],
            backoff_coefficient=c["backoff_coefficient"],
        )

    # ── State store layer ─────────────────────────────────────────────────
    def bounded_state_store(self, client: DaprClient) -> "BoundedAgentStateStore":
        return BoundedAgentStateStore(
            client=client,
            actor_id=self.actor_id,
            max_turns=self.config["max_history_turns"],
            max_tokens=self.config["max_history_tokens"],
        )

    def summary(self) -> dict:
        return {
            "actor_invocations":  self._actor_invocations,
            "actor_tokens":       self._actor_tokens,
            "reminder_fires":     self._reminder_fires,
            "workflow_spent_usd": round(self._workflow_spent_usd, 4),
        }

How Dapr differs from other AI orchestration frameworks

Dimension	Dapr	LangGraph	CrewAI Flows	AutoGen
Loop mechanism	Actor re-entrancy / reminder schedule	Graph cycle / recursion_limit	@listen chain / @router cycle	Auto-reply conversation
State persistence	External (Redis/CosmosDB) — survives restart	In-memory checkpointer (optional SQLite)	In-process FlowState	In-memory message history
Loop survival after crash	Yes — reminder fires on startup	Only if checkpointer used	No	No
Built-in retry	RetryPolicy on activities	User-defined retry nodes	No built-in retry	No built-in retry
Cost multiplier for parallel	Concurrent actor activations	Parallel node branches	Fan-out @listen on same event	GroupChat N-1 broadcast
Guard implementation layer	Actor state + Dapr state store	Checkpointer / recursion limit	Step tracker decorator	max_consecutive_auto_reply

The critical row is "Loop survival after crash." LangGraph, CrewAI Flows, and AutoGen all terminate their loops when the process exits. Dapr reminders do not. A Dapr AI agent that enters a reminder-based loop will restart that loop on every application deployment until the reminder is explicitly unregistered — which means the loop can accumulate charges across multiple pod restarts, auto-scaling events, and even manual deployments.

Cost quantification: what each failure mode costs at scale

Failure mode	Trigger	Cost per occurrence	Cost at 100 concurrent users
Actor re-entrancy loop`maxStackDepth=32`	Tool call cycle detected; re-entrancy configured	32 calls × 800 tokens × $0.005/1K = $0.13	$13/event — each user session that triggers it
Reminder infinite recursion no fire limit	LLM always returns "continue" condition	1 call per 15 min × 24h = 96 calls × $0.01 = $0.96/day per actor	$96/day — 100 actors in reminder loop
Workflow retry storm unlimited retries	Model rate-limited for 30 minutes	~60 retries × $0.01/call = $0.60 per workflow	$60 — 100 workflows in retry storm
State store accumulation 100-turn history	Long-running agent session	50K tokens input × $0.005/1K = $0.25/call vs $0.004/call early	62× cost inflation on mature sessions vs fresh sessions

The state store accumulation failure is particularly deceptive because it doesn't look like a runaway cost event — it looks like steadily increasing LLM costs that don't correlate with user count. The agent "just gets more expensive to run" as sessions age.

Integration with RunGuard

The four guards above are self-contained and require no external service. For production Dapr deployments, connecting them to RunGuard gives you cross-actor visibility: which actors are approaching their invocation ceilings, which workflows are spending above their budgets, and a time-series of reminder fire rates per actor type.

The DaprAgentGuard.summary() output is designed to feed directly into RunGuard's reporting endpoint. In a Dapr sidecar configuration, emit it at the end of each actor method invocation:

import httpx
import os

RUNGUARD_KEY = os.environ.get("RUNGUARD_API_KEY", "")

async def report_to_runguard(guard: DaprAgentGuard) -> None:
    if not RUNGUARD_KEY:
        return
    summary = guard.summary()
    async with httpx.AsyncClient() as client:
        await client.post(
            "https://runguard.dev/api/v1/report",
            headers={"Authorization": f"Bearer {RUNGUARD_KEY}"},
            json={
                "app":    "dapr-ai-agent",
                "actor":  guard.actor_id,
                "stats":  summary,
            },
            timeout=2.0,
        )

Frequently asked questions

Does the actor invocation counter reset when the actor deactivates?

No — if you persist it to the Dapr state store before the LLM call (as shown in the example), the counter survives deactivation and reactivation. Actor instances deactivate after a configurable idle timeout (default 60 minutes in Dapr). On next activation, _on_activate loads the saved state and the counter resumes from where it left off. This is intentional: a user session that pauses for two hours and then resumes should still count against the same lifetime limit for that actor instance. Create a new actor ID (e.g., append a session timestamp) if you want a fresh limit per session.

What's the right max_number_of_attempts for LLM calls in Dapr Workflow?

Three is a safe default for most LLM APIs. Rate limit windows are typically 60 seconds, and with a 5-second first interval and 2× backoff, three attempts span approximately 5 + 10 + 20 = 35 seconds — just inside a one-minute window. If you're using a model with longer rate limit windows (some Bedrock models throttle per hour), set max_number_of_attempts=2 and handle retries at the application level with an explicit wait-and-reschedule pattern rather than relying on the Dapr Workflow retry policy to handle multi-minute waits.

Can I use a TTL on the Dapr state store key to automatically clear history?

Yes, if your state store backend supports TTL (Redis, Cosmos DB, and DynamoDB all do via Dapr component configuration). Set metadata.ttlInSeconds on the state item. However, TTL-based clearing is coarse — it deletes the entire history after the timeout regardless of how recently the session was active. The bounded trim approach shown above is better for active sessions because it keeps recent history available while preventing growth. Use TTL as a backstop for abandoned sessions (e.g., 7 days), not as the primary size control.

How do I identify which actors are in a reminder loop in a running Dapr application?

Dapr exposes reminder metadata via the Dapr Dashboard and the Dapr management API at GET http://localhost:3500/v1.0/actors/{actorType}/{actorId}/reminders/{reminderName}. You can also query the state store directly for the reminder count key. For production monitoring, emit the reminder_fires count from DaprAgentGuard.summary() to your observability platform on each reminder callback. A fire rate significantly above 1/period indicates the reminder is being re-registered on every callback — the classic infinite recursion pattern.

Does this work with Dapr's multi-app run feature for local development?

Yes. dapr run --app-id research-agent and dapr run -f dapr.yaml (multi-app run) both use the same actor runtime. The guards are pure application-layer code — they don't depend on any specific Dapr deployment mode. In local development you'll typically use an in-memory state store component, which means actor state doesn't persist across dapr stop and dapr start cycles. That's actually useful during development: you can clear reminder fire counts by restarting the sidecar. In staging and production, use a persistent backend and verify that your reminder fire counts accumulate correctly across restarts before shipping.

Stop runaway Dapr AI agent costs before they land on your bill.

RunGuard is a runtime SDK that trips a circuit breaker the moment your AI agent's call pattern shows a loop, context window blow-through, or budget excess — before the charge hits. One-line install for Python and TypeScript applications, including Dapr-based orchestration.

View pricing — free 14-day trial

Also in this series