Devin, built by Cognition, is a fully autonomous AI software engineer. Unlike AI coding assistants that augment a developer sitting at an IDE, Devin operates independently inside a persistent sandboxed Linux environment. It receives a task description, plans the approach, and then executes: reading and writing files through a built-in code editor, running commands in a persistent shell session, searching documentation and verifying behavior through a browser, and iterating on its work until the task is complete or it determines it cannot proceed. The developer reviews completed work rather than supervising each step.
Devin's billing model reflects this autonomous architecture. Usage is measured in ACUs (Agent Compute Units) — a unit representing roughly one minute of Devin's active compute, encompassing both the LLM token cost of reasoning and the infrastructure cost of running the sandboxed environment. A short, clean coding task might consume 5–10 ACUs. A complex debugging session on an unfamiliar codebase might consume 40–80 ACUs. An autonomous session that gets stuck in a correction loop — re-reading the same files, rerunning the same failing tests, and trying incrementally different approaches to the same broken code path — can consume 150+ ACUs without completing the task. Teams running Devin at scale against PR pipelines discover this ACU variance only after the billing cycle closes.
The root cause is the same structural pattern that drives cost overruns in every autonomous agent: Devin's four primary subsystems each accumulate state differently, and the cumulative effect of that accumulation is invisible until the bill arrives. The autonomous ReAct coding loop carries full shell history, file reads, and test outputs in context at every iteration, making stuck agents expensive before any progress is made. Browser-use screenshot context injection adds image tokens for every page Devin visits during research or UI verification phases, bypassing the per-token mental model entirely. Sandbox cold start dependency initialization generates thousands of lines of terminal output before the first line of application code runs, filling a significant fraction of the context window with install noise. Concurrent session fan-out in PR automation workflows multiplies all three effects by the number of parallel Devin instances.
What this post covers: Four cost amplification patterns specific to Devin's autonomous coding loop context accumulation, browser screenshot injection, sandbox dependency initialization overhead, and concurrent ACU session fan-out — and a runtime circuit breaker guard for each. The guards operate at the automation orchestration layer, giving you ACU and token spend ceilings without changing Devin's behavior for tasks that fit within budget.
Pattern 1: Autonomous ReAct Coding Loop Context Accumulation
Devin's core execution model is a ReAct loop: reason about the current state of the task, act by choosing a tool (read file, write file, run shell command, browser action), observe the output of that action, then reason again with the updated context. At each reasoning step, Devin's model receives the complete accumulated context: the original task description, every file it has read so far at full content, every shell command it has run and its full output, every browser action and the resulting page content, and the diff of every file change it has applied.
Devin does not compress or summarize prior steps. The context window carries raw history. This is correct behavior when the agent is making progress — the full history lets the model reason accurately about what has and has not been tried. It becomes the failure mode when the agent is stuck. A Devin session debugging a failing TypeScript compilation error — re-reading the same three source files, running npm run build repeatedly, observing the same type error, and trying incremental fixes that don't resolve the root cause — accumulates context at approximately 4,500 tokens per iteration (2,500 from the three source files re-read, 400 from the accumulated error output, 1,600 from prior diffs and intermediate steps). After 20 iterations on the same stuck error, the context window carries 90,000 tokens of history for a task where a human developer would have escalated after three failed attempts.
Step 1: 4,200 input tokens × $0.003/K = $0.0126
Step 10: 46,000 input tokens × $0.003/K = $0.138
Step 20: 91,000 input tokens × $0.003/K = $0.273
Total for 20-step stuck session: ~$2.91 in LLM cost alone, before ACU compute overhead
30 stuck sessions/month at 20 iterations each: $873/month LLM overrun + ACU compute
The escalation happens without any visible signal in Devin's interface. The task stays in "working" state. ACUs accumulate. The model continues producing reasoning steps that look plausible but aren't converging. In interactive Devin sessions a developer can intervene; in automated workflows — Devin sessions triggered by CI events or PR creation with no human watching — the accumulation runs until the context window saturates or the task times out.
import hashlib
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class DevinLoopGuard:
max_iterations: int = 20
max_context_tokens: int = 80_000
consecutive_failure_limit: int = 4
stall_fingerprint_window: int = 3
_iteration: int = field(default=0, init=False)
_total_context_tokens: int = field(default=0, init=False)
_consecutive_failures: int = field(default=0, init=False)
_recent_fingerprints: list = field(default_factory=list, init=False)
def _fingerprint(self, tool_output: str) -> str:
normalized = " ".join(tool_output.strip().split())[:1024]
return hashlib.sha256(normalized.encode()).hexdigest()[:16]
def before_action(
self,
context_tokens: int,
action_succeeded: bool,
tool_output: str = "",
) -> None:
self._iteration += 1
self._total_context_tokens += context_tokens
if not action_succeeded:
self._consecutive_failures += 1
fp = self._fingerprint(tool_output)
self._recent_fingerprints.append(fp)
if len(self._recent_fingerprints) > self.stall_fingerprint_window:
self._recent_fingerprints.pop(0)
# All recent outputs identical → agent is stuck in a retry loop
if (len(self._recent_fingerprints) == self.stall_fingerprint_window
and len(set(self._recent_fingerprints)) == 1):
raise RuntimeError(
f"DevinLoopGuard: identical tool output across last "
f"{self.stall_fingerprint_window} steps at iteration "
f"{self._iteration}. Agent is stuck — escalating to human review."
)
else:
self._consecutive_failures = 0
self._recent_fingerprints.clear()
if self._consecutive_failures >= self.consecutive_failure_limit:
raise RuntimeError(
f"DevinLoopGuard: {self._consecutive_failures} consecutive "
f"action failures at iteration {self._iteration}. "
"Halting before further context accumulation."
)
if self._iteration > self.max_iterations:
raise RuntimeError(
f"DevinLoopGuard: iteration ceiling {self.max_iterations} reached. "
f"Accumulated {self._total_context_tokens:,} context tokens. "
"Returning partial results — restart with narrower task scope."
)
if self._total_context_tokens > self.max_context_tokens:
raise RuntimeError(
f"DevinLoopGuard: context token ceiling {self.max_context_tokens:,} "
f"exceeded at iteration {self._iteration}. "
"Compacting context or restarting with focused subtask."
)
DevinLoopGuard trips on four conditions: the iteration ceiling (max_iterations=20), the cumulative context token ceiling (max_context_tokens=80_000), the consecutive failure limit (consecutive_failure_limit=4), and the stall fingerprint detector — which identifies when the last N tool outputs are identical, indicating the agent is retrying the same action with the same result. Call before_action(context_tokens, action_succeeded, tool_output) at the top of each iteration loop. On a trip, catch the RuntimeError, save the current diff state, and surface the partial result with the trip reason rather than letting the loop continue.
Pattern 2: Browser Screenshot Context Injection
Devin includes a browser as a first-class tool. It uses the browser for multiple legitimate purposes: verifying that a deployed UI renders correctly after a frontend change, reading documentation for unfamiliar APIs, checking a live service's response to validate that a backend fix is working, and researching error messages or library behavior. Each browser action — navigate to URL, click element, scroll, take screenshot — produces an observation that gets included in Devin's context. Screenshots are the most expensive observation type.
A screenshot from Devin's headless browser at 1280×800 resolution, encoded as a base64 image for the multimodal model, consumes approximately 1,300–1,700 image tokens per frame (equivalent to a medium-complexity image in Claude's vision pricing). When Devin navigates to verify a React component renders correctly, it typically takes multiple screenshots: before and after the change, at different viewport sizes, after each interaction that should produce a state transition. A UI verification subtask involving a modal, a form submission, and a responsive layout check might take 12–18 screenshots before Devin concludes the fix is correct. That's 16,000–31,000 image tokens for a UI verification step that a developer would complete in a two-second visual scan.
The amplification compounds for research-heavy tasks. A Devin session implementing an integration with an unfamiliar third-party API — navigating the provider's documentation, reading authentication examples, checking rate limit tables, verifying webhook payload schemas — might visit 15–25 documentation pages with multiple screenshots each. Full-page screenshots of documentation pages with dense code examples reach 2,000–2,500 image tokens per frame. A 20-page documentation research subtask with two screenshots per page injects 80,000–100,000 image tokens into the context before Devin writes a single line of integration code.
15 screenshots × 1,500 image tokens = 22,500 image tokens = $0.0675 per UI check
API docs research, 20 pages × 2 screenshots × 2,200 tokens = 88,000 image tokens = $0.264
Combined UI + docs phase on a complex integration task: $0.33 before any code tokens
40 such tasks/month in an automated PR pipeline: $528/month in image tokens alone
from dataclasses import dataclass, field
SCREENSHOT_TOKENS_PER_FRAME = 1_500 # median 1280x800 screenshot cost
@dataclass
class DevinBrowserGuard:
max_screenshots: int = 20
max_browser_pages: int = 15
max_image_tokens: int = 30_000
_screenshots_taken: int = field(default=0, init=False)
_pages_visited: int = field(default=0, init=False)
_image_tokens: int = field(default=0, init=False)
def on_navigate(self, url: str) -> None:
self._pages_visited += 1
if self._pages_visited > self.max_browser_pages:
raise RuntimeError(
f"DevinBrowserGuard: browser page ceiling {self.max_browser_pages} "
f"reached (visiting {url}). Research phase is exceeding scope — "
"summarize findings from current pages before continuing."
)
def on_screenshot(
self,
tokens: int = SCREENSHOT_TOKENS_PER_FRAME,
label: str = "",
) -> None:
self._screenshots_taken += 1
self._image_tokens += tokens
if self._screenshots_taken > self.max_screenshots:
raise RuntimeError(
f"DevinBrowserGuard: screenshot ceiling {self.max_screenshots} "
f"reached (total image tokens: {self._image_tokens:,}). "
"Switch to text-only page content extraction."
)
if self._image_tokens > self.max_image_tokens:
raise RuntimeError(
f"DevinBrowserGuard: image token ceiling {self.max_image_tokens:,} "
f"exceeded after {self._screenshots_taken} screenshots{' — '+label if label else ''}. "
"Halting browser phase to preserve context budget."
)
@property
def image_tokens(self) -> int:
return self._image_tokens
DevinBrowserGuard tracks three dimensions: page navigations, screenshot count, and cumulative image tokens. Call on_navigate(url) before each browser navigation and on_screenshot() after each screenshot is taken (passing the actual token count from the vision model's usage response if available, or the default estimate). The image token ceiling is the primary enforcement lever — set it based on the total context budget you want the browser phase to consume, leaving headroom for the code and shell history that accumulates later in the session.
Pattern 3: Sandbox Cold Start Dependency Installation Overhead
Every Devin session begins by initializing a fresh sandboxed Linux environment. Devin clones the target repository, reads the project's configuration files to understand the tech stack, and installs dependencies before writing or running any application code. For a well-scoped project with a small dependency tree, this initialization is fast and token-cheap. For real-world projects — Python machine learning pipelines with 150+ transitive dependencies, Node.js monorepos with workspace-level installs across 8 packages, or Java projects with Gradle dependency resolution — the initialization phase produces thousands of lines of terminal output that fill a significant fraction of Devin's context window.
The token cost is in what Devin reads back from its shell. A pip install -r requirements.txt on a Python data science project (numpy, pandas, scikit-learn, torch, transformers, and their dependencies) produces 2,000–4,000 lines of install output: download progress bars, dependency resolver decisions, build step outputs for compiled packages, and deprecation warnings from transitive dependencies. Devin reads this output as part of the shell observation step. At approximately 1 token per word, 3,000 lines of install output consume 9,000–15,000 tokens of context before Devin reads a single line of application code. A Node.js monorepo with npm install --workspaces across 8 packages follows the same pattern: 4,000–6,000 lines of resolution and download output = 12,000–18,000 tokens of initialization noise.
The failure mode escalates when Devin restarts its session. Devin is designed to recover from context saturation by starting a new session with a fresh sandbox and the task context re-injected. Each restart pays the full initialization cost again: clone the repo, install dependencies, read the 15,000-token install output, then resume the task from the saved checkpoint. A long, complex engineering task that requires three session restarts pays 45,000 tokens in initialization overhead alone — more than the context budget for many straightforward tasks. In automated pipelines where restarts are invisible, this multiplier accumulates silently.
pip install output: 3,200 lines ≈ 12,800 tokens × $0.003/K = $0.038 per cold start
Node.js monorepo npm install: 5,100 lines ≈ 20,400 tokens × $0.003/K = $0.061 per cold start
3 session restarts on a complex task: 3 × $0.061 = $0.184 in install-output tokens
50 tasks/month with avg 1.8 restarts each: $276/month in cold start token overhead
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class DevinSandboxGuard:
max_sessions: int = 3
max_install_output_tokens: int = 20_000
max_cold_start_tokens_total: int = 50_000
warn_threshold: float = 0.75
_session_count: int = field(default=0, init=False)
_cold_start_tokens_total: int = field(default=0, init=False)
def on_session_start(self) -> None:
self._session_count += 1
if self._session_count > self.max_sessions:
raise RuntimeError(
f"DevinSandboxGuard: session restart ceiling {self.max_sessions} "
f"reached. Total cold start overhead: "
f"{self._cold_start_tokens_total:,} tokens. "
"Task requires decomposition — split into subtasks with narrower scope."
)
def on_install_output(self, output_tokens: int, package_manager: str = "") -> None:
self._cold_start_tokens_total += output_tokens
label = f" ({package_manager})" if package_manager else ""
if output_tokens > self.max_install_output_tokens:
raise RuntimeError(
f"DevinSandboxGuard: install output{label} consumed "
f"{output_tokens:,} tokens, exceeding per-session ceiling "
f"{self.max_install_output_tokens:,}. "
"Suppress verbose install output or use a pre-built dependency cache."
)
warn_limit = self.max_cold_start_tokens_total * self.warn_threshold
if self._cold_start_tokens_total >= self.max_cold_start_tokens_total:
raise RuntimeError(
f"DevinSandboxGuard: cumulative cold start token budget "
f"{self.max_cold_start_tokens_total:,} exceeded across "
f"{self._session_count} sessions. Aborting further restarts."
)
elif self._cold_start_tokens_total >= warn_limit:
print(
f"[DevinSandboxGuard] Warning: cold start tokens at "
f"{self._cold_start_tokens_total:,} / {self.max_cold_start_tokens_total:,} "
f"({self._cold_start_tokens_total/self.max_cold_start_tokens_total:.0%}) "
f"after {self._session_count} sessions."
)
@property
def sessions(self) -> int:
return self._session_count
DevinSandboxGuard tracks two dimensions: the session restart count and the cumulative cold start token consumption across all sessions. Call on_session_start() whenever Devin initializes a new sandbox and on_install_output(output_tokens, package_manager) after the dependency installation step completes (pass the token count of the installation output as observed by the model). The per-session install ceiling gives you early warning when a single project's dependency tree is unusually verbose; the cumulative ceiling enforces the overall restart budget for a task.
Pattern 4: Concurrent ACU Session Fan-out in PR Automation
Devin's autonomous architecture makes it compelling for PR automation: trigger a Devin session on every pull request, let it read the diff, understand the change, write tests, fix linting issues, and leave a review. Teams shipping multiple PRs per day find this workflow attractive because it replaces asynchronous code review with immediate automated feedback. The failure mode is not in any individual session — it's in the concurrent session fan-out when a sprint week delivers 20 PRs in a single afternoon.
Each concurrent Devin session is independent. Session A reviewing PR #201 and Session B reviewing PR #202 each clone the full repository, each run the full dependency installation, each initialize the sandbox environment, and each begin reading the codebase for context — simultaneously. At peak PR volume, a team running Devin PR automation might have 12 concurrent sessions each consuming 15,000–20,000 tokens of cold start overhead before reading a single line of the PR diff. The 12 sessions compete for the ACU pool simultaneously, producing a burst of 180,000–240,000 initialization tokens in the first few minutes of PR-event processing. If each session then encounters a failing test in the PR branch and enters a debugging loop, the ACU consumption compounds further.
The billing impact is concentrated at sprint boundaries. Teams that budget for average PR volume discover that sprint-end PR bursts — feature freeze pushes, team synchronization PRs, dependency update batches — produce ACU spikes 3–5× above the baseline. Because ACU billing is measured against the monthly pool, spike events exhaust the pool allocation days before the billing cycle closes, leaving the automation disabled for the remainder of the month.
Cold start per session: ~8 ACUs (clone + install + initial read)
12 concurrent sessions at sprint-end: 12 × 8 = 96 ACUs = $240 in 15 minutes
If 8 sessions enter debugging loops (avg +12 ACUs each): +96 ACUs = $480 total burst
Monthly baseline 3 concurrent sessions avg: $180/month → sprint-end burst adds $480 in one event
import asyncio
import time
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class DevinConcurrencyGuard:
max_concurrent_sessions: int = 4
max_hourly_session_launches: int = 12
acu_budget_per_session: float = 20.0
acu_cost_per_unit: float = 2.50
_active_sessions: int = field(default=0, init=False)
_launches_this_hour: list = field(default_factory=list, init=False)
_lock: asyncio.Lock = field(default_factory=asyncio.Lock, init=False)
def _prune_launch_window(self) -> None:
cutoff = time.time() - 3600
self._launches_this_hour = [t for t in self._launches_this_hour if t > cutoff]
async def acquire(self, session_id: str = "") -> None:
async with self._lock:
self._prune_launch_window()
if self._active_sessions >= self.max_concurrent_sessions:
raise RuntimeError(
f"DevinConcurrencyGuard: concurrent session ceiling "
f"{self.max_concurrent_sessions} reached "
f"(active: {self._active_sessions}). "
f"Session{' ' + session_id if session_id else ''} queued — "
"submit after an active session completes."
)
if len(self._launches_this_hour) >= self.max_hourly_session_launches:
raise RuntimeError(
f"DevinConcurrencyGuard: hourly launch ceiling "
f"{self.max_hourly_session_launches} reached "
f"({len(self._launches_this_hour)} launches in the last hour). "
"Throttling PR automation to prevent ACU burst."
)
self._active_sessions += 1
self._launches_this_hour.append(time.time())
async def release(
self,
session_id: str = "",
acus_consumed: Optional[float] = None,
) -> None:
async with self._lock:
self._active_sessions = max(0, self._active_sessions - 1)
if acus_consumed is not None and acus_consumed > self.acu_budget_per_session:
cost = acus_consumed * self.acu_cost_per_unit
budget_cost = self.acu_budget_per_session * self.acu_cost_per_unit
raise RuntimeError(
f"DevinConcurrencyGuard: session{' ' + session_id if session_id else ''} "
f"consumed {acus_consumed:.1f} ACUs (${cost:.2f}), "
f"exceeding per-session budget of {self.acu_budget_per_session} ACUs "
f"(${budget_cost:.2f}). "
"Flag for review before scheduling further automation on this PR."
)
@property
def active(self) -> int:
return self._active_sessions
DevinConcurrencyGuard uses an async lock to enforce two rate dimensions: a concurrent session ceiling that prevents fan-out bursts from consuming the entire ACU pool simultaneously, and an hourly launch ceiling that smooths sprint-end PR spikes into a controlled processing queue. Call await guard.acquire(session_id) before launching each Devin session and await guard.release(session_id, acus_consumed) when the session completes. Pass the actual ACU count from Devin's session completion event — the per-session budget check in release() identifies runaway sessions retroactively so they can be reviewed before similar PRs trigger the same pattern again.
Composing the Guards in an Automated PR Pipeline
In production PR automation, all four guards operate in concert. The concurrency guard governs session acquisition at the pipeline level. Once a session starts, the sandbox guard tracks initialization cost. As the session runs its ReAct loop, the loop guard monitors iteration count and context accumulation. For any browser action the session takes, the browser guard tracks screenshot and image token cost. When any guard trips, the exception surfaces a structured error that the pipeline can log, route to a human reviewer, or use to inform retry strategy.
import asyncio
from dataclasses import dataclass, field
@dataclass
class DevinPRPipeline:
concurrency: DevinConcurrencyGuard = field(
default_factory=lambda: DevinConcurrencyGuard(
max_concurrent_sessions=4,
max_hourly_session_launches=12,
acu_budget_per_session=20.0,
)
)
async def run_pr_session(
self,
pr_id: str,
devin_client, # your Devin API wrapper
) -> dict:
await self.concurrency.acquire(session_id=pr_id)
session_guards = {
"sandbox": DevinSandboxGuard(max_sessions=2, max_install_output_tokens=18_000),
"loop": DevinLoopGuard(max_iterations=15, max_context_tokens=60_000),
"browser": DevinBrowserGuard(max_screenshots=10, max_image_tokens=15_000),
}
acus_consumed = 0.0
trip_reason = None
try:
session = await devin_client.create_session(pr_id=pr_id)
session_guards["sandbox"].on_session_start()
async for event in session.stream_events():
if event.type == "install_output":
session_guards["sandbox"].on_install_output(
event.tokens, event.package_manager
)
elif event.type == "iteration_start":
session_guards["loop"].before_action(
context_tokens=event.context_tokens,
action_succeeded=event.last_action_succeeded,
tool_output=event.last_tool_output or "",
)
elif event.type == "browser_navigate":
session_guards["browser"].on_navigate(event.url)
elif event.type == "browser_screenshot":
session_guards["browser"].on_screenshot(
tokens=event.image_tokens,
label=event.url,
)
elif event.type == "session_restart":
session_guards["sandbox"].on_session_start()
elif event.type == "completed":
acus_consumed = event.acus_consumed
break
except RuntimeError as e:
trip_reason = str(e)
await devin_client.halt_session(session.id)
finally:
await self.concurrency.release(
session_id=pr_id,
acus_consumed=acus_consumed,
)
return {
"pr_id": pr_id,
"status": "tripped" if trip_reason else "completed",
"trip_reason": trip_reason,
"acus_consumed": acus_consumed,
"image_tokens": session_guards["browser"].image_tokens,
"iterations": session_guards["loop"]._iteration,
}
This composition pattern keeps each guard's concern isolated while giving the pipeline a single structured result per PR session. The trip reason distinguishes between a stuck coding loop, a research phase that overran its browser budget, a dependency installation that consumed more context than expected, and a session that simply exceeded the per-session ACU ceiling. Each trip reason maps to a different remediation: adjust the task scope, switch to text-only documentation access, pre-build a dependency cache, or decompose the PR into smaller subtasks.
Frequently Asked Questions
How does Devin's ACU billing relate to token costs?
ACUs (Agent Compute Units) are a composite billing metric that includes both the LLM inference cost and the sandbox infrastructure cost for Devin's persistent execution environment. A single ACU represents approximately one minute of Devin's active execution, which includes the reasoning step (LLM tokens), any tool calls, and the sandbox compute. For most tasks, LLM token cost and ACU cost scale together — sessions that consume more tokens (stuck loops, large context accumulations, screenshot-heavy browser phases) also consume more ACUs because the model takes longer to process each reasoning step. The guards in this post target the token dimension directly, which also limits ACU accumulation as a downstream effect.
Does Devin have any built-in cost controls or iteration limits?
Devin provides a session time limit and an ACU budget configuration in its team settings, which sets a hard ceiling on total ACU consumption per organization per billing period. It does not expose per-session iteration limits, per-phase browser screenshot ceilings, or cold start token budgets. The session-level limits protect against runaway sessions that consume the entire monthly pool in a single task, but they do not prevent gradual ACU accumulation across many bounded sessions, each of which stays under the hard ceiling while still collectively exhausting the monthly allocation early. The guards in this post address the per-session structural patterns that produce that gradual exhaustion.
Is Devin's browser-use cost the same as using browser-use or Playwright directly?
The token cost model is similar — screenshots injected as images into a multimodal model's context window cost roughly the same regardless of whether the screenshot comes from Devin's built-in browser, a browser-use Python library session, or a Playwright automation script feeding frames into an agent loop. The difference is transparency: with browser-use or Playwright you typically control the screenshot cadence explicitly and can observe image token counts in the model's usage response. With Devin's built-in browser, screenshots are an implementation detail of Devin's internal observation loop — the total image token count is not surfaced per action. The DevinBrowserGuard tracks at the action event level to reconstruct this cost, using the median per-screenshot estimate when exact token counts are unavailable.
How should tasks be scoped to minimize Devin's cold start overhead?
The most effective scoping strategy is to minimize repository size at the point where Devin's sandbox is initialized. Rather than pointing Devin at a full monorepo, create a task configuration that scopes Devin to the relevant package directory and uses a sparse checkout (git sparse-checkout) to clone only the subdirectory relevant to the task. A sparse checkout of packages/auth/ from a 500K-line monorepo takes 3–5% of the clone time and produces 3–5% of the initial context overhead. Combined with a pre-built dependency cache (Docker layer cache or language-specific caching like pip's wheel cache) mounted into the sandbox, cold start token overhead can be reduced by 80–90% for recurring task types on known codebases.
What's the right concurrent session ceiling for PR automation?
The right ceiling depends on two factors: your monthly ACU budget and your average session ACU cost. A conservative starting point: divide your monthly ACU budget by your average session ACU cost to get your total session budget for the month. Then set the concurrent ceiling to no more than 10% of the daily session budget — this ensures a single PR burst event (20 PRs in an afternoon) cannot consume more than 10% of the monthly pool. For example, a 500 ACU/month budget at 10 ACUs/session = 50 sessions/month = ~2 sessions/day average. A concurrent ceiling of 4 with an hourly launch ceiling of 8 prevents any burst from consuming more than 32 ACUs (about 6% of the pool) in a single hour.