Cohere Command R agent budget limit: adding a cost cap and loop detector to Command R tool-use agents
Cohere’s Command R and Command R+ models support native tool use via the /v1/chat API. The tool-use interface is well-designed: the model returns structured tool_calls objects, you execute the tools and return tool_results, and the model continues reasoning. What Cohere’s API does not provide is a per-run cost cap or a loop detector. An agent built on Command R+ can enter a tool-call loop — calling the same tool repeatedly with the same arguments when the result doesn’t advance its goal — and run until your Cohere billing quota is exhausted. At Command R+ pricing, a 500-turn loop costs roughly $2–$10 depending on context size — not catastrophic for a single incident, but when this happens at production scale across multiple concurrent users, the bill arrives faster than you can intervene. This page shows how to add RunGuard’s circuit breaker to a Command R agent in Python.
Cohere Command R tool-use: how agents loop
- Tool-result non-advancement. Command R+’s tool-use model is optimized for retrieval-augmented generation (RAG) workflows where tools are expected to return relevant documents. When a tool returns an empty or low-quality result, the model may call the same tool with slightly modified arguments to try to get better results. If the underlying tool has a systematic issue — the data source is unavailable, the query is malformed, the API is rate-limiting — every iteration produces the same low-quality result, and the model keeps trying. Command R+’s strong RAG optimization means it is particularly persistent in this pattern compared to general-purpose models.
- Grounded generation tool-call duplication. Command R supports a “grounded generation” mode where the model must cite its sources from tool results. If the model is unsatisfied with the citations it can extract from the available tool results, it may call the same tools again to attempt to gather more citable material. This is a well-intentioned behavior (get more sources) that becomes a loop when the tools cannot provide better material.
- Cohere’s 128k context limit and tool-result accumulation. Command R+ has a 128k context window. Repeated tool calls append their results to the conversation, progressively filling the context. When the limit is hit, Cohere returns a
400error with a message about prompt length. Because Command R+ is designed for long-context tasks, this limit is hit less often than with shorter-context models, but multi-step research agents can still reach it within a long session.
Adding a budget limit and loop detector to a Command R agent
- Cohere’s tool-use API structure. The Cohere Python client uses a chat-based interface where tool calls are returned as
response.tool_callsand tool results are passed back as a list ofToolResultobjects. The conversation history is managed by the caller, not the client. This makes RunGuard straightforward to wire in: wrap theclient.chat()call, extract the tool-call signature for loop detection, and track accumulated spend. - Python: Command R+ agent with RunGuard budget cap and loop detection.
import cohere from runguard import guard_async, LoopDetectedError, BudgetExceededError import asyncio co = cohere.AsyncClient(api_key="YOUR_COHERE_API_KEY") TOOLS = [ { "name": "search_documents", "description": "Search a document corpus for passages relevant to a query.", "parameter_definitions": { "query": { "description": "The search query string.", "type": "str", "required": True, } }, }, { "name": "get_document_by_id", "description": "Retrieve a specific document by its unique identifier.", "parameter_definitions": { "doc_id": { "description": "The document identifier.", "type": "str", "required": True, } }, }, ] async def _call_command_r(chat_history: list, message: str, **kwargs) -> dict: response = await co.chat( model="command-r-plus", message=message, chat_history=chat_history, tools=TOOLS, **kwargs, ) meta = response.meta # Command R+ pricing: $3/$15 per 1M input/output tokens input_tokens = (meta.tokens.input_tokens if meta and meta.tokens else 0) output_tokens = (meta.tokens.output_tokens if meta and meta.tokens else 0) usd = (input_tokens * 3.0 + output_tokens * 15.0) / 1_000_000 # Signature: first tool call name, or "end_turn" tool_calls = response.tool_calls or [] sig = tool_calls[0].name if tool_calls else "end_turn" return {"response": response, "usd": usd, "sig": sig} guarded_call = guard_async( _call_command_r, budget={"max_usd": 1.0}, loop={"repeats": 3, "max_cycle_len": 4}, ) async def run_command_r_agent(user_query: str) -> str: chat_history = [] current_message = user_query while True: try: result = await guarded_call(chat_history, current_message) except LoopDetectedError as e: return f"Loop detected (pattern: {e.pattern!r}). Aborting after {len(chat_history)} turns." except BudgetExceededError as e: return f"Budget cap reached (${e.spent:.4f}). Aborting." response = result["response"] # Append this turn to history chat_history.append({ "role": "USER", "message": current_message, }) chat_history.append({ "role": "CHATBOT", "message": response.text or "", "tool_calls": response.tool_calls or [], }) if not response.tool_calls: # No more tool calls — the model has produced a final answer return response.text or "" # Execute tools and build tool results tool_results = [] for tc in response.tool_calls: try: output = await dispatch_tool(tc.name, tc.parameters) tool_results.append( cohere.ToolResult( call=tc, outputs=[{"result": str(output)}], ) ) except Exception as exc: # Surface as structured error, not a plain string tool_results.append( cohere.ToolResult( call=tc, outputs=[{"error": f"{type(exc).__name__}: {exc}"}], ) ) # In Command R's tool-use pattern, the next API call passes tool_results # and uses the previous CHATBOT turn's tool_calls chat_history.append({ "role": "TOOL", "tool_results": tool_results, }) # Continue with an empty message — Cohere convention for tool-result continuation current_message = "" async def dispatch_tool(name: str, parameters: dict) -> str: if name == "search_documents": return await search_documents(parameters["query"]) elif name == "get_document_by_id": return await get_document_by_id(parameters["doc_id"]) raise ValueError(f"Unknown tool: {name}") # Stub implementations async def search_documents(query: str) -> str: ... async def get_document_by_id(doc_id: str) -> str: ... - Handling Cohere’s
TOOLturn pattern. Cohere’s chat API uses a distinct"TOOL"role for tool results, unlike OpenAI’s"tool"role with atool_call_id. The guard wraps the outgoing API call, not the tool execution, so the turn structure does not affect RunGuard’s wiring. The signature extraction happens on the response from eachclient.chat()call, regardless of whether the call was initiated by a user message or a tool result continuation.
Budget calibration for Command R agents
- Command R pricing (2026 reference). Command R: $0.15/$0.60 per 1M input/output tokens. Command R+: $3.00/$15.00 per 1M input/output tokens. Command R+ is 20× more expensive on input and 25× more expensive on output. An agent run that costs $0.02 on Command R costs $0.40 on Command R+ for the same conversation history. Set your
max_usdcap accordingly: what’s a safe limit for Command R may be grossly generous for Command R+. - Calibrating from P95 run costs. Run your agent on a representative sample of production inputs, log the dollar cost per run (available from
response.meta.tokens), take the 95th percentile, and setmax_usdto 2.5× that value. This gives a cap that never fires on legitimate long runs while catching runaway loops that cost 10× the P95. See autonomous agent cost control best practices for the full calibration methodology. - Command R vs. Command R+ for agentic tasks. For most agentic tasks, Command R (not R+) is the cost-effective choice for tool-call loops. Command R+ excels at complex reasoning and RAG tasks where quality matters. If your agent performs many short tool-call iterations — search, extract, validate — Command R’s lower cost makes it more suitable for the loop body, with Command R+ reserved for the final synthesis step. Structure your agent pipeline accordingly and set model-appropriate
max_usdcaps per step.
Cohere API defaults vs. RunGuard
| Control | Cohere API default | RunGuard |
|---|---|---|
| Tool-call loop detection | Not supported | loop: repeats=3 fires on 3rd repeat of same pattern |
| Per-run cost cap | Not supported (account-level quota only) | budget: max_usd fires before each chat() call |
| Max turns | Not supported | Implicit via loop + budget caps |
| Context-window guard | 400 after request sent | Pre-call ContextOverflowError before request sent |
| Slack/PagerDuty alert on trip | Not supported | alerts: slack_webhook or pagerduty_key |
| RAG-specific loop detection | Not supported | Same loop detector — RAG repeated-retrieval pattern is a period-1 cycle |