Cohere Command R agent budget limit: adding a cost cap and loop detector to Command R tool-use agents

Cohere’s Command R and Command R+ models support native tool use via the /v1/chat API. The tool-use interface is well-designed: the model returns structured tool_calls objects, you execute the tools and return tool_results, and the model continues reasoning. What Cohere’s API does not provide is a per-run cost cap or a loop detector. An agent built on Command R+ can enter a tool-call loop — calling the same tool repeatedly with the same arguments when the result doesn’t advance its goal — and run until your Cohere billing quota is exhausted. At Command R+ pricing, a 500-turn loop costs roughly $2–$10 depending on context size — not catastrophic for a single incident, but when this happens at production scale across multiple concurrent users, the bill arrives faster than you can intervene. This page shows how to add RunGuard’s circuit breaker to a Command R agent in Python.

Cohere Command R tool-use: how agents loop

Tool-result non-advancement. Command R+’s tool-use model is optimized for retrieval-augmented generation (RAG) workflows where tools are expected to return relevant documents. When a tool returns an empty or low-quality result, the model may call the same tool with slightly modified arguments to try to get better results. If the underlying tool has a systematic issue — the data source is unavailable, the query is malformed, the API is rate-limiting — every iteration produces the same low-quality result, and the model keeps trying. Command R+’s strong RAG optimization means it is particularly persistent in this pattern compared to general-purpose models.
Grounded generation tool-call duplication. Command R supports a “grounded generation” mode where the model must cite its sources from tool results. If the model is unsatisfied with the citations it can extract from the available tool results, it may call the same tools again to attempt to gather more citable material. This is a well-intentioned behavior (get more sources) that becomes a loop when the tools cannot provide better material.
Cohere’s 128k context limit and tool-result accumulation. Command R+ has a 128k context window. Repeated tool calls append their results to the conversation, progressively filling the context. When the limit is hit, Cohere returns a 400 error with a message about prompt length. Because Command R+ is designed for long-context tasks, this limit is hit less often than with shorter-context models, but multi-step research agents can still reach it within a long session.

Adding a budget limit and loop detector to a Command R agent

Cohere’s tool-use API structure. The Cohere Python client uses a chat-based interface where tool calls are returned as response.tool_calls and tool results are passed back as a list of ToolResult objects. The conversation history is managed by the caller, not the client. This makes RunGuard straightforward to wire in: wrap the client.chat() call, extract the tool-call signature for loop detection, and track accumulated spend.

Python: Command R+ agent with RunGuard budget cap and loop detection.

import cohere
from runguard import guard_async, LoopDetectedError, BudgetExceededError
import asyncio

co = cohere.AsyncClient(api_key="YOUR_COHERE_API_KEY")

TOOLS = [
    {
        "name": "search_documents",
        "description": "Search a document corpus for passages relevant to a query.",
        "parameter_definitions": {
            "query": {
                "description": "The search query string.",
                "type": "str",
                "required": True,
            }
        },
    },
    {
        "name": "get_document_by_id",
        "description": "Retrieve a specific document by its unique identifier.",
        "parameter_definitions": {
            "doc_id": {
                "description": "The document identifier.",
                "type": "str",
                "required": True,
            }
        },
    },
]


async def _call_command_r(chat_history: list, message: str, **kwargs) -> dict:
    response = await co.chat(
        model="command-r-plus",
        message=message,
        chat_history=chat_history,
        tools=TOOLS,
        **kwargs,
    )

    meta = response.meta
    # Command R+ pricing: $3/$15 per 1M input/output tokens
    input_tokens = (meta.tokens.input_tokens if meta and meta.tokens else 0)
    output_tokens = (meta.tokens.output_tokens if meta and meta.tokens else 0)
    usd = (input_tokens * 3.0 + output_tokens * 15.0) / 1_000_000

    # Signature: first tool call name, or "end_turn"
    tool_calls = response.tool_calls or []
    sig = tool_calls[0].name if tool_calls else "end_turn"

    return {"response": response, "usd": usd, "sig": sig}


guarded_call = guard_async(
    _call_command_r,
    budget={"max_usd": 1.0},
    loop={"repeats": 3, "max_cycle_len": 4},
)


async def run_command_r_agent(user_query: str) -> str:
    chat_history = []
    current_message = user_query

    while True:
        try:
            result = await guarded_call(chat_history, current_message)
        except LoopDetectedError as e:
            return f"Loop detected (pattern: {e.pattern!r}). Aborting after {len(chat_history)} turns."
        except BudgetExceededError as e:
            return f"Budget cap reached (${e.spent:.4f}). Aborting."

        response = result["response"]

        # Append this turn to history
        chat_history.append({
            "role": "USER",
            "message": current_message,
        })
        chat_history.append({
            "role": "CHATBOT",
            "message": response.text or "",
            "tool_calls": response.tool_calls or [],
        })

        if not response.tool_calls:
            # No more tool calls — the model has produced a final answer
            return response.text or ""

        # Execute tools and build tool results
        tool_results = []
        for tc in response.tool_calls:
            try:
                output = await dispatch_tool(tc.name, tc.parameters)
                tool_results.append(
                    cohere.ToolResult(
                        call=tc,
                        outputs=[{"result": str(output)}],
                    )
                )
            except Exception as exc:
                # Surface as structured error, not a plain string
                tool_results.append(
                    cohere.ToolResult(
                        call=tc,
                        outputs=[{"error": f"{type(exc).__name__}: {exc}"}],
                    )
                )

        # In Command R's tool-use pattern, the next API call passes tool_results
        # and uses the previous CHATBOT turn's tool_calls
        chat_history.append({
            "role": "TOOL",
            "tool_results": tool_results,
        })

        # Continue with an empty message — Cohere convention for tool-result continuation
        current_message = ""


async def dispatch_tool(name: str, parameters: dict) -> str:
    if name == "search_documents":
        return await search_documents(parameters["query"])
    elif name == "get_document_by_id":
        return await get_document_by_id(parameters["doc_id"])
    raise ValueError(f"Unknown tool: {name}")


# Stub implementations
async def search_documents(query: str) -> str: ...
async def get_document_by_id(doc_id: str) -> str: ...

Handling Cohere’s TOOL turn pattern. Cohere’s chat API uses a distinct "TOOL" role for tool results, unlike OpenAI’s "tool" role with a tool_call_id. The guard wraps the outgoing API call, not the tool execution, so the turn structure does not affect RunGuard’s wiring. The signature extraction happens on the response from each client.chat() call, regardless of whether the call was initiated by a user message or a tool result continuation.

Budget calibration for Command R agents

Command R pricing (2026 reference). Command R: $0.15/$0.60 per 1M input/output tokens. Command R+: $3.00/$15.00 per 1M input/output tokens. Command R+ is 20× more expensive on input and 25× more expensive on output. An agent run that costs $0.02 on Command R costs $0.40 on Command R+ for the same conversation history. Set your max_usd cap accordingly: what’s a safe limit for Command R may be grossly generous for Command R+.
Calibrating from P95 run costs. Run your agent on a representative sample of production inputs, log the dollar cost per run (available from response.meta.tokens), take the 95th percentile, and set max_usd to 2.5× that value. This gives a cap that never fires on legitimate long runs while catching runaway loops that cost 10× the P95. See autonomous agent cost control best practices for the full calibration methodology.
Command R vs. Command R+ for agentic tasks. For most agentic tasks, Command R (not R+) is the cost-effective choice for tool-call loops. Command R+ excels at complex reasoning and RAG tasks where quality matters. If your agent performs many short tool-call iterations — search, extract, validate — Command R’s lower cost makes it more suitable for the loop body, with Command R+ reserved for the final synthesis step. Structure your agent pipeline accordingly and set model-appropriate max_usd caps per step.

Cohere API defaults vs. RunGuard

Control	Cohere API default	RunGuard
Tool-call loop detection	Not supported	loop: repeats=3 fires on 3rd repeat of same pattern
Per-run cost cap	Not supported (account-level quota only)	budget: max_usd fires before each chat() call
Max turns	Not supported	Implicit via loop + budget caps
Context-window guard	400 after request sent	Pre-call ContextOverflowError before request sent
Slack/PagerDuty alert on trip	Not supported	alerts: slack_webhook or pagerduty_key
RAG-specific loop detection	Not supported	Same loop detector — RAG repeated-retrieval pattern is a period-1 cycle