smolagents `CodeAgent` generates code to call tools. RunGuard catches the loops before the code keeps running.

HuggingFace’s smolagents library takes a code-first approach to AI agents. Instead of emitting structured tool-call JSON, the CodeAgent class generates executable Python code on each step: the model writes a snippet that calls one or more tool functions and assigns their outputs to variables. The generated code is executed in a sandboxed interpreter, the outputs are fed back into the agent’s context, and the model generates the next code snippet. ToolCallingAgent uses the more conventional JSON tool-call format. Both agents support a max_steps parameter that limits the number of agent iterations, similar to AutoGen’s MaxMessageTermination. Like all step-count terminators, max_steps fires when the iteration budget is exhausted, not when a repeated pattern is detected: a CodeAgent that generates result = web_search("AI safety 2026") on step 2, then the identical code on step 4, then again on step 6, pays for every generation in full before max_steps fires at step 10 (or wherever you set it). RunGuard wraps the model’s generate call at the layer smolagents uses internally, so the loop detector sees each step’s code generation as a tool-call signature and fires on the third repeated pattern — before step 7’s generation goes out and before the LLM cost for that step lands on your invoice.

How smolagents works and where loops form

CodeAgent: the model writes Python that calls tools. On each step, CodeAgent sends the task, tool descriptions, and accumulated observation history to the LLM. The model generates a Python code block that invokes one or more of the registered tool functions. The smolagents runtime executes the code in a LocalPythonInterpreter (or a sandboxed E2B executor), captures stdout, return values, and any exceptions, and appends the results as observations to the agent’s history. The model then sees all prior observations plus the new ones and generates the next code block. This code-writing approach gives the model more flexibility than JSON tool-calls (it can chain multiple tool calls in one step, use Python control flow, and assign intermediate results to variables) at the cost of a more complex execution surface.
The loop pattern in CodeAgent: identical code blocks, same tool, same arguments. A CodeAgent loop manifests as the model generating the same (or nearly identical) code block on consecutive steps. The canonical form is: the model calls web_search("query"), gets a result, reasons that the result is insufficient, and on the next step generates web_search("query") again — same function, same arguments, same 200-token result. From the model’s perspective, it is trying to get more information; from the budget’s perspective, it has already paid for this result once and is about to pay again. The loop is detectable by signature: the generated code’s first tool call name plus a truncated snapshot of its arguments forms a repeating pattern in the step history.
ToolCallingAgent: JSON tool-calls, same loop risk. ToolCallingAgent uses the standard tool-calling interface (structuredJSON tool calls, not raw code generation), similar to LangChain agents or AutoGen. The loop pattern is identical: the agent calls the same tool with the same arguments on consecutive steps. The detection approach is the same as for CodeAgent: RunGuard wraps the model’s generate call and derives a signature from the first tool call in each response.
The max_steps backstop: useful but not a loop detector. smolagents’ max_steps parameter (default 6 in some configurations, configurable at agent construction) fires when the agent has completed max_steps iterations. A loop caught by max_steps=6 at step 6 has already paid for 3 redundant identical generations (steps 2, 4, 6 in a same-tool loop). RunGuard catches the third repetition at step 6’s preflight and prevents that generation, leaving you with the cost of steps 2 and 4 only. More importantly, if max_steps is set high (12, 20, 50) to allow genuinely complex multi-step tasks to complete, a loop that repeats 6 times before max_steps fires can accumulate significant cost. RunGuard’s repeats: 3 default fires regardless of how high max_steps is set.

Where to wrap RunGuard in a smolagents stack

smolagents agents use a model object (an HfApiModel, OpenAIServerModel, LiteLLMModel, or a custom model class) that exposes a __call__ method. The agent calls model(messages, stop_sequences=...) on each step, and the model object handles the HTTP call to the LLM provider. The correct place to add RunGuard is as a wrapper around the model object’s __call__ method, before the HTTP call goes out. You subclass or monkey-patch the model to run the guard on each call, extract the usd and sig from the response, and either return the response (if the guard passes) or raise LoopDetectedError / BudgetExceededError (if the guard fires). The agent’s step loop will receive the exception from the model call and can route it to a graceful error handler. Alternatively, you can subclass CodeAgent or ToolCallingAgent and override the step() method to wrap the model call with guard() inline, but the model-object approach is simpler and works without touching the agent’s internal logic.

Implementation: smolagents with RunGuard budget and loop guard

Python: wrapping an OpenAIServerModel

from smolagents import CodeAgent, ToolCallingAgent, OpenAIServerModel, DuckDuckGoSearchTool
from runguard import guard, LoopDetectedError, BudgetExceededError
import openai

# --- Build the guarded model wrapper ---

class GuardedOpenAIModel(OpenAIServerModel):
    """OpenAIServerModel with RunGuard budget + loop detection."""

    def __init__(self, *args, max_usd: float = 5.0, **kwargs):
        super().__init__(*args, **kwargs)
        self._guard = None
        self._max_usd = max_usd

    def _make_guarded_call(self, messages, stop_sequences=None, **kwargs):
        """The actual model call — wrapped by guard()."""
        response = super().__call__(messages, stop_sequences=stop_sequences, **kwargs)

        # Extract cost from usage
        usage = getattr(response, "usage", None)
        usd = 0.0
        if usage and hasattr(usage, "prompt_tokens"):
            # GPT-4o pricing
            usd = (usage.prompt_tokens * 2.5 + usage.completion_tokens * 10) / 1_000_000

        # Extract tool-call signature
        sig = "end_turn"
        if hasattr(response, "content") and isinstance(response.content, list):
            for block in response.content:
                name = getattr(block, "name", None) or getattr(getattr(block, "function", None), "name", None)
                if name:
                    sig = name
                    break

        return {"response": response, "usd": usd, "sig": sig}

    def __call__(self, messages, stop_sequences=None, **kwargs):
        if self._guard is None:
            self._guard = guard(
                self._make_guarded_call,
                budget={"max_usd": self._max_usd},
                loop={"repeats": 3, "max_cycle_len": 8},
            )
        result = self._guard(messages, stop_sequences=stop_sequences, **kwargs)
        return result["response"]


# --- Use the guarded model in a CodeAgent ---

model = GuardedOpenAIModel(
    model_id="gpt-4o",
    api_base="https://api.openai.com/v1",
    api_key="YOUR_KEY",
    max_usd=3.0,   # $3 per-run cap
)

agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=model,
    max_steps=20,   # outer backstop
)

try:
    result = agent.run("Research and summarize the main AI safety frameworks published in 2026.")
    print(result)
except LoopDetectedError as e:
    print(f"Loop detected: pattern={e.pattern!r}, repeats={e.repeats}")
except BudgetExceededError as e:
    print(f"Budget exceeded: ${e.spent:.4f} spent (cap $3.00)")

Using HfApiModel (Inference API)

from smolagents import CodeAgent, HfApiModel, DuckDuckGoSearchTool
from runguard import guard, LoopDetectedError, BudgetExceededError

class GuardedHfApiModel(HfApiModel):
    """HfApiModel with RunGuard loop detection (cost tracking omitted — HF API pricing varies)."""

    def __init__(self, *args, max_steps_guard: int = 30, **kwargs):
        super().__init__(*args, **kwargs)
        self._call_count = 0
        self._max_steps_guard = max_steps_guard
        self._loop_guard = None

    def _inner_call(self, messages, stop_sequences=None, **kwargs):
        response = super().__call__(messages, stop_sequences=stop_sequences, **kwargs)
        # No reliable token-cost for HF Inference API — use step count as a proxy
        self._call_count += 1
        usd = float(self._call_count)  # treat call count as "cost" for budget guard
        sig = "end_turn"
        # Try to extract tool name from the response
        if hasattr(response, "content") and isinstance(response.content, list):
            for block in response.content:
                name = getattr(block, "name", None)
                if name:
                    sig = name
                    break
        return {"response": response, "usd": usd, "sig": sig}

    def __call__(self, messages, stop_sequences=None, **kwargs):
        if self._loop_guard is None:
            self._loop_guard = guard(
                self._inner_call,
                # max_usd doubles as max call count when usd=call_count
                budget={"max_usd": float(self._max_steps_guard)},
                loop={"repeats": 3, "max_cycle_len": 6},
            )
        result = self._loop_guard(messages, stop_sequences=stop_sequences, **kwargs)
        return result["response"]

model = GuardedHfApiModel(
    model_id="meta-llama/Llama-3.1-70B-Instruct",
    max_steps_guard=15,  # hard cap at 15 model calls regardless of tool pattern
)

agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model, max_steps=20)

try:
    result = agent.run("Find and compare the top 5 open-source agent frameworks by GitHub stars.")
    print(result)
except LoopDetectedError as e:
    print(f"Loop detected at call #{e.spent:.0f}: {e.pattern!r}")
except BudgetExceededError as e:
    print(f"Max call count exceeded: {e.spent:.0f} calls")

The code-generation loop: a unique signature challenge

smolagents’ CodeAgent generates free-form Python rather than structured tool-call JSON. This creates a nuance for signature-based loop detection: the “tool call” is implicit in the code (the function name and arguments that appear in the generated Python snippet) rather than explicit in a structured response field. To extract a signature from a code block, you need to parse the generated code and find the first tool function call. The examples above extract the tool name from the model’s response object (the formatted message that smolagents’ model client returns) which, for ToolCallingAgent, is a structured tool call. For CodeAgent, the response is raw text; in that case, a practical signature is the first function call token in the generated code, extracted with a simple regex: re.search(r"\b(\w+)\(", generated_code). The first function call in the code is usually the tool the agent is trying to invoke; if the agent is calling the same tool with the same arguments on consecutive steps, the first-function-call signature will repeat and the loop detector will fire. For finer-grained detection, include a 64-byte hash of the function call arguments: first_call_name + ":" + md5(first_call_args[:64]).hexdigest(). The examples above use smolagents’ response format where the tool name is available directly; adapt the signature extraction to your specific model client’s response format.

What this is not

Not a smolagents plugin or an official HuggingFace integration. RunGuard wraps the model object’s __call__ method, which is a public interface in all smolagents model classes. The wrapper is a subclass, not a monkey-patch of smolagents internals. If smolagents changes the model client interface in a future version, you may need to update the wrapper to match the new signature. RunGuard’s SDK itself will not require updates (it wraps whatever function you give it); only the glue code that extracts usd and sig from the smolagents response format may need updating.
Not a sandboxed code executor or a security guard. smolagents’ LocalPythonInterpreter and E2B executor run the generated Python code. RunGuard fires before the model generates the code, not before the code executes. RunGuard does not inspect the generated code for security issues (arbitrary file access, network calls from the generated code, etc.). For security sandboxing, use smolagents’ built-in E2B integration or configure LocalPythonInterpreters authorized imports list. RunGuard is a cost and loop guard at the LLM call layer, not a code safety layer.
Not limited to smolagents — the same pattern applies to any framework with a model-object abstraction. The GuardedModel subclass pattern works for any framework that routes LLM calls through a model object with a __call__ method: LangChain’s BaseChatModel, AutoGen’s ChatCompletionClient, DSPy’s LM class, and Haystack’s ChatGenerator all follow the same pattern. Subclass the model, override the call method, add the guard. RunGuard ships as @runguard/sdk on npm and runguard on PyPI. The full API is in llms.txt. The AutoGen loop guard page and CrewAI loop detection page show the same subclass pattern for those frameworks.