Budget alerts and loop detection for Microsoft Semantic Kernel agents

Semantic Kernel is Microsoft’s enterprise AI SDK for orchestrating LLM calls, plugins, memory, and planners in .NET, Python, and Java. Its auto function-calling loop — where SK invokes your plugins in sequence until the model declares the task complete — is powerful and, without guardrails, dangerous. A badly-formed planner, a plugin that always returns a partial result, or a model that never reaches a stopping condition can burn through your Azure OpenAI credits in a single run. This page explains exactly where SK’s cost risk lives and how to add a per-run budget cap and loop detector without touching your SK plugins.

Where Semantic Kernel’s cost risk lives

SK’s primary agentic loop is the auto function-calling (AFC) pattern. You register plugins, pass a FunctionChoiceBehavior.Auto() to your kernel invocation, and SK iterates: call the model, extract tool invocations from the response, execute the plugins, append the results, call the model again. This continues until the model returns a response with no tool calls.

Four patterns produce runaway loops in this architecture:

What SK’s built-in limits miss

SK exposes a few knobs for controlling run length: you can set a maximum number of function invocations via FunctionChoiceBehavior options, and you can configure Polly retry limits on the HTTP client. These are count-based limits on individual operations, not pattern-based loop detectors.

Count-based limits have two failure modes:

  1. Too high — You set MaximumFunctionCallsPerRequest to 25 to handle complex tasks. A looping agent runs to 25 function calls before stopping. On GPT-4.1 Turbo, 25 planning rounds for a mid-complexity task costs ~$8. That’s per run, per bug, per improperly-formed query your users can send.
  2. Too low — You set it to 5 to cap costs. A legitimately complex orchestration task that needs 8 plugin calls hits the ceiling and produces an incomplete result. Your agent is now unreliable on valid inputs.

Pattern-based detection solves this: let the agent run as many steps as it needs for legitimate work, and trip only when the same signature repeats — which is the definition of being stuck.

Wrapping SK plugins with RunGuard in Python

In Python, SK plugins are classes with methods decorated with @kernel_function. You wrap the underlying Python function with guard() before the SK decorator sees it:

from semantic_kernel.functions import kernel_function
from runguard import guard, BudgetTracker, LoopDetectedError

tracker = BudgetTracker(max_usd=3.0)

class SearchPlugin:

    @kernel_function(name="SearchWeb", description="Search the web for information")
    @guard(budget=tracker, loop_window=20, loop_threshold=3)
    async def search_web(self, query: str) -> str:
        # RunGuard checks before this executes
        # SK auto function-calling sees the result as usual
        return await self._search_api.query(query)

# Register plugin and run normally
from semantic_kernel import Kernel

kernel = Kernel()
kernel.add_plugin(SearchPlugin(), plugin_name="Search")

# Handle the trip at the invocation level
try:
    result = await kernel.invoke(
        plugin_name="Search",
        function_name="SearchWeb",
        query="AI regulations 2026"
    )
except LoopDetectedError as e:
    print(f"SK loop detected on plugin {e.tool_name}: {e.count} repeats")
    # Return a structured error to the planning loop
    raise

The @guard decorator runs before @kernel_function in the MRO, so RunGuard sees every invocation that SK’s AFC loop generates, regardless of which planner created the plan.

Wrapping SK plugins with RunGuard in TypeScript

// TypeScript SK — plugins are classes with @kernelFunction decorator
import { KernelFunction, kernelFunction } from "@microsoft/semantic-kernel";
import { guard, BudgetTracker, LoopDetectedError } from "runguard";

const tracker = new BudgetTracker({ maxUsd: 3.0 });

class SearchPlugin {
  private readonly guardedSearch = guard(
    async (query: string): Promise<string> => {
      return await this.searchApi.query(query);
    },
    { budget: tracker, loopWindow: 20, loopThreshold: 3 }
  );

  @kernelFunction({
    description: "Search the web for information",
    parameters: [{ name: "query", description: "The search query" }],
  })
  async searchWeb(query: string): Promise<string> {
    // RunGuard's guard is called inside the SK kernel function
    return await this.guardedSearch(query);
  }
}

Budget alert: what happens when the ceiling is hit

When BudgetTracker.max_usd is reached mid-run, RunGuard raises BudgetExceededError before the next plugin call executes. The exception bubbles up through SK’s AFC loop. If you catch it at the kernel.invoke() level, you can:

The key property: the alert fires before the next call, not after the run completes. You get a real-time notification while you can still act on it — before a bad planner plan or a production edge case turns into a five-figure bill.

Calibrating the budget for SK agents

For Semantic Kernel agents, a practical calibration heuristic:

Add budget guardrails to your Semantic Kernel agents

RunGuard’s Python and TypeScript SDKs wrap any callable. For Semantic Kernel, add @guard to your plugin methods and BudgetTracker to your kernel setup. A five-minute integration replaces a bespoke cost-counting middleware you would otherwise have to build and maintain yourself.

Get started with RunGuard — or see how the same pattern applies to AutoGen, smolagents, and CrewAI.