LLM agent cost per feature tracking: how to know which AI feature is eating your margin
Your LLM API bill shows a total. It does not show you that your AI search feature costs $1,800/month, your AI summarization feature costs $600/month, and your AI drafting feature costs $4,200/month — and that the drafting feature has a CTR (cost-to-revenue ratio) of 31%, which is three times what’s sustainable. That breakdown is what per-feature cost tracking gives you. Without it, you optimize blindly: you might spend three sprints improving the token efficiency of the search feature ($1,800/month, 8% CTR — fine as-is) while ignoring the drafting feature ($4,200/month, 31% CTR — in urgent need of either optimization or repricing). The optimization capacity you spend in the wrong place has an opportunity cost equal to the optimization you should have been doing. Per-feature LLM cost tracking is not a nice-to-have; it’s the prerequisite for rational allocation of engineering effort on cost optimization and for making pricing decisions that reflect actual AI cost structure.
Why aggregate LLM costs are useless for optimization decisions
- Averages hide the distribution that drives decisions. The average LLM cost per session across your product might be $0.18. But if your five AI features have session costs of $0.04, $0.09, $0.15, $0.22, and $1.40, the average tells you almost nothing useful. The feature at $1.40/session is 9.4× the product-wide average and almost certainly has a different root cause than the $0.04 feature. Optimizing from the average might lead you to try to shave 10% from the $0.15 feature (saving $0.015 per session) when the $1.40 feature has obvious low-hanging fruit (reducing context window size could cut its cost by 60%, saving $0.84 per session). Per-feature attribution makes this analysis possible; aggregate averages make it invisible.
- Cost and value don’t correlate without per-feature data. High LLM cost per session is not inherently a problem — it is a problem only when the cost is high relative to the value (revenue, engagement, retention) the feature delivers. An AI research feature that costs $0.80/session but drives 95% retention for users who use it weekly is a better investment than an AI autocomplete feature that costs $0.12/session but has 30% adoption and no measurable retention impact. Per-feature cost tracking, combined with per-feature engagement metrics, lets you compute the cost-to-value ratio for each feature and make investment decisions based on leverage rather than absolute cost. See AI agent cost allocation in SaaS for the full cost-to-value analysis framework.
- Feature-level cost trends reveal optimization opportunities and regressions. The week-over-week cost trend for each feature is more informative than the absolute cost level. A feature whose per-session cost has increased 8% week-over-week for four consecutive weeks has a compound growth problem that will double its cost in nine weeks. That trend might be caused by: users asking more complex queries (input token growth), a prompt update that increased output verbosity (output token growth), a new tool added to the agent that returns large results (tool result token growth), or a change in user behavior (more sessions from a higher-cost user cohort). None of these root causes are visible from aggregate cost trending; all of them are visible from per-feature cost trending with per-session breakdowns. See LLM structured output cost impact for how output formatting changes affect feature cost trends.
- Optimization prioritization requires knowing the ROI per feature. Engineering optimization capacity is finite. Per-feature cost data lets you prioritize optimization work by expected ROI: for each feature, compute (current monthly cost − optimized cost estimate) as the monthly savings, and (engineering days to implement) as the cost. Features with high savings potential and low implementation cost should be first in the optimization queue. Without per-feature data, prioritization is based on intuition or politics rather than ROI. With it, the optimization backlog becomes a sorted list of investments ranked by return.
Attribution architecture: how to track cost per feature
- Feature tag at the LLM call layer, not the session layer. The most granular attribution tracks cost at the individual LLM call level, tagged with the feature context. This matters when a single user session spans multiple features: a user might ask a question that triggers the AI search feature, then refine it using the AI drafting feature, then summarize the results using the AI summarization feature — all in a single product session. Session-level attribution would assign the entire cost to the first feature touched or spread it undifferentiated across the session. Call-level attribution assigns each LLM call to the specific feature context that triggered it, which is the accurate attribution.
- Feature context propagation through call chains. When an AI feature triggers a multi-step LLM chain (retrieval → summarization → generation), all calls in the chain should be attributed to the initiating feature, not just the first call. Implement feature context as a thread-local or async-context variable that propagates through the call chain automatically. This prevents the common attribution error where only the first call in a chain is tagged with the feature context, making the chain appear much cheaper than it actually is in per-feature dashboards.
- Shared infrastructure cost allocation. Some LLM costs are genuinely shared across features: a central embedding model used by multiple features, a shared retrieval pipeline, a common re-ranking step. These shared costs should be allocated to consuming features using a usage-based proportional model: if the AI search feature makes 60% of all requests to the shared embedding API and the AI Q&A feature makes 40%, allocate 60%/40% of the shared embedding cost accordingly. Document the allocation method and apply it consistently so that per-feature cost figures are comparable across time periods. Undocumented allocation changes make historical cost trends uninterpretable.
- Separate development, staging, and production cost attribution. LLM costs incurred in development and staging environments (testing prompts, running integration tests, developer experimentation) should not be attributed to production features. Implement environment-aware tagging that excludes non-production cost from your per-feature production cost figures. Development LLM costs are legitimate but belong in a separate budget line (developer tooling or R&D overhead) rather than inflating the per-feature production cost metrics that drive optimization decisions. Use the
environmenttag on every LLM call and filter it at the attribution aggregation layer.
Per-feature dashboards: what to measure and display
- Five essential metrics per feature in the weekly dashboard. A per-feature cost dashboard should show five metrics: (1) total LLM spend for the week, (2) cost per session (median and P95), (3) cost-to-revenue ratio (LLM cost as % of feature-attributable ARR), (4) week-over-week cost trend (% change in median session cost vs. prior week), and (5) anomaly events count (sessions that exceeded the warning threshold). These five metrics fit in a single table row per feature, making a multi-feature overview scannable in 30 seconds. Engineers reviewing the dashboard should be able to identify which feature needs attention immediately without drilling into detailed reports.
- Cost decomposition per feature: input, output, tool calls. For features where the weekly review identifies a concerning cost trend, the next layer of analysis is cost decomposition: what fraction of the feature’s per-session LLM cost comes from input tokens, output tokens, tool call results, and tool invocations? A feature where 70% of cost is in input tokens has a different optimization path (context pruning, prompt compression, retrieval filtering) than a feature where 70% of cost is in output tokens (output length limits, structured output formatting, response caching). See AI agent context pruning strategies and AI agent prompt compression cost savings for the relevant optimization techniques for each cost driver.
- Feature cost lifecycle: launch, growth, maturity, decline. Per-feature cost tracking over time reveals the feature cost lifecycle. A newly launched feature typically has higher per-session costs (prompts are not optimized, context windows are conservatively large, model tier is not right-sized) that decrease as the team optimizes. A feature in growth may see costs increase as usage grows faster than optimization work. A mature feature should have stable, optimized costs. A feature in decline may have rising costs relative to falling engagement as the remaining users are a less-representative sample. Understanding where each feature is in its cost lifecycle helps prioritize engineering attention: optimization work has highest ROI in the launch and growth phases when changes have long-term compounding effect.
- Alert on feature cost regressions, not just anomalies. A session-level cost anomaly is a discrete event. A feature cost regression is a sustained trend: median session cost increasing more than 5% week-over-week for two or more consecutive weeks. Configure an alert that fires when this regression pattern is detected for any feature. Unlike anomaly alerts (which require immediate response), regression alerts trigger a scheduled investigation: what changed in this feature’s prompt, model, or tool set in the last two weeks that could explain the cost increase? A regression caught at +10% above baseline is cheap to fix; a regression caught at +40% above baseline may have already generated hundreds or thousands of dollars in excess cost.
RunGuard for per-feature cost tracking
- Feature-tagged session initialization. RunGuard’s
appNamefield at initialization maps to a feature in the dashboard. Initialize a separate RunGuard instance per feature (or pass a dynamicfeatureIdcontext field) to separate cost records by feature:// Separate RunGuard context per feature const searchGuard = new RunGuard({ appName: 'ai-search', budget: { maxCostPerSessionUsd: 0.30 } }); const draftGuard = new RunGuard({ appName: 'ai-drafting', budget: { maxCostPerSessionUsd: 1.50 } }); const summarizeGuard = new RunGuard({ appName: 'ai-summarize', budget: { maxCostPerSessionUsd: 0.20 } });Each feature’s costs flow to a separate line in the RunGuard dashboard, making per-feature weekly cost analysis available without any additional infrastructure. - Per-call cost records for cost decomposition. RunGuard records cost per LLM call within each session, including token counts for input and output separately. This enables the input/output/tool call decomposition analysis without building a custom instrumentation layer. When a feature’s per-session cost is trending up, RunGuard’s per-call breakdown shows whether the increase is in input tokens (growing context), output tokens (growing response verbosity), or turn count (more tool calls). Each cost driver has a different optimization path, and identifying the right one saves 60–80% of the time that would be spent investigating the wrong hypothesis.
- Feature cost budget enforcement. Set a daily and per-session budget ceiling per feature in RunGuard. When the AI drafting feature’s daily budget is exhausted, RunGuard fires the budget callback and you can degrade the feature gracefully (switch to a faster, cheaper model, reduce output length limits, show a “daily AI capacity reached” message) without any custom budget tracking logic. The per-feature budget enforcement ensures that one overperforming feature cannot consume budget intended for other features without any action from the engineering team. See LLM FinOps team practices for how per-feature enforcement fits into an organization-wide cost governance practice.
You can’t optimize what you can’t measure per feature.
Per-feature LLM cost tracking transforms cost optimization from guesswork into a ranked list of engineering investments ordered by ROI. RunGuard provides the per-session, per-call cost attribution that makes feature-level analysis possible from day one — no custom instrumentation required.
Start free trial →