AI agent cost allocation in SaaS products: per-user LLM cost tracking and the unit economics of AI features
Adding an AI agent to your SaaS product changes the unit economics in ways that traditional software pricing never had to address. A database query costs essentially nothing marginal; a database query that triggers an LLM chain can cost $0.05 to $2.00 depending on context window size, model tier, and whether the agent loops. When you have 10,000 active users and each one triggers that chain five times per day, the LLM cost is $2,500 to $100,000 per day — a range too wide to ignore and too volatile to manage without per-user attribution. The SaaS businesses that are profitable on AI features in 2026 are the ones that track LLM cost per user, per pricing tier, and per feature with the same precision they track infrastructure cost per user. They know which user cohort is profitable, which pricing tier is underwater, and which feature has a cost-to-revenue ratio that justifies its continued existence. This guide covers how to build that attribution and use it to make pricing, feature, and infrastructure decisions that keep AI-feature unit economics positive.
The unit economics of AI features in SaaS
- Cost-to-revenue ratio as the primary AI feature metric. The fundamental unit economics metric for an AI feature is the cost-to-revenue ratio (CTR): the fraction of the revenue attributable to the feature that is consumed by the LLM cost to deliver it. A feature that drives $50/mo of ARR per user and costs $1.50/mo in LLM cost per user has a 3% CTR — healthy. A feature that drives $50/mo of ARR per user and costs $12/mo in LLM cost per user has a 24% CTR — a business model that stops working at scale. Tracking CTR per feature tells you which AI features are investments (positive CTR leverage) and which are liabilities (CTR approaching or exceeding the feature’s revenue contribution). See LLM FinOps team practices for how to operationalize CTR in a weekly review rhythm.
- Per-user LLM cost distribution is skewed, not normal. In every SaaS product with AI features, per-user LLM cost follows a power-law distribution: 80% of users generate 20% of cost, and 20% generate 80%. Within the high-cost 20%, there is often a long tail of heavy users who individually generate 10–100× the average LLM cost. Understanding this distribution is critical for pricing and margin decisions: if your top 5% of users by LLM cost are each consuming $25/mo in compute to deliver a $29/mo plan, that cohort is unprofitable. But if those same users have 90% retention and 70% expansion rate, the LTV calculation changes the picture. Per-user cost attribution lets you make this analysis; aggregate cost averages hide it.
- Identify the cost drivers within features, not just between features. When a feature’s aggregate LLM cost is high, the root cause is usually a specific user behavior pattern or a specific code path, not the feature as a whole. A research agent that costs $0.20/session on average may have a P99 session cost of $3.50 driven by users who ask extremely long questions with complex document attachments. The average is a weighted mean of two very different populations: the $0.12 sessions (90% of sessions) and the $2.80 sessions (10% of sessions, driven by a specific input pattern). Reducing the P99 by adding a context window limit for long documents might reduce average cost by 40% with no impact on the P50 user experience. You cannot make this optimization without per-session cost attribution showing the cost distribution.
- Cost attribution to pricing tiers determines tier profitability. If your SaaS has three pricing tiers (Solo $19/mo, Team $79/mo, Enterprise $299/mo) and all tiers include the AI agent feature, you need LLM cost attributed per tier to know whether each tier is profitable. A Solo user who generates $4.50/mo in LLM cost on a $19/mo plan is contributing $14.50 gross margin before infrastructure costs. A Solo user who generates $18.50/mo in LLM cost on a $19/mo plan is generating $0.50 gross margin — barely profitable and one bad session from negative. If 15% of Solo users are in the second category, you have a cohort profitability problem that requires either usage limits for the Solo tier or a price increase.
Implementing per-user LLM cost attribution
- Pass user context to every LLM call. The foundation of per-user cost attribution is tagging every LLM API call with the user’s ID, pricing tier, and feature context. In most LLM SDK implementations, this means adding a metadata object to every call:
{ user_id: "usr_abc123", tier: "solo", feature: "ai_search", session_id: "ses_xyz" }. These tags flow through to your cost tracking layer (whether that’s RunGuard, a custom middleware, or your LLM provider’s cost export) and allow you to compute cost-per-user, cost-per-tier, and cost-per-feature from the same underlying data. Without these tags, cost attribution requires reverse-engineering from session logs, which is both expensive and inaccurate. - Compute cost attribution in near real time, not monthly. Monthly cost attribution is too slow for SaaS management decisions. If a cohort of heavy users is consuming 3× their tier’s implied LLM budget, you want to know within a week — not at the end of the billing cycle when you’ve already incurred the cost for 30 days. Implement a daily cost attribution pipeline: aggregate per-user LLM costs from your cost tracking layer, join to user attributes (tier, signup date, engagement metrics), and write to a data warehouse. Run daily aggregate queries and route exceptions (users whose weekly LLM cost exceeds a threshold relative to their tier’s implied budget) to a cost monitoring Slack channel for review.
- Handle shared-session costs in multi-user contexts. Some AI agent sessions are initiated at the account level rather than the user level (a team agent running a nightly research job, a Slack bot responding to messages from multiple team members). In these cases, LLM cost cannot be attributed to a single user — it should be attributed to the account and then either distributed evenly across account members or treated as a flat account-level cost. Define your attribution policy for shared sessions in advance and implement it consistently, or your per-user cost data will have systematic attribution errors that make cohort analysis unreliable.
- Store per-session cost data for retrospective analysis. Per-user LLM cost attribution is not just a real-time monitoring tool — it’s a historical data asset that powers pricing and feature decisions months later. Store per-session cost records (user ID, session ID, feature, start time, total cost, token counts) in a queryable format for at least 12 months. This historical data enables cohort analysis (“do users who joined in Q4 have lower LLM costs because they were onboarded with the new context limits?”), feature impact analysis (“did the prompt compression we shipped in March reduce per-session cost for the AI drafting feature?”), and pricing model validation (“at our current cost structure, is the $19 Solo plan profitable for the median user?”).
Tier-based budget enforcement in SaaS products
- Set per-tier implied LLM budget and enforce it technically. Each pricing tier in your SaaS implicitly allocates a certain amount of LLM cost per user per month — the amount at which the tier is profitable given your target gross margin. Make that implicit allocation explicit: compute the maximum LLM cost per user per month that keeps each tier at target margin, and configure a session-level budget ceiling for users in that tier. A Solo user who hits $4.50/mo of LLM cost in the first week of the month should have their per-session budget ceiling tightened (or be shown an upgrade prompt) rather than continuing to generate unlimited cost into a plan that won’t support it.
- Graceful degradation vs. hard stop for over-budget users. When a user exceeds their tier’s implied LLM budget, you have two enforcement options: graceful degradation (route their sessions to a cheaper, faster model tier with a lower session budget ceiling) or hard stop (block AI feature usage until the next billing cycle). Graceful degradation is almost always preferable for retention: users who experience reduced AI quality are frustrated but stay; users who are hard-blocked often churn. Implement graceful degradation by routing over-budget users to a cheaper model tier automatically, with an in-app notification that their current plan’s AI capacity is exhausted and an upgrade prompt. See LLM model routing cost optimization for model router implementation patterns.
- Usage limit display as a conversion driver. For freemium and trial users, the moment when they hit their LLM budget ceiling is the highest-intent upgrade moment. A user who has used the AI feature enough to exhaust their free allocation is a user who has demonstrated the feature’s value. Display remaining AI capacity prominently in the UI: “3 AI sessions remaining this month — upgrade for unlimited.” As capacity depletes, the upgrade CTA becomes more prominent and contextual. Teams that track LLM cost per user can engineer this conversion moment precisely; teams that don’t know their per-user cost cannot know when to show the upgrade prompt.
- Power-user identification and proactive outreach. Per-user LLM cost attribution identifies your power users — the users whose LLM usage is 3–10× the median for their tier. These users are either upgrade candidates (they need more capacity and would benefit from a higher tier) or cost risks (they are consuming unprofitable amounts of LLM resources at their current price point). Power-user identification should trigger two workflows: proactive outreach from the account management team (“we noticed you’re using our AI features heavily — let’s talk about a plan that scales with you”) and a technical review to see whether their usage pattern reveals an optimization opportunity (context window trimming, prompt compression, task decomposition that reduces per-session cost). See AI agent token efficiency optimization for cost reduction patterns.
RunGuard for SaaS AI feature cost allocation
- User-tagged session tracking. RunGuard’s BudgetTracker supports passing user context (user ID, tier, feature name) on each session initialization. This context is stored with every cost record, enabling per-user and per-tier cost attribution from RunGuard’s data without needing a separate attribution layer.
const guard = new RunGuard({ appName: 'ai-drafting-feature', context: { userId: req.user.id, tier: req.user.subscriptionTier, // 'solo' | 'team' | 'enterprise' featureId: 'ai-draft' }, budget: { // Tier-based session budget ceiling maxCostPerSessionUsd: tierBudgets[req.user.subscriptionTier] ?? 0.25, onBudgetExceeded: async (ctx) => { await showUpgradePrompt(ctx.userId); } } }); - Tier-specific budget ceiling enforcement. RunGuard’s per-session budget ceiling can be set dynamically based on the user’s tier at session initialization time. This means a Solo user gets a $0.25 session ceiling while a Team user gets a $1.00 ceiling and an Enterprise user gets a $5.00 ceiling — all enforced by the same circuit breaker, with different parameters per tier. When a Solo user hits their ceiling, RunGuard fires the
onBudgetExceededcallback, which can show an upgrade prompt rather than an error message. Cost enforcement becomes a conversion driver, not just a cost control. - Session cost export for per-user attribution analysis. Export RunGuard’s session cost records (with user ID, tier, and feature tags) to your data warehouse for cohort profitability analysis. A weekly SQL query that computes median and P95 LLM cost per user, grouped by tier and signup cohort, gives you the per-tier profitability picture that drives pricing and feature decisions. See LLM agent cost per feature tracking for granular attribution at the feature level.
Know your AI feature unit economics before they know you.
SaaS businesses that track LLM cost per user, per tier, and per feature make better pricing decisions, build more profitable AI features, and grow gross margins as they scale. RunGuard gives you the per-session cost data and tier-based enforcement that turns per-user attribution from a reporting exercise into an operational control.
Start free trial →