
How we meter AI usage with a credit system, enforce quotas tied to billing plans, track every invocation, and wire it all together in the LLM orchestrator.
Part 1 covered the provider registry and credential routing — how we pick the right API key for every AI call. This post covers what happens around that call: metering, quotas, and usage tracking.
Early on, we considered charging users for exact token costs. The problem: unpredictable bills, complex per-model pricing, and users afraid to experiment. "I don't want to click that button if it costs me $0.50."
Credits solve this. Every subscription plan includes a monthly credit allowance. One credit equals one AI invocation. Users who want unlimited usage bring their own API keys and pay their provider directly.
// packages/lib/src/ai/providers/types.ts
const PLAN_CREDIT_LIMITS = {
starter: 1000, // 1,000 AI calls/month
growth: 5000, // 5,000 AI calls/month
business: 20000, // 20,000 AI calls/month
enterprise: 100000, // 100,000 AI calls/month
} as const
const DEFAULT_QUOTA_LIMITS = {
[ProviderQuotaType.TRIAL]: 50, // 50 calls during trial
[ProviderQuotaType.FREE]: 100, // 100 calls/month on free tier
[ProviderQuotaType.PAID]: 1000, // Starter plan default
} as const
We deliberately didn't do per-token billing. A few reasons:
Yes, a GPT-4o call costs us more than a Haiku call. We absorb that variance at the platform level. The alternative — per-token billing with model-specific rates — adds complexity users don't want.
The QuotaService handles upgrades, downgrades, trials, and quota checks. It's scoped to an organization and operates on SYSTEM provider configuration records.
// packages/lib/src/ai/quota/quota-service.ts
export class QuotaService {
constructor(
private db: Database,
private organizationId: string
) {}
async upgradeToPaid(creditLimit: number): Promise<void> {
const now = new Date()
const periodEnd = new Date(now)
periodEnd.setMonth(periodEnd.getMonth() + 1)
await this.db
.update(schema.ProviderConfiguration)
.set({
quotaType: ProviderQuotaType.PAID,
quotaLimit: creditLimit,
quotaUsed: 0,
quotaPeriodStart: now,
quotaPeriodEnd: periodEnd,
updatedAt: now,
})
.where(
and(
eq(schema.ProviderConfiguration.organizationId, this.organizationId),
eq(schema.ProviderConfiguration.providerType, 'SYSTEM')
)
)
}
async downgradeToFree(): Promise<void> {
// Same pattern — sets quotaType to FREE, limit to 100
}
async setTrialQuota(trialCredits = DEFAULT_QUOTA_LIMITS[ProviderQuotaType.TRIAL]): Promise<void> {
// Same pattern — sets quotaType to TRIAL, limit to 50
}
async resetQuota(): Promise<void> {
// Zeroes quotaUsed, sets new period start/end
}
}
A few things to note:
Quota lives on the ProviderConfiguration row. Not in a separate table. The hot path — "check quota, get credentials" — hits one row. Since quotas only apply to SYSTEM providers, colocating them with the SYSTEM configuration row avoids a JOIN.
quotaLimit: -1 means unlimited. Used for enterprise plans and self-hosted deployments.
No rollover. Unused credits don't carry to the next month. This keeps accounting simple and the upgrade incentive clear.
Self-hosted deployments skip all quota logic:
async getQuotaStatus() {
if (isSelfHosted()) {
return {
quotaType: ProviderQuotaType.PAID,
quotaUsed: 0,
quotaLimit: -1, // Unlimited
quotaPeriodStart: null,
quotaPeriodEnd: null,
percentUsed: 0,
isExceeded: false,
}
}
// ... normal DB query
}
Self-hosted users manage their own provider costs. The system still records usage for analytics, but never enforces limits.
When subscription events fire from Stripe:
subscription.created → quotaService.upgradeToPaid(planCreditLimit)
subscription.updated → quotaService.upgradeToPaid(newPlanLimit)
subscription.deleted → quotaService.downgradeToFree()
invoice.paid → quotaService.resetQuota()
Credit resets align with the Stripe billing cycle. When a new invoice is paid, quotaUsed goes back to 0 and the period starts fresh.
Every AI call — whether it uses platform credits or the org's own key — gets logged.
// packages/database/src/db/schema/ai-usage.ts
export const AiUsage = pgTable(
'AiUsage',
{
id: text().$defaultFn(() => createId()).primaryKey().notNull(),
createdAt: timestamp({ precision: 3 }).defaultNow().notNull(),
organizationId: text().notNull()
.references(() => Organization.id, { onDelete: 'cascade' }),
userId: text().references(() => User.id, { onDelete: 'set null' }),
// What was called
provider: text().notNull(),
model: text().notNull(),
modelType: text().default('llm').notNull(),
// Token breakdown
inputTokens: integer().default(0).notNull(),
outputTokens: integer().default(0).notNull(),
totalTokens: integer().default(0).notNull(),
cost: doublePrecision(),
// Credit tracking
creditsUsed: integer().default(1),
providerType: providerType('providerType'), // "SYSTEM" | "CUSTOM"
credentialSource: credentialSource('credentialSource'), // detailed source
// Source attribution
source: text(), // "compose" | "workflow" | "dataset" | "chat"
sourceId: text(), // workflow ID, dataset ID, etc.
// Performance
responseTime: integer(),
},
(table) => [
index('AiUsage_organizationId_createdAt_idx').using(
'btree', table.organizationId.asc().nullsLast(), table.createdAt.asc().nullsLast()
),
index('AiUsage_source_idx').using('btree', table.source.asc().nullsLast()),
]
)
Both SYSTEM and CUSTOM invocations are logged. But only SYSTEM invocations deduct credits. CUSTOM usage is recorded for analytics — orgs can see total AI usage regardless of who's paying the provider.
creditsUsed and cost are deliberately different fields. creditsUsed is what the user pays us (always 1 per invocation for SYSTEM, 0 for CUSTOM). cost is what we estimate the provider charges based on token counts and registry pricing.
Every invocation is tagged with where it came from:
| Source | What triggers it |
|---|---|
compose | Reply drafting in the ticket view |
workflow | Workflow automation AI nodes |
dataset | Training dataset generation |
chat | Kopilot copilot conversations |
other | Everything else |
The sourceId points to the specific workflow, dataset, or session. This enables per-feature usage dashboards — "workflows consume 60% of our credits" — which helps orgs decide where to optimize.
Here's what happens when an AI call completes:
// packages/lib/src/ai/usage/usage-tracking-service.ts
async trackUsage(request: UsageTrackingRequest): Promise<void> {
const inputTokens = request.usage.prompt_tokens || 0
const outputTokens = request.usage.completion_tokens || 0
const totalTokens = request.usage.total_tokens || inputTokens + outputTokens
const creditsUsed = request.creditsUsed ?? 1
await this.database.transaction(async (tx) => {
// 1. Insert usage record
await tx.insert(schema.AiUsage).values({
organizationId: request.organizationId,
userId: request.userId,
provider: request.provider,
model: request.model,
modelType: 'llm',
inputTokens,
outputTokens,
totalTokens,
createdAt: request.timestamp || new Date(),
providerType: request.providerType ?? 'CUSTOM',
credentialSource: request.credentialSource ?? 'CUSTOM',
creditsUsed,
source: request.source ?? 'other',
sourceId: request.sourceId ?? null,
})
// 2. Increment quota — SYSTEM only
if (request.providerType === 'SYSTEM') {
await tx
.update(schema.ProviderConfiguration)
.set({
quotaUsed: sql`${schema.ProviderConfiguration.quotaUsed} + ${creditsUsed}`,
})
.where(
and(
eq(schema.ProviderConfiguration.organizationId, request.organizationId),
eq(schema.ProviderConfiguration.provider, request.provider),
eq(schema.ProviderConfiguration.providerType, 'SYSTEM'),
isNotNull(schema.ProviderConfiguration.quotaType)
)
)
}
})
}
Two things worth calling out:
Atomic quota increment. quotaUsed + 1 in SQL, not a read-modify-write in application code. Under concurrent AI calls, this prevents the classic "two calls read 99, both write 100" race condition.
Single transaction. The usage record and quota increment are in the same transaction. If the usage insert succeeds but the quota update fails, neither persists. No orphaned records.
For workflows that make multiple AI calls in sequence, there's a batch variant that aggregates by provider+model before inserting:
async trackUsageBatch(requests: UsageTrackingRequest[]): Promise<void> {
// Aggregate entries by provider:model into single rows
const grouped = new Map<string, { inputTokens, outputTokens, creditsUsed, ref }>()
for (const req of requests) {
const key = `${req.provider}:${req.model}`
const existing = grouped.get(key)
if (existing) {
existing.inputTokens += req.usage.prompt_tokens || 0
existing.outputTokens += req.usage.completion_tokens || 0
existing.creditsUsed += req.creditsUsed ?? 1
} else {
grouped.set(key, { /* ... */ })
}
}
// Single multi-row INSERT + quota updates in one transaction
await this.database.transaction(async (tx) => {
await tx.insert(schema.AiUsage).values(rows)
// Deduct credits for SYSTEM providers
for (const row of systemRows) {
await tx.update(schema.ProviderConfiguration).set({
quotaUsed: sql`${schema.ProviderConfiguration.quotaUsed} + ${row.creditsUsed}`,
}).where(/* ... */)
}
})
}
A workflow running 5 AI steps against the same model produces 1 aggregated usage row instead of 5. Fewer rows, same credit deduction.
The LLMOrchestrator is the single entry point for all AI calls in the product. It wires together four concerns:
┌──────────────────────────────────────────┐
│ LLM Orchestrator │
│ │
│ 1. Quota Guard (can we call?) │
│ 2. Credential Routing (which key?) │
│ 3. Provider Client (make the call) │
│ 4. Usage Tracking (record it) │
└──────────────────────────────────────────┘
Here's the actual invoke method, condensed to the important parts:
// packages/lib/src/ai/orchestrator/llm-orchestrator.ts
async invoke(request: LLMInvocationRequest): Promise<LLMInvocationResponse> {
const { model, provider, messages, organizationId, userId, context } = request
// 1. Quota guard — fail fast before making an API call
if (this.config.enableQuotaEnforcement && this.db) {
const guard = await createUsageGuard(this.db)
if (guard) {
const usageResult = await guard.consume(organizationId, 'aiCompletions', { userId })
if (!usageResult.allowed) {
throw new UsageLimitError({
metric: 'aiCompletions',
current: usageResult.current ?? 0,
limit: usageResult.limit ?? 0,
message: 'You have reached your monthly AI usage limit. Upgrade your plan for more AI completions.',
})
}
}
}
// 2. Credential routing — resolve the right key (from Part 1)
const { client: llmClient, providerType, credentialSource } =
await this.getClientWithMetadata(provider, model, organizationId, userId)
// 3. Make the call
const response = await llmClient.invoke(invokeParams)
// 4. Track usage — deduct credits if SYSTEM
if (this.config.enableUsageTracking && this.usageService && response.usage) {
const source = (context?.source as UsageSource) ?? 'other'
const sourceId = context?.workflowId ?? context?.datasetId ?? context?.sessionId
await this.usageService.trackUsage({
organizationId,
userId,
provider,
model,
usage: response.usage,
providerType,
credentialSource,
creditsUsed: 1,
source,
sourceId,
})
}
return { ...response, provider }
}
Notice how providerType and credentialSource flow through the entire invocation:
async getClientWithMetadata(provider, model, organizationId, userId) {
const providerManager = new ProviderManager(this.db!, organizationId, userId)
// Resolve credentials (this runs the fallback chain from Part 1)
const credentials = await providerManager.getCurrentCredentials(
provider, model, ModelType.LLM, false
)
// Create the provider client
const providerClient = await ProviderRegistry.createClient(provider, organizationId, userId)
const llmClient = providerClient.getClient(ModelType.LLM, credentials.credentials)
return {
client: llmClient,
providerType: credentials.providerType || 'CUSTOM',
credentialSource: credentials.credentialSource || 'CUSTOM',
}
}
The metadata isn't inferred after the fact. It's determined during credential resolution and carried through to usage tracking. The usage record knows exactly which credential type was used because that information traveled with the request.
The streaming variant (streamInvoke) follows the same pattern — quota check before streaming starts, usage tracking after the stream completes:
async *streamInvoke(request: LLMInvocationRequest): AsyncGenerator<LLMStreamChunk, LLMInvocationResponse> {
// Quota guard before streaming
if (this.config.enableQuotaEnforcement && this.db) {
const guard = await createUsageGuard(this.db)
if (guard) {
const usageResult = await guard.consume(organizationId, 'aiCompletions', { userId })
if (!usageResult.allowed) throw new UsageLimitError(/* ... */)
}
}
const { client: llmClient, providerType, credentialSource } =
await this.getClientWithMetadata(provider, model, organizationId, userId)
// Stream chunks to caller
const streamResult = llmClient.streamInvoke(invokeParams)
while (true) {
const { value: chunk, done } = await streamResult.next()
if (done) break
yield chunk
}
// Return final response with metadata
return { ...finalResponse, providerType, credentialSource }
}
Every feature that uses AI — compose, workflows, datasets, copilot chat — goes through this orchestrator. No feature can accidentally bypass quota checks or use the wrong credentials.
Multiple concurrent AI calls could all pass the quota guard before any of them track usage. We accept slight over-usage rather than adding pessimistic locking. The guard is a soft limit — similar to how API rate limiters work. An org at 999/1000 credits that fires 3 concurrent calls will end up at 1002/1000, not get two calls rejected.
If the LLM call fails after passing the quota guard, no credit is deducted. The trackUsage call happens after a successful response. Users don't pay for errors.
quotaLimit takes effect immediately. quotaUsed is preserved. If you've used 800 of 1000 Starter credits and upgrade to Growth (5000), you have 4200 remaining.invoice.paid event from Stripe. quotaUsed goes to 0, new period dates set.If an org removes their custom API keys, they switch to SYSTEM with whatever credits remain. Previous CUSTOM usage doesn't count against SYSTEM credits — they're tracked with different providerType values.
The AiUsage table powers several analytics views via the getUsageStatsByPeriod method:
async getUsageStatsByPeriod(
organizationId: string,
options: { days?: number; periodStart?: Date; periodEnd?: Date }
): Promise<UsageStatsByPeriodResponse> {
const results = await this.database
.select({
date: sql<string>`DATE(${schema.AiUsage.createdAt})`.as('date'),
provider: schema.AiUsage.provider,
model: schema.AiUsage.model,
modelType: schema.AiUsage.modelType,
source: schema.AiUsage.source,
sourceId: schema.AiUsage.sourceId,
totalTokens: sum(schema.AiUsage.totalTokens).as('totalTokens'),
runCount: count(schema.AiUsage.id).as('runCount'),
})
.from(schema.AiUsage)
.where(/* org + date range */)
.groupBy(
sql`DATE(${schema.AiUsage.createdAt})`,
schema.AiUsage.provider,
schema.AiUsage.model,
schema.AiUsage.modelType,
schema.AiUsage.source,
schema.AiUsage.sourceId
)
.orderBy(sql`DATE(${schema.AiUsage.createdAt})`)
// Transform into { statisticsByDay, totalUsageForPeriod }
}
This gives orgs:
| Decision | Trade-off | Why we chose it |
|---|---|---|
| 1 credit = 1 invocation | We absorb model cost variance | Predictable billing, simpler UX |
| No credit rollover | Less generous to light users | Simpler accounting, consistent upgrade incentive |
| Track CUSTOM usage too | Storage cost for unmetered calls | Analytics for all orgs, not just credit users |
| Soft quota enforcement | Slight over-usage possible | No pessimistic locking, better latency |
| Atomic SQL increment | Can't do complex deduction | Race-condition-free for 1-credit-per-call |
| Source attribution on every row | More columns, more data | Per-feature dashboards justify the storage |
| Credits deducted post-call | Failed calls don't cost credits | Trust-building — users don't pay for errors |
Here's the full picture across both posts:
Organization decides: platform credits or own API key?
↓
ProviderPreference.preferredType
↓
┌────────────┴────────────┐
↓ ↓
SYSTEM Mode CUSTOM Mode
(Platform Credits) (User's Own Keys)
↓ ↓
Quotas enforced No quota enforcement
Credits tracked Unlimited usage
↓ ↓
One credential Three-tier fallback:
per provider 1. Model-specific key
2. Load-balanced keys
3. Provider-level key
↓ ↓
LLM Orchestrator: guard → route → call → track
↓ ↓
Usage: credits=1 Usage: credits=0
Quota incremented Usage logged (no deduction)
The core insight: by separating what providers exist (registry) from how they're configured (provider configuration) from how they're metered (quota + usage), each concern evolves independently. Adding a new provider doesn't touch billing. Changing credit limits doesn't touch credential routing. And the orchestrator ensures every AI call in the product goes through the same enforcement and tracking path.