Track AI Spending Across Providers: 2026 Guide

If you use one AI provider, billing is annoying. If you use three or four, billing becomes a part-time job. A Claude subscription here, an OpenAI API key there, a Gemini quota tied to a Google Workspace seat, a Cursor plan that bills you separately from the underlying model calls it makes. By the end of the month, no single dashboard tells you what you actually spent on “AI”, and the surprises always land on the wrong side of zero.

This is a practical guide to getting that number under control. Not a vendor pitch, not a theoretical framework: the dashboards that exist today, what they hide, and what you have to log yourself to get a real picture.

Why provider dashboards are not enough

Every major AI provider ships a usage page. Anthropic has one in the Console, OpenAI has one under Settings, Google Cloud has billing reports, Mistral and DeepSeek each expose a usage view. They all work, in isolation. The problem starts the moment you have more than one.

Three structural issues recur across providers:

Subscriptions and API are billed in different places. Claude Pro and Max sit on a Stripe-style consumer billing flow. The Anthropic API sits on the Console and uses prepaid credits or postpaid invoicing. They do not aggregate. The same is true at OpenAI: ChatGPT Plus, Team, and Pro live in one billing portal, while platform.openai.com bills the API in another.
Dashboards lag. Most provider dashboards update within minutes for spend, but tokens and request counts can lag by a full day. If you push out a buggy agent on a Friday afternoon, you may not see the damage until Sunday.
Indirect spend is invisible. Cursor, Windsurf, Zed, Continue, Claude Code via a console subscription, and the dozen other tools that pass your prompts to a model do not all report token usage back to the underlying provider in a way you can read. You see a flat monthly fee, not the underlying call volume.

None of this is a conspiracy. It is just how billing systems grow. But it means the answer to “what did we spend on AI this month” is almost never one query.

The four buckets you actually need to track

Before you pick tools, get the categories right. I split AI spend into four buckets and reconcile them once a week.

Bucket 1: Direct API usage. Raw calls to Anthropic, OpenAI, Google AI Studio or Vertex, Mistral, DeepSeek, Groq, Together, Fireworks, Cerebras, and whoever else you have a key with. This is the bucket that can spike from $40 to $4,000 in a weekend if a loop goes wrong, so it deserves the most instrumentation.

Bucket 2: Consumer and team subscriptions. Claude Pro and Max, ChatGPT Plus, Pro, Team, and Enterprise, Gemini Advanced, Perplexity Pro, GitHub Copilot, Cursor, Windsurf, v0, Replit. Predictable, but they accumulate. The risk is not overspend, it is forgetting which ones you still need.

Bucket 3: Cloud-hosted models. Bedrock, Vertex, Azure OpenAI, and the various inference platforms billed through your existing cloud account. This spend hides inside the cloud bill, often under “AI Platform” or “Machine Learning”, and rarely appears in the same report your finance team uses for AI tools.

Bucket 4: Implicit AI inside SaaS. Notion AI, Linear AI, Intercom Fin, Salesforce Einstein, Atlassian Rovo, the AI add-on inside your CRM, the AI tier of your analytics tool. You did not buy “an AI subscription”, you bought a feature, but you are paying for inference somewhere upstream. This bucket is usually the smallest per-seat but the hardest to audit.

If you only track bucket 1, you will undercount by a wide margin. If you only track buckets 1 and 2, you still miss the cloud bill, which for any team running production agents is often the largest line.

What each provider’s dashboard actually gives you

A short tour of what is and is not in the box, as of mid-2026.

Anthropic Console. Per-key usage with daily granularity, broken down by model (Sonnet, Opus, Haiku, and the variants of each). Costs are shown in USD, with input, output, cache write, and cache read tokens separated. You can set monthly spend limits per workspace and per key. Email alerts fire at configurable thresholds. The export is CSV per month, accessed via the billing tab. There is no built-in cross-workspace report, so a parent company with several workspaces has to stitch them together.

OpenAI Platform. Daily usage and cost by model, broken down by project. Projects are the right unit of cost allocation: one per team or product. Hard limits and soft limits are configurable per project. The usage export is JSON or CSV. Image and audio usage are shown separately from text, which is helpful since those are billed on different units. Fine-tuning and batch are also separated.

Google. This is the messiest one. Gemini consumer subscriptions sit in Google One. The Gemini API via AI Studio bills through a Google Cloud project. Vertex AI bills through the same Cloud project but under different SKUs. If your finance team treats the entire Cloud bill as “infra”, AI spend gets buried. Use labels aggressively, and pull a billing export to BigQuery if the number ever gets material.

Mistral, DeepSeek, Groq, Together, Fireworks, Cerebras. All ship a usage page, all expose a usage endpoint or a CSV export, none of them coordinate with each other. Budgets and alerts vary widely. DeepSeek and Groq are the easiest to overspend on because per-token prices are low enough to feel free, and a high-throughput agent can still rack up a meaningful bill.

Subscription tools. Cursor, Windsurf, Claude Code, Copilot, and similar tools show you a usage meter (requests, premium requests, fast vs slow, however they choose to brand it). What they almost never show is the dollar value of the model calls they are making on your behalf. You pay the subscription, you get the meter, end of story. This is fine as long as the subscription cap holds. The moment you hit usage-based overage, you need to read the fine print on how each “request” maps to dollars.

A minimal tracking setup that scales

You do not need a FinOps platform on day one. You need a spreadsheet, a logger, and a weekly habit. Here is the smallest setup that does not fall apart at month three.

1. One source of truth for subscriptions

A single sheet with one row per subscription: provider, plan, monthly cost, annual cost, renewal date, owner, business purpose, last reviewed. Update it on the first of every month. The point is not the data, the point is the review: if a row has not been touched in three months and nobody remembers why it exists, cancel it.

2. Per-key API logging on your side

Do not trust provider dashboards as your only record. Wrap your API calls (or the SDK initialization) with a thin logger that writes, for every request: timestamp, provider, model, input tokens, output tokens, cached tokens, latency, request ID, and a tag for which project or feature made the call. Append to a local SQLite file or a cloud table. Cost can be computed at query time from a price table you keep updated.

This is fifty lines of code per provider and it pays for itself the first time a dashboard disagrees with reality, or the first time you need to know which feature is driving spend. The provider dashboard tells you that model X cost $812 last week. Your own log tells you that feature Y was responsible for 78% of it.

3. Cloud spend with explicit labels

If you call models through Bedrock, Vertex, or Azure OpenAI, label every resource with the team and product. Then build one saved billing query that filters to AI-related SKUs and groups by label. Run it weekly. The labels matter more than the query, because untagged spend in a multi-team cloud account is impossible to attribute after the fact.

4. A weekly reconciliation

Once a week, fifteen minutes: open each provider dashboard, write the week-to-date number into a sheet, sum the buckets, compare to the same week last month. You are not auditing, you are looking for the line that moved. A single number that doubled week-over-week is a question worth asking, even if the absolute amount is small. That is how you catch a regression while it still costs $50 instead of $5,000.

When to graduate to a tool

A spreadsheet plus a logger handles most teams up to roughly $5,000 to $10,000 a month of AI spend. Past that, the failure mode changes. You stop missing money and start missing patterns: which user, which prompt template, which time of day, which downstream feature.

That is where a dedicated tracker earns its keep. It can be an internal dashboard built on top of the logs you are already writing. It can be a third-party tool that ingests provider exports. Or it can be a workspace like tokenkarma, which is built specifically around the case of one user or one team holding several subscriptions and several API keys at once. The goal in any case is the same: one screen that answers “what did we spend, on what, this week” without you opening four dashboards.

A few features to look for, in rough order of usefulness:

Multi-provider ingestion, ideally via official exports rather than scraped HTML.
Per-feature or per-project attribution, not just per-key.
Forecasting that uses your actual run rate, not a flat extrapolation of the current month.
Anomaly alerts on rate of change, not only on absolute thresholds. A 10x jump from $5 to $50 a day is the early warning. A static $500 threshold is the smoke alarm after the fire.

Common mistakes that cost real money

A few patterns I see repeatedly, in no particular order:

Leaving an old key in a forgotten script. A cron job from six months ago, still calling an expensive model, still working perfectly, still billing you. Rotate keys quarterly and pull the list of active keys from each provider. Anything you cannot identify, disable.

Confusing subscription cap with hard cap. Most consumer subscriptions cap usage by throttling or by switching you to a smaller model, not by stopping. Most API accounts can be configured to hard-stop at a dollar amount, but the default is often a soft alert. Check the setting, do not assume.

Counting prepaid credits as spent the day you bought them. Prepaid credits are an asset until they are consumed. If your finance team books them as expense at purchase, your AI cost line will swing wildly month to month for no operational reason. Track credit balance and consumption separately.

Ignoring cache hits. Anthropic, OpenAI, and Google all bill cached input tokens at a fraction of standard input. If your workload reuses long system prompts, the cache pricing can change the economics of a model choice by 50% or more. Your log needs to capture cached tokens as a separate field, or you cannot reason about it.

Treating the cheapest model as free. Haiku, GPT-mini variants, Gemini Flash, DeepSeek V series: per-token prices are small enough that engineers stop thinking about them. Then a retry loop kicks in, or a multi-agent system fans out to a thousand parallel calls, and the bill is suddenly large. Cheap models reward sloppy throughput patterns. Treat them with the same attention you give to expensive ones.

A realistic target

The point of tracking AI spend is not to minimize it. The point is to know what you are getting for it. A team that can say “this feature ships $4,200 a month of inference value and produces $90,000 a month of revenue” has won the conversation. A team that says “I think we spend about $5k on AI, maybe more” has not.

Pick the buckets that apply to you. Set up the logger. Hold the weekly reconciliation. Within a month, you will know your number. Within three, you will know where it goes. From there, every other decision (which provider, which plan, which tool) gets easier, because it stops being a guess.

The tooling is the easy part. The habit is the hard part. Start with the habit.