May 25, 2026

The token bill comes due

The headlines this past week have been hard to miss. Microsoft is reportedly cancelling internal Claude Code licences for tens of thousands of engineers and pushing them onto its own GitHub Copilot CLI. Uber is said to have burned through its entire 2026 AI budget in roughly four months. Meta engineers built an internal leaderboard to rank who consumes the most AI. The pattern even has a name now: "tokenmaxxing" — using AI for the sake of using AI, because someone upstairs set a usage target.

It is tempting to read this as an "AI is failing" story. It is not. It is a procurement story, and that distinction matters a great deal if you run a small or mid-sized business. Nothing about the underlying technology stopped working last week. What changed is that a handful of very large companies discovered, in public and at considerable expense, that a powerful tool deployed without cost discipline behaves exactly like any other powerful tool deployed without cost discipline.

What actually broke

The core issue is the gap between two things that look similar but are not: a chatbot query and an agentic task. Asking a model a question costs a fixed, predictable amount — a few cents, settled in one round trip. An AI agent is a different animal entirely. It reads your codebase, plans a multi-step change, runs commands, inspects the output, corrects itself and iterates, sometimes for an hour at a stretch. A single instruction handed to an agent can consume up to a thousand times more tokens than the equivalent question put to a plain LLM, depending on how many steps it takes to reach an answer.

That would be manageable if usage stayed flat. It has not. Training costs are genuinely falling. Per-token prices are genuinely falling. But the number of tokens a single task consumes has climbed faster than the price has dropped. Goldman Sachs has forecast that agentic AI could drive a 24-fold increase in token consumption by 2030. Gartner has been blunter still: it expects the cost of running a large model to fall sharply by 2030, and yet warns that cheaper tokens will not translate into cheaper enterprise AI, because agentic systems simply burn far more of them per task. When the unit price falls tenfold but consumption rises twenty-four-fold, the invoice goes up. It is not complicated arithmetic.

Economists have a name for this too. The Jevons Paradox describes efficiency gains that increase total consumption rather than reduce it. When steam engines became more fuel-efficient in the nineteenth century, Britain did not burn less coal — it burned far more, because efficiency made the engines worth deploying everywhere. Cheaper, more fuel-efficient aircraft did not reduce aviation fuel use; they made flying affordable enough that demand exploded. AI tokens are on the same trajectory. Each one is getting cheaper, and we are collectively using so many more of them that the total spend keeps rising.

The real failure was management, not technology

Here is the part the "AI cost crisis" framing tends to skip. The companies in trouble are, to a striking degree, the ones that set crude internal usage quotas. When you instruct staff to consume as many tokens as possible, or rank them on a leaderboard by AI usage, you have not built a productivity programme. You have built a metric — and metrics get gamed. This one was gamed immediately and predictably. Employees ran agents on trivial work, kicked off unnecessary autonomous loops, and used AI for tasks that a moment's thought would have finished faster. Set a foolish target and you get foolish results. That is not an AI problem; it is the oldest problem in management, wearing new clothes.

It is worth being precise about this, because the lesson is easy to get wrong. An agentic coding tool used well is one of the best-value tools available to a software business today. The same tool, pointed at make-work and run to satisfy a dashboard, is pure waste. The technology is identical in both cases. The only variable that changed is the incentive sitting on top of it. The giants did not get burned by AI. They got burned by deploying a consumption-priced tool with no cost visibility and an incentive that actively rewarded waste — and then acting surprised when the bill arrived.

There is a deeper irony here. In traditional software development, the more code an engineer writes, the more value the company captures. Under token-based pricing, the more code the AI writes, the faster the bill grows — and that bill is paid to an outside supplier. The economics quietly inverted, and the organisations that kept measuring "activity" as though it were free walked straight into it.

Why this is good news if you are small

Large enterprises have a structural disadvantage in this new environment. Thousands of seats, weak per-team visibility, procurement processes that buy tools centrally and worry about consumption later, and a culture where no individual feels the cost of their own usage. That combination is almost designed to produce runaway spend.

A small business has the opposite profile, and that is an advantage worth using deliberately. You can see exactly what you are spending. The person running the tool is usually the person who feels the invoice. Decisions are fast and reversible. You are not locked into a twelve-month enterprise agreement negotiated before anyone understood agentic pricing. Where a large company needs a committee and a quarter to change course, you need an afternoon.

At our own desk, agentic tools do real work every day — code refactoring, data import pipelines, database optimisation, the genuinely tedious parts of software delivery. The spend stays modest, and not because we ration access. It stays modest because the work is tied to outcomes rather than to a usage figure on a chart. A few principles travel well to any small or mid-sized business thinking about this:

Measure cost per outcome, not cost per token. "We spent forty dollars and shipped a feature that would otherwise have taken a developer a full day" is a healthy sentence. "AI usage is up three hundred percent this month" is not a sentence about value at all — it tells you nothing about whether anything worth having got built.

Match the model to the task. Not every job needs a frontier reasoning model running an hour-long agent loop. A great deal of routine work can be routed to smaller, cheaper models, with the expensive agents reserved for genuinely hard problems. Most of the available savings live in this single decision.

Watch the meter. Open-source tooling now exists to track agentic spend in real time and to apply aggressive prompt caching so you are not paying repeatedly to resend the same context. Visibility alone changes behaviour — it is very hard to manage a number nobody ever looks at.

Never reward usage for its own sake. If you are introducing AI tools to a team, tie them to delivered work, never to a consumption score. The moment "did you use the AI today" becomes the question being asked, you have faithfully recreated the exact problem currently embarrassing the largest companies in the world.

The takeaway

AI agents are not too expensive. Unmanaged AI agents are too expensive — and that is a solvable problem, not a verdict on the technology. The distinction is the whole point. The firms making headlines this month did not discover that agentic AI does not work. They discovered that buying it like a flat-rate subscription, while measuring success by how much of it people consumed, produces a very large bill and very little to show for it.

For a small business willing to be deliberate, this is genuinely an opportunity. The giants are publicly relearning a lesson you can simply start out with: spend on results, not on activity. Used that way, an AI agent is one of the highest-leverage things a small team can put to work. If you would like help working out where agentic AI honestly pays for itself in your operations — and, just as importantly, where it does not — that is exactly the kind of problem we enjoy.

The token bill comes due

What actually broke

The real failure was management, not technology

Why this is good news if you are small

The takeaway

Recent Blogs

How 13 Words on Reddit Can Hijack ChatGPT and Gemini

The Fake AI Skill That Passed Every Scanner

Span XFRA puts an AI data center beside your home

Enough talk, let’s get to work

Links

Services

Contact Details