Earlier
this month, millions of OpenClaw users woke up to a sweeping mandate:
The viral AI agent tool, which this year took the worldwide tech
industry by storm, had been severely restricted by Anthropic.
Anthropic,
like other leading AI labs, was under immense pressure to lessen the
strain on its systems and start turning a profit. So if the users wanted
its Claude AI to power their popular agents, they’d have to start
paying handsomely for the privilege.
“Our
subscriptions weren’t built for the usage patterns of these third-party
tools,” wrote Boris Cherny, head of Claude Code, on X.
“We want to be intentional in managing our growth to continue to serve
our customers sustainably long-term. This change is a step toward that.”
The
announcement was a sign of the times. Investors have poured hundreds of
billions of dollars into companies like OpenAI and Anthropic to help
them scale and build out their compute. Now, they’re expecting returns.
After years of offering cheap or totally free access to advanced AI
systems, the bill is starting to come due — and downstream, users are
beginning to feel the pinch.
Over
the past few years, most top AI labs have introduced new subscription
tiers to court power users. OpenAI and Anthropic shifted their pricing
plans for enterprise. OpenAI introduced in-platform advertisements.
Anthropic, of course, restricted third-party tools.
In
some ways, this is a tale as old as time, and particularly, a clear
echo of the tech boom of the ’10s. Venture capitalists helped startups
subsidize fast growth in all kinds of areas: ride-hailing apps,
e-commerce, takeout and grocery delivery. Once companies cemented their
power, they raised prices, added new revenue streams, and delivered a
return to investors. Or they didn’t — and they crashed and burned.
But
AI companies have gone through more investor money at a faster pace
than any other sector in recent history. AI companies have broken ground
on data centers around the world, dedicating billions of dollars with
promises of better models, lower costs, and AI for everyone. Even
stemming the flow of losses will be difficult — let alone making the
kind of money investors are hoping for. “When you sink trillions of
dollars into data centers, you’re going to expect a return,” said Will
Sommer, a senior director analyst at Gartner, who specializes in
economic forecasting and quantitative modeling.
“When you sink trillions of dollars into data centers, you’re going to expect a return.”
“Is
the era of basically free or close-to-free AI kind of coming to an end
here?” said Mark Riedl, a professor in the Georgia Tech School of
Interactive Computing. “It’s too soon to say for certain, but there are
some signs.”
Gartner’s
Sommer studies long-term economic market trends related to generative
AI, including calculating just how much money is at stake. Between 2024
and 2029, he said, Gartner estimates that capital investment in AI data
centers will reach about $6.3 trillion — a “massive amount of money.”
To
avoid a write-down of these assets, major AI model providers would
ideally generate a return on invested capital (ROIC) of about 25
percent, Sommer said. (That’s about what Amazon, Microsoft, and Google
tend to earn on their overall capital investments.) On the other hand,
if the returns fall below 12 percent, institutional capital loses
interest — there’s better money elsewhere, Sommer said. Below 7 percent,
you’re in write-down territory, which is “an unmitigated disaster for
all of the investors in this technology,” Sommer said.
To
reach that bare minimum of 7 percent, Gartner forecasts that large AI
companies would need to earn cumulatively close to $7 trillion in
AI-driven revenue through 2029, which is close to $2 trillion per year
by the end of the period. In order to achieve “historic returns,” the
providers would need to earn nearly $8.2 trillion in the same period.
OpenAI has already made $600 billion in spending commitments through 2030, the company said in February,
which Sommer says is already a “massive step down” from the $1.4
trillion it had planned before. Based on OpenAI’s revenue forecasts and
potential compound annual growth, Sommer said that even in the best-case
scenario, he predicts that the lab would only hit a fraction of the
overall spend required to hit that 7 percent ROIC.
How
do model providers like OpenAI make this money? By selling access to
what are known as tokens. A token is essentially a unit of data input
that an AI model can understand and process — it could be text, images,
audio, or something else. One token is generally worth about four
characters in the English language — the word “bathroom,” for instance,
would likely be processed as two tokens. One paragraph in English is
generally about 100 tokens, and a 1,500-word essay may be about 2,050
tokens, per an OpenAI estimate.
To hit investors’ revenue expectations, providers would need to process a “mind-bending” number of tokens, Sommer said.
By
most measures, companies’ numbers are already pretty big. Google
announced it was processing 1.3 quadrillion tokens in October, for
instance. If you add all the providers’ estimates up, Sommer said, you
get 100 to 200 quadrillion tokens a year. But to achieve the the $2
trillion in annual spend Gartner calculated, providers would need to be
generating, by conservative estimates, a cumulative 10 sextillion tokens
per year. (To make that slightly less abstract, a quadrillion has 15
zeros, and a sextillion has 21.) Even assuming a very generous profit
margin of 10 percent per token, that would mean that token consumption
between now and 2030 would need to grow by 50,000–100,000x.
To hit investors’ revenue expectations, providers would need to process a “mind-bending” number of tokens
Right
now, constantly seeking more data centers and strapped for compute,
companies aren’t capable of processing this many tokens. Even if they
could, they’d face a problem: they’re likely taking a loss on them.
Sommer estimates that if you only account for the direct cost of
infrastructure and electricity, “every company is making very reasonable
margins on every token.” But that margin is probably tighter or
nonexistent with newer, more token-hungry models. And it’s eaten up
completely by indirect operation costs, like building out more compute
and the “ungodly” expense of constantly training the next big model.
“As
soon as you then add all of the infrastructure that needs to be built
for the next generation of model, and you look at how these models are
going to scale, it becomes increasingly untenable,” Sommer said.
Sommer
predicts that many companies “won’t be able to sustain their burn
rate,” and says market consolidation is virtually inevitable — in his
eyes, no more than two large language model providers in any regional
market will survive. And the era where nearly every service has a fairly
generous unpaid tier probably isn’t going to last.
“For
the [labs] that have a lot of users that were free, I think the
question was never really if you’d monetize the free tier but it was
when, and how badly do you do it,” Jay Madheswaran, cofounder of legal
AI startup Eve, which is a client of both OpenAI and Anthropic, told The Verge.
Even
if you do find a way to square the math, building customer loyalty can
be just as complicated. Top labs are constantly leapfrogging each other
on model debuts, feature releases, strategy shifts, hiring
announcements, and more. It can be tough to stay on top long enough to
corner any part of the market — engineers and developers are famous for
switching which model they’re using on any given day, and it’s easy to
do so.
So
labs are increasingly emphasizing the importance of locking users into
their platform and tools. Anthropic, which primarily builds for
enterprise clients, has been going all in on its coding efforts,
and OpenAI has recently pledged to mirror Anthropic’s focus on coding
and enterprise, ahead of both companies reportedly racing each other to
IPO by the end of 2026.
For
now, that competition is benefiting end users. “It’s an arms race where
you cannot let up at all because the switching cost is zero,” said
Soham Mazumdar, cofounder and CEO of Wisdom AI, adding, “As a common
man, I’m going to be the winner longer-term.”
In
the early days of AI, the bulk of compute costs went to training
initial models, while inference (or performing tasks) was cheaper. As
models have advanced and systems have added features, however, inference
has gotten far more resource-intensive. AI agents, or tools that
ideally can complete complex, multistep tasks on your behalf without
constant hand-holding, now use vastly more tokens than the basic chatbot
models did a few years back.
Reasoning
models, which increasingly power AI agents, are notoriously expensive
on the inference side as well, said Georgia Tech’s Riedl. These agents —
such as popular open-source platform OpenClaw — are typically more
efficient and effective than ones without reasoning, but they also
expend far more tokens doing behind-the-scenes work the end user may not
see. That may look like “thinking through” a lot of different potential
paths, launching sub-agents to do portions of a task, or verifying the
accuracy of different steps of the process.
“You
put in your one-sentence prompt… and it’ll talk out loud to itself for
thousands and thousands of tokens, thousands and thousands of words,
maybe even tens of thousands when you get into coding,” Riedl said,
adding, “If you have thousands or millions of people using these things
every single day, the inference costs of just the users generating tons
and tons of tokens all the time really outweighs the training side of
things.” If model providers were making a straightforward profit on all
these tokens and had the compute to handle them easily, that wouldn’t be
a problem for them — but as things stand, it’s a strain.
“The use cases have exploded, and we’re out of capacity.”
“Anybody
who was building agents in the past couple of years sort of saw this
coming,” said Aaron Levie, CEO of Box, adding, “The use cases have
exploded, and we’re out of capacity.”
Top AI labs have recently changed their policies on API usage and third-party tools — like Anthropic essentially banning
the use of OpenClaw unless subscribers pay extra — due to the extra
strain. “You’ve got these tools that are basically just sitting as
background processors on everyone’s laptops and desktops, just
continuously waking themselves up, generating some tokens, doing some
stuff, and putting themselves back to sleep,” says Riedl.
And
no matter what you’re doing with a reasoning-model-powered AI agent,
there are likely going to be wasted tokens — meaning times that an AI
model goes down a non-useful path and then backtracks, or checks on how
something is going but doesn’t change anything, or even pauses to write
itself a poem. In an era where labs are likely losing money on some
tokens and companies are strapped for compute, the industry is trying to
reduce wasted tokens and build more focused and targeted models.
Although
it may be good for both paying customers and AI labs alike to make
models use fewer tokens, it ironically works against the mission of
massively increasing token usage. As Gartner’s Sommer puts it, pricing
models may change significantly down the line, but right now, there’s a
“narrow space on the treadmill” between short- and long-term goals.
Add
this all up, and big AI companies are at a transition point: they’ve
attracted huge numbers of users by offering free access, and now they
need to keep those users while charging a lot more. “On one hand, they
want to see more tokens being generated but they have to either suck up
the costs, which they can sort of do as long as venture capital is
flowing, or pass the costs back on to [customers],” Riedl said. “Maybe
the economics are a little upside down right now.”
These days, OpenAI and Anthropic are often weighing
the advantages of older flat-rate subscription plans and ones with
metered fees. Both companies’ enterprise plans are now token-based,
since usership is “uneven,” as Andrew Filev, founder of Zencoder, called
it — one person may use it once or twice a week for a few minutes,
while another is running five agents in the background around the clock.
For consumer chatbots, some monetization is taking the form of advertising
In
consumer chatbots, some model makers are trying to mitigate this with
advertising. OpenAI recently introduced ads within ChatGPT, which show
up as a separate sidebar, and it’s reportedly working on a tool to track how well those ads work. (Anthropic famously decried the move in its 2026 Super Bowl ads.)
But
for companies that build tools on top of models like GPT-5 or Claude
Opus, the price of tokens is going up, and the extra cost is largely
trickling down to their customers. Multiple tech companies The Verge
spoke with said they, or their customers, are changing strategies to
offset the new pricing. Some are considering moving fully or partially
to open-source models, and some are using considerable time and
resources to evaluate how expensive high-end models perform on certain
tasks compared to cheaper alternatives.
David
DeSanto, CEO of software company Anaconda, recently returned from a
five-week trip around the world speaking to customers. He said that many
were moving to self-host AI models — deploying their own within Amazon
Bedrock or Google’s Vertex AI to have more control over the supply chain
— or changing to open-source or open-weight models for a lot of their
needs, since many such models have significantly improved on benchmarks
as of late. Some companies also worry about the security of sending IP
to a commercial frontier lab, so they only use ChatGPT or Claude models
for “mission-critical applications,” he said.
“Everyone
I spoke to had some version of this problem — their token usage has
gone up, so their usage-based billing cost has gone up, or the tier they
were on no longer has the same cap, and now they’re having to go to a
more expensive tier to try to keep the same amount of usage per month as
part of their flat rate,” DeSanto said.
Eve,
a company that sells software to plaintiff lawyers, is constantly
balancing quality and token costs, Madheswaran said — especially since
Eve’s token usage has gone up 100x year-over-year to date. So it’s
always switching between open-source models and varying ones from
Anthropic and OpenAI.
But
even a 1 percent regression in quality of output negatively impacts
Eve’s customers “quite significantly,” Madheswaran said, which is why
Eve spends a lot of internal resources tracking model quality. The
company typically finds itself using the newer, more expensive reasoning
models about 25–30% of the time, splitting the rest of its usage
between Eve’s own open-source variants and smaller, cheaper models from
leading labs. Madheswaran said the company has found that some cheap
models are just as accurate as expensive ones, depending on the query.
“What
open source is really doing is it’s putting pressure on these companies
to make their cheaper models cheaper because their profit margins there
are much, much better,” Madheswaran said.
“What open source is really doing is it’s putting pressure on these companies to make their cheaper models cheaper.”
Wisdom
AI, which provides AI-powered data analysis, hasn’t had to pass on cost
increases yet. The team is testing out how different models perform on
different types of tasks, and then budgeting accordingly. Mazumdar said
it’s been testing out Cerebras, which is popular for open-weight models,
lately, “in anticipation of how expensive things will get” from the
premier labs like OpenAI and Anthropic. “[Big AI companies] have been
giving this away for free,” Mazumdar said. “What they’re trying to do
is, the moment they sense there’s an enterprise at play, or there’s
propensity to pay, they absolutely jack up the prices drastically.”
But
he said there’s always a cost, especially on the coding front. “The
reality is this: If you’re doing coding of any kind, then the
open-source models simply don’t come close, and that’s the unfortunate
reality of where we are today,” he said.
Box’s
Levie believes the changes will play out over the next 24 months. He
said the VC subsidized era of AI was likely necessary for growth — after
all, if two companies with largely equal products are competing for the
same customers, and one is offering a (subsidized) product at a lower
price, the latter will obviously win out, at least in the short term.
But now it’s time to build more efficiency into the system, and not
everyone is going to survive it.
“The
size of the market is so large that I think it actually will sort of
all work out,” Levie said. “At an individual company level, you have to
decide: Can you keep up with this flywheel, or are you going to be
priced out based on an inability to raise capital or an inability to
make the model more efficient for your tasks?”
Eve’s
Madheswaran thinks the industry will soon move from focusing on the
so-called “best” model to what works the best for a business’s
personalized, niche use cases. “That’s my guess, and obviously I’m
betting our entire company on it.”
Gartner’s
Sommer likens the whole scenario to what he called the “stegosaurus
paradox.” When scientists first discovered the stegosaurus fossil, he
said, they didn’t understand how a large body could be supported by such
a small head with a tiny mouth — and the theory they developed was that
the stegosaurus would need to constantly be eating, and eating a highly
nutritious diet.
“We
see AI as kind of being the same deal,” Sommer said — for the
stegosaurus (AI labs) to survive, then providers need to find more food
for it (the entire global economy, not just the tech market) and it has
to be highly nutritious, too (i.e., providers need to be able to earn a
margin from it and stop subsidizing). If the stegosaurus paradox isn’t
resolved, and the mouth is “too small for the body,” he said, it will
lead to write-downs, falling valuations, dried-up financing, and a broad
resetting of expectations for AI worldwide. Therefore, Sommer said, a
sustainable business model “would require that genAI be infused in
everything from billboards to checkout kiosks,” with providers taking a
cut of all of those transactions.
“The
free era was really a land grab — it’s a common strategy used by
startups,” said Eve’s Madheswaran. “That’s just not a business model.
You can’t do that for too long.”