Token Diseconomy: The Cost Challenges of AI Agents Come to Light
Microsoft withdraws Claude Code license amid skyrocketing token costs and underwhelming output. A wake-up call for AI cost management in businesses.
According to reports by certain media outlets, Microsoft has decided to withdraw the Claude Code license for internal use. Claude Code, an AI programming tool provided by Anthropic, gained rapid popularity within Microsoft, becoming one of the most widely used development assistance software in just six months. However, this popularity came with a sharp rise in token consumption, leading to soaring costs, while the quality of outputs failed to meet expectations. After careful consideration, Microsoft hit the brakes and directed employees toward its in-house Copilot CLI.
This phenomenon is not limited to Microsoft. An analytical article titled “Token Diseconomy,” published by Tencent Research Institute, revealed that Uber exhausted its entire AI programming tool budget for 2026 within just four months. Amazon employees were found to be consuming tokens unnecessarily, while Meta eliminated its internal Tokenmaxxing ranking for employees to discourage unproductive token usage. While businesses are embracing AI, the correct approach remains elusive, with companies emphasizing AI-native strategies but failing to realize tangible benefits. The mounting invoices are indicative of what can aptly be termed “Token Diseconomy.”
Token Diseconomy stems from a combination of factors such as inadequate internal management controls, diminishing returns from token utilization, and inefficiencies in agent architecture design (e.g., redundant skill calls, internal frictions in long-term tasks, and high coordination costs in multi-agent systems). These factors collectively exacerbate the issue.
Rising Token Market Prices
Currently, the token market is experiencing an overall upward trend in pricing. High-end pricing has become fixed, mid-range prices are rising in both quantity and cost, and economy-class tiers are also following suit. Anthropic has leveraged its coding expertise to establish the industry’s strongest pricing authority. Between late 2024 and May 2026, its annual recurring revenue (ARR) surged from approximately $1 billion to $45 billion.
Since the launch of its Claude 3 series in early 2024, Anthropic has adopted a multi-tiered product strategy with flagship, mid-range, and lightweight offerings, enabling hierarchical pricing. The Opus series is priced at $15/$75 (per million tokens for input/output) for high-end market stability, while the Sonnet series ($3/$15) offers high cost-performance for daily coding and office tasks. The Haiku series ($1/$5) targets lightweight, fast interaction scenarios. The introduction of the Mythos Preview ($25/$125) added an ultra-high-end tier, followed by the Fable 5 ($10/$50), which caters to broader markets. This results in a three-dimensional pricing strategy based on capabilities, risks, and costs.
Even in the economy-class token market, prices are quietly rising. Haiku 4.5 is priced 20% higher than Haiku 3.5, while the output cost for Gemini 2.5 Flash per million tokens is over six times higher than Gemini 2.0 Flash. Open-source models have also generally seen price increases. The core reason lies in the explosive growth in economy-class token consumption, shifting competitive logic from price comparison to cost-performance evaluation.
Structural Inefficiencies in Agent Architecture
Token consumption involves multiple dimensions of hidden technical inefficiencies, leading to exponential losses.
- Context Trap: Agents repeatedly bring in historical information. In the code review phase of the ChatDev framework, approximately 40% of tokens are spent on transmitting existing information rather than generating new content.
- Tokenizer Black Box: Closed-source model tokenizers often inflate token counts. Anthropic Opus 4.7’s new tokenizer resulted in a 47% increase in average tokens for technical documents and a staggering 201% increase in high-resolution images.
- Redundant Skill Calls: Around 79.6% of public software engineering skills fail to improve efficiency, with token costs increasing by up to 451%, while average efficiency gains are just 1.2 percentage points. Blind skill invocation merely adds to costs.
- Multi-Agent Communication and Entropy Tax: More than half of tokens are consumed in internal adjustments and self-corrections. As systems grow more complex, entropy costs increase rapidly.
Limitations on Effective Token Applications
Current pure language model paradigms exhibit structural gaps when applied to the real world, restricting token usage to a few highly digitized scenarios. Programming is a general-use exception, enabling rapid iteration via automated feedback loops—a condition not applicable to most other scenarios. Legal AI, for example, can only perform initial reviews, requiring veteran lawyers to reconfirm findings, almost as if starting from scratch, with feedback costs far exceeding those in programming contexts.
Expanding into the physical world presents further challenges, such as asymmetric verification costs. Embodied intelligences like humanoid robots remain constrained by gaps between virtual and real environments. OpenAI’s early Dactyl project, despite completing extensive simulation training, lacked robustness in real-world environments, with verification costs several orders of magnitude higher than those in virtual settings. Without expanding the effective range of token applications, Token Diseconomy may persist for a long time.
Risk Ripple Effects and Solutions
The risks within the AI industry chain remain unevenly distributed. Upstream hardware companies reap enormous profits, while midstream model makers face deficits and downstream users begin cost control efforts, concentrating risks in the middle of the chain. Some midstream manufacturers cycle funds with upstream partners, creating risk opacity in private credit markets and heightening the danger of simultaneous stock and bond crashes during a bubble burst. Additionally, expanded computational demands excessively consume water and electricity resources, raising electricity costs for residents near computational nodes and suppressing civilian needs.
Solutions must address both supply and demand. On the supply side, the adoption of precision technologies such as semantic context compression, skill simplification optimization, adaptive model routing, and budget-constrained host architectures can help reduce per-token costs. On the demand side, companies must strengthen AI cost management and identify intermediate scenarios in traditional industries where verification costs are reasonable. Ultimately, the industry must transition from showcasing technology to refining it, restoring tokens to the golden standard of ROI, and achieving net revenue improvements by completing tasks with minimal token usage.
Editorial Opinion
In the short term, Microsoft’s decision to withdraw the Claude Code license is likely to influence other major companies. Organizations are expected to adopt stricter evaluations of token consumption and ROI for AI programming tools. Questions remain about the sustainability of Anthropic’s aggressive pricing strategy and the potential risks of customer attrition. Concurrently, model vendors will need to introduce features promoting efficient token usage.
From a long-term perspective, Token Diseconomy serves as a warning for the healthy development of the AI ecosystem. The current structure, where upstream semiconductor companies enjoy massive profits while midstream model makers and downstream users bear the brunt of costs, will inevitably require adjustments. For the industry to expand AI applications beyond programming to other sectors, discovering feedback loop-friendly intermediate domains will be critical. As an editorial team, we urge corporate executives to carefully assess this issue.
References
- Huxiu: “Token Diseconomy” — Published on June 29, 2026
Comments