Token Factory Sees 20-Fold Growth in Six Months Amid AI Inference Boom
Spurred by surging demand for AI inference, China's startup Infinigence AI has achieved over 20-fold growth in its Token Factory business within six months. Its neutral infrastructure model, optimizing between chips and models, has created a virtuous cycle.
The narrative of the AI industry has shifted dramatically from training to inference. By 2026, global enterprise investments in inference infrastructure are projected to reach $68 billion, surpassing the $45 billion expected for training infrastructure, according to international data agencies. This pivotal moment, where the inference market outpaces the training market, is reshaping the structure of the industry itself.
According to reports from Huxiu, amidst this explosive demand, Chinese startup Infinigence AI (known as 無問芯穹, Wu Wen Xin Qiong in Chinese) has experienced rapid growth with its “Token Factory” business model. The company’s Agentic MaaS (Model as a Service) business saw a more than 20-fold increase in token calls between December 2025 and April 2026, with over 95% of this demand stemming from agents.
The Turning Point: Inference Surpassing Training
Until now, the primary focus of the AI industry has been on training. Those with more GPUs and the capacity to train larger models dominated the market. However, since the end of 2025, the situation has shifted dramatically. As AI evolves beyond simple chatbot interactions to perform complex tasks such as coding, contract reviews, and project follow-ups, the number of tokens consumed has skyrocketed by tens to hundreds of times.
Infinigence AI’s data vividly illustrates this shift. More than 95% of tokens called on its platform originate from agent-related use cases. The explosion of inference demand has shifted the value center of the AI industry downstream, transforming the infrastructure layer—once considered just a pipeline—into the “hub of token production.”
A Neutral Player Bridging Chips and Models
Infinigence AI’s business model is notably unique. The company neither develops its own large-scale language models (LLMs), manufactures chips, nor provides consumer-facing applications. Instead, it operates as a neutral player between chips and models, scheduling, matching, and optimizing computational resources that lag behind demand to efficiently convert them into tokens. In essence, it serves as a hub for token production within the AI industry chain.
Co-founder and CEO Xia Lixue stated in the Huxiu interview, “Those who are truly profiting from AI never flinch at prices; they simply quietly seek ways to reduce costs.” Leveraging its neutrality, the company has positioned itself at the center of collaboration among chip manufacturers, model developers, and application companies.
The Virtuous Cycle of the Value Chain
Infinigence AI operates based on a core formula: “AI productivity = scale of intelligence × token production efficiency × token value conversion.” Currently, the company has achieved a five- to ten-fold improvement in cost performance for LLM scenarios with 1 trillion parameters.
The greater and more diverse the demand, the more room there is to optimize chip and model combinations. This results in the production of more stable, high-cost-performance tokens, which in turn drives further demand. This virtuous cycle—or flywheel effect—is now in motion.
The current pricing model is a pay-per-token system, akin to the cost-per-thousand-impressions (CPM) model in the advertising industry. Costs reduced through technological optimization are directly converted into gross profit and reinvested into research and development. Users can enjoy intuitive value without needing to concern themselves with the underlying hardware. This structure explains the seemingly paradoxical phenomenon where token production costs continue to decrease, while selling prices rise.
Opportunities for Domestic Chips in Prefill
and Decode Separation
LLM inference can be divided into two stages: the computation-intensive prefill stage and the memory-access-intensive decode stage. Each stage has distinct requirements for the chips involved.
Domestic chips have already progressed beyond the “usable or not” question, particularly for the prefill phase, which aligns well with the capabilities of current domestically produced chips and has been implemented in practice. According to Huxiu’s report, even without the narrative of domestic chip development, the global supply of AI computational resources is insufficient. Infinigence AI’s core business revolves around addressing this resource shortage, ensuring its growth logic remains unaffected.
Lower Token Costs to Enable Small AI-Native
Organizations
Xia Lixue predicts that token costs have the potential to decrease by an additional one to two orders of magnitude (10 to 100 times). As these costs drop, we may witness the emergence of numerous small-scale AI-native organizations with just 10 to 20 members. These organizations would integrate humans and AI deeply, achieving productivity levels that far exceed those of traditional teams of similar size.
Such organizations are likely to emerge sooner in industries that have already undergone digital transformation (DX). AI demand continues to outpace supply, far from being a zero-sum game. In this scenario, stakeholders across the industry chain can benefit from creating value, and the paradigm shift may only open up new avenues for growth.
Editorial Opinion
In the short term, competition within the Token Factory model is likely to intensify. If neutral infrastructure players like Infinigence AI continue to succeed, more startups could adopt similar models, and cloud giants might accelerate their efforts at market capture. In the next three to six months, the Japanese market may also witness the emergence of companies specializing in inference infrastructure optimization and revised token pricing models by existing cloud providers.
From a long-term perspective, this business model has the potential to further evolve the division of labor within the AI industry. As the layers of chips, models, infrastructure, and applications become increasingly optimized independently, the barriers to AI adoption will be significantly lowered. In the Asia-Pacific region, particularly, the cost-sensitive engineering culture is expected to drive the development of Token Factories. Within one to three years, a highly liquid market for token-based resource transactions could become commonplace, distinct from the traditional GPU leasing market.
The editorial team is particularly interested in the effectiveness of the technological barriers that neutral infrastructure players like Infinigence AI are establishing. While the optimization of chip and model combinations may appear to be a software engineering challenge, its value will fluctuate in response to changing demand patterns.
References
- Huxiu — Published on 2026-06-19
Frequently Asked Questions
- What is a Token Factory?
- In AI inference, a Token Factory is a business model that acts as a neutral infrastructure layer between chips and models, scheduling and optimizing computational resources to efficiently convert them into tokens. The company does not develop its own chips or models but instead integrates various players in the ecosystem.
- What market changes does Infinigence AI's growth indicate?
- It highlights the shift in the AI industry's value from training to inference. The proliferation of agents has led to an exponential increase in token consumption, signaling an era where investments in inference infrastructure surpass those in training infrastructure.
- What happens if token costs drop further?
- A significant decrease in token costs could lead to the emergence of numerous small-scale, AI-native organizations with 10–20 members. These teams, deeply integrated with AI, are expected to achieve productivity levels far beyond those of traditional teams, potentially transforming industry structures.
Comments