Dev

Alibaba Cloud Completely Redesigns Its Cloud Infrastructure for the Age of AI Agents

Alibaba Cloud integrates chip, cloud, and model technology to prepare for the era of AI agents. A closer look at Agentic Cloud and the latest roadmap for its proprietary chips.

8 min read Reviewed & edited by the SINGULISM Editorial Team

Alibaba Cloud Completely Redesigns Its Cloud Infrastructure for the Age of AI Agents
Photo by Growtika on Unsplash

The Era of AI Agents as the “Clients”

of Cloud Computing At global tech conferences like Google I/O and AWS re:Invent, the dominant theme today is “AI agents.” Intelligent agents are no longer limited to being conversational assistants; they are now evolving into entities capable of autonomously performing tasks around the clock, integrated into search engines, browsers, smartphones, and even smart glasses. This trend raises a fundamental question for cloud providers: traditional cloud infrastructure was designed for human engineers to operate. But in a future where agents become the primary users, will these assumptions remain valid? At the Alibaba Cloud Summit held on May 20, 2026, Alibaba Cloud provided a definitive answer to this question. Their bold strategy involves completely redesigning their cloud infrastructure as a unified package, seamlessly integrating cloud, chips, and AI models. Liu Weiguang, Senior Vice President of Alibaba Cloud, emphasized the inevitability of this transformation, stating, “Once agents reach critical mass, they can work tirelessly around the clock, leading to an infinite demand for AI and cloud services.”

Traditional Cloud vs.

Cloud for the Agent Era: A Fundamental Shift The core of this redesign lies in recognizing that the workload characteristics of AI agents are fundamentally different from those of traditional cloud computing.

In traditional cloud computing, workloads are relatively stable. Companies purchase ECS (Elastic Compute Service) to run websites or databases with predictable traffic and long-term resource utilization. The business model of cloud providers has been centered around leasing resources, with three pillars: computing, storage, and networking. However, the operating mode of AI agents is entirely different. When executing tasks, agents may make dozens of model calls in milliseconds, discard the environment immediately after completing a task, and might not initiate another task for minutes or seconds. This creates highly irregular, bursty workloads with short life cycles that generate massive loads instantaneously and disappear just as quickly. Although agents appear to call upon AI models, they actually require the full stack of AI infrastructure, including sandbox environments to run code, databases to store intermediate states, and networks to access external tools. Each agent task execution triggers the coordinated scheduling of multiple resources—computing, storage, networking, and model inference. The complexity of cloud computing in the agent era is on a completely different scale than before. Liu Weiguang highlighted a fascinating case involving their intelligent agent product launched after the Lunar New Year. In the past, setting up cloud resources required manual intervention from human engineers logging into a console. Now, agents can autonomously activate cloud computing resources directly in the background. “Agents can now complete the setup of cloud computing resources in minutes—a process that used to take humans days,” Liu said. This shift underscores how the primary users of cloud infrastructure are transitioning from human engineers to AI agents.

What Is Agentic Cloud?

To enable agents to fully utilize the cloud, Alibaba Cloud has reengineered its product offerings along three dimensions: “skillification,” “MCP-ization,” and “CLI-ization.” This initiative transforms cloud products into standardized functional modules that agents can use as easily as calling functions. While traditional cloud product consoles are user-friendly for humans, they are meaningless to agents. What agents need are structured capability descriptions and clear calling protocols. Alibaba Cloud has named this framework “Agentic Cloud.” Unlike the “AI Native Cloud,” which focuses on providing elastic and efficient computing resources for training and iterating large-scale AI models, Agentic Cloud delivers a suite of capabilities tailored to the runtime needs of intelligent agents. These include sandbox environments, AI gateways, memory management, security measures, and orchestration governance. In the past, the main goal of cloud providers when adopting AI was to sell computing resources to companies for training and inference. Now, however, Alibaba Cloud aims to transform the very cloud itself into an operating system for intelligent agents—a bold reimagining of the essence of cloud business.

Solidifying the Physical Foundation with

In-House Chips If Agentic Cloud represents the architectural vision, then Alibaba Cloud’s in-house chips provide the physical foundation. At the summit, Alibaba Cloud unveiled its roadmap for proprietary chip development. T-Head, Alibaba’s semiconductor subsidiary, announced its next-generation AI chip, the Hanguang M890. With 144GB of memory and an inter-chip bandwidth of 800GB/s, it boasts three times the performance of its predecessor, the Hanguang 810E. The newly introduced ICN Switch 1.0 connectivity chip can link 128 AI chips into a single super-node server, with P2P latency of less than 150 nanoseconds. T-Head also plans to release the more powerful Hanguang V900 and Hanguang J900 chips within the next two years, advancing at a pace that matches the iterative development of large-scale models. The Hanguang series has already shipped 560,000 units, serving over 400 clients across more than 20 industries, including telecommunications, automotive, and finance. Combined with Alibaba’s other in-house developments, such as the Yitian CPU, Pangu intelligent NICs, and Zhenyue storage controller chips, Alibaba’s chip portfolio has evolved from single-point breakthroughs to comprehensive coverage. Liu Weiguang reiterated the importance of integrating chips, cloud, models, and inference into a unified framework. “The ultimate value delivered to customers is the synergistic effect of seamlessly combined model capabilities, chip capabilities, and cloud capabilities,” Liu said.

The Bailian Inference Platform as a

“Production Workshop” Between the chips and AI models lies the Bailian inference platform, which acts as a “production workshop.” Alibaba Cloud has built a massive GPU resource cluster on Bailian, tailored for agent-specific technological demands. Key features include pooled scheduling to optimize GPU utilization, context caching to eliminate redundant computation in multi-turn conversations and complex tasks, and elastic throughput scheduling to handle peaks and troughs in agent requests. This ensures stability during traffic spikes and prevents resource wastage during downtimes. Another standout feature is the Agentic RL mechanism, which uses reinforcement learning based on real-world execution feedback to continuously refine model performance. This creates a sustainable loop of iterative improvement. Bailian also incorporates robust security governance capabilities, an essential feature in the context of autonomous agents. Without boundary constraints, agents running tasks 24/7 could become uncontrollable. Bailian’s security mechanisms ensure that agents always operate within pre-set authority boundaries.

Qwen3.7-Max Demonstrates 35 Hours of

Autonomous Execution On the model front, Alibaba’s latest release, Qwen3.7-Max, ranked first among domestic models in Arena’s global large-scale model blind test, rivaling top-tier models like GPT, Claude, and Gemini. Even more compelling is a real-world case study: running on the newly launched Hanguang M890 chip, Qwen3.7-Max autonomously worked for 35 hours based solely on task instructions. It independently developed and optimized production-grade AI computing kernels from scratch, achieving a tenfold performance improvement over the official version. There was no human intervention or intermediate guidance—just 35 hours from zero to production-level output. This case exemplifies the synergy between Alibaba’s proprietary chips and its AI models. In just three months, Qwen has released three flagship versions—3.5, 3.6, and 3.7—demonstrating Alibaba’s deliberate acceleration of model evolution to meet the exponential growth in demand for agent-era capabilities. However, this rapid iteration is ultimately constrained by the availability of computational power, underscoring the importance of the integrated chip-cloud-model strategy.

How Alibaba Cloud’s Strategy Differs from

Google’s When compared to Google, Alibaba Cloud’s strategy stands out as unique. Google’s integration of its TPU chips with its Gemini models achieves unparalleled cost-performance efficiency within its proprietary deep learning framework, earning accolades both technically and in financial markets. Similarly, Alibaba’s approach of running in-house models on in-house chips leverages deep software-hardware integration to maximize each chip’s computational potential. However, Alibaba goes a step further by transforming its entire cloud business into a runtime environment for AI agents. While companies like AWS and Microsoft Azure are also reimagining their businesses and infrastructure for the agent era, few cloud providers globally are pursuing Alibaba Cloud’s comprehensive approach of developing and integrating chips, models, and cloud services into a unified offering.

A Need for a New Evaluation Framework Over

the past year, Alibaba Cloud has faced scrutiny for its unprecedented investment in AI infrastructure, with critics questioning whether such massive spending is justified or merely a ploy to boost stock prices through AI hype.

However, the comprehensive vision outlined at this summit suggests that such criticisms may be rooted in an outdated framework. Traditional metrics like market share, growth rates, or comparisons to AWS and Azure fail to capture the transformative changes occurring in cloud computing. If we accept the premise that agents will become the primary users of cloud services, then the very design philosophy of cloud infrastructure must evolve. Addressing agents’ irregular and bursty workload characteristics requires a full-stack redesign, encompassing chips, networks, storage, inference platforms, and security systems. Alibaba Cloud has demonstrated not only the ambition to lead this transformation but also a clear roadmap for doing so. While the ultimate shape of cloud computing in the agent era remains to be seen, the blueprint laid out by Alibaba Cloud could define the next decade of the cloud industry.

Frequently Asked Questions

How does Alibaba Cloud's "Agentic Cloud" differ from traditional cloud services?
Traditional cloud services are designed for human engineers to operate via consoles, whereas Agentic Cloud enables AI agents to directly call cloud resources. This involves transforming cloud products into standardized functional modules through "skillification," "MCP-ization," and "CLI-ization." It also includes features like sandboxes, AI gateways, memory management, security, and governance tailored for agents.
What is the role of Alibaba Cloud's proprietary "Hanguang" chip series?
Developed by Alibaba's semiconductor subsidiary T-Head, the Hanguang series are integrated AI chips for both training and inference. The latest Hanguang M890 offers triple the performance of its predecessor, with plans to release even more powerful versions in the next two years. These chips, alongside Alibaba's CPUs, NICs, and storage controller chips, form a unique, comprehensive data center chip matrix.
How does Qwen3.7-Max compare to other large-scale models?
Qwen3.7-Max ranked first among domestic models in a global blind test and comes close to matching leading models like GPT, Claude, and Gemini. Remarkably, it autonomously completed a 35-hour task on the Hanguang M890 chip, creating and optimizing production-level AI computing kernels with tenfold performance improvement, without any human intervention.
Source: 钛媒体

Comments

← Back to Home