Gadgets

Qualcomm Introduces AI Accelerator "HBC" with Compute Units Under DRAM

Qualcomm unveils its new "High-Bandwidth Compute (HBC)" AI accelerator architecture for data centers, promising 768GB of memory and 133TB/s bandwidth.

4 min read Reviewed & edited by the SINGULISM Editorial Team

Qualcomm Introduces AI Accelerator "HBC" with Compute Units Under DRAM
Photo by Brecht Corbeel on Unsplash

Qualcomm has unveiled a new architecture as part of its efforts to regain ground in the data center AI infrastructure sector. During an investor presentation held on June 30, 2026, the company announced its near-memory computing technology “High-Bandwidth Compute (HBC),” which places compute units beneath DRAM layers. The Register reported on the development.

This technology is expected to be integrated into the AI250 series Dragonfly rack systems and is seen as a strategic move for Qualcomm to make a significant entry into the data center AI accelerator market, which has been dominated by NVIDIA and AMD.

Overview of the HBC Architecture

The core of HBC lies in directly integrating parts of the XPU (accelerator processor) compute circuits below the DRAM stack. According to Qualcomm, this approach allows the retention of the low latency performance characteristic of SRAM while achieving the high-capacity memory typical of High Bandwidth Memory (HBM).

Tony Pialis, Executive Vice President of Data Center at Qualcomm, stated during the investor presentation, “We provide all the performance benefits of SRAM while achieving the density and memory capacity of HBM stacks.”

This architecture forms a single integrated compute-memory module. Unlike traditional GPU architectures, where compute chips and HBM are packaged separately, HBC physically integrates memory and compute functions. This integration aims to ease the “memory wall” bottleneck associated with data transfer.

Performance Metrics of the AI250

According to Qualcomm, the AI250 accelerator equipped with HBC boasts 768GB of memory per card and an effective memory bandwidth of 133TB/s. By comparison, NVIDIA’s Groq 3 LPU offers 500MB of SRAM and 150TB/s bandwidth. Qualcomm’s focus on large memory capacity is evident, reflecting its design priorities over certain trade-offs like bandwidth.

However, it is important to note that Qualcomm’s claims hinge on the qualifier “effective” memory bandwidth. In the Dragonfly systems based on the AI200 architecture, Qualcomm has similarly claimed a total effective memory bandwidth of 414TB/s across 56 chips. In response to inquiries from The Register, Qualcomm explained that this figure represents the “pure physical bandwidth of the LPDDR interface.” Achieving such bandwidth with LPDDR5x at 8800MT/s would require a 6,720-bit-wide memory bus, yet the company has not disclosed the specifics of this implementation.

Qualcomm also states that the effective bandwidth of the AI250 will be 18 times that of the AI200, and its next-generation AI300 is expected to achieve 54 times the bandwidth. These figures, according to Qualcomm, are due to the unique bandwidth amplification effects of the HBC architecture.

A Turning Point in Data Center Strategy

Qualcomm has long been involved in AI processing through its Snapdragon processors for smartphones, which come equipped with NPUs. However, in the data center domain, the company has had a low profile in comparison to giants like NVIDIA, AMD, and startups like Cerebras. The announcement of HBC signals Qualcomm’s commitment to making a significant impact in the AI infrastructure market.

Following the launch of the AI200-based Dragonfly system in 2026, Qualcomm plans to roll out the AI250 series in 2027. With its unique near-memory computing technology, Qualcomm aims to disrupt the GPU-centric existing market structure.

However, the true value of HBC depends on real-world workload performance and power efficiency. As the term “effective bandwidth” suggests, Qualcomm’s claims may be based on theoretical peak values. Until practical inference performance and power consumption metrics for the entire system are made public, the market’s evaluation will likely remain cautious.

Editorial Opinion

In the short term, Qualcomm’s HBC announcement has the potential to introduce a new option in a data center AI accelerator market dominated by NVIDIA and AMD. Particularly for inference phases of large language models (LLMs), the large memory capacity could offer a significant advantage. However, by the time Qualcomm launches its product in 2027, NVIDIA is likely to have introduced its post-Blackwell architecture, intensifying competition further.

To compete on equal footing with GPU vendors, Qualcomm will need to demonstrate not only superior memory capacity but also advantages in software ecosystems and total cost of ownership (TCO). At this stage, the vague definition of “effective bandwidth” calls for independent benchmarking to substantiate Qualcomm’s claims. From the editorial team’s perspective, the first critical test of this architecture’s reliability will be whether Qualcomm can provide third-party testing environments early on.

In the long term, approaches like HBC’s near-memory computing could become a practical solution to improve overall system performance, especially as the limits of semiconductor process miniaturization become more apparent.

References

Frequently Asked Questions

What is the standout feature of the HBC architecture?
HBC employs a near-memory computing approach by placing part of the accelerator's compute circuits beneath the DRAM stack. This design is said to combine SRAM-level low latency with HBM-like high memory capacity.
How realistic is the AI250's claimed effective bandwidth of 133TB/s?
Qualcomm describes it as the "pure physical bandwidth of the LPDDR interface." However, achieving such bandwidth would require an extremely wide memory bus, and the company has not disclosed detailed implementation methods. Currently, these figures appear to be theoretical and require real-world validation.
How does the AI250 compare to competing products from NVIDIA and AMD?
While NVIDIA's Groq 3 LPU offers 500MB of SRAM and 150TB/s bandwidth, the AI250 emphasizes its advantage in memory capacity with 768GB and claims 133TB/s bandwidth. Although it lags behind in bandwidth, its large memory capacity could be advantageous for inference workloads that demand more memory.
Source: The Register

Comments

← Back to Home