MAX Models Now Support Apple Silicon GPUs, Covering All Generations Through M5
Modular expands MAX framework support for Apple silicon GPUs in its latest release. Models for text and image generation now run from M1 to M5.
Modular announced on June 27, 2026 (local time) that its AI/ML framework “MAX” now supports operation on Apple silicon GPUs. With the release of version 26.4 and beyond, text generation LLMs, visual models, and image diffusion models can now run on Apple silicon GPUs covering all generations from M1 to M5. Previously, MAX was limited to programming in the Mojo language or executing basic MAX graphs, but this update establishes an environment where many practical models can run natively.
Supported Hardware and Performance
Characteristics
MAX supports GPUs across all generations of Apple silicon, from M1 to M5. However, due to architectural differences between generations, not all models perform equally on all hardware. Modular’s engineering team has primarily conducted testing on systems with M3 to M5, leaving some combinations on earlier generations like M1 and M2 insufficiently verified. The company has urged users to report any issues on its GitHub Issue Tracker.
Notably, the M5 systems feature a new Neural Accelerator equipped with dedicated matrix multiplication units, which is a key improvement. Modular engineers Preston and Fabio have been focusing on developing kernels optimized for this computational unit, achieving impressive execution speeds during internal testing. While the company has not yet released direct benchmark comparisons with MLX or other frameworks, it claims that practical speeds have been achieved.
Getting Started and Limitations
To try out MAX’s Apple silicon GPU support, users can set up MAX from Modular’s repository and run models via the command line. For instance, a small LLM like Qwen/Qwen3.5-0.8B can generate text directly with the following command:
max generate --model-path=Qwen/Qwen3.5-0.8B --device-memory-utilization 0.5 --max-batch-size 1 --prompt "The sky is blue because"
Similarly, to expose an endpoint as a server, users can use the max serve command. Noteworthy features include the memory limitation flags --device-memory-utilization and --max-batch-size. Since Apple silicon uses shared memory between CPU and GPU, these features are designed to prevent unnecessary memory allocation and ensure stable system operation.
If the system has more than 15GB of free RAM, a 4-billion-parameter image generation model, FLUX.2 [klein], can also be executed locally. Modular provides functionalities such as 256x256 pixel image generation using the simple_offline_generation sample and serving responses via an Open Responses endpoint.
Future Optimizations and Development Roadmap
Modular is continuing to enhance model coverage and performance. It acknowledges the possibility of temporary performance drops in nightly builds due to kernel tuning and model support updates. There are still many areas for optimization, particularly for generations prior to M5, and improvements are expected to be rolled out incrementally in future updates.
The company has encouraged users to “follow the latest improvements in nightly builds,” demonstrating its commitment to continuous development.
Editorial Opinion
This development marks a significant milestone in positioning Apple silicon-equipped Macs as practical platforms for AI/ML development. The existence of M5’s Neural Accelerator-specific kernels showcases a convergence of Apple’s vision for on-chip AI processing and Modular’s technical strategy.
In the short term, this broadens the environment for Mac users to experiment with LLMs and image generation models locally, reducing dependence on cloud GPUs and lowering the barrier for prototyping. This is particularly appealing for privacy-conscious organizations that prioritize local inference.
From a long-term perspective, competition between frameworks such as MLX and PyTorch’s MPS backend is likely to intensify. Modular’s integration with the Mojo language and its combination of graph optimization with hardware-specific kernels are key differentiators. However, the GPU memory capacity limitations of Apple silicon (currently maxing out at 192GB) remain a bottleneck for running large-scale models. Overcoming this limitation will be critical for establishing Macs as fully-fledged local AI workstations.
References
- MAX models can now run on Apple silicon GPUs - Modular Forum — Published on June 27, 2026
Frequently Asked Questions
- What is MAX?
- MAX is an AI/ML framework developed by Modular, closely integrated with the company's Mojo programming language. It features a graph-based execution model and supports multiple hardware backends.
- Can MAX run on Apple silicon chips prior to M5?
- Yes, it can run on older generations from M1 to M4, but testing has primarily focused on M3 to M5. Some compatibility issues may arise with models on M1 and M2, and Modular encourages users to report any problems.
- How does MAX differ from MLX?
- MLX is a machine learning framework led by Apple and optimized for Apple silicon. MAX, on the other hand, leverages its integration with the Mojo language and emphasizes graph optimization. Direct benchmark comparisons between the two frameworks have not yet been published.
Comments