FEX 2607: Major Enhancements for x86 Emulation on ARM
Monthly update for the FEX emulator, version 2607, introduces optimizations for unreleased 256-bit SVE2 hardware, CUDA thunking support, and Unixlib for Proton/WINE.
Valve-supported x86/x86_64 emulator for ARM, “FEX,” has released its monthly feature update, FEX 2607. According to Phoronix, this version includes groundbreaking developments such as optimizations for ARM processors with the yet-to-be-released 256-bit SVE2 instruction set, experimental support for CUDA thunking to enable NVIDIA CUDA applications to run transparently, and the implementation of Unixlib for Proton/WINE.
Early Support for Unreleased CPUs
The most prominent feature of FEX 2607 is its optimization for 256-bit SVE2-enabled ARM processors, which are not yet available in the market. SVE2 (Scalable Vector Extension 2) is an extended instruction set defined in the ARM v9 architecture, notable for its variable vector lengths depending on the implementation.
The FEX development team has worked on implementing x86’s Advanced Vector Extensions (AVX) instructions using this 256-bit SVE2. AVX instructions are a cornerstone of SIMD (Single Instruction Multiple Data) processing in x86 processors, utilized extensively in areas such as video processing, scientific computing, and machine learning. By leveraging 256-bit SVE2, FEX aims to significantly enhance AVX instruction emulation performance.
Although no ARM processors with 256-bit SVE2 have been released yet, it is anticipated that this optimization will benefit Valve’s rumored handheld gaming PC “Steam Frame” and future ARM-based Linux devices. This forward-looking optimization is seen as a strategic move to maximize emulator performance as soon as compatible hardware becomes available.
Improvements to the JIT Compiler
FEX 2607 also introduces several updates and enhancements to its JIT (Just-In-Time) compiler. The JIT compiler is a core component of emulation, responsible for dynamically converting x86 instructions into ARM instructions at runtime. The efficiency and accuracy of this conversion process are directly tied to the overall performance and compatibility of the emulator.
FEX employs a JIT method that caches and reuses converted code to reduce conversion costs. In this release, optimizations have been made to the code generation paths to better handle complex x86 instruction patterns, and the precision of memory access pattern detection has been improved. These advancements are expected to enhance the stability of the emulator, particularly for resource-intensive gaming applications.
First Steps Toward CUDA Thunking
With FEX 2607, support for CUDA thunking—a feature to enable NVIDIA CUDA applications to run on ARM environments—has been initiated. While still in its early stages, this functionality could allow certain dynamically linked x86_64 CUDA software to start running.
Thunking refers to a technology that transparently translates API calls between different architectures. In FEX’s case, the mechanism replaces x86_64 CUDA runtime calls with their ARM-native counterparts, enabling x86_64-compiled CUDA applications to utilize GPUs on ARM systems.
This feature has garnered significant attention due to the advent of systems like NVIDIA’s “DGX Spark” and “GB10,” which integrate ARM-based CPUs with NVIDIA GPUs through the Grace Blackwell architecture. Should FEX fully support CUDA thunking, it may enable x86_64 CUDA applications to run on ARM environments with near-native performance, opening doors for broader use cases in AI and machine learning workflows.
Revolutionizing Architecture with Unixlib
To improve performance within Proton/WINE environments, FEX 2607 has also begun implementing a new library system known as “Unixlib.” Currently, FEX functions are implemented as DLL files loaded by WINE. However, this design imposes several engineering constraints.
By isolating Unixlib as a shared object (SO) file, FEX is expected to overcome architectural limitations and allow for more dynamic expansion. The SO file approach leverages the standard dynamic linking mechanism in Linux, enabling more efficient low-level operations such as memory management and multi-threading.
This change is particularly crucial for WINE/Proton environments used for gaming. Many Windows games demand advanced memory management and multi-threaded processing, making the flexibility of the emulation framework directly impact compatibility and performance.
Relationship with Steam Frame
Valve, which supports the development of FEX, is reportedly working on an ARM-based handheld gaming PC called “Steam Frame.” While the company has already employed x86-based AMD APUs in its Steam Deck, multiple sources suggest the possibility of transitioning to ARM architecture in future products.
FEX is positioned as a foundational technology to enable x86/x86_64 Windows games and Linux applications to run on ARM hardware. Through its integration with Steam Play (Proton), FEX could make it possible to play Windows games on ARM devices.
Although the specifics of the processor for the Steam Frame are not yet revealed, FEX’s support for 256-bit SVE2 suggests that the processor may support the latest extensions beyond the ARM v9 instruction set.
Comparison with Competing Emulators
Other emulators with similar goals include QEMU’s user-mode emulation, Box86/Box64, and Apple’s Rosetta 2.
While QEMU offers general-purpose emulation, it has significant overhead, particularly for gaming applications. Box86/Box64 is lightweight but limited in coverage and optimization. Apple’s Rosetta 2 delivers high performance but is exclusive to Apple Silicon, making it unavailable for Linux or general ARM devices.
FEX differentiates itself with an efficient JIT compiler and optimizations tailored for gaming. Its deep integration with graphics APIs and Proton, supported by Valve, is a significant advantage.
However, unlike Rosetta 2, FEX lacks deep OS-level integration, which may lead to higher system call translation overhead. The introduction of Unixlib is expected to address some of these limitations in the future.
Impact on the Ecosystem
The maturation of FEX is vital for the broader ARM ecosystem. One of the biggest challenges for the proliferation of ARM-based devices is compatibility with the vast software library developed for x86.
In the gaming sector, many titles are optimized for x86, making the transition to ARM challenging. High-performance emulators like FEX can bridge this gap, enabling existing game libraries to function on ARM devices.
Additionally, if FEX facilitates the execution of x86 CUDA applications on ARM-based AI workstations like DGX Spark and GB10, it could significantly lower the barriers for developers transitioning to ARM environments.
Editorial Opinion
In the short term, the experimental support for CUDA thunking and 256-bit SVE2 optimization in FEX 2607 has the potential to impact NVIDIA DGX Spark users and engineers exploring ARM-based development platforms directly. Coupled with improvements to the JIT compiler, this release represents a notable leap in the practicality of emulation. If CUDA thunking achieves full functionality, it could revolutionize AI/ML workflows on ARM environments.
From a long-term perspective, the evolution of FEX will accelerate ARM’s penetration into both gaming and professional sectors. With Valve’s support and the anticipated launch of the Steam Frame, ARM could become a viable alternative to x86 in gaming. However, maintaining compatibility with existing ARM hardware that lacks 256-bit SVE2 will likely remain a challenge.
The editorial team is keenly observing the performance and compatibility of FEX’s CUDA thunking implementation, particularly its ability to handle both dynamically and statically linked applications.
References
- Phoronix — Published on 2026-07-04
Frequently Asked Questions
- What is the FEX emulator?
- FEX is a user-mode emulator that enables Linux x86/x86_64 binaries to run on ARM64 (AArch64) systems. It uses JIT compilation to dynamically translate instructions and works with WINE/Proton to allow Windows games to run on ARM. It is developed with support from Valve.
- Why is support for 256-bit SVE2 important?
- SVE2 is an extended instruction set in the ARM v9 architecture that features variable vector lengths. By leveraging SVE2, FEX can efficiently emulate x86 AVX instructions (256-bit SIMD), providing a performance boost for future ARM processors with 256-bit SVE2 support.
- What is CUDA thunking, and why is it significant?
- CUDA thunking is a technology that translates GPU calls from x86_64 CUDA applications into ARM-native calls. It could enable x86_64 CUDA software to run on ARM-based NVIDIA platforms like DGX Spark and GB10, facilitating AI/ML development on ARM systems. ## References - [FEX 2607 Optimizing For Yet-To-Be-Released ARM 256-bit SVE2 Hardware - Phoronix](https://www.phoronix.com/news/FEX-2607-Emulator) — Published on 2026-07-04
Comments