Dev

AVX-512 Boosts Linux RAID Performance by Up to 41%

Google Linux kernel expert Eric Biggers has implemented an AVX-512-optimized xor_gen() function for software RAID, achieving up to a 41% performance improvement on the AMD Ryzen 9 9950X.

4 min read Reviewed & edited by the SINGULISM Editorial Team

AVX-512 Boosts Linux RAID Performance by Up to 41%
Photo from Unsplash

Eric Biggers, a Google specialist in the Linux kernel crypto subsystem, has published an optimization patch for software RAID leveraging the AVX-512 instruction set. The patch targets the kernel’s xor_gen() function and has demonstrated processing speed improvements of 19% to 41% on the AMD Ryzen 9 9950X.

Optimization Target and Mechanism

The xor_gen() function is used for generating and verifying parity blocks in RAID5 and RAID6. In software RAID, the CPU handles parity calculations, so the efficiency of this function directly impacts storage write performance.

The AVX-512 version of xor_gen() implemented by Biggers utilizes 512-bit vector registers (ZMM registers). Additionally, it employs the vpternlogq instruction, which can perform a three-input XOR operation in a single instruction, achieving higher throughput than previous implementations.

This optimization targets the following CPU generations:

  • AMD Zen 4 and later (client/server)
  • Intel Sapphire Rapids and later (server)
  • Intel Rocket Lake (client)
  • Intel Nova Lake and later (client)

However, Intel Skylake Server and Ice Lake generation AVX-512 implementations are excluded. These CPUs are known to experience excessive downclocking when using ZMM registers, and the same policy has already been adopted for cryptographic and CRC code. The patch uses the !PREFER_YMM condition to exclude these older implementations.

Benchmark Results

In tests conducted by Biggers on an AMD Ryzen 9 9950X (Zen 5) desktop processor, performance improvements of 19% to 41% were observed compared to the previous implementation. This represents one of the largest gains among several AVX-512 optimizations he has carried out.

Specific workloads include RAID5 parity generation and RAID6 dual parity computation. These processes are significant bottlenecks in software RAID environments where the CPU handles storage controller tasks. This optimization is expected to deliver notable benefits, especially in database servers and file servers that demand high throughput.

Outlook for Kernel Integration

Biggers has a track record of providing numerous AVX-512 optimizations for Linux kernel crypto code. The patch has been posted on the mailing list and, after review, is expected to be merged into the mainline kernel.

In the Linux kernel maintenance process, patches with clear performance gains and low risk tend to be incorporated relatively quickly. This optimization replaces an existing code path and introduces no functional regressions. Since compile-time conditionals ensure that CPUs lacking AVX-512 support will continue using the original implementation, there are no compatibility issues.

Editorial Opinion

Short-Term Impact

If this patch is merged into the mainline kernel, systems using Linux software RAID will see immediate benefits. Especially for NAS or home servers equipped with Zen 5 CPUs like the AMD Ryzen 9 9950X, write performance to RAID arrays will improve to a perceptible degree. Considering the kernel update cycles of Linux distributions, integration into major distributions could occur as early as around kernel 6.13. For server operators, the practical value lies in performance gains without hardware replacement.

Long-Term Perspective

This optimization serves as another demonstration of AVX-512’s utility. For a long time, AVX-512 was shunned due to Intel’s imperfect implementation (downclocking issues), but efficient execution on AMD Zen 4 and later, along with improvements on Intel Sapphire Rapids and later, is fueling a reassessment. If AVX-512 adoption expands in areas beyond RAID—such as compression, cryptography, and machine learning—it could boost overall x86_64 platform performance. On the other hand, the limited generational support (e.g., Intel Rocket Lake only) remains a factor for users when choosing CPUs.

Editor’s Query

The decision to exclude Skylake Server and Ice Lake from this patch is reasonable, but existing servers with those CPUs will miss out on software RAID performance gains. Is there a possibility that Intel could improve the AVX-512 downclocking issue at the firmware level in the future? Additionally, given the strong AVX-512 performance of AMD Zen 4/5, will Intel reconsider incorporating AVX-512 in future client CPUs? Furthermore, it will be interesting to see whether this optimization prompts similar implementations for RISC-V or ARM architectures, and industry discussion on that front is welcome.

References

Frequently Asked Questions

Which Linux kernel version will this AVX-512 optimization be available from?
Currently, the patch is at the public stage; its merge into the mainline kernel depends on upcoming reviews. If everything goes smoothly, it could be integrated as early as around kernel 6.13. It may take a few more months for it to appear in major distribution kernel updates.
Can it be manually enabled on excluded CPUs like Intel Skylake Server and Ice Lake?
The patch uses a compile-time conditional to check `!PREFER_YMM`, so it is not automatically enabled on these CPUs. Forcing it would require changes to the kernel build configuration, but downclocking will cause performance degradation, so it is not recommended.
Can workloads other than software RAID benefit from this optimization?
This optimization is specific to the xor_gen() function. However, the same AVX-512 implementation pattern has already been adopted in the Linux kernel's crypto and CRC code, and those areas are also being improved by Biggers.
Source: Phoronix

Comments

← Back to Home