Dev

Linux RAID, AVX-512 xor_gen() v2 Achieves Up to 43% Speedup

Google's Eric Biggers updates Linux kernel AVX-512 xor_gen() to v2 achieving up to 43% faster RAID5/RAID6 parity benefiting Btrfs and others

4 min read Reviewed & edited by the SINGULISM Editorial Team

Linux RAID, AVX-512 xor_gen() v2 Achieves Up to 43% Speedup
Photo by Lukas on Unsplash

According to Phoronix, the AVX-512 implementation of the xor_gen() function for the Linux kernel, led by Google’s Eric Biggers, has been updated to the second version (v2) and posted on the Linux Kernel Mailing List (LKML). This function is used for generating and verifying parity blocks in RAID5/RAID6, and is also called directly by filesystems such as Btrfs.

Evolution from Initial Version to v2

The initial AVX-512 implementation released by Biggers earlier achieved up to a 41% performance improvement over the existing implementation. According to Phoronix’s Michael Larabel, v2 has been further improved, with up to 43% performance gains confirmed. While the initial version showed notable effects at specific src count sizes, v2 has been optimized to deliver benefits across a wider range of src count sizes.

The xor_gen() function is at the core of Linux’s software RAID stack, and faster parity calculation directly reduces storage system latency. This improvement has practical implications, especially for large RAID arrays and high-load storage servers.

What AVX-512 Brings

AVX-512 is a 512-bit wide SIMD (Single Instruction, Multiple Data) instruction set developed by Intel. It can execute 16 32-bit floating-point operations or 64 8-bit integer operations per clock cycle, making it suitable for parallelizing bitwise XOR operations like parity calculation. Biggers’ implementation leverages this property to accelerate parity generation for RAID5 and Reed-Solomon code calculations for RAID6.

The use of AVX-512 in the Linux kernel is not limited to this xor_gen() function. Previously, AVX-512 optimizations have been pursued for pathways such as encryption, checksum calculation, and memory copying. This work is part of a broader effort to utilize SIMD within the storage subsystem.

Impact Scope and Evaluation

The improvement affects all Linux environments using software RAID. Specifically, this includes RAID5/RAID6 arrays using mdadm, RAID5/RAID6 modes in Btrfs, and systems configuring software RAID on top of XFS or ext4. Environments using hardware RAID controllers see no direct benefit, but this optimization is important for software-defined storage in cloud and data center environments.

The degree of performance improvement depends on the src count. v2 has been designed to deliver improvements in more patterns, but actual numbers vary by hardware and workload. Once review on LKML is complete, it is likely to be merged into a future Linux kernel release.

Editorial Opinion

This development marks a milestone in the utilization of AVX-512 within Linux kernel development. Intel’s AVX-512 initially saw limited adoption due to power consumption and latency concerns. However, improvements in processors starting with the 4th Gen Xeon Scalable Processor (Sapphire Rapids) have accelerated its use in the kernel once again.

In the short term, environments running software RAID on existing Linux distributions can expect performance gains after a kernel update. The impact is especially significant for workloads where parity calculation is a bottleneck, such as large backup servers or video surveillance storage.

In the medium to long term, the scope of AVX-512 utilization may expand to fields beyond storage. There are numerous kernel tasks suitable for SIMD parallel processing, such as filesystem metadata handling and hash computation for data deduplication. Biggers’ work is expected to help establish best practices for SIMD usage across the kernel.

On the other hand, AVX-512 is only available on some Intel processors and some AMD processors (Zen 4 and later). Generic kernel distributions must maintain both AVX-512-capable and non-capable code paths, which increases binary size and maintenance costs. How the kernel community manages this trade-off will be a key point to watch.

References

Frequently Asked Questions

What specific processing does the xor_gen() function handle?
xor_gen() is a function within the Linux kernel that generates and verifies parity data using XOR operations. In RAID5/RAID6, parity blocks must be calculated from multiple data disks, and this function speeds up XOR operations, thereby improving storage I/O performance.
What improvements does the v2 version offer?
Compared to the initial version, v2 has been optimized to deliver performance improvements across a wider range of src count sizes. The maximum performance gain has improved from 41% to 43%, with specific sizes showing gains exceeding the initial version. The exact improvements are under review on LKML.
Which Linux kernel version will this optimization be available in?
At this point, the patch has been submitted to LKML, and the kernel version into which it will be merged is not yet determined. Typically, after review, it may be included in the next major release or considered for backport to stable versions. Distributions such as Ubuntu and RHEL will make it available through updates once the corresponding kernel is released.
Source: Phoronix

Comments

← Back to Home