Linux Kernel Optimization: CRC32C Gets a Tenfold Size Reduction and Significant Speedup
A patch has been proposed for the upcoming Linux 6.13 kernel version, redesigning the CRC32C checksum algorithm. The new implementation has reduced the code size by nearly tenfold—from 4546 bytes to 418 bytes. This reduction in operations and optimized loop logic has yielded a substantial performance boost, particularly noticeable when retpoline protection against Spectre attacks is disabled. Performance gains are observed on AMD Zen 2 processors at 11.8%, Intel Emerald Rapids at 6.4%, and Intel Haswell at 4.8%.
With retpoline protection enabled, the optimization’s impact is even more pronounced: performance on Intel Emerald Rapids increases by 66.8%, on Intel Haswell by 35.0%, and on AMD Zen 2 by 29.5%.
Previously, CRC32C employed 128 unrolled cycles, significantly inflating the code size. As modern processors support out-of-order execution, the excess of jump commands within these loops became a barrier to optimization. In the new implementation, the iteration count has been reduced to four, which has greatly minimized code size while concurrently enhancing operation speed.