Cross Reference: /freebsd-current/sys/libkern/x86/crc32

History log of /freebsd-current/sys/libkern/x86/crc32_sse42.c
Revision	Date	Author	Comments
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# f89d2072	17-Jun-2019	Xin LI <delphij@FreeBSD.org>	Separate kernel crc32() implementation to its own header (gsb_crc32.h) and rename the source to gsb_crc32.c. This is a prerequisite of unifying kernel zlib instances. PR: 229763 Submitted by: Yoshihiro Ota <ota at j.email.ne.jp> Differential Revision: https://reviews.freebsd.org/D20193
# 9de5f67d	11-Aug-2017	Ryan Libby <rlibby@FreeBSD.org>	x86/crc32_sse42.c: quiet unused function warning Reviewed by: cem Approved by: markj (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11980
# 864c28cf	26-Mar-2017	Bruce Evans <bde@FreeBSD.org>	Use inline asm instead of unportable intrinsics for the SSE4 crc32 optimization. This fixes building with gcc-4.2.1 (it doesn't support SSE4). gas-2.17.50 [FreeBSD] supports SSE4 instructions, so this doesn't need using .byte directives. This fixes depending on host user headers in the kernel. Fix user includes (don't depend on namespace pollution in <nmmintrin.h> that is not included now). The instrinsics had no advantages except to sometimes avoid compiler pessimixations. clang understands them a bit better than inline asm, and generates better looking code which also runs better for cem, but for me it just at the same speed or slower by doing excessive unrollowing in all the wrong places. gcc-4.2.1 also doesn't understand what it is doing with unrolling, but with -O3 somehow it does more unrolling that helps. Reduce 1 of the the compiler pessimizations (copying a variable which already satisfies an "rm" constraint in a good way by being in memory and not used again, to different memory and accessing it there. Force copying it to a register instead). Try to optimize the inner loops significantly, so as to run at full speed on smaller inputs. The algorithm is already very MD, and was tuned for the throughput of 3 crc32 instructions per cycle found on at least Sandybridge through Haswell. Now it is even more tuned for this, so depends more on the compiler not rearranging or unrolling things too much. The main inner loop for should have no difficulty runing at full speed on these CPUs unless the compiler unrolls it too much. However, the main inner loop wasn't even used for buffers smaller than 24K. Now it is used for buffers larger than 384 bytes. Now it is not so long, and the main outer loop is used more. The new optimization is to try to arrange that the outer loop runs in parallel with the next inner loop except for the final iteration; then reduce the loop sizes significantly to take advantage of this. Approved by: cem Not tested in production by: bde
# 6be2ff7d	30-Jan-2017	Conrad Meyer <cem@FreeBSD.org>	calculate_crc32c: Add SSE4.2 implementation on x86 Derived from an implementation by Mark Adler. The fast loop performs three simultaneous CRCs over subsets of the data before composing them. This takes advantage of certain properties of the CRC32 implementation in Intel hardware. (The CRC instruction takes 1 cycle but has 2-3 cycles of latency.) The CRC32 instruction does not manipulate FPU state. i386 does not have the crc32q instruction, so avoid it there. Otherwise the implementation is identical to amd64. Add basic userland tests to verify correctness on a variety of inputs. PR: 216467 Reported by: Ben RUBSON <ben.rubson at gmail.com> Reviewed by: kib@, markj@ (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D9342