README.686 revision 205194
1205194SdelphijThis is a patched version of zlib, modified to use 2205194SdelphijPentium-Pro-optimized assembly code in the deflation algorithm. The 3205194Sdelphijfiles changed/added by this patch are: 4205194Sdelphij 5205194SdelphijREADME.686 6205194Sdelphijmatch.S 7205194Sdelphij 8205194SdelphijThe speedup that this patch provides varies, depending on whether the 9205194Sdelphijcompiler used to build the original version of zlib falls afoul of the 10205194SdelphijPPro's speed traps. My own tests show a speedup of around 10-20% at 11205194Sdelphijthe default compression level, and 20-30% using -9, against a version 12205194Sdelphijcompiled using gcc 2.7.2.3. Your mileage may vary. 13205194Sdelphij 14205194SdelphijNote that this code has been tailored for the PPro/PII in particular, 15205194Sdelphijand will not perform particuarly well on a Pentium. 16205194Sdelphij 17205194SdelphijIf you are using an assembler other than GNU as, you will have to 18205194Sdelphijtranslate match.S to use your assembler's syntax. (Have fun.) 19205194Sdelphij 20205194SdelphijBrian Raiter 21205194Sdelphijbreadbox@muppetlabs.com 22205194SdelphijApril, 1998 23205194Sdelphij 24205194Sdelphij 25205194SdelphijAdded for zlib 1.1.3: 26205194Sdelphij 27205194SdelphijThe patches come from 28205194Sdelphijhttp://www.muppetlabs.com/~breadbox/software/assembly.html 29205194Sdelphij 30205194SdelphijTo compile zlib with this asm file, copy match.S to the zlib directory 31205194Sdelphijthen do: 32205194Sdelphij 33205194SdelphijCFLAGS="-O3 -DASMV" ./configure 34205194Sdelphijmake OBJA=match.o 35205194Sdelphij 36205194Sdelphij 37205194SdelphijUpdate: 38205194Sdelphij 39205194SdelphijI've been ignoring these assembly routines for years, believing that 40205194Sdelphijgcc's generated code had caught up with it sometime around gcc 2.95 41205194Sdelphijand the major rearchitecting of the Pentium 4. However, I recently 42205194Sdelphijlearned that, despite what I believed, this code still has some life 43205194Sdelphijin it. On the Pentium 4 and AMD64 chips, it continues to run about 8% 44205194Sdelphijfaster than the code produced by gcc 4.1. 45205194Sdelphij 46205194SdelphijIn acknowledgement of its continuing usefulness, I've altered the 47205194Sdelphijlicense to match that of the rest of zlib. Share and Enjoy! 48205194Sdelphij 49205194SdelphijBrian Raiter 50205194Sdelphijbreadbox@muppetlabs.com 51205194SdelphijApril, 2007 52