1 | This is a patched version of zlib modified to use
|
---|
2 | Pentium-optimized assembly code in the deflation algorithm. The files
|
---|
3 | changed/added by this patch are:
|
---|
4 |
|
---|
5 | README.586
|
---|
6 | match.S
|
---|
7 |
|
---|
8 | The effectiveness of these modifications is a bit marginal, as the the
|
---|
9 | program's bottleneck seems to be mostly L1-cache contention, for which
|
---|
10 | there is no real way to work around without rewriting the basic
|
---|
11 | algorithm. The speedup on average is around 5-10% (which is generally
|
---|
12 | less than the amount of variance between subsequent executions).
|
---|
13 | However, when used at level 9 compression, the cache contention can
|
---|
14 | drop enough for the assembly version to achieve 10-20% speedup (and
|
---|
15 | sometimes more, depending on the amount of overall redundancy in the
|
---|
16 | files). Even here, though, cache contention can still be the limiting
|
---|
17 | factor, depending on the nature of the program using the zlib library.
|
---|
18 | This may also mean that better improvements will be seen on a Pentium
|
---|
19 | with MMX, which suffers much less from L1-cache contention, but I have
|
---|
20 | not yet verified this.
|
---|
21 |
|
---|
22 | Note that this code has been tailored for the Pentium in particular,
|
---|
23 | and will not perform well on the Pentium Pro (due to the use of a
|
---|
24 | partial register in the inner loop).
|
---|
25 |
|
---|
26 | If you are using an assembler other than GNU as, you will have to
|
---|
27 | translate match.S to use your assembler's syntax. (Have fun.)
|
---|
28 |
|
---|
29 | Brian Raiter
|
---|
30 | [email protected]
|
---|
31 | April, 1998
|
---|
32 |
|
---|
33 |
|
---|
34 | Added for zlib 1.1.3:
|
---|
35 |
|
---|
36 | The patches come from
|
---|
37 | http://www.muppetlabs.com/~breadbox/software/assembly.html
|
---|
38 |
|
---|
39 | To compile zlib with this asm file, copy match.S to the zlib directory
|
---|
40 | then do:
|
---|
41 |
|
---|
42 | CFLAGS="-O3 -DASMV" ./configure
|
---|
43 | make OBJA=match.o
|
---|