1ziplimit.txt
2
3Zip 3 and UnZip 6 now support many of the extended limits of Zip64.
4
5A) Hard limits of the Zip archive format:
6
7   Number of entries in Zip archive:            64 k (2^16 - 1 entries)
8   Compressed size of archive entry:            4 GByte (2^32 - 1 Bytes)
9   Uncompressed size of entry:                  4 GByte (2^32 - 1 Bytes)
10   Size of single-volume Zip archive:           4 GByte (2^32 - 1 Bytes)
11   Per-volume size of multi-volume archives:    4 GByte (2^32 - 1 Bytes)
12   Number of parts for multi-volume archives:   64 k (1^16 - 1 parts)
13   Total size of multi-volume archive:          256 TByte (4G * 64k)
14
15   The number of archive entries and of multivolume parts are limited by
16   the structure of the "end-of-central-directory" record, where the these
17   numbers are stored in 2-Byte fields.
18   Some Zip and/or UnZip implementations (for example Info-ZIP's) allow
19   handling of archives with more than 64k entries.  (The information
20   from "number of entries" field in the "end-of-central-directory" record
21   is not really neccessary to retrieve the contents of a Zip archive;
22   it should rather be used for consistency checks.)
23
24   Length of an archive entry name:             64 kByte (2^16 - 1)
25   Length of archive member comment:            64 kByte (2^16 - 1)
26   Total length of "extra field":               64 kByte (2^16 - 1)
27   Length of a single e.f. block:               64 kByte (2^16 - 1)
28   Length of archive comment:                   64 KByte (2^16 - 1)
29
30   Additional limitation claimed by PKWARE:
31     Size of local-header structure (fixed fields of 30 Bytes + filename
32      local extra field):                     < 64 kByte
33     Size of central-directory structure (46 Bytes + filename +
34      central extra field + member comment):  < 64 kByte
35
36   Note:
37   In 2001, PKWARE has published version 4.5 of the Zip format specification
38   (together with the release of PKZIP for Windows 4.5).  This specification
39   defines new extra field blocks that allow to break the size limits of the
40   standard zipfile structures.  In this extended Zip format, the size limits
41   of zip entries (and the complete zip archive) have been extended to
42   (2^64 - 1) Bytes and the maximum number of archive entries to (2^32-1).
43   Zip 3.0 supports these Zip64 extensions and should be released shortly.
44   UnZip 6.0 should support these standards.
45
46B) Implementation limits of UnZip:
47
48   Note:
49   This section should be updated when UnZip 6.0 is near release.
50
51 1. Size limits caused by file I/O and decompression handling:
52   Size of Zip archive:                 2 GByte (2^31 - 1 Bytes)
53   Compressed size of archive entry:    2 GByte (2^31 - 1 Bytes)
54
55   Note: On some systems, UnZip may support archive sizes up to 4 GByte.
56         To get this support, the target environment has to meet the following
57         requirements:
58         a) The compiler's intrinsic "long" data types must be able to hold
59            integer numbers of 2^32. In other words - the standard intrinsic
60            integer types "long" and "unsigned long" have to be wider than
61            32 bit.
62         b) The system has to supply a C runtime library that is compatible
63            with the more-than-32-bit-wide "long int" type of condition a)
64         c) The standard file positioning functions fseek(), ftell() (and/or
65            the Unix style lseek() and tell() functions) have to be capable
66            to move to absolute file offsets of up to 4 GByte from the file
67            start.
68         On 32-bit CPU hardware, you generally cannot expect that a C compiler
69         provides a "long int" type that is wider than 32-bit. So, many of the
70         most popular systems (i386, PowerPC, 680x0, et. al) are out of luck.
71         You may find environment that provide all requirements on systems
72         with 64-bit CPU hardware. Examples might be Cray number crunchers
73         or Compaq (former DEC) Alpha AXP machines.
74
75   The number of Zip archive entries is unlimited. The "number-of-entries"
76   field of the "end-of-central-dir" record is checked against the "number
77   of entries found in the central directory" modulus 64k (2^16).
78
79   Multi-volume archive extraction is not supported.
80
81   Memory requirements are mostly independent of the archive size
82   and archive contents.
83   In general, UnZip needs a fixed amount of internal buffer space
84   plus the size to hold the complete information of the currently
85   processed entry's local header. Here, a large extra field
86   (could be up to 64 kByte) may exceed the available memory
87   for MSDOS 16-bit executables (when they were compiled in small
88   or medium memory model, with a fixed 64kByte limit on data space).
89
90   The other exception where memory requirements scale with "larger"
91   archives is the "restore directory attributes" feature. Here, the
92   directory attributes info for each restored directory has to be held
93   in memory until the whole archive has been processed. So, the amount
94   of memory needed to keep this info scales with the number of restored
95   directories and may cause memory problems when a lot of directories
96   are restored in a single run.
97
98C) Implementation limits of the Zip executables:
99
100   Note:
101   This section has been updated to reflect Zip 3.0.
102
103 1. Size limits caused by file I/O and compression handling:
104   Without Zip64 extensions:
105    Size of Zip archive:                 2 GByte (2^31 - 1 Bytes)
106    Compressed size of archive entry:    2 GByte (2^31 - 1 Bytes)
107    Uncompressed size of entry:          2 GByte (2^31 - 1 Bytes),
108                                         (could/should be 4 GBytes...)
109   Using Zip64 extensions:
110    Size of Zip archive:                 2^63 - 1 Bytes
111    Compressed size of archive entry:    2^63 - 1 Bytes
112    Uncompressed size of entry:          2^63 - 1 Bytes
113   
114   Multi-volume archive creation now supported in the form of split
115   archvies.  Currently up to 99,999 splits are supported.
116
117 2. Limits caused by handling of archive contents lists
118
119 2.1. Number of archive entries (freshen, update, delete)
120     a) 16-bit executable:              64k (2^16 -1) or 32k (2^15 - 1),
121                                        (unsigned vs. signed type of size_t)
122     a1) 16-bit executable:             <16k ((2^16)/4)
123         (The smaller limit a1) results from the array size limit of
124         the "qsort()" function.)
125
126         32-bit executables:            <1G ((2^32)/4)
127         (usual system limit of the "qsort()" function on 32-bit systems)
128
129     b) stack space needed by qsort to sort list of archive entries
130
131     NOTE: In the current executables, overflows of limits a) and b) are NOT
132           checked!
133
134     c) amount of free memory to hold "central directory information" of
135        all archive entries; one entry needs:
136        96 bytes (32-bit) resp. 80 bytes (16-bit)
137        + 3 * length of entry name
138        + length of zip entry comment (when present)
139        + length of extra field(s) (when present, e.g.: UT needs 9 bytes)
140        + some bytes for book-keeping of memory allocation
141
142   Conclusion:
143     For systems with limited memory space (MSDOS, small AMIGAs, other
144     environments without virtual memory), the number of archive entries
145     is most often limited by condition c).
146     For example, with approx. 100 kBytes of free memory after loading and
147     initializing the program, a 16-bit DOS Zip cannot process more than 600
148     to 1000 (+) archive entries.  (For the 16-bit Windows DLL or the 16-bit
149     OS/2 port, limit c) is less important because Windows or OS/2 executables
150     are not restricted to the 1024k area of real mode memory.  These 16-bit
151     ports are limited by conditions a1) and b), say: at maximum approx.
152     16000 entries!)
153
154
155 2.2. Number of "new" entries (add operation)
156     In addition to the restrictions above (2.1.), the following limits
157     caused by the handling of the "new files" list apply:
158
159     a) 16-bit executable:              <16k ((2^64)/4)
160
161     b) stack size required for "qsort" operation on "new entries" list.
162
163     NOTE: In the current executables, the overflow checks for these limits
164           are missing!
165
166     c) amount of free memory to hold the directory info list for new entries;
167        one entry needs:
168        24 bytes (32-bit) resp. 22 bytes (16-bit)
169        + 3 * length of filename
170
171     NOTE: For larger systems, the actual limits may be more performance
172     issues (how long you want to wait) rather than available memory and other
173     resources.
174
175D) Some technical remarks:
176
177 1. For executables compiled without LARGE_FILE_SUPPORT and ZIP64_SUPPORT
178    enabled, the 2GByte size limit on archive files is a consequence of
179    the portable C implementation of the Info-ZIP programs.  Zip archive
180    processing requires random access to the archive file for jumping
181    between different parts of the archive's structure.  In standard C,
182    this is done via stdio functions fseek()/ftell() resp. unix-io functions
183    lseek()/tell().  In many (most?) C implementations, these functions use
184    "signed long" variables to hold offset pointers into sequential files.
185    In most cases, this is a signed 32-bit number, which is limited to
186    ca. 2E+09.  There may be specific C runtime library implementations
187    that interpret the offset numbers as unsigned, but for us, this is not
188    reliable in the context of portable programming.
189
190    If LARGE_FILE_SUPPORT and ZIP64_SUPPORT are defined and supported by
191    the system, 64-bit off_t file offsets are supported and the above
192    larger limits are supported.  As off_t is signed, the maximum offset
193    is usually limited to 2^63 - 1.
194
195 2. The 2GByte limit on the size of a single compressed archive member
196    is again a consequence of the implementation in C.
197    The variables used internally to count the size of the compressed
198    data stream are of type "long", which is guaranted to be at least
199    32-bit wide on all supported environments.
200
201    But, why do we use "signed" long and not "unsigned long"?
202
203    Throughout the I/O handling of the compressed data stream, the
204    sign bit of the "long" numbers is (mis-)used as a kind of overflow
205    detection. In the end, this is caused by the fact that standard C
206    lacks any overflow checking on integer arithmetics and does not
207    support access to the underlying hardware's overflow detection
208    (the status bits, especially "carry" and "overflow" of the CPU's
209    flags-register) in a system-independent manner.
210
211    So, we "misuse" the most-significant bit of the compressed data
212    size counters as carry bit for efficient overflow/underflow detection.
213    We could change the code to a different method of overflow detection,
214    by using a bunch of "sanity" comparisons (kind of "is the calculated
215    result plausible when compared with the operands"). But, this would
216    "blow up" the code of the "inner loop", with remarkable loss of
217    processing speed. Or, we could reduce the amount of consistency checks
218    of the compressed data (e.g. detection of premature end of stream) to
219    an absolute minimum, at the cost of the programs' stability when
220    processing corrupted data.
221
222    Summary: Changing the compression/decompression core routines to
223    be "unsigned safe" would require excessive recoding, with little
224    gain on maximum processable uncompressed size (a gain can only be
225    expected for hardly compressable data), but at severe costs on
226    performance, stability and maintainability.  Therefore, it is
227    quite unlikely that this will ever happen for Zip/UnZip.
228
229    With LARGE_FILE_SUPPORT and ZIP64_SUPPORT enabled and supported,
230    the above arguments still apply, but the limits are based on 64 bits
231    instead of 32 and should allow most large files and archives to be
232    processed.
233
234    Anyway, the Zip archive format is more and more showing its age...
235    The effort to lift the 2GByte limits should be better invested in
236    creating a successor for the Zip archive format and tools.  But given
237    the latest improvements to the format and the wide acceptance of zip
238    files, the format will probably be around for awhile more.
239
240Please report any problems using the web contact form at:  www.Info-ZIP.org
241
242Last updated:  26 January 2002, Christian Spieler
243               25 May 2008, Ed Gordon
244