1ziplimit.txt 2 3Zip 3 and UnZip 6 now support many of the extended limits of Zip64. 4 5A) Hard limits of the Zip archive format: 6 7 Number of entries in Zip archive: 64 k (2^16 - 1 entries) 8 Compressed size of archive entry: 4 GByte (2^32 - 1 Bytes) 9 Uncompressed size of entry: 4 GByte (2^32 - 1 Bytes) 10 Size of single-volume Zip archive: 4 GByte (2^32 - 1 Bytes) 11 Per-volume size of multi-volume archives: 4 GByte (2^32 - 1 Bytes) 12 Number of parts for multi-volume archives: 64 k (1^16 - 1 parts) 13 Total size of multi-volume archive: 256 TByte (4G * 64k) 14 15 The number of archive entries and of multivolume parts are limited by 16 the structure of the "end-of-central-directory" record, where the these 17 numbers are stored in 2-Byte fields. 18 Some Zip and/or UnZip implementations (for example Info-ZIP's) allow 19 handling of archives with more than 64k entries. (The information 20 from "number of entries" field in the "end-of-central-directory" record 21 is not really neccessary to retrieve the contents of a Zip archive; 22 it should rather be used for consistency checks.) 23 24 Length of an archive entry name: 64 kByte (2^16 - 1) 25 Length of archive member comment: 64 kByte (2^16 - 1) 26 Total length of "extra field": 64 kByte (2^16 - 1) 27 Length of a single e.f. block: 64 kByte (2^16 - 1) 28 Length of archive comment: 64 KByte (2^16 - 1) 29 30 Additional limitation claimed by PKWARE: 31 Size of local-header structure (fixed fields of 30 Bytes + filename 32 local extra field): < 64 kByte 33 Size of central-directory structure (46 Bytes + filename + 34 central extra field + member comment): < 64 kByte 35 36 Note: 37 In 2001, PKWARE has published version 4.5 of the Zip format specification 38 (together with the release of PKZIP for Windows 4.5). This specification 39 defines new extra field blocks that allow to break the size limits of the 40 standard zipfile structures. In this extended Zip format, the size limits 41 of zip entries (and the complete zip archive) have been extended to 42 (2^64 - 1) Bytes and the maximum number of archive entries to (2^32-1). 43 Zip 3.0 supports these Zip64 extensions and should be released shortly. 44 UnZip 6.0 should support these standards. 45 46B) Implementation limits of UnZip: 47 48 Note: 49 This section should be updated when UnZip 6.0 is near release. 50 51 1. Size limits caused by file I/O and decompression handling: 52 Size of Zip archive: 2 GByte (2^31 - 1 Bytes) 53 Compressed size of archive entry: 2 GByte (2^31 - 1 Bytes) 54 55 Note: On some systems, UnZip may support archive sizes up to 4 GByte. 56 To get this support, the target environment has to meet the following 57 requirements: 58 a) The compiler's intrinsic "long" data types must be able to hold 59 integer numbers of 2^32. In other words - the standard intrinsic 60 integer types "long" and "unsigned long" have to be wider than 61 32 bit. 62 b) The system has to supply a C runtime library that is compatible 63 with the more-than-32-bit-wide "long int" type of condition a) 64 c) The standard file positioning functions fseek(), ftell() (and/or 65 the Unix style lseek() and tell() functions) have to be capable 66 to move to absolute file offsets of up to 4 GByte from the file 67 start. 68 On 32-bit CPU hardware, you generally cannot expect that a C compiler 69 provides a "long int" type that is wider than 32-bit. So, many of the 70 most popular systems (i386, PowerPC, 680x0, et. al) are out of luck. 71 You may find environment that provide all requirements on systems 72 with 64-bit CPU hardware. Examples might be Cray number crunchers 73 or Compaq (former DEC) Alpha AXP machines. 74 75 The number of Zip archive entries is unlimited. The "number-of-entries" 76 field of the "end-of-central-dir" record is checked against the "number 77 of entries found in the central directory" modulus 64k (2^16). 78 79 Multi-volume archive extraction is not supported. 80 81 Memory requirements are mostly independent of the archive size 82 and archive contents. 83 In general, UnZip needs a fixed amount of internal buffer space 84 plus the size to hold the complete information of the currently 85 processed entry's local header. Here, a large extra field 86 (could be up to 64 kByte) may exceed the available memory 87 for MSDOS 16-bit executables (when they were compiled in small 88 or medium memory model, with a fixed 64kByte limit on data space). 89 90 The other exception where memory requirements scale with "larger" 91 archives is the "restore directory attributes" feature. Here, the 92 directory attributes info for each restored directory has to be held 93 in memory until the whole archive has been processed. So, the amount 94 of memory needed to keep this info scales with the number of restored 95 directories and may cause memory problems when a lot of directories 96 are restored in a single run. 97 98C) Implementation limits of the Zip executables: 99 100 Note: 101 This section has been updated to reflect Zip 3.0. 102 103 1. Size limits caused by file I/O and compression handling: 104 Without Zip64 extensions: 105 Size of Zip archive: 2 GByte (2^31 - 1 Bytes) 106 Compressed size of archive entry: 2 GByte (2^31 - 1 Bytes) 107 Uncompressed size of entry: 2 GByte (2^31 - 1 Bytes), 108 (could/should be 4 GBytes...) 109 Using Zip64 extensions: 110 Size of Zip archive: 2^63 - 1 Bytes 111 Compressed size of archive entry: 2^63 - 1 Bytes 112 Uncompressed size of entry: 2^63 - 1 Bytes 113 114 Multi-volume archive creation now supported in the form of split 115 archvies. Currently up to 99,999 splits are supported. 116 117 2. Limits caused by handling of archive contents lists 118 119 2.1. Number of archive entries (freshen, update, delete) 120 a) 16-bit executable: 64k (2^16 -1) or 32k (2^15 - 1), 121 (unsigned vs. signed type of size_t) 122 a1) 16-bit executable: <16k ((2^16)/4) 123 (The smaller limit a1) results from the array size limit of 124 the "qsort()" function.) 125 126 32-bit executables: <1G ((2^32)/4) 127 (usual system limit of the "qsort()" function on 32-bit systems) 128 129 b) stack space needed by qsort to sort list of archive entries 130 131 NOTE: In the current executables, overflows of limits a) and b) are NOT 132 checked! 133 134 c) amount of free memory to hold "central directory information" of 135 all archive entries; one entry needs: 136 96 bytes (32-bit) resp. 80 bytes (16-bit) 137 + 3 * length of entry name 138 + length of zip entry comment (when present) 139 + length of extra field(s) (when present, e.g.: UT needs 9 bytes) 140 + some bytes for book-keeping of memory allocation 141 142 Conclusion: 143 For systems with limited memory space (MSDOS, small AMIGAs, other 144 environments without virtual memory), the number of archive entries 145 is most often limited by condition c). 146 For example, with approx. 100 kBytes of free memory after loading and 147 initializing the program, a 16-bit DOS Zip cannot process more than 600 148 to 1000 (+) archive entries. (For the 16-bit Windows DLL or the 16-bit 149 OS/2 port, limit c) is less important because Windows or OS/2 executables 150 are not restricted to the 1024k area of real mode memory. These 16-bit 151 ports are limited by conditions a1) and b), say: at maximum approx. 152 16000 entries!) 153 154 155 2.2. Number of "new" entries (add operation) 156 In addition to the restrictions above (2.1.), the following limits 157 caused by the handling of the "new files" list apply: 158 159 a) 16-bit executable: <16k ((2^64)/4) 160 161 b) stack size required for "qsort" operation on "new entries" list. 162 163 NOTE: In the current executables, the overflow checks for these limits 164 are missing! 165 166 c) amount of free memory to hold the directory info list for new entries; 167 one entry needs: 168 24 bytes (32-bit) resp. 22 bytes (16-bit) 169 + 3 * length of filename 170 171 NOTE: For larger systems, the actual limits may be more performance 172 issues (how long you want to wait) rather than available memory and other 173 resources. 174 175D) Some technical remarks: 176 177 1. For executables compiled without LARGE_FILE_SUPPORT and ZIP64_SUPPORT 178 enabled, the 2GByte size limit on archive files is a consequence of 179 the portable C implementation of the Info-ZIP programs. Zip archive 180 processing requires random access to the archive file for jumping 181 between different parts of the archive's structure. In standard C, 182 this is done via stdio functions fseek()/ftell() resp. unix-io functions 183 lseek()/tell(). In many (most?) C implementations, these functions use 184 "signed long" variables to hold offset pointers into sequential files. 185 In most cases, this is a signed 32-bit number, which is limited to 186 ca. 2E+09. There may be specific C runtime library implementations 187 that interpret the offset numbers as unsigned, but for us, this is not 188 reliable in the context of portable programming. 189 190 If LARGE_FILE_SUPPORT and ZIP64_SUPPORT are defined and supported by 191 the system, 64-bit off_t file offsets are supported and the above 192 larger limits are supported. As off_t is signed, the maximum offset 193 is usually limited to 2^63 - 1. 194 195 2. The 2GByte limit on the size of a single compressed archive member 196 is again a consequence of the implementation in C. 197 The variables used internally to count the size of the compressed 198 data stream are of type "long", which is guaranted to be at least 199 32-bit wide on all supported environments. 200 201 But, why do we use "signed" long and not "unsigned long"? 202 203 Throughout the I/O handling of the compressed data stream, the 204 sign bit of the "long" numbers is (mis-)used as a kind of overflow 205 detection. In the end, this is caused by the fact that standard C 206 lacks any overflow checking on integer arithmetics and does not 207 support access to the underlying hardware's overflow detection 208 (the status bits, especially "carry" and "overflow" of the CPU's 209 flags-register) in a system-independent manner. 210 211 So, we "misuse" the most-significant bit of the compressed data 212 size counters as carry bit for efficient overflow/underflow detection. 213 We could change the code to a different method of overflow detection, 214 by using a bunch of "sanity" comparisons (kind of "is the calculated 215 result plausible when compared with the operands"). But, this would 216 "blow up" the code of the "inner loop", with remarkable loss of 217 processing speed. Or, we could reduce the amount of consistency checks 218 of the compressed data (e.g. detection of premature end of stream) to 219 an absolute minimum, at the cost of the programs' stability when 220 processing corrupted data. 221 222 Summary: Changing the compression/decompression core routines to 223 be "unsigned safe" would require excessive recoding, with little 224 gain on maximum processable uncompressed size (a gain can only be 225 expected for hardly compressable data), but at severe costs on 226 performance, stability and maintainability. Therefore, it is 227 quite unlikely that this will ever happen for Zip/UnZip. 228 229 With LARGE_FILE_SUPPORT and ZIP64_SUPPORT enabled and supported, 230 the above arguments still apply, but the limits are based on 64 bits 231 instead of 32 and should allow most large files and archives to be 232 processed. 233 234 Anyway, the Zip archive format is more and more showing its age... 235 The effort to lift the 2GByte limits should be better invested in 236 creating a successor for the Zip archive format and tools. But given 237 the latest improvements to the format and the wide acceptance of zip 238 files, the format will probably be around for awhile more. 239 240Please report any problems using the web contact form at: www.Info-ZIP.org 241 242Last updated: 26 January 2002, Christian Spieler 243 25 May 2008, Ed Gordon 244