1 2The .lzma File Format 3===================== 4 5 0. Preface 6 0.1. Notices and Acknowledgements 7 0.2. Changes 8 1. File Format 9 1.1. Header 10 1.1.1. Properties 11 1.1.2. Dictionary Size 12 1.1.3. Uncompressed Size 13 1.2. LZMA Compressed Data 14 2. References 15 16 170. Preface 18 19 This document describes the .lzma file format, which is 20 sometimes also called LZMA_Alone format. It is a legacy file 21 format, which is being or has been replaced by the .xz format. 22 The MIME type of the .lzma format is `application/x-lzma'. 23 24 The most commonly used software to handle .lzma files are 25 LZMA SDK, LZMA Utils, 7-Zip, and XZ Utils. This document 26 describes some of the differences between these implementations 27 and gives hints what subset of the .lzma format is the most 28 portable. 29 30 310.1. Notices and Acknowledgements 32 33 This file format was designed by Igor Pavlov for use in 34 LZMA SDK. This document was written by Lasse Collin 35 <lasse.collin@tukaani.org> using the documentation found 36 from the LZMA SDK. 37 38 This document has been put into the public domain. 39 40 410.2. Changes 42 43 Last modified: 2011-04-12 11:55+0300 44 45 461. File Format 47 48 +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+ 49 | Header | LZMA Compressed Data | 50 +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+ 51 52 The .lzma format file consist of 13-byte Header followed by 53 the LZMA Compressed Data. 54 55 Unlike the .gz, .bz2, and .xz formats, it is not possible to 56 concatenate multiple .lzma files as is and expect the 57 decompression tool to decode the resulting file as if it were 58 a single .lzma file. 59 60 For example, the command line tools from LZMA Utils and 61 LZMA SDK silently ignore all the data after the first .lzma 62 stream. In contrast, the command line tool from XZ Utils 63 considers the .lzma file to be corrupt if there is data after 64 the first .lzma stream. 65 66 671.1. Header 68 69 +------------+----+----+----+----+--+--+--+--+--+--+--+--+ 70 | Properties | Dictionary Size | Uncompressed Size | 71 +------------+----+----+----+----+--+--+--+--+--+--+--+--+ 72 73 741.1.1. Properties 75 76 The Properties field contains three properties. An abbreviation 77 is given in parentheses, followed by the value range of the 78 property. The field consists of 79 80 1) the number of literal context bits (lc, [0, 8]); 81 2) the number of literal position bits (lp, [0, 4]); and 82 3) the number of position bits (pb, [0, 4]). 83 84 The properties are encoded using the following formula: 85 86 Properties = (pb * 5 + lp) * 9 + lc 87 88 The following C code illustrates a straightforward way to 89 decode the Properties field: 90 91 uint8_t lc, lp, pb; 92 uint8_t prop = get_lzma_properties(); 93 if (prop > (4 * 5 + 4) * 9 + 8) 94 return LZMA_PROPERTIES_ERROR; 95 96 pb = prop / (9 * 5); 97 prop -= pb * 9 * 5; 98 lp = prop / 9; 99 lc = prop - lp * 9; 100 101 XZ Utils has an additional requirement: lc + lp <= 4. Files 102 which don't follow this requirement cannot be decompressed 103 with XZ Utils. Usually this isn't a problem since the most 104 common lc/lp/pb values are 3/0/2. It is the only lc/lp/pb 105 combination that the files created by LZMA Utils can have, 106 but LZMA Utils can decompress files with any lc/lp/pb. 107 108 1091.1.2. Dictionary Size 110 111 Dictionary Size is stored as an unsigned 32-bit little endian 112 integer. Any 32-bit value is possible, but for maximum 113 portability, only sizes of 2^n and 2^n + 2^(n-1) should be 114 used. 115 116 LZMA Utils creates only files with dictionary size 2^n, 117 16 <= n <= 25. LZMA Utils can decompress files with any 118 dictionary size. 119 120 XZ Utils creates and decompresses .lzma files only with 121 dictionary sizes 2^n and 2^n + 2^(n-1). If some other 122 dictionary size is specified when compressing, the value 123 stored in the Dictionary Size field is a rounded up, but the 124 specified value is still used in the actual compression code. 125 126 1271.1.3. Uncompressed Size 128 129 Uncompressed Size is stored as unsigned 64-bit little endian 130 integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates 131 that Uncompressed Size is unknown. End of Payload Marker (*) 132 is used if and only if Uncompressed Size is unknown. 133 134 XZ Utils rejects files whose Uncompressed Size field specifies 135 a known size that is 256 GiB or more. This is to reject false 136 positives when trying to guess if the input file is in the 137 .lzma format. When Uncompressed Size is unknown, there is no 138 limit for the uncompressed size of the file. 139 140 (*) Some tools use the term End of Stream (EOS) marker 141 instead of End of Payload Marker. 142 143 1441.2. LZMA Compressed Data 145 146 Detailed description of the format of this field is out of 147 scope of this document. 148 149 1502. References 151 152 LZMA SDK - The original LZMA implementation 153 http://7-zip.org/sdk.html 154 155 7-Zip 156 http://7-zip.org/ 157 158 LZMA Utils - LZMA adapted to POSIX-like systems 159 http://tukaani.org/lzma/ 160 161 XZ Utils - The next generation of LZMA Utils 162 http://tukaani.org/xz/ 163 164 The .xz file format - The successor of the .lzma format 165 http://tukaani.org/xz/xz-file-format.txt 166 167