1
2XZ Utils FAQ
3============
4
5Q:  What do the letters XZ mean?
6
7A:  Nothing. They are just two letters, which come from the file format
8    suffix .xz. The .xz suffix was selected, because it seemed to be
9    pretty much unused. It has no deeper meaning.
10
11
12Q:  What are LZMA and LZMA2?
13
14A:  LZMA stands for Lempel-Ziv-Markov chain-Algorithm. It is the name
15    of the compression algorithm designed by Igor Pavlov for 7-Zip.
16    LZMA is based on LZ77 and range encoding.
17
18    LZMA2 is an updated version of the original LZMA to fix a couple of
19    practical issues. In context of XZ Utils, LZMA is called LZMA1 to
20    emphasize that LZMA is not the same thing as LZMA2. LZMA2 is the
21    primary compression algorithm in the .xz file format.
22
23
24Q:  There are many LZMA related projects. How does XZ Utils relate to them?
25
26A:  7-Zip and LZMA SDK are the original projects. LZMA SDK is roughly
27    a subset of the 7-Zip source tree.
28
29    p7zip is 7-Zip's command line tools ported to POSIX-like systems.
30
31    LZMA Utils provide a gzip-like lzma tool for POSIX-like systems.
32    LZMA Utils are based on LZMA SDK. XZ Utils are the successor to
33    LZMA Utils.
34
35    There are several other projects using LZMA. Most are more or less
36    based on LZMA SDK. See <http://7-zip.org/links.html>.
37
38
39Q:  Why is liblzma named liblzma if its primary file format is .xz?
40    Shouldn't it be e.g. libxz?
41
42A:  When the designing of the .xz format began, the idea was to replace
43    the .lzma format and use the same .lzma suffix. It would have been
44    quite OK to reuse the suffix when there were very few .lzma files
45    around. However, the old .lzma format become popular before the
46    new format was finished. The new format was renamed to .xz but the
47    name of liblzma wasn't changed.
48
49
50Q:  Do XZ Utils support the .7z format?
51
52A:  No. Use 7-Zip (Windows) or p7zip (POSIX-like systems) to handle .7z
53    files.
54
55
56Q:  I have many .tar.7z files. Can I convert them to .tar.xz without
57    spending hours recompressing the data?
58
59A:  In the "extra" directory, there is a script named 7z2lzma.bash which
60    is able to convert some .7z files to the .lzma format (not .xz). It
61    needs the 7za (or 7z) command from p7zip. The script may silently
62    produce corrupt output if certain assumptions are not met, so
63    decompress the resulting .lzma file and compare it against the
64    original before deleting the original file!
65
66
67Q:  I have many .lzma files. Can I quickly convert them to the .xz format?
68
69A:  For now, no. Since XZ Utils supports the .lzma format, it's usually
70    not too bad to keep the old files in the old format. If you want to
71    do the conversion anyway, you need to decompress the .lzma files and
72    then recompress to the .xz format.
73
74    Technically, there is a way to make the conversion relatively fast
75    (roughly twice the time that normal decompression takes). Writing
76    such a tool would take quite a bit time though, and would probably
77    be useful to only a few people. If you really want such a conversion
78    tool, contact Lasse Collin and offer some money.
79
80
81Q:  I have installed xz, but my tar doesn't recognize .tar.xz files.
82    How can I extract .tar.xz files?
83
84A:  xz -dc foo.tar.xz | tar xf -
85
86
87Q:  Can I recover parts of a broken .xz file (e.g. corrupted CD-R)?
88
89A:  It may be possible if the file consists of multiple blocks, which
90    typically is not the case if the file was created in single-threaded
91    mode. There is no recovery program yet.
92
93
94Q:  Is (some part of) XZ Utils patented?
95
96A:  Lasse Collin is not aware of any patents that could affect XZ Utils.
97    However, due to nature of software patents, it's not possible to
98    guarantee that XZ Utils isn't affected by any third party patent(s).
99
100
101Q:  Where can I find documentation about the file format and algorithms?
102
103A:  The .xz format is documented in xz-file-format.txt. It is a container
104    format only, and doesn't include descriptions of any non-trivial
105    filters.
106
107    Documenting LZMA and LZMA2 is planned, but for now, there is no other
108    documentation that the source code. Before you begin, you should know
109    the basics of LZ77 and range coding algorithms. LZMA is based on LZ77,
110    but LZMA is a lot more complex. Range coding is used to compress
111    the final bitstream like Huffman coding is used in Deflate.
112
113
114Q:  I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma?
115
116A:  BCJ filter is called "x86" in liblzma. BCJ2 is not included,
117    because it requires using more than one encoded output stream.
118    A streamable version of BCJ2-style filtering is planned.
119
120
121Q:  I need to use a script that runs "xz -9". On a system with 256 MiB
122    of RAM, xz says that it cannot allocate memory. Can I make the
123    script work without modifying it?
124
125A:  Set a default memory usage limit for compression. You can do it e.g.
126    in a shell initialization script such as ~/.bashrc or /etc/profile:
127
128        XZ_DEFAULTS=--memlimit-compress=150MiB
129        export XZ_DEFAULTS
130
131    xz will then scale the compression settings down so that the given
132    memory usage limit is not reached. This way xz shouldn't run out
133    of memory.
134
135    Check also that memory-related resource limits are high enough.
136    On most systems, "ulimit -a" will show the current resource limits.
137
138
139Q:  How do I create files that can be decompressed with XZ Embedded?
140
141A:  See the documentation in XZ Embedded. In short, something like
142    this is a good start:
143
144        xz --check=crc32 --lzma2=preset=6e,dict=64KiB
145
146    Or if a BCJ filter is needed too, e.g. if compressing
147    a kernel image for PowerPC:
148
149        xz --check=crc32 --powerpc --lzma2=preset=6e,dict=64KiB
150
151    Adjust dictionary size to get a good compromise between
152    compression ratio and decompressor memory usage. Note that
153    in single-call decompression mode of XZ Embedded, a big
154    dictionary doesn't increase memory usage.
155
156
157Q:  Will xz support threaded compression?
158
159A:  It is planned and has been taken into account when designing
160    the .xz file format. Eventually there will probably be three types
161    of threading, each method having its own advantages and disadvantages.
162
163    The simplest method is splitting the uncompressed data into blocks
164    and compressing them in parallel independent from each other.
165    Since the blocks are compressed independently, they can also be
166    decompressed independently. Together with the index feature in .xz,
167    this allows using threads to create .xz files for random-access
168    reading. This also makes threaded decompression possible, although
169    it is not clear if threaded decompression will ever be implemented.
170
171    The independent blocks method has a couple of disadvantages too. It
172    will compress worse than a single-block method. Often the difference
173    is not too big (maybe 1-2 %) but sometimes it can be too big. Also,
174    the memory usage of the compressor increases linearly when adding
175    threads.
176
177    Match finder parallelization is another threading method. It has
178    been in 7-Zip for ages. It doesn't affect compression ratio or
179    memory usage significantly. Among the three threading methods, only
180    this is useful when compressing small files (files that are not
181    significantly bigger than the dictionary). Unfortunately this method
182    scales only to about two CPU cores.
183
184    The third method is pigz-style threading (I use that name, because
185    pigz <http://www.zlib.net/pigz/> uses that method). It doesn't
186    affect compression ratio significantly and scales to many cores.
187    The memory usage scales linearly when threads are added. It isn't
188    significant with pigz, because Deflate uses only 32 KiB dictionary,
189    but with LZMA2 the memory usage will increase dramatically just like
190    with the independent blocks method. There is also a constant
191    computational overhead, which may make pigz-method a bit dull on
192    dual-core compared to the parallel match finder method, but with more
193    cores the overhead is not a big deal anymore.
194
195    Combining the threading methods will be possible and also useful.
196    E.g. combining match finder parallelization with pigz-style threading
197    can cut the memory usage by 50 %.
198
199    It is possible that the single-threaded method will be modified to
200    create files indentical to the pigz-style method. We'll see once
201    pigz-style threading has been implemented in liblzma.
202
203
204Q:  How do I build a program that needs liblzmadec (lzmadec.h)?
205
206A:  liblzmadec is part of LZMA Utils. XZ Utils has liblzma, but no
207    liblzmadec. The code using liblzmadec should be ported to use
208    liblzma instead. If you cannot or don't want to do that, download
209    LZMA Utils from <http://tukaani.org/lzma/>.
210
211
212Q:  The default build of liblzma is too big. How can I make it smaller?
213
214A:  Give --enable-small to the configure script. Use also appropriate
215    --enable or --disable options to include only those filter encoders
216    and decoders and integrity checks that you actually need. Use
217    CFLAGS=-Os (with GCC) or equivalent to tell your compiler to optimize
218    for size. See INSTALL for information about configure options.
219
220    If the result is still too big, take a look at XZ Embedded. It is
221    a separate project, which provides a limited but significantly
222    smaller XZ decoder implementation than XZ Utils. You can find it
223    at <http://tukaani.org/xz/embedded.html>.
224
225