1XZ(1) XZ Utils XZ(1) 2 3 4 5NAME 6 xz, unxz, xzcat, lzma, unlzma, lzcat - Compress or decompress .xz and 7 .lzma files 8 9SYNOPSIS 10 xz [option]... [file]... 11 12 unxz is equivalent to xz --decompress. 13 xzcat is equivalent to xz --decompress --stdout. 14 lzma is equivalent to xz --format=lzma. 15 unlzma is equivalent to xz --format=lzma --decompress. 16 lzcat is equivalent to xz --format=lzma --decompress --stdout. 17 18 When writing scripts that need to decompress files, it is recommended 19 to always use the name xz with appropriate arguments (xz -d or xz -dc) 20 instead of the names unxz and xzcat. 21 22DESCRIPTION 23 xz is a general-purpose data compression tool with command line syntax 24 similar to gzip(1) and bzip2(1). The native file format is the .xz 25 format, but the legacy .lzma format used by LZMA Utils and raw com- 26 pressed streams with no container format headers are also supported. 27 28 xz compresses or decompresses each file according to the selected oper- 29 ation mode. If no files are given or file is -, xz reads from standard 30 input and writes the processed data to standard output. xz will refuse 31 (display an error and skip the file) to write compressed data to stan- 32 dard output if it is a terminal. Similarly, xz will refuse to read 33 compressed data from standard input if it is a terminal. 34 35 Unless --stdout is specified, files other than - are written to a new 36 file whose name is derived from the source file name: 37 38 o When compressing, the suffix of the target file format (.xz or 39 .lzma) is appended to the source filename to get the target file- 40 name. 41 42 o When decompressing, the .xz or .lzma suffix is removed from the 43 filename to get the target filename. xz also recognizes the suf- 44 fixes .txz and .tlz, and replaces them with the .tar suffix. 45 46 If the target file already exists, an error is displayed and the file 47 is skipped. 48 49 Unless writing to standard output, xz will display a warning and skip 50 the file if any of the following applies: 51 52 o File is not a regular file. Symbolic links are not followed, and 53 thus they are not considered to be regular files. 54 55 o File has more than one hard link. 56 57 o File has setuid, setgid, or sticky bit set. 58 59 o The operation mode is set to compress and the file already has a 60 suffix of the target file format (.xz or .txz when compressing to 61 the .xz format, and .lzma or .tlz when compressing to the .lzma for- 62 mat). 63 64 o The operation mode is set to decompress and the file doesn't have a 65 suffix of any of the supported file formats (.xz, .txz, .lzma, or 66 .tlz). 67 68 After successfully compressing or decompressing the file, xz copies the 69 owner, group, permissions, access time, and modification time from the 70 source file to the target file. If copying the group fails, the per- 71 missions are modified so that the target file doesn't become accessible 72 to users who didn't have permission to access the source file. xz 73 doesn't support copying other metadata like access control lists or 74 extended attributes yet. 75 76 Once the target file has been successfully closed, the source file is 77 removed unless --keep was specified. The source file is never removed 78 if the output is written to standard output. 79 80 Sending SIGINFO or SIGUSR1 to the xz process makes it print progress 81 information to standard error. This has only limited use since when 82 standard error is a terminal, using --verbose will display an automati- 83 cally updating progress indicator. 84 85 Memory usage 86 The memory usage of xz varies from a few hundred kilobytes to several 87 gigabytes depending on the compression settings. The settings used 88 when compressing a file determine the memory requirements of the decom- 89 pressor. Typically the decompressor needs 5 % to 20 % of the amount of 90 memory that the compressor needed when creating the file. For example, 91 decompressing a file created with xz -9 currently requires 65 MiB of 92 memory. Still, it is possible to have .xz files that require several 93 gigabytes of memory to decompress. 94 95 Especially users of older systems may find the possibility of very 96 large memory usage annoying. To prevent uncomfortable surprises, xz 97 has a built-in memory usage limiter, which is disabled by default. 98 While some operating systems provide ways to limit the memory usage of 99 processes, relying on it wasn't deemed to be flexible enough (e.g. 100 using ulimit(1) to limit virtual memory tends to cripple mmap(2)). 101 102 The memory usage limiter can be enabled with the command line option 103 --memlimit=limit. Often it is more convenient to enable the limiter by 104 default by setting the environment variable XZ_DEFAULTS, e.g. 105 XZ_DEFAULTS=--memlimit=150MiB. It is possible to set the limits sepa- 106 rately for compression and decompression by using --memlimit-com- 107 press=limit and --memlimit-decompress=limit. Using these two options 108 outside XZ_DEFAULTS is rarely useful because a single run of xz cannot 109 do both compression and decompression and --memlimit=limit (or -M 110 limit) is shorter to type on the command line. 111 112 If the specified memory usage limit is exceeded when decompressing, xz 113 will display an error and decompressing the file will fail. If the 114 limit is exceeded when compressing, xz will try to scale the settings 115 down so that the limit is no longer exceeded (except when using --for- 116 mat=raw or --no-adjust). This way the operation won't fail unless the 117 limit is very small. The scaling of the settings is done in steps that 118 don't match the compression level presets, e.g. if the limit is only 119 slightly less than the amount required for xz -9, the settings will be 120 scaled down only a little, not all the way down to xz -8. 121 122 Concatenation and padding with .xz files 123 It is possible to concatenate .xz files as is. xz will decompress such 124 files as if they were a single .xz file. 125 126 It is possible to insert padding between the concatenated parts or 127 after the last part. The padding must consist of null bytes and the 128 size of the padding must be a multiple of four bytes. This can be use- 129 ful e.g. if the .xz file is stored on a medium that measures file sizes 130 in 512-byte blocks. 131 132 Concatenation and padding are not allowed with .lzma files or raw 133 streams. 134 135OPTIONS 136 Integer suffixes and special values 137 In most places where an integer argument is expected, an optional suf- 138 fix is supported to easily indicate large integers. There must be no 139 space between the integer and the suffix. 140 141 KiB Multiply the integer by 1,024 (2^10). Ki, k, kB, K, and KB are 142 accepted as synonyms for KiB. 143 144 MiB Multiply the integer by 1,048,576 (2^20). Mi, m, M, and MB are 145 accepted as synonyms for MiB. 146 147 GiB Multiply the integer by 1,073,741,824 (2^30). Gi, g, G, and GB 148 are accepted as synonyms for GiB. 149 150 The special value max can be used to indicate the maximum integer value 151 supported by the option. 152 153 Operation mode 154 If multiple operation mode options are given, the last one takes 155 effect. 156 157 -z, --compress 158 Compress. This is the default operation mode when no operation 159 mode option is specified and no other operation mode is implied 160 from the command name (for example, unxz implies --decompress). 161 162 -d, --decompress, --uncompress 163 Decompress. 164 165 -t, --test 166 Test the integrity of compressed files. This option is equiva- 167 lent to --decompress --stdout except that the decompressed data 168 is discarded instead of being written to standard output. No 169 files are created or removed. 170 171 -l, --list 172 Print information about compressed files. No uncompressed out- 173 put is produced, and no files are created or removed. In list 174 mode, the program cannot read the compressed data from standard 175 input or from other unseekable sources. 176 177 The default listing shows basic information about files, one 178 file per line. To get more detailed information, use also the 179 --verbose option. For even more information, use --verbose 180 twice, but note that this may be slow, because getting all the 181 extra information requires many seeks. The width of verbose 182 output exceeds 80 characters, so piping the output to e.g. 183 less -S may be convenient if the terminal isn't wide enough. 184 185 The exact output may vary between xz versions and different 186 locales. For machine-readable output, --robot --list should be 187 used. 188 189 Operation modifiers 190 -k, --keep 191 Don't delete the input files. 192 193 -f, --force 194 This option has several effects: 195 196 o If the target file already exists, delete it before compress- 197 ing or decompressing. 198 199 o Compress or decompress even if the input is a symbolic link 200 to a regular file, has more than one hard link, or has the 201 setuid, setgid, or sticky bit set. The setuid, setgid, and 202 sticky bits are not copied to the target file. 203 204 o When used with --decompress --stdout and xz cannot recognize 205 the type of the source file, copy the source file as is to 206 standard output. This allows xzcat --force to be used like 207 cat(1) for files that have not been compressed with xz. Note 208 that in future, xz might support new compressed file formats, 209 which may make xz decompress more types of files instead of 210 copying them as is to standard output. --format=format can 211 be used to restrict xz to decompress only a single file for- 212 mat. 213 214 -c, --stdout, --to-stdout 215 Write the compressed or decompressed data to standard output 216 instead of a file. This implies --keep. 217 218 --no-sparse 219 Disable creation of sparse files. By default, if decompressing 220 into a regular file, xz tries to make the file sparse if the 221 decompressed data contains long sequences of binary zeros. It 222 also works when writing to standard output as long as standard 223 output is connected to a regular file and certain additional 224 conditions are met to make it safe. Creating sparse files may 225 save disk space and speed up the decompression by reducing the 226 amount of disk I/O. 227 228 -S .suf, --suffix=.suf 229 When compressing, use .suf as the suffix for the target file 230 instead of .xz or .lzma. If not writing to standard output and 231 the source file already has the suffix .suf, a warning is dis- 232 played and the file is skipped. 233 234 When decompressing, recognize files with the suffix .suf in 235 addition to files with the .xz, .txz, .lzma, or .tlz suffix. If 236 the source file has the suffix .suf, the suffix is removed to 237 get the target filename. 238 239 When compressing or decompressing raw streams (--format=raw), 240 the suffix must always be specified unless writing to standard 241 output, because there is no default suffix for raw streams. 242 243 --files[=file] 244 Read the filenames to process from file; if file is omitted, 245 filenames are read from standard input. Filenames must be ter- 246 minated with the newline character. A dash (-) is taken as a 247 regular filename; it doesn't mean standard input. If filenames 248 are given also as command line arguments, they are processed 249 before the filenames read from file. 250 251 --files0[=file] 252 This is identical to --files[=file] except that each filename 253 must be terminated with the null character. 254 255 Basic file format and compression options 256 -F format, --format=format 257 Specify the file format to compress or decompress: 258 259 auto This is the default. When compressing, auto is equiva- 260 lent to xz. When decompressing, the format of the input 261 file is automatically detected. Note that raw streams 262 (created with --format=raw) cannot be auto-detected. 263 264 xz Compress to the .xz file format, or accept only .xz files 265 when decompressing. 266 267 lzma, alone 268 Compress to the legacy .lzma file format, or accept only 269 .lzma files when decompressing. The alternative name 270 alone is provided for backwards compatibility with LZMA 271 Utils. 272 273 raw Compress or uncompress a raw stream (no headers). This 274 is meant for advanced users only. To decode raw streams, 275 you need use --format=raw and explicitly specify the fil- 276 ter chain, which normally would have been stored in the 277 container headers. 278 279 -C check, --check=check 280 Specify the type of the integrity check. The check is calcu- 281 lated from the uncompressed data and stored in the .xz file. 282 This option has an effect only when compressing into the .xz 283 format; the .lzma format doesn't support integrity checks. The 284 integrity check (if any) is verified when the .xz file is decom- 285 pressed. 286 287 Supported check types: 288 289 none Don't calculate an integrity check at all. This is usu- 290 ally a bad idea. This can be useful when integrity of 291 the data is verified by other means anyway. 292 293 crc32 Calculate CRC32 using the polynomial from IEEE-802.3 294 (Ethernet). 295 296 crc64 Calculate CRC64 using the polynomial from ECMA-182. This 297 is the default, since it is slightly better than CRC32 at 298 detecting damaged files and the speed difference is neg- 299 ligible. 300 301 sha256 Calculate SHA-256. This is somewhat slower than CRC32 302 and CRC64. 303 304 Integrity of the .xz headers is always verified with CRC32. It 305 is not possible to change or disable it. 306 307 -0 ... -9 308 Select a compression preset level. The default is -6. If mul- 309 tiple preset levels are specified, the last one takes effect. 310 If a custom filter chain was already specified, setting a com- 311 pression preset level clears the custom filter chain. 312 313 The differences between the presets are more significant than 314 with gzip(1) and bzip2(1). The selected compression settings 315 determine the memory requirements of the decompressor, thus 316 using a too high preset level might make it painful to decom- 317 press the file on an old system with little RAM. Specifically, 318 it's not a good idea to blindly use -9 for everything like it 319 often is with gzip(1) and bzip2(1). 320 321 -0 ... -3 322 These are somewhat fast presets. -0 is sometimes faster 323 than gzip -9 while compressing much better. The higher 324 ones often have speed comparable to bzip2(1) with compa- 325 rable or better compression ratio, although the results 326 depend a lot on the type of data being compressed. 327 328 -4 ... -6 329 Good to very good compression while keeping decompressor 330 memory usage reasonable even for old systems. -6 is the 331 default, which is usually a good choice e.g. for dis- 332 tributing files that need to be decompressible even on 333 systems with only 16 MiB RAM. (-5e or -6e may be worth 334 considering too. See --extreme.) 335 336 -7 ... -9 337 These are like -6 but with higher compressor and decom- 338 pressor memory requirements. These are useful only when 339 compressing files bigger than 8 MiB, 16 MiB, and 32 MiB, 340 respectively. 341 342 On the same hardware, the decompression speed is approximately a 343 constant number of bytes of compressed data per second. In 344 other words, the better the compression, the faster the decom- 345 pression will usually be. This also means that the amount of 346 uncompressed output produced per second can vary a lot. 347 348 The following table summarises the features of the presets: 349 350 Preset DictSize CompCPU CompMem DecMem 351 -0 256 KiB 0 3 MiB 1 MiB 352 -1 1 MiB 1 9 MiB 2 MiB 353 -2 2 MiB 2 17 MiB 3 MiB 354 -3 4 MiB 3 32 MiB 5 MiB 355 -4 4 MiB 4 48 MiB 5 MiB 356 -5 8 MiB 5 94 MiB 9 MiB 357 -6 8 MiB 6 94 MiB 9 MiB 358 -7 16 MiB 6 186 MiB 17 MiB 359 -8 32 MiB 6 370 MiB 33 MiB 360 -9 64 MiB 6 674 MiB 65 MiB 361 362 Column descriptions: 363 364 o DictSize is the LZMA2 dictionary size. It is waste of memory 365 to use a dictionary bigger than the size of the uncompressed 366 file. This is why it is good to avoid using the presets -7 367 ... -9 when there's no real need for them. At -6 and lower, 368 the amount of memory wasted is usually low enough to not mat- 369 ter. 370 371 o CompCPU is a simplified representation of the LZMA2 settings 372 that affect compression speed. The dictionary size affects 373 speed too, so while CompCPU is the same for levels -6 ... -9, 374 higher levels still tend to be a little slower. To get even 375 slower and thus possibly better compression, see --extreme. 376 377 o CompMem contains the compressor memory requirements in the 378 single-threaded mode. It may vary slightly between xz ver- 379 sions. Memory requirements of some of the future multi- 380 threaded modes may be dramatically higher than that of the 381 single-threaded mode. 382 383 o DecMem contains the decompressor memory requirements. That 384 is, the compression settings determine the memory require- 385 ments of the decompressor. The exact decompressor memory 386 usage is slighly more than the LZMA2 dictionary size, but the 387 values in the table have been rounded up to the next full 388 MiB. 389 390 -e, --extreme 391 Use a slower variant of the selected compression preset level 392 (-0 ... -9) to hopefully get a little bit better compression 393 ratio, but with bad luck this can also make it worse. Decom- 394 pressor memory usage is not affected, but compressor memory 395 usage increases a little at preset levels -0 ... -3. 396 397 Since there are two presets with dictionary sizes 4 MiB and 398 8 MiB, the presets -3e and -5e use slightly faster settings 399 (lower CompCPU) than -4e and -6e, respectively. That way no two 400 presets are identical. 401 402 Preset DictSize CompCPU CompMem DecMem 403 -0e 256 KiB 8 4 MiB 1 MiB 404 -1e 1 MiB 8 13 MiB 2 MiB 405 -2e 2 MiB 8 25 MiB 3 MiB 406 -3e 4 MiB 7 48 MiB 5 MiB 407 -4e 4 MiB 8 48 MiB 5 MiB 408 -5e 8 MiB 7 94 MiB 9 MiB 409 -6e 8 MiB 8 94 MiB 9 MiB 410 -7e 16 MiB 8 186 MiB 17 MiB 411 -8e 32 MiB 8 370 MiB 33 MiB 412 -9e 64 MiB 8 674 MiB 65 MiB 413 414 For example, there are a total of four presets that use 8 MiB 415 dictionary, whose order from the fastest to the slowest is -5, 416 -6, -5e, and -6e. 417 418 --fast 419 --best These are somewhat misleading aliases for -0 and -9, respec- 420 tively. These are provided only for backwards compatibility 421 with LZMA Utils. Avoid using these options. 422 423 --memlimit-compress=limit 424 Set a memory usage limit for compression. If this option is 425 specified multiple times, the last one takes effect. 426 427 If the compression settings exceed the limit, xz will adjust the 428 settings downwards so that the limit is no longer exceeded and 429 display a notice that automatic adjustment was done. Such 430 adjustments are not made when compressing with --format=raw or 431 if --no-adjust has been specified. In those cases, an error is 432 displayed and xz will exit with exit status 1. 433 434 The limit can be specified in multiple ways: 435 436 o The limit can be an absolute value in bytes. Using an inte- 437 ger suffix like MiB can be useful. Example: --memlimit-com- 438 press=80MiB 439 440 o The limit can be specified as a percentage of total physical 441 memory (RAM). This can be useful especially when setting the 442 XZ_DEFAULTS environment variable in a shell initialization 443 script that is shared between different computers. That way 444 the limit is automatically bigger on systems with more mem- 445 ory. Example: --memlimit-compress=70% 446 447 o The limit can be reset back to its default value by setting 448 it to 0. This is currently equivalent to setting the limit 449 to max (no memory usage limit). Once multithreading support 450 has been implemented, there may be a difference between 0 and 451 max for the multithreaded case, so it is recommended to use 0 452 instead of max until the details have been decided. 453 454 See also the section Memory usage. 455 456 --memlimit-decompress=limit 457 Set a memory usage limit for decompression. This also affects 458 the --list mode. If the operation is not possible without 459 exceeding the limit, xz will display an error and decompressing 460 the file will fail. See --memlimit-compress=limit for possible 461 ways to specify the limit. 462 463 -M limit, --memlimit=limit, --memory=limit 464 This is equivalent to specifying --memlimit-compress=limit 465 --memlimit-decompress=limit. 466 467 --no-adjust 468 Display an error and exit if the compression settings exceed the 469 memory usage limit. The default is to adjust the settings down- 470 wards so that the memory usage limit is not exceeded. Automatic 471 adjusting is always disabled when creating raw streams (--for- 472 mat=raw). 473 474 -T threads, --threads=threads 475 Specify the number of worker threads to use. The actual number 476 of threads can be less than threads if using more threads would 477 exceed the memory usage limit. 478 479 Multithreaded compression and decompression are not implemented 480 yet, so this option has no effect for now. 481 482 As of writing (2010-09-27), it hasn't been decided if threads 483 will be used by default on multicore systems once support for 484 threading has been implemented. Comments are welcome. The com- 485 plicating factor is that using many threads will increase the 486 memory usage dramatically. Note that if multithreading will be 487 the default, it will probably be done so that single-threaded 488 and multithreaded modes produce the same output, so compression 489 ratio won't be significantly affected if threading will be 490 enabled by default. 491 492 Custom compressor filter chains 493 A custom filter chain allows specifying the compression settings in 494 detail instead of relying on the settings associated to the preset lev- 495 els. When a custom filter chain is specified, the compression preset 496 level options (-0 ... -9 and --extreme) are silently ignored. 497 498 A filter chain is comparable to piping on the command line. When com- 499 pressing, the uncompressed input goes to the first filter, whose output 500 goes to the next filter (if any). The output of the last filter gets 501 written to the compressed file. The maximum number of filters in the 502 chain is four, but typically a filter chain has only one or two fil- 503 ters. 504 505 Many filters have limitations on where they can be in the filter chain: 506 some filters can work only as the last filter in the chain, some only 507 as a non-last filter, and some work in any position in the chain. 508 Depending on the filter, this limitation is either inherent to the fil- 509 ter design or exists to prevent security issues. 510 511 A custom filter chain is specified by using one or more filter options 512 in the order they are wanted in the filter chain. That is, the order 513 of filter options is significant! When decoding raw streams (--for- 514 mat=raw), the filter chain is specified in the same order as it was 515 specified when compressing. 516 517 Filters take filter-specific options as a comma-separated list. Extra 518 commas in options are ignored. Every option has a default value, so 519 you need to specify only those you want to change. 520 521 --lzma1[=options] 522 --lzma2[=options] 523 Add LZMA1 or LZMA2 filter to the filter chain. These filters 524 can be used only as the last filter in the chain. 525 526 LZMA1 is a legacy filter, which is supported almost solely due 527 to the legacy .lzma file format, which supports only LZMA1. 528 LZMA2 is an updated version of LZMA1 to fix some practical 529 issues of LZMA1. The .xz format uses LZMA2 and doesn't support 530 LZMA1 at all. Compression speed and ratios of LZMA1 and LZMA2 531 are practically the same. 532 533 LZMA1 and LZMA2 share the same set of options: 534 535 preset=preset 536 Reset all LZMA1 or LZMA2 options to preset. Preset con- 537 sist of an integer, which may be followed by single-let- 538 ter preset modifiers. The integer can be from 0 to 9, 539 matching the command line options -0 ... -9. The only 540 supported modifier is currently e, which matches 541 --extreme. The default preset is 6, from which the 542 default values for the rest of the LZMA1 or LZMA2 options 543 are taken. 544 545 dict=size 546 Dictionary (history buffer) size indicates how many bytes 547 of the recently processed uncompressed data is kept in 548 memory. The algorithm tries to find repeating byte 549 sequences (matches) in the uncompressed data, and replace 550 them with references to the data currently in the dictio- 551 nary. The bigger the dictionary, the higher is the 552 chance to find a match. Thus, increasing dictionary size 553 usually improves compression ratio, but a dictionary big- 554 ger than the uncompressed file is waste of memory. 555 556 Typical dictionary size is from 64 KiB to 64 MiB. The 557 minimum is 4 KiB. The maximum for compression is cur- 558 rently 1.5 GiB (1536 MiB). The decompressor already sup- 559 ports dictionaries up to one byte less than 4 GiB, which 560 is the maximum for the LZMA1 and LZMA2 stream formats. 561 562 Dictionary size and match finder (mf) together determine 563 the memory usage of the LZMA1 or LZMA2 encoder. The same 564 (or bigger) dictionary size is required for decompressing 565 that was used when compressing, thus the memory usage of 566 the decoder is determined by the dictionary size used 567 when compressing. The .xz headers store the dictionary 568 size either as 2^n or 2^n + 2^(n-1), so these sizes are 569 somewhat preferred for compression. Other sizes will get 570 rounded up when stored in the .xz headers. 571 572 lc=lc Specify the number of literal context bits. The minimum 573 is 0 and the maximum is 4; the default is 3. In addi- 574 tion, the sum of lc and lp must not exceed 4. 575 576 All bytes that cannot be encoded as matches are encoded 577 as literals. That is, literals are simply 8-bit bytes 578 that are encoded one at a time. 579 580 The literal coding makes an assumption that the highest 581 lc bits of the previous uncompressed byte correlate with 582 the next byte. E.g. in typical English text, an upper- 583 case letter is often followed by a lower-case letter, and 584 a lower-case letter is usually followed by another lower- 585 case letter. In the US-ASCII character set, the highest 586 three bits are 010 for upper-case letters and 011 for 587 lower-case letters. When lc is at least 3, the literal 588 coding can take advantage of this property in the uncom- 589 pressed data. 590 591 The default value (3) is usually good. If you want maxi- 592 mum compression, test lc=4. Sometimes it helps a little, 593 and sometimes it makes compression worse. If it makes it 594 worse, test e.g. lc=2 too. 595 596 lp=lp Specify the number of literal position bits. The minimum 597 is 0 and the maximum is 4; the default is 0. 598 599 Lp affects what kind of alignment in the uncompressed 600 data is assumed when encoding literals. See pb below for 601 more information about alignment. 602 603 pb=pb Specify the number of position bits. The minimum is 0 604 and the maximum is 4; the default is 2. 605 606 Pb affects what kind of alignment in the uncompressed 607 data is assumed in general. The default means four-byte 608 alignment (2^pb=2^2=4), which is often a good choice when 609 there's no better guess. 610 611 When the aligment is known, setting pb accordingly may 612 reduce the file size a little. E.g. with text files hav- 613 ing one-byte alignment (US-ASCII, ISO-8859-*, UTF-8), 614 setting pb=0 can improve compression slightly. For 615 UTF-16 text, pb=1 is a good choice. If the alignment is 616 an odd number like 3 bytes, pb=0 might be the best 617 choice. 618 619 Even though the assumed alignment can be adjusted with pb 620 and lp, LZMA1 and LZMA2 still slightly favor 16-byte 621 alignment. It might be worth taking into account when 622 designing file formats that are likely to be often com- 623 pressed with LZMA1 or LZMA2. 624 625 mf=mf Match finder has a major effect on encoder speed, memory 626 usage, and compression ratio. Usually Hash Chain match 627 finders are faster than Binary Tree match finders. The 628 default depends on the preset: 0 uses hc3, 1-3 use hc4, 629 and the rest use bt4. 630 631 The following match finders are supported. The memory 632 usage formulas below are rough approximations, which are 633 closest to the reality when dict is a power of two. 634 635 hc3 Hash Chain with 2- and 3-byte hashing 636 Minimum value for nice: 3 637 Memory usage: 638 dict * 7.5 (if dict <= 16 MiB); 639 dict * 5.5 + 64 MiB (if dict > 16 MiB) 640 641 hc4 Hash Chain with 2-, 3-, and 4-byte hashing 642 Minimum value for nice: 4 643 Memory usage: 644 dict * 7.5 (if dict <= 32 MiB); 645 dict * 6.5 (if dict > 32 MiB) 646 647 bt2 Binary Tree with 2-byte hashing 648 Minimum value for nice: 2 649 Memory usage: dict * 9.5 650 651 bt3 Binary Tree with 2- and 3-byte hashing 652 Minimum value for nice: 3 653 Memory usage: 654 dict * 11.5 (if dict <= 16 MiB); 655 dict * 9.5 + 64 MiB (if dict > 16 MiB) 656 657 bt4 Binary Tree with 2-, 3-, and 4-byte hashing 658 Minimum value for nice: 4 659 Memory usage: 660 dict * 11.5 (if dict <= 32 MiB); 661 dict * 10.5 (if dict > 32 MiB) 662 663 mode=mode 664 Compression mode specifies the method to analyze the data 665 produced by the match finder. Supported modes are fast 666 and normal. The default is fast for presets 0-3 and nor- 667 mal for presets 4-9. 668 669 Usually fast is used with Hash Chain match finders and 670 normal with Binary Tree match finders. This is also what 671 the presets do. 672 673 nice=nice 674 Specify what is considered to be a nice length for a 675 match. Once a match of at least nice bytes is found, the 676 algorithm stops looking for possibly better matches. 677 678 Nice can be 2-273 bytes. Higher values tend to give bet- 679 ter compression ratio at the expense of speed. The 680 default depends on the preset. 681 682 depth=depth 683 Specify the maximum search depth in the match finder. 684 The default is the special value of 0, which makes the 685 compressor determine a reasonable depth from mf and nice. 686 687 Reasonable depth for Hash Chains is 4-100 and 16-1000 for 688 Binary Trees. Using very high values for depth can make 689 the encoder extremely slow with some files. Avoid set- 690 ting the depth over 1000 unless you are prepared to 691 interrupt the compression in case it is taking far too 692 long. 693 694 When decoding raw streams (--format=raw), LZMA2 needs only the 695 dictionary size. LZMA1 needs also lc, lp, and pb. 696 697 --x86[=options] 698 --powerpc[=options] 699 --ia64[=options] 700 --arm[=options] 701 --armthumb[=options] 702 --sparc[=options] 703 Add a branch/call/jump (BCJ) filter to the filter chain. These 704 filters can be used only as a non-last filter in the filter 705 chain. 706 707 A BCJ filter converts relative addresses in the machine code to 708 their absolute counterparts. This doesn't change the size of 709 the data, but it increases redundancy, which can help LZMA2 to 710 produce 0-15 % smaller .xz file. The BCJ filters are always 711 reversible, so using a BCJ filter for wrong type of data doesn't 712 cause any data loss, although it may make the compression ratio 713 slightly worse. 714 715 It is fine to apply a BCJ filter on a whole executable; there's 716 no need to apply it only on the executable section. Applying a 717 BCJ filter on an archive that contains both executable and non- 718 executable files may or may not give good results, so it gener- 719 ally isn't good to blindly apply a BCJ filter when compressing 720 binary packages for distribution. 721 722 These BCJ filters are very fast and use insignificant amount of 723 memory. If a BCJ filter improves compression ratio of a file, 724 it can improve decompression speed at the same time. This is 725 because, on the same hardware, the decompression speed of LZMA2 726 is roughly a fixed number of bytes of compressed data per sec- 727 ond. 728 729 These BCJ filters have known problems related to the compression 730 ratio: 731 732 o Some types of files containing executable code (e.g. object 733 files, static libraries, and Linux kernel modules) have the 734 addresses in the instructions filled with filler values. 735 These BCJ filters will still do the address conversion, which 736 will make the compression worse with these files. 737 738 o Applying a BCJ filter on an archive containing multiple simi- 739 lar executables can make the compression ratio worse than not 740 using a BCJ filter. This is because the BCJ filter doesn't 741 detect the boundaries of the executable files, and doesn't 742 reset the address conversion counter for each executable. 743 744 Both of the above problems will be fixed in the future in a new 745 filter. The old BCJ filters will still be useful in embedded 746 systems, because the decoder of the new filter will be bigger 747 and use more memory. 748 749 Different instruction sets have have different alignment: 750 751 Filter Alignment Notes 752 x86 1 32-bit or 64-bit x86 753 PowerPC 4 Big endian only 754 ARM 4 Little endian only 755 ARM-Thumb 2 Little endian only 756 IA-64 16 Big or little endian 757 SPARC 4 Big or little endian 758 759 Since the BCJ-filtered data is usually compressed with LZMA2, 760 the compression ratio may be improved slightly if the LZMA2 761 options are set to match the alignment of the selected BCJ fil- 762 ter. For example, with the IA-64 filter, it's good to set pb=4 763 with LZMA2 (2^4=16). The x86 filter is an exception; it's usu- 764 ally good to stick to LZMA2's default four-byte alignment when 765 compressing x86 executables. 766 767 All BCJ filters support the same options: 768 769 start=offset 770 Specify the start offset that is used when converting 771 between relative and absolute addresses. The offset must 772 be a multiple of the alignment of the filter (see the ta- 773 ble above). The default is zero. In practice, the 774 default is good; specifying a custom offset is almost 775 never useful. 776 777 --delta[=options] 778 Add the Delta filter to the filter chain. The Delta filter can 779 be only used as a non-last filter in the filter chain. 780 781 Currently only simple byte-wise delta calculation is supported. 782 It can be useful when compressing e.g. uncompressed bitmap 783 images or uncompressed PCM audio. However, special purpose 784 algorithms may give significantly better results than Delta + 785 LZMA2. This is true especially with audio, which compresses 786 faster and better e.g. with flac(1). 787 788 Supported options: 789 790 dist=distance 791 Specify the distance of the delta calculation in bytes. 792 distance must be 1-256. The default is 1. 793 794 For example, with dist=2 and eight-byte input A1 B1 A2 B3 795 A3 B5 A4 B7, the output will be A1 B1 01 02 01 02 01 02. 796 797 Other options 798 -q, --quiet 799 Suppress warnings and notices. Specify this twice to suppress 800 errors too. This option has no effect on the exit status. That 801 is, even if a warning was suppressed, the exit status to indi- 802 cate a warning is still used. 803 804 -v, --verbose 805 Be verbose. If standard error is connected to a terminal, xz 806 will display a progress indicator. Specifying --verbose twice 807 will give even more verbose output. 808 809 The progress indicator shows the following information: 810 811 o Completion percentage is shown if the size of the input file 812 is known. That is, the percentage cannot be shown in pipes. 813 814 o Amount of compressed data produced (compressing) or consumed 815 (decompressing). 816 817 o Amount of uncompressed data consumed (compressing) or pro- 818 duced (decompressing). 819 820 o Compression ratio, which is calculated by dividing the amount 821 of compressed data processed so far by the amount of uncom- 822 pressed data processed so far. 823 824 o Compression or decompression speed. This is measured as the 825 amount of uncompressed data consumed (compression) or pro- 826 duced (decompression) per second. It is shown after a few 827 seconds have passed since xz started processing the file. 828 829 o Elapsed time in the format M:SS or H:MM:SS. 830 831 o Estimated remaining time is shown only when the size of the 832 input file is known and a couple of seconds have already 833 passed since xz started processing the file. The time is 834 shown in a less precise format which never has any colons, 835 e.g. 2 min 30 s. 836 837 When standard error is not a terminal, --verbose will make xz 838 print the filename, compressed size, uncompressed size, compres- 839 sion ratio, and possibly also the speed and elapsed time on a 840 single line to standard error after compressing or decompressing 841 the file. The speed and elapsed time are included only when the 842 operation took at least a few seconds. If the operation didn't 843 finish, e.g. due to user interruption, also the completion per- 844 centage is printed if the size of the input file is known. 845 846 -Q, --no-warn 847 Don't set the exit status to 2 even if a condition worth a warn- 848 ing was detected. This option doesn't affect the verbosity 849 level, thus both --quiet and --no-warn have to be used to not 850 display warnings and to not alter the exit status. 851 852 --robot 853 Print messages in a machine-parsable format. This is intended 854 to ease writing frontends that want to use xz instead of 855 liblzma, which may be the case with various scripts. The output 856 with this option enabled is meant to be stable across xz 857 releases. See the section ROBOT MODE for details. 858 859 --info-memory 860 Display, in human-readable format, how much physical memory 861 (RAM) xz thinks the system has and the memory usage limits for 862 compression and decompression, and exit successfully. 863 864 -h, --help 865 Display a help message describing the most commonly used 866 options, and exit successfully. 867 868 -H, --long-help 869 Display a help message describing all features of xz, and exit 870 successfully 871 872 -V, --version 873 Display the version number of xz and liblzma in human readable 874 format. To get machine-parsable output, specify --robot before 875 --version. 876 877ROBOT MODE 878 The robot mode is activated with the --robot option. It makes the out- 879 put of xz easier to parse by other programs. Currently --robot is sup- 880 ported only together with --version, --info-memory, and --list. It 881 will be supported for normal compression and decompression in the 882 future. 883 884 Version 885 xz --robot --version will print the version number of xz and liblzma in 886 the following format: 887 888 XZ_VERSION=XYYYZZZS 889 LIBLZMA_VERSION=XYYYZZZS 890 891 X Major version. 892 893 YYY Minor version. Even numbers are stable. Odd numbers are alpha 894 or beta versions. 895 896 ZZZ Patch level for stable releases or just a counter for develop- 897 ment releases. 898 899 S Stability. 0 is alpha, 1 is beta, and 2 is stable. S should be 900 always 2 when YYY is even. 901 902 XYYYZZZS are the same on both lines if xz and liblzma are from the same 903 XZ Utils release. 904 905 Examples: 4.999.9beta is 49990091 and 5.0.0 is 50000002. 906 907 Memory limit information 908 xz --robot --info-memory prints a single line with three tab-separated 909 columns: 910 911 1. Total amount of physical memory (RAM) in bytes 912 913 2. Memory usage limit for compression in bytes. A special value of 914 zero indicates the default setting, which for single-threaded mode 915 is the same as no limit. 916 917 3. Memory usage limit for decompression in bytes. A special value of 918 zero indicates the default setting, which for single-threaded mode 919 is the same as no limit. 920 921 In the future, the output of xz --robot --info-memory may have more 922 columns, but never more than a single line. 923 924 List mode 925 xz --robot --list uses tab-separated output. The first column of every 926 line has a string that indicates the type of the information found on 927 that line: 928 929 name This is always the first line when starting to list a file. The 930 second column on the line is the filename. 931 932 file This line contains overall information about the .xz file. This 933 line is always printed after the name line. 934 935 stream This line type is used only when --verbose was specified. There 936 are as many stream lines as there are streams in the .xz file. 937 938 block This line type is used only when --verbose was specified. There 939 are as many block lines as there are blocks in the .xz file. 940 The block lines are shown after all the stream lines; different 941 line types are not interleaved. 942 943 summary 944 This line type is used only when --verbose was specified twice. 945 This line is printed after all block lines. Like the file line, 946 the summary line contains overall information about the .xz 947 file. 948 949 totals This line is always the very last line of the list output. It 950 shows the total counts and sizes. 951 952 The columns of the file lines: 953 2. Number of streams in the file 954 3. Total number of blocks in the stream(s) 955 4. Compressed size of the file 956 5. Uncompressed size of the file 957 6. Compression ratio, for example 0.123. If ratio is over 958 9.999, three dashes (---) are displayed instead of the 959 ratio. 960 7. Comma-separated list of integrity check names. The follow- 961 ing strings are used for the known check types: None, CRC32, 962 CRC64, and SHA-256. For unknown check types, Unknown-N is 963 used, where N is the Check ID as a decimal number (one or 964 two digits). 965 8. Total size of stream padding in the file 966 967 The columns of the stream lines: 968 2. Stream number (the first stream is 1) 969 3. Number of blocks in the stream 970 4. Compressed start offset 971 5. Uncompressed start offset 972 6. Compressed size (does not include stream padding) 973 7. Uncompressed size 974 8. Compression ratio 975 9. Name of the integrity check 976 10. Size of stream padding 977 978 The columns of the block lines: 979 2. Number of the stream containing this block 980 3. Block number relative to the beginning of the stream (the 981 first block is 1) 982 4. Block number relative to the beginning of the file 983 5. Compressed start offset relative to the beginning of the 984 file 985 6. Uncompressed start offset relative to the beginning of the 986 file 987 7. Total compressed size of the block (includes headers) 988 8. Uncompressed size 989 9. Compression ratio 990 10. Name of the integrity check 991 992 If --verbose was specified twice, additional columns are included on 993 the block lines. These are not displayed with a single --verbose, 994 because getting this information requires many seeks and can thus be 995 slow: 996 11. Value of the integrity check in hexadecimal 997 12. Block header size 998 13. Block flags: c indicates that compressed size is present, 999 and u indicates that uncompressed size is present. If the 1000 flag is not set, a dash (-) is shown instead to keep the 1001 string length fixed. New flags may be added to the end of 1002 the string in the future. 1003 14. Size of the actual compressed data in the block (this 1004 excludes the block header, block padding, and check fields) 1005 15. Amount of memory (in bytes) required to decompress this 1006 block with this xz version 1007 16. Filter chain. Note that most of the options used at com- 1008 pression time cannot be known, because only the options that 1009 are needed for decompression are stored in the .xz headers. 1010 1011 The columns of the totals line: 1012 2. Number of streams 1013 3. Number of blocks 1014 4. Compressed size 1015 5. Uncompressed size 1016 6. Average compression ratio 1017 7. Comma-separated list of integrity check names that were 1018 present in the files 1019 8. Stream padding size 1020 9. Number of files. This is here to keep the order of the ear- 1021 lier columns the same as on file lines. 1022 1023 If --verbose was specified twice, additional columns are included on 1024 the totals line: 1025 10. Maximum amount of memory (in bytes) required to decompress 1026 the files with this xz version 1027 11. yes or no indicating if all block headers have both com- 1028 pressed size and uncompressed size stored in them 1029 1030 Future versions may add new line types and new columns can be added to 1031 the existing line types, but the existing columns won't be changed. 1032 1033EXIT STATUS 1034 0 All is good. 1035 1036 1 An error occurred. 1037 1038 2 Something worth a warning occurred, but no actual errors 1039 occurred. 1040 1041 Notices (not warnings or errors) printed on standard error don't affect 1042 the exit status. 1043 1044ENVIRONMENT 1045 xz parses space-separated lists of options from the environment vari- 1046 ables XZ_DEFAULTS and XZ_OPT, in this order, before parsing the options 1047 from the command line. Note that only options are parsed from the 1048 environment variables; all non-options are silently ignored. Parsing 1049 is done with getopt_long(3) which is used also for the command line 1050 arguments. 1051 1052 XZ_DEFAULTS 1053 User-specific or system-wide default options. Typically this is 1054 set in a shell initialization script to enable xz's memory usage 1055 limiter by default. Excluding shell initialization scripts and 1056 similar special cases, scripts must never set or unset 1057 XZ_DEFAULTS. 1058 1059 XZ_OPT This is for passing options to xz when it is not possible to set 1060 the options directly on the xz command line. This is the case 1061 e.g. when xz is run by a script or tool, e.g. GNU tar(1): 1062 1063 XZ_OPT=-2v tar caf foo.tar.xz foo 1064 1065 Scripts may use XZ_OPT e.g. to set script-specific default com- 1066 pression options. It is still recommended to allow users to 1067 override XZ_OPT if that is reasonable, e.g. in sh(1) scripts one 1068 may use something like this: 1069 1070 XZ_OPT=${XZ_OPT-"-7e"} 1071 export XZ_OPT 1072 1073LZMA UTILS COMPATIBILITY 1074 The command line syntax of xz is practically a superset of lzma, 1075 unlzma, and lzcat as found from LZMA Utils 4.32.x. In most cases, it 1076 is possible to replace LZMA Utils with XZ Utils without breaking exist- 1077 ing scripts. There are some incompatibilities though, which may some- 1078 times cause problems. 1079 1080 Compression preset levels 1081 The numbering of the compression level presets is not identical in xz 1082 and LZMA Utils. The most important difference is how dictionary sizes 1083 are mapped to different presets. Dictionary size is roughly equal to 1084 the decompressor memory usage. 1085 1086 Level xz LZMA Utils 1087 -0 256 KiB N/A 1088 -1 1 MiB 64 KiB 1089 -2 2 MiB 1 MiB 1090 -3 4 MiB 512 KiB 1091 -4 4 MiB 1 MiB 1092 1093 -5 8 MiB 2 MiB 1094 -6 8 MiB 4 MiB 1095 -7 16 MiB 8 MiB 1096 -8 32 MiB 16 MiB 1097 -9 64 MiB 32 MiB 1098 1099 The dictionary size differences affect the compressor memory usage too, 1100 but there are some other differences between LZMA Utils and XZ Utils, 1101 which make the difference even bigger: 1102 1103 Level xz LZMA Utils 4.32.x 1104 -0 3 MiB N/A 1105 -1 9 MiB 2 MiB 1106 -2 17 MiB 12 MiB 1107 -3 32 MiB 12 MiB 1108 -4 48 MiB 16 MiB 1109 -5 94 MiB 26 MiB 1110 -6 94 MiB 45 MiB 1111 -7 186 MiB 83 MiB 1112 -8 370 MiB 159 MiB 1113 -9 674 MiB 311 MiB 1114 1115 The default preset level in LZMA Utils is -7 while in XZ Utils it is 1116 -6, so both use an 8 MiB dictionary by default. 1117 1118 Streamed vs. non-streamed .lzma files 1119 The uncompressed size of the file can be stored in the .lzma header. 1120 LZMA Utils does that when compressing regular files. The alternative 1121 is to mark that uncompressed size is unknown and use end-of-payload 1122 marker to indicate where the decompressor should stop. LZMA Utils uses 1123 this method when uncompressed size isn't known, which is the case for 1124 example in pipes. 1125 1126 xz supports decompressing .lzma files with or without end-of-payload 1127 marker, but all .lzma files created by xz will use end-of-payload 1128 marker and have uncompressed size marked as unknown in the .lzma 1129 header. This may be a problem in some uncommon situations. For exam- 1130 ple, a .lzma decompressor in an embedded device might work only with 1131 files that have known uncompressed size. If you hit this problem, you 1132 need to use LZMA Utils or LZMA SDK to create .lzma files with known 1133 uncompressed size. 1134 1135 Unsupported .lzma files 1136 The .lzma format allows lc values up to 8, and lp values up to 4. LZMA 1137 Utils can decompress files with any lc and lp, but always creates files 1138 with lc=3 and lp=0. Creating files with other lc and lp is possible 1139 with xz and with LZMA SDK. 1140 1141 The implementation of the LZMA1 filter in liblzma requires that the sum 1142 of lc and lp must not exceed 4. Thus, .lzma files, which exceed this 1143 limitation, cannot be decompressed with xz. 1144 1145 LZMA Utils creates only .lzma files which have a dictionary size of 2^n 1146 (a power of 2) but accepts files with any dictionary size. liblzma 1147 accepts only .lzma files which have a dictionary size of 2^n or 2^n + 1148 2^(n-1). This is to decrease false positives when detecting .lzma 1149 files. 1150 1151 These limitations shouldn't be a problem in practice, since practically 1152 all .lzma files have been compressed with settings that liblzma will 1153 accept. 1154 1155 Trailing garbage 1156 When decompressing, LZMA Utils silently ignore everything after the 1157 first .lzma stream. In most situations, this is a bug. This also 1158 means that LZMA Utils don't support decompressing concatenated .lzma 1159 files. 1160 1161 If there is data left after the first .lzma stream, xz considers the 1162 file to be corrupt. This may break obscure scripts which have assumed 1163 that trailing garbage is ignored. 1164 1165NOTES 1166 Compressed output may vary 1167 The exact compressed output produced from the same uncompressed input 1168 file may vary between XZ Utils versions even if compression options are 1169 identical. This is because the encoder can be improved (faster or bet- 1170 ter compression) without affecting the file format. The output can 1171 vary even between different builds of the same XZ Utils version, if 1172 different build options are used. 1173 1174 The above means that implementing --rsyncable to create rsyncable .xz 1175 files is not going to happen without freezing a part of the encoder 1176 implementation, which can then be used with --rsyncable. 1177 1178 Embedded .xz decompressors 1179 Embedded .xz decompressor implementations like XZ Embedded don't neces- 1180 sarily support files created with integrity check types other than none 1181 and crc32. Since the default is --check=crc64, you must use 1182 --check=none or --check=crc32 when creating files for embedded systems. 1183 1184 Outside embedded systems, all .xz format decompressors support all the 1185 check types, or at least are able to decompress the file without veri- 1186 fying the integrity check if the particular check is not supported. 1187 1188 XZ Embedded supports BCJ filters, but only with the default start off- 1189 set. 1190 1191EXAMPLES 1192 Basics 1193 Compress the file foo into foo.xz using the default compression level 1194 (-6), and remove foo if compression is successful: 1195 1196 xz foo 1197 1198 Decompress bar.xz into bar and don't remove bar.xz even if decompres- 1199 sion is successful: 1200 1201 xz -dk bar.xz 1202 1203 Create baz.tar.xz with the preset -4e (-4 --extreme), which is slower 1204 than e.g. the default -6, but needs less memory for compression and 1205 decompression (48 MiB and 5 MiB, respectively): 1206 1207 tar cf - baz | xz -4e > baz.tar.xz 1208 1209 A mix of compressed and uncompressed files can be decompressed to stan- 1210 dard output with a single command: 1211 1212 xz -dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt 1213 1214 Parallel compression of many files 1215 On GNU and *BSD, find(1) and xargs(1) can be used to parallelize com- 1216 pression of many files: 1217 1218 find . -type f \! -name '*.xz' -print0 \ 1219 | xargs -0r -P4 -n16 xz -T1 1220 1221 The -P option to xargs(1) sets the number of parallel xz processes. 1222 The best value for the -n option depends on how many files there are to 1223 be compressed. If there are only a couple of files, the value should 1224 probably be 1; with tens of thousands of files, 100 or even more may be 1225 appropriate to reduce the number of xz processes that xargs(1) will 1226 eventually create. 1227 1228 The option -T1 for xz is there to force it to single-threaded mode, 1229 because xargs(1) is used to control the amount of parallelization. 1230 1231 Robot mode 1232 Calculate how many bytes have been saved in total after compressing 1233 multiple files: 1234 1235 xz --robot --list *.xz | awk '/^totals/{print $5-$4}' 1236 1237 A script may want to know that it is using new enough xz. The follow- 1238 ing sh(1) script checks that the version number of the xz tool is at 1239 least 5.0.0. This method is compatible with old beta versions, which 1240 didn't support the --robot option: 1241 1242 if ! eval "$(xz --robot --version 2> /dev/null)" || 1243 [ "$XZ_VERSION" -lt 50000002 ]; then 1244 echo "Your xz is too old." 1245 fi 1246 unset XZ_VERSION LIBLZMA_VERSION 1247 1248 Set a memory usage limit for decompression using XZ_OPT, but if a limit 1249 has already been set, don't increase it: 1250 1251 NEWLIM=$((123 << 20)) # 123 MiB 1252 OLDLIM=$(xz --robot --info-memory | cut -f3) 1253 if [ $OLDLIM -eq 0 -o $OLDLIM -gt $NEWLIM ]; then 1254 XZ_OPT="$XZ_OPT --memlimit-decompress=$NEWLIM" 1255 export XZ_OPT 1256 fi 1257 1258 Custom compressor filter chains 1259 The simplest use for custom filter chains is customizing a LZMA2 pre- 1260 set. This can be useful, because the presets cover only a subset of 1261 the potentially useful combinations of compression settings. 1262 1263 The CompCPU columns of the tables from the descriptions of the options 1264 -0 ... -9 and --extreme are useful when customizing LZMA2 presets. 1265 Here are the relevant parts collected from those two tables: 1266 1267 Preset CompCPU 1268 -0 0 1269 -1 1 1270 -2 2 1271 -3 3 1272 -4 4 1273 -5 5 1274 -6 6 1275 -5e 7 1276 -6e 8 1277 1278 If you know that a file requires somewhat big dictionary (e.g. 32 MiB) 1279 to compress well, but you want to compress it quicker than xz -8 would 1280 do, a preset with a low CompCPU value (e.g. 1) can be modified to use a 1281 bigger dictionary: 1282 1283 xz --lzma2=preset=1,dict=32MiB foo.tar 1284 1285 With certain files, the above command may be faster than xz -6 while 1286 compressing significantly better. However, it must be emphasized that 1287 only some files benefit from a big dictionary while keeping the CompCPU 1288 value low. The most obvious situation, where a big dictionary can help 1289 a lot, is an archive containing very similar files of at least a few 1290 megabytes each. The dictionary size has to be significantly bigger 1291 than any individual file to allow LZMA2 to take full advantage of the 1292 similarities between consecutive files. 1293 1294 If very high compressor and decompressor memory usage is fine, and the 1295 file being compressed is at least several hundred megabytes, it may be 1296 useful to use an even bigger dictionary than the 64 MiB that xz -9 1297 would use: 1298 1299 xz -vv --lzma2=dict=192MiB big_foo.tar 1300 1301 Using -vv (--verbose --verbose) like in the above example can be useful 1302 to see the memory requirements of the compressor and decompressor. 1303 Remember that using a dictionary bigger than the size of the uncom- 1304 pressed file is waste of memory, so the above command isn't useful for 1305 small files. 1306 1307 Sometimes the compression time doesn't matter, but the decompressor 1308 memory usage has to be kept low e.g. to make it possible to decompress 1309 the file on an embedded system. The following command uses -6e (-6 1310 --extreme) as a base and sets the dictionary to only 64 KiB. The 1311 resulting file can be decompressed with XZ Embedded (that's why there 1312 is --check=crc32) using about 100 KiB of memory. 1313 1314 xz --check=crc32 --lzma2=preset=6e,dict=64KiB foo 1315 1316 If you want to squeeze out as many bytes as possible, adjusting the 1317 number of literal context bits (lc) and number of position bits (pb) 1318 can sometimes help. Adjusting the number of literal position bits (lp) 1319 might help too, but usually lc and pb are more important. E.g. a 1320 source code archive contains mostly US-ASCII text, so something like 1321 the following might give slightly (like 0.1 %) smaller file than xz -6e 1322 (try also without lc=4): 1323 1324 xz --lzma2=preset=6e,pb=0,lc=4 source_code.tar 1325 1326 Using another filter together with LZMA2 can improve compression with 1327 certain file types. E.g. to compress a x86-32 or x86-64 shared library 1328 using the x86 BCJ filter: 1329 1330 xz --x86 --lzma2 libfoo.so 1331 1332 Note that the order of the filter options is significant. If --x86 is 1333 specified after --lzma2, xz will give an error, because there cannot be 1334 any filter after LZMA2, and also because the x86 BCJ filter cannot be 1335 used as the last filter in the chain. 1336 1337 The Delta filter together with LZMA2 can give good results with bitmap 1338 images. It should usually beat PNG, which has a few more advanced fil- 1339 ters than simple delta but uses Deflate for the actual compression. 1340 1341 The image has to be saved in uncompressed format, e.g. as uncompressed 1342 TIFF. The distance parameter of the Delta filter is set to match the 1343 number of bytes per pixel in the image. E.g. 24-bit RGB bitmap needs 1344 dist=3, and it is also good to pass pb=0 to LZMA2 to accommodate the 1345 three-byte alignment: 1346 1347 xz --delta=dist=3 --lzma2=pb=0 foo.tiff 1348 1349 If multiple images have been put into a single archive (e.g. .tar), the 1350 Delta filter will work on that too as long as all images have the same 1351 number of bytes per pixel. 1352 1353SEE ALSO 1354 xzdec(1), xzdiff(1), xzgrep(1), xzless(1), xzmore(1), gzip(1), 1355 bzip2(1), 7z(1) 1356 1357 XZ Utils: <http://tukaani.org/xz/> 1358 XZ Embedded: <http://tukaani.org/xz/embedded.html> 1359 LZMA SDK: <http://7-zip.org/sdk.html> 1360 1361 1362 1363Tukaani 2010-10-04 XZ(1) 1364