README.md revision 305420
1305420Smm# Welcome to libarchive! 2305420Smm 3305420SmmThe libarchive project develops a portable, efficient C library that 4305420Smmcan read and write streaming archives in a variety of formats. It 5305420Smmalso includes implementations of the common `tar`, `cpio`, and `zcat` 6305420Smmcommand-line tools that use the libarchive library. 7305420Smm 8305420Smm## Questions? Issues? 9305420Smm 10305420Smm* http://www.libarchive.org is the home for ongoing 11305420Smm libarchive development, including documentation, 12305420Smm and links to the libarchive mailing lists. 13305420Smm* To report an issue, use the issue tracker at 14305420Smm https://github.com/libarchive/libarchive/issues 15305420Smm* To submit an enhancement to libarchive, please 16305420Smm submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls 17305420Smm 18305420Smm## Contents of the Distribution 19305420Smm 20305420SmmThis distribution bundle includes the following major components: 21305420Smm 22305420Smm* **libarchive**: a library for reading and writing streaming archives 23305420Smm* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive 24305420Smm* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality 25305420Smm* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such 26305420Smm* **examples**: Some small example programs that you may find useful. 27305420Smm* **examples/minitar**: a compact sample demonstrating use of libarchive. 28305420Smm* **contrib**: Various items sent to me by third parties; please contact the authors with any questions. 29305420Smm 30305420SmmThe top-level directory contains the following information files: 31305420Smm 32305420Smm* **NEWS** - highlights of recent changes 33305420Smm* **COPYING** - what you can do with this 34305420Smm* **INSTALL** - installation instructions 35305420Smm* **README** - this file 36305420Smm* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL 37305420Smm* **configure** - configuration script, see INSTALL for details. If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`). 38305420Smm 39305420SmmThe following files in the top-level directory are used by the 'configure' script: 40305420Smm* `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers 41305420Smm* `Makefile.in`, `config.h.in` - templates used by configure script 42305420Smm 43305420Smm## Documentation 44305420Smm 45305420SmmIn addition to the informational articles and documentation 46305420Smmin the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki), 47305420Smmthe distribution also includes a number of manual pages: 48305420Smm 49305420Smm * bsdtar.1 explains the use of the bsdtar program 50305420Smm * bsdcpio.1 explains the use of the bsdcpio program 51305420Smm * bsdcat.1 explains the use of the bsdcat program 52305420Smm * libarchive.3 gives an overview of the library as a whole 53305420Smm * archive_read.3, archive_write.3, archive_write_disk.3, and 54305420Smm archive_read_disk.3 provide detailed calling sequences for the read 55305420Smm and write APIs 56305420Smm * archive_entry.3 details the "struct archive_entry" utility class 57305420Smm * archive_internals.3 provides some insight into libarchive's 58305420Smm internal structure and operation. 59305420Smm * libarchive-formats.5 documents the file formats supported by the library 60305420Smm * cpio.5, mtree.5, and tar.5 provide detailed information about these 61305420Smm popular archive formats, including hard-to-find details about 62305420Smm modern cpio and tar variants. 63305420Smm 64305420SmmThe manual pages above are provided in the 'doc' directory in 65305420Smma number of different formats. 66305420Smm 67305420SmmYou should also read the copious comments in `archive.h` and the 68305420Smmsource code for the sample programs for more details. Please let us 69305420Smmknow about any errors or omissions you find. 70305420Smm 71305420Smm## Supported Formats 72305420Smm 73305420SmmCurrently, the library automatically detects and reads the following fomats: 74305420Smm * Old V7 tar archives 75305420Smm * POSIX ustar 76305420Smm * GNU tar format (including GNU long filenames, long link names, and sparse files) 77305420Smm * Solaris 9 extended tar format (including ACLs) 78305420Smm * POSIX pax interchange format 79305420Smm * POSIX octet-oriented cpio 80305420Smm * SVR4 ASCII cpio 81305420Smm * POSIX octet-oriented cpio 82305420Smm * Binary cpio (big-endian or little-endian) 83305420Smm * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions) 84305420Smm * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives) 85305420Smm * GNU and BSD 'ar' archives 86305420Smm * 'mtree' format 87305420Smm * 7-Zip archives 88305420Smm * Microsoft CAB format 89305420Smm * LHA and LZH archives 90305420Smm * RAR archives (with some limitations due to RAR's proprietary status) 91305420Smm * XAR archives 92305420Smm 93305420SmmThe library also detects and handles any of the following before evaluating the archive: 94305420Smm * uuencoded files 95305420Smm * files with RPM wrapper 96305420Smm * gzip compression 97305420Smm * bzip2 compression 98305420Smm * compress/LZW compression 99305420Smm * lzma, lzip, and xz compression 100305420Smm * lz4 compression 101305420Smm * lzop compression 102305420Smm 103305420SmmThe library can create archives in any of the following formats: 104305420Smm * POSIX ustar 105305420Smm * POSIX pax interchange format 106305420Smm * "restricted" pax format, which will create ustar archives except for 107305420Smm entries that require pax extensions (for long filenames, ACLs, etc). 108305420Smm * Old GNU tar format 109305420Smm * Old V7 tar format 110305420Smm * POSIX octet-oriented cpio 111305420Smm * SVR4 "newc" cpio 112305420Smm * shar archives 113305420Smm * ZIP archives (with uncompressed or "deflate" compressed entries) 114305420Smm * GNU and BSD 'ar' archives 115305420Smm * 'mtree' format 116305420Smm * ISO9660 format 117305420Smm * 7-Zip archives 118305420Smm * XAR archives 119305420Smm 120305420SmmWhen creating archives, the result can be filtered with any of the following: 121305420Smm * uuencode 122305420Smm * gzip compression 123305420Smm * bzip2 compression 124305420Smm * compress/LZW compression 125305420Smm * lzma, lzip, and xz compression 126305420Smm * lz4 compression 127305420Smm * lzop compression 128305420Smm 129305420Smm## Notes about the Library Design 130305420Smm 131305420SmmThe following notes address many of the most common 132305420Smmquestions we are asked about libarchive: 133305420Smm 134305420Smm* This is a heavily stream-oriented system. That means that 135305420Smm it is optimized to read or write the archive in a single 136305420Smm pass from beginning to end. For example, this allows 137305420Smm libarchive to process archives too large to store on disk 138305420Smm by processing them on-the-fly as they are read from or 139305420Smm written to a network or tape drive. This also makes 140305420Smm libarchive useful for tools that need to produce 141305420Smm archives on-the-fly (such as webservers that provide 142305420Smm archived contents of a users account). 143305420Smm 144305420Smm* In-place modification and random access to the contents 145305420Smm of an archive are not directly supported. For some formats, 146305420Smm this is not an issue: For example, tar.gz archives are not 147305420Smm designed for random access. In some other cases, libarchive 148305420Smm can re-open an archive and scan it from the beginning quickly 149305420Smm enough to provide the needed abilities even without true 150305420Smm random access. Of course, some applications do require true 151305420Smm random access; those applications should consider alternatives 152305420Smm to libarchive. 153305420Smm 154305420Smm* The library is designed to be extended with new compression and 155305420Smm archive formats. The only requirement is that the format be 156305420Smm readable or writable as a stream and that each archive entry be 157305420Smm independent. There are articles on the libarchive Wiki explaining 158305420Smm how to extend libarchive. 159305420Smm 160305420Smm* On read, compression and format are always detected automatically. 161305420Smm 162305420Smm* The same API is used for all formats; in particular, it's very 163305420Smm easy for software using libarchive to transparently handle 164305420Smm any of libarchive's archiving formats. 165305420Smm 166305420Smm* Libarchive's automatic support for decompression can be used 167305420Smm without archiving by explicitly selecting the "raw" and "empty" 168305420Smm formats. 169305420Smm 170305420Smm* I've attempted to minimize static link pollution. If you don't 171305420Smm explicitly invoke a particular feature (such as support for a 172305420Smm particular compression or format), it won't get pulled in to 173305420Smm statically-linked programs. In particular, if you don't explicitly 174305420Smm enable a particular compression or decompression support, you won't 175305420Smm need to link against the corresponding compression or decompression 176305420Smm libraries. This also reduces the size of statically-linked 177305420Smm binaries in environments where that matters. 178305420Smm 179305420Smm* The library is generally _thread safe_ depending on the platform: 180305420Smm it does not define any global variables of its own. However, some 181305420Smm platforms do not provide fully thread-safe versions of key C library 182305420Smm functions. On those platforms, libarchive will use the non-thread-safe 183305420Smm functions. Patches to improve this are of great interest to us. 184305420Smm 185305420Smm* In particular, libarchive's modules to read or write a directory 186305420Smm tree do use `chdir()` to optimize the directory traversals. This 187305420Smm can cause problems for programs that expect to do disk access from 188305420Smm multiple threads. Of course, those modules are completely 189305420Smm optional and you can use the rest of libarchive without them. 190305420Smm 191305420Smm* The library is _not_ thread aware, however. It does no locking 192305420Smm or thread management of any kind. If you create a libarchive 193305420Smm object and need to access it from multiple threads, you will 194305420Smm need to provide your own locking. 195305420Smm 196305420Smm* On read, the library accepts whatever blocks you hand it. 197305420Smm Your read callback is free to pass the library a byte at a time 198305420Smm or mmap the entire archive and give it to the library at once. 199305420Smm On write, the library always produces correctly-blocked output. 200305420Smm 201305420Smm* The object-style approach allows you to have multiple archive streams 202305420Smm open at once. bsdtar uses this in its "@archive" extension. 203305420Smm 204305420Smm* The archive itself is read/written using callback functions. 205305420Smm You can read an archive directly from an in-memory buffer or 206305420Smm write it to a socket, if you wish. There are some utility 207305420Smm functions to provide easy-to-use "open file," etc, capabilities. 208305420Smm 209305420Smm* The read/write APIs are designed to allow individual entries 210305420Smm to be read or written to any data source: You can create 211305420Smm a block of data in memory and add it to a tar archive without 212305420Smm first writing a temporary file. You can also read an entry from 213305420Smm an archive and write the data directly to a socket. If you want 214305420Smm to read/write entries to disk, there are convenience functions to 215305420Smm make this especially easy. 216305420Smm 217305420Smm* Note: The "pax interchange format" is a POSIX standard extended tar 218305420Smm format that should be used when the older _ustar_ format is not 219305420Smm appropriate. It has many advantages over other tar formats 220305420Smm (including the legacy GNU tar format) and is widely supported by 221305420Smm current tar implementations. 222305420Smm 223