1305420Smm# Welcome to libarchive! 2305420Smm 3305420SmmThe libarchive project develops a portable, efficient C library that 4305420Smmcan read and write streaming archives in a variety of formats. It 5305420Smmalso includes implementations of the common `tar`, `cpio`, and `zcat` 6305420Smmcommand-line tools that use the libarchive library. 7305420Smm 8305420Smm## Questions? Issues? 9305420Smm 10305420Smm* http://www.libarchive.org is the home for ongoing 11305420Smm libarchive development, including documentation, 12305420Smm and links to the libarchive mailing lists. 13305420Smm* To report an issue, use the issue tracker at 14305420Smm https://github.com/libarchive/libarchive/issues 15305420Smm* To submit an enhancement to libarchive, please 16305420Smm submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls 17305420Smm 18305420Smm## Contents of the Distribution 19305420Smm 20305420SmmThis distribution bundle includes the following major components: 21305420Smm 22305420Smm* **libarchive**: a library for reading and writing streaming archives 23305420Smm* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive 24305420Smm* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality 25305420Smm* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such 26305420Smm* **examples**: Some small example programs that you may find useful. 27305420Smm* **examples/minitar**: a compact sample demonstrating use of libarchive. 28305420Smm* **contrib**: Various items sent to me by third parties; please contact the authors with any questions. 29305420Smm 30305420SmmThe top-level directory contains the following information files: 31305420Smm 32305420Smm* **NEWS** - highlights of recent changes 33305420Smm* **COPYING** - what you can do with this 34305420Smm* **INSTALL** - installation instructions 35305420Smm* **README** - this file 36305420Smm* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL 37305420Smm* **configure** - configuration script, see INSTALL for details. If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`). 38305420Smm 39305420SmmThe following files in the top-level directory are used by the 'configure' script: 40305420Smm* `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers 41305420Smm* `Makefile.in`, `config.h.in` - templates used by configure script 42305420Smm 43305420Smm## Documentation 44305420Smm 45305420SmmIn addition to the informational articles and documentation 46305420Smmin the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki), 47305420Smmthe distribution also includes a number of manual pages: 48305420Smm 49305420Smm * bsdtar.1 explains the use of the bsdtar program 50305420Smm * bsdcpio.1 explains the use of the bsdcpio program 51305420Smm * bsdcat.1 explains the use of the bsdcat program 52305420Smm * libarchive.3 gives an overview of the library as a whole 53305420Smm * archive_read.3, archive_write.3, archive_write_disk.3, and 54305420Smm archive_read_disk.3 provide detailed calling sequences for the read 55305420Smm and write APIs 56305420Smm * archive_entry.3 details the "struct archive_entry" utility class 57305420Smm * archive_internals.3 provides some insight into libarchive's 58305420Smm internal structure and operation. 59305420Smm * libarchive-formats.5 documents the file formats supported by the library 60305420Smm * cpio.5, mtree.5, and tar.5 provide detailed information about these 61305420Smm popular archive formats, including hard-to-find details about 62305420Smm modern cpio and tar variants. 63305420Smm 64305420SmmThe manual pages above are provided in the 'doc' directory in 65305420Smma number of different formats. 66305420Smm 67305420SmmYou should also read the copious comments in `archive.h` and the 68305420Smmsource code for the sample programs for more details. Please let us 69305420Smmknow about any errors or omissions you find. 70305420Smm 71305420Smm## Supported Formats 72305420Smm 73362133SmmCurrently, the library automatically detects and reads the following formats: 74305420Smm * Old V7 tar archives 75305420Smm * POSIX ustar 76305420Smm * GNU tar format (including GNU long filenames, long link names, and sparse files) 77305420Smm * Solaris 9 extended tar format (including ACLs) 78305420Smm * POSIX pax interchange format 79305420Smm * POSIX octet-oriented cpio 80305420Smm * SVR4 ASCII cpio 81305420Smm * Binary cpio (big-endian or little-endian) 82370535Sgit2svn * PWB binary cpio 83305420Smm * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions) 84305420Smm * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives) 85349524Smm * ZIPX archives (with support for bzip2, ppmd8, lzma and xz compressed entries) 86305420Smm * GNU and BSD 'ar' archives 87305420Smm * 'mtree' format 88305420Smm * 7-Zip archives 89305420Smm * Microsoft CAB format 90305420Smm * LHA and LZH archives 91342360Smm * RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status) 92305420Smm * XAR archives 93305420Smm 94305420SmmThe library also detects and handles any of the following before evaluating the archive: 95305420Smm * uuencoded files 96305420Smm * files with RPM wrapper 97305420Smm * gzip compression 98305420Smm * bzip2 compression 99305420Smm * compress/LZW compression 100305420Smm * lzma, lzip, and xz compression 101305420Smm * lz4 compression 102305420Smm * lzop compression 103338795Smm * zstandard compression 104305420Smm 105305420SmmThe library can create archives in any of the following formats: 106305420Smm * POSIX ustar 107305420Smm * POSIX pax interchange format 108305420Smm * "restricted" pax format, which will create ustar archives except for 109305420Smm entries that require pax extensions (for long filenames, ACLs, etc). 110305420Smm * Old GNU tar format 111305420Smm * Old V7 tar format 112305420Smm * POSIX octet-oriented cpio 113305420Smm * SVR4 "newc" cpio 114370535Sgit2svn * Binary cpio (little-endian) 115370535Sgit2svn * PWB binary cpio 116305420Smm * shar archives 117305420Smm * ZIP archives (with uncompressed or "deflate" compressed entries) 118305420Smm * GNU and BSD 'ar' archives 119305420Smm * 'mtree' format 120305420Smm * ISO9660 format 121305420Smm * 7-Zip archives 122305420Smm * XAR archives 123305420Smm 124305420SmmWhen creating archives, the result can be filtered with any of the following: 125305420Smm * uuencode 126305420Smm * gzip compression 127305420Smm * bzip2 compression 128305420Smm * compress/LZW compression 129305420Smm * lzma, lzip, and xz compression 130305420Smm * lz4 compression 131305420Smm * lzop compression 132338795Smm * zstandard compression 133305420Smm 134305420Smm## Notes about the Library Design 135305420Smm 136305420SmmThe following notes address many of the most common 137305420Smmquestions we are asked about libarchive: 138305420Smm 139305420Smm* This is a heavily stream-oriented system. That means that 140305420Smm it is optimized to read or write the archive in a single 141305420Smm pass from beginning to end. For example, this allows 142305420Smm libarchive to process archives too large to store on disk 143305420Smm by processing them on-the-fly as they are read from or 144305420Smm written to a network or tape drive. This also makes 145305420Smm libarchive useful for tools that need to produce 146305420Smm archives on-the-fly (such as webservers that provide 147305420Smm archived contents of a users account). 148305420Smm 149305420Smm* In-place modification and random access to the contents 150305420Smm of an archive are not directly supported. For some formats, 151305420Smm this is not an issue: For example, tar.gz archives are not 152305420Smm designed for random access. In some other cases, libarchive 153305420Smm can re-open an archive and scan it from the beginning quickly 154305420Smm enough to provide the needed abilities even without true 155305420Smm random access. Of course, some applications do require true 156305420Smm random access; those applications should consider alternatives 157305420Smm to libarchive. 158305420Smm 159305420Smm* The library is designed to be extended with new compression and 160305420Smm archive formats. The only requirement is that the format be 161305420Smm readable or writable as a stream and that each archive entry be 162305420Smm independent. There are articles on the libarchive Wiki explaining 163305420Smm how to extend libarchive. 164305420Smm 165305420Smm* On read, compression and format are always detected automatically. 166305420Smm 167338795Smm* The same API is used for all formats; it should be very 168305420Smm easy for software using libarchive to transparently handle 169305420Smm any of libarchive's archiving formats. 170305420Smm 171305420Smm* Libarchive's automatic support for decompression can be used 172305420Smm without archiving by explicitly selecting the "raw" and "empty" 173305420Smm formats. 174305420Smm 175305420Smm* I've attempted to minimize static link pollution. If you don't 176305420Smm explicitly invoke a particular feature (such as support for a 177305420Smm particular compression or format), it won't get pulled in to 178305420Smm statically-linked programs. In particular, if you don't explicitly 179305420Smm enable a particular compression or decompression support, you won't 180305420Smm need to link against the corresponding compression or decompression 181305420Smm libraries. This also reduces the size of statically-linked 182305420Smm binaries in environments where that matters. 183305420Smm 184305420Smm* The library is generally _thread safe_ depending on the platform: 185305420Smm it does not define any global variables of its own. However, some 186305420Smm platforms do not provide fully thread-safe versions of key C library 187305420Smm functions. On those platforms, libarchive will use the non-thread-safe 188305420Smm functions. Patches to improve this are of great interest to us. 189305420Smm 190305420Smm* In particular, libarchive's modules to read or write a directory 191305420Smm tree do use `chdir()` to optimize the directory traversals. This 192305420Smm can cause problems for programs that expect to do disk access from 193305420Smm multiple threads. Of course, those modules are completely 194305420Smm optional and you can use the rest of libarchive without them. 195305420Smm 196305420Smm* The library is _not_ thread aware, however. It does no locking 197305420Smm or thread management of any kind. If you create a libarchive 198305420Smm object and need to access it from multiple threads, you will 199305420Smm need to provide your own locking. 200305420Smm 201305420Smm* On read, the library accepts whatever blocks you hand it. 202305420Smm Your read callback is free to pass the library a byte at a time 203305420Smm or mmap the entire archive and give it to the library at once. 204305420Smm On write, the library always produces correctly-blocked output. 205305420Smm 206305420Smm* The object-style approach allows you to have multiple archive streams 207305420Smm open at once. bsdtar uses this in its "@archive" extension. 208305420Smm 209305420Smm* The archive itself is read/written using callback functions. 210305420Smm You can read an archive directly from an in-memory buffer or 211305420Smm write it to a socket, if you wish. There are some utility 212305420Smm functions to provide easy-to-use "open file," etc, capabilities. 213305420Smm 214305420Smm* The read/write APIs are designed to allow individual entries 215305420Smm to be read or written to any data source: You can create 216305420Smm a block of data in memory and add it to a tar archive without 217305420Smm first writing a temporary file. You can also read an entry from 218305420Smm an archive and write the data directly to a socket. If you want 219305420Smm to read/write entries to disk, there are convenience functions to 220305420Smm make this especially easy. 221305420Smm 222305420Smm* Note: The "pax interchange format" is a POSIX standard extended tar 223305420Smm format that should be used when the older _ustar_ format is not 224305420Smm appropriate. It has many advantages over other tar formats 225305420Smm (including the legacy GNU tar format) and is widely supported by 226305420Smm current tar implementations. 227305420Smm 228