1305420Smm# Welcome to libarchive!
2305420Smm
3305420SmmThe libarchive project develops a portable, efficient C library that
4305420Smmcan read and write streaming archives in a variety of formats.  It
5305420Smmalso includes implementations of the common `tar`, `cpio`, and `zcat`
6305420Smmcommand-line tools that use the libarchive library.
7305420Smm
8305420Smm## Questions?  Issues?
9305420Smm
10305420Smm* http://www.libarchive.org is the home for ongoing
11305420Smm  libarchive development, including documentation,
12305420Smm  and links to the libarchive mailing lists.
13305420Smm* To report an issue, use the issue tracker at
14305420Smm  https://github.com/libarchive/libarchive/issues
15305420Smm* To submit an enhancement to libarchive, please
16305420Smm  submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls
17305420Smm
18305420Smm## Contents of the Distribution
19305420Smm
20305420SmmThis distribution bundle includes the following major components:
21305420Smm
22305420Smm* **libarchive**: a library for reading and writing streaming archives
23305420Smm* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive
24305420Smm* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality
25305420Smm* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such
26305420Smm* **examples**: Some small example programs that you may find useful.
27305420Smm* **examples/minitar**: a compact sample demonstrating use of libarchive.
28305420Smm* **contrib**:  Various items sent to me by third parties; please contact the authors with any questions.
29305420Smm
30305420SmmThe top-level directory contains the following information files:
31305420Smm
32305420Smm* **NEWS** - highlights of recent changes
33305420Smm* **COPYING** - what you can do with this
34305420Smm* **INSTALL** - installation instructions
35305420Smm* **README** - this file
36305420Smm* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL
37305420Smm* **configure** - configuration script, see INSTALL for details.  If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`).
38305420Smm
39305420SmmThe following files in the top-level directory are used by the 'configure' script:
40305420Smm* `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers
41305420Smm* `Makefile.in`, `config.h.in` - templates used by configure script
42305420Smm
43305420Smm## Documentation
44305420Smm
45305420SmmIn addition to the informational articles and documentation
46305420Smmin the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki),
47305420Smmthe distribution also includes a number of manual pages:
48305420Smm
49305420Smm * bsdtar.1 explains the use of the bsdtar program
50305420Smm * bsdcpio.1 explains the use of the bsdcpio program
51305420Smm * bsdcat.1 explains the use of the bsdcat program
52305420Smm * libarchive.3 gives an overview of the library as a whole
53305420Smm * archive_read.3, archive_write.3, archive_write_disk.3, and
54305420Smm   archive_read_disk.3 provide detailed calling sequences for the read
55305420Smm   and write APIs
56305420Smm * archive_entry.3 details the "struct archive_entry" utility class
57305420Smm * archive_internals.3 provides some insight into libarchive's
58305420Smm   internal structure and operation.
59305420Smm * libarchive-formats.5 documents the file formats supported by the library
60305420Smm * cpio.5, mtree.5, and tar.5 provide detailed information about these
61305420Smm   popular archive formats, including hard-to-find details about
62305420Smm   modern cpio and tar variants.
63305420Smm
64305420SmmThe manual pages above are provided in the 'doc' directory in
65305420Smma number of different formats.
66305420Smm
67305420SmmYou should also read the copious comments in `archive.h` and the
68305420Smmsource code for the sample programs for more details.  Please let us
69305420Smmknow about any errors or omissions you find.
70305420Smm
71305420Smm## Supported Formats
72305420Smm
73362133SmmCurrently, the library automatically detects and reads the following formats:
74305420Smm  * Old V7 tar archives
75305420Smm  * POSIX ustar
76305420Smm  * GNU tar format (including GNU long filenames, long link names, and sparse files)
77305420Smm  * Solaris 9 extended tar format (including ACLs)
78305420Smm  * POSIX pax interchange format
79305420Smm  * POSIX octet-oriented cpio
80305420Smm  * SVR4 ASCII cpio
81305420Smm  * Binary cpio (big-endian or little-endian)
82370535Sgit2svn  * PWB binary cpio
83305420Smm  * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions)
84305420Smm  * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives)
85349524Smm  * ZIPX archives (with support for bzip2, ppmd8, lzma and xz compressed entries)
86305420Smm  * GNU and BSD 'ar' archives
87305420Smm  * 'mtree' format
88305420Smm  * 7-Zip archives
89305420Smm  * Microsoft CAB format
90305420Smm  * LHA and LZH archives
91342360Smm  * RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status)
92305420Smm  * XAR archives
93305420Smm
94305420SmmThe library also detects and handles any of the following before evaluating the archive:
95305420Smm  * uuencoded files
96305420Smm  * files with RPM wrapper
97305420Smm  * gzip compression
98305420Smm  * bzip2 compression
99305420Smm  * compress/LZW compression
100305420Smm  * lzma, lzip, and xz compression
101305420Smm  * lz4 compression
102305420Smm  * lzop compression
103338795Smm  * zstandard compression
104305420Smm
105305420SmmThe library can create archives in any of the following formats:
106305420Smm  * POSIX ustar
107305420Smm  * POSIX pax interchange format
108305420Smm  * "restricted" pax format, which will create ustar archives except for
109305420Smm    entries that require pax extensions (for long filenames, ACLs, etc).
110305420Smm  * Old GNU tar format
111305420Smm  * Old V7 tar format
112305420Smm  * POSIX octet-oriented cpio
113305420Smm  * SVR4 "newc" cpio
114370535Sgit2svn  * Binary cpio (little-endian)
115370535Sgit2svn  * PWB binary cpio
116305420Smm  * shar archives
117305420Smm  * ZIP archives (with uncompressed or "deflate" compressed entries)
118305420Smm  * GNU and BSD 'ar' archives
119305420Smm  * 'mtree' format
120305420Smm  * ISO9660 format
121305420Smm  * 7-Zip archives
122305420Smm  * XAR archives
123305420Smm
124305420SmmWhen creating archives, the result can be filtered with any of the following:
125305420Smm  * uuencode
126305420Smm  * gzip compression
127305420Smm  * bzip2 compression
128305420Smm  * compress/LZW compression
129305420Smm  * lzma, lzip, and xz compression
130305420Smm  * lz4 compression
131305420Smm  * lzop compression
132338795Smm  * zstandard compression
133305420Smm
134305420Smm## Notes about the Library Design
135305420Smm
136305420SmmThe following notes address many of the most common
137305420Smmquestions we are asked about libarchive:
138305420Smm
139305420Smm* This is a heavily stream-oriented system.  That means that
140305420Smm  it is optimized to read or write the archive in a single
141305420Smm  pass from beginning to end.  For example, this allows
142305420Smm  libarchive to process archives too large to store on disk
143305420Smm  by processing them on-the-fly as they are read from or
144305420Smm  written to a network or tape drive.  This also makes
145305420Smm  libarchive useful for tools that need to produce
146305420Smm  archives on-the-fly (such as webservers that provide
147305420Smm  archived contents of a users account).
148305420Smm
149305420Smm* In-place modification and random access to the contents
150305420Smm  of an archive are not directly supported.  For some formats,
151305420Smm  this is not an issue: For example, tar.gz archives are not
152305420Smm  designed for random access.  In some other cases, libarchive
153305420Smm  can re-open an archive and scan it from the beginning quickly
154305420Smm  enough to provide the needed abilities even without true
155305420Smm  random access.  Of course, some applications do require true
156305420Smm  random access; those applications should consider alternatives
157305420Smm  to libarchive.
158305420Smm
159305420Smm* The library is designed to be extended with new compression and
160305420Smm  archive formats.  The only requirement is that the format be
161305420Smm  readable or writable as a stream and that each archive entry be
162305420Smm  independent.  There are articles on the libarchive Wiki explaining
163305420Smm  how to extend libarchive.
164305420Smm
165305420Smm* On read, compression and format are always detected automatically.
166305420Smm
167338795Smm* The same API is used for all formats; it should be very
168305420Smm  easy for software using libarchive to transparently handle
169305420Smm  any of libarchive's archiving formats.
170305420Smm
171305420Smm* Libarchive's automatic support for decompression can be used
172305420Smm  without archiving by explicitly selecting the "raw" and "empty"
173305420Smm  formats.
174305420Smm
175305420Smm* I've attempted to minimize static link pollution.  If you don't
176305420Smm  explicitly invoke a particular feature (such as support for a
177305420Smm  particular compression or format), it won't get pulled in to
178305420Smm  statically-linked programs.  In particular, if you don't explicitly
179305420Smm  enable a particular compression or decompression support, you won't
180305420Smm  need to link against the corresponding compression or decompression
181305420Smm  libraries.  This also reduces the size of statically-linked
182305420Smm  binaries in environments where that matters.
183305420Smm
184305420Smm* The library is generally _thread safe_ depending on the platform:
185305420Smm  it does not define any global variables of its own.  However, some
186305420Smm  platforms do not provide fully thread-safe versions of key C library
187305420Smm  functions.  On those platforms, libarchive will use the non-thread-safe
188305420Smm  functions.  Patches to improve this are of great interest to us.
189305420Smm
190305420Smm* In particular, libarchive's modules to read or write a directory
191305420Smm  tree do use `chdir()` to optimize the directory traversals.  This
192305420Smm  can cause problems for programs that expect to do disk access from
193305420Smm  multiple threads.  Of course, those modules are completely
194305420Smm  optional and you can use the rest of libarchive without them.
195305420Smm
196305420Smm* The library is _not_ thread aware, however.  It does no locking
197305420Smm  or thread management of any kind.  If you create a libarchive
198305420Smm  object and need to access it from multiple threads, you will
199305420Smm  need to provide your own locking.
200305420Smm
201305420Smm* On read, the library accepts whatever blocks you hand it.
202305420Smm  Your read callback is free to pass the library a byte at a time
203305420Smm  or mmap the entire archive and give it to the library at once.
204305420Smm  On write, the library always produces correctly-blocked output.
205305420Smm
206305420Smm* The object-style approach allows you to have multiple archive streams
207305420Smm  open at once.  bsdtar uses this in its "@archive" extension.
208305420Smm
209305420Smm* The archive itself is read/written using callback functions.
210305420Smm  You can read an archive directly from an in-memory buffer or
211305420Smm  write it to a socket, if you wish.  There are some utility
212305420Smm  functions to provide easy-to-use "open file," etc, capabilities.
213305420Smm
214305420Smm* The read/write APIs are designed to allow individual entries
215305420Smm  to be read or written to any data source:  You can create
216305420Smm  a block of data in memory and add it to a tar archive without
217305420Smm  first writing a temporary file.  You can also read an entry from
218305420Smm  an archive and write the data directly to a socket.  If you want
219305420Smm  to read/write entries to disk, there are convenience functions to
220305420Smm  make this especially easy.
221305420Smm
222305420Smm* Note: The "pax interchange format" is a POSIX standard extended tar
223305420Smm  format that should be used when the older _ustar_ format is not
224305420Smm  appropriate.  It has many advantages over other tar formats
225305420Smm  (including the legacy GNU tar format) and is widely supported by
226305420Smm  current tar implementations.
227305420Smm
228