README.md revision 305420
1305420Smm# Welcome to libarchive!
2305420Smm
3305420SmmThe libarchive project develops a portable, efficient C library that
4305420Smmcan read and write streaming archives in a variety of formats.  It
5305420Smmalso includes implementations of the common `tar`, `cpio`, and `zcat`
6305420Smmcommand-line tools that use the libarchive library.
7305420Smm
8305420Smm## Questions?  Issues?
9305420Smm
10305420Smm* http://www.libarchive.org is the home for ongoing
11305420Smm  libarchive development, including documentation,
12305420Smm  and links to the libarchive mailing lists.
13305420Smm* To report an issue, use the issue tracker at
14305420Smm  https://github.com/libarchive/libarchive/issues
15305420Smm* To submit an enhancement to libarchive, please
16305420Smm  submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls
17305420Smm
18305420Smm## Contents of the Distribution
19305420Smm
20305420SmmThis distribution bundle includes the following major components:
21305420Smm
22305420Smm* **libarchive**: a library for reading and writing streaming archives
23305420Smm* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive
24305420Smm* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality
25305420Smm* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such
26305420Smm* **examples**: Some small example programs that you may find useful.
27305420Smm* **examples/minitar**: a compact sample demonstrating use of libarchive.
28305420Smm* **contrib**:  Various items sent to me by third parties; please contact the authors with any questions.
29305420Smm
30305420SmmThe top-level directory contains the following information files:
31305420Smm
32305420Smm* **NEWS** - highlights of recent changes
33305420Smm* **COPYING** - what you can do with this
34305420Smm* **INSTALL** - installation instructions
35305420Smm* **README** - this file
36305420Smm* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL
37305420Smm* **configure** - configuration script, see INSTALL for details.  If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`).
38305420Smm
39305420SmmThe following files in the top-level directory are used by the 'configure' script:
40305420Smm* `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers
41305420Smm* `Makefile.in`, `config.h.in` - templates used by configure script
42305420Smm
43305420Smm## Documentation
44305420Smm
45305420SmmIn addition to the informational articles and documentation
46305420Smmin the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki),
47305420Smmthe distribution also includes a number of manual pages:
48305420Smm
49305420Smm * bsdtar.1 explains the use of the bsdtar program
50305420Smm * bsdcpio.1 explains the use of the bsdcpio program
51305420Smm * bsdcat.1 explains the use of the bsdcat program
52305420Smm * libarchive.3 gives an overview of the library as a whole
53305420Smm * archive_read.3, archive_write.3, archive_write_disk.3, and
54305420Smm   archive_read_disk.3 provide detailed calling sequences for the read
55305420Smm   and write APIs
56305420Smm * archive_entry.3 details the "struct archive_entry" utility class
57305420Smm * archive_internals.3 provides some insight into libarchive's
58305420Smm   internal structure and operation.
59305420Smm * libarchive-formats.5 documents the file formats supported by the library
60305420Smm * cpio.5, mtree.5, and tar.5 provide detailed information about these
61305420Smm   popular archive formats, including hard-to-find details about
62305420Smm   modern cpio and tar variants.
63305420Smm
64305420SmmThe manual pages above are provided in the 'doc' directory in
65305420Smma number of different formats.
66305420Smm
67305420SmmYou should also read the copious comments in `archive.h` and the
68305420Smmsource code for the sample programs for more details.  Please let us
69305420Smmknow about any errors or omissions you find.
70305420Smm
71305420Smm## Supported Formats
72305420Smm
73305420SmmCurrently, the library automatically detects and reads the following fomats:
74305420Smm  * Old V7 tar archives
75305420Smm  * POSIX ustar
76305420Smm  * GNU tar format (including GNU long filenames, long link names, and sparse files)
77305420Smm  * Solaris 9 extended tar format (including ACLs)
78305420Smm  * POSIX pax interchange format
79305420Smm  * POSIX octet-oriented cpio
80305420Smm  * SVR4 ASCII cpio
81305420Smm  * POSIX octet-oriented cpio
82305420Smm  * Binary cpio (big-endian or little-endian)
83305420Smm  * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions)
84305420Smm  * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives)
85305420Smm  * GNU and BSD 'ar' archives
86305420Smm  * 'mtree' format
87305420Smm  * 7-Zip archives
88305420Smm  * Microsoft CAB format
89305420Smm  * LHA and LZH archives
90305420Smm  * RAR archives (with some limitations due to RAR's proprietary status)
91305420Smm  * XAR archives
92305420Smm
93305420SmmThe library also detects and handles any of the following before evaluating the archive:
94305420Smm  * uuencoded files
95305420Smm  * files with RPM wrapper
96305420Smm  * gzip compression
97305420Smm  * bzip2 compression
98305420Smm  * compress/LZW compression
99305420Smm  * lzma, lzip, and xz compression
100305420Smm  * lz4 compression
101305420Smm  * lzop compression
102305420Smm
103305420SmmThe library can create archives in any of the following formats:
104305420Smm  * POSIX ustar
105305420Smm  * POSIX pax interchange format
106305420Smm  * "restricted" pax format, which will create ustar archives except for
107305420Smm    entries that require pax extensions (for long filenames, ACLs, etc).
108305420Smm  * Old GNU tar format
109305420Smm  * Old V7 tar format
110305420Smm  * POSIX octet-oriented cpio
111305420Smm  * SVR4 "newc" cpio
112305420Smm  * shar archives
113305420Smm  * ZIP archives (with uncompressed or "deflate" compressed entries)
114305420Smm  * GNU and BSD 'ar' archives
115305420Smm  * 'mtree' format
116305420Smm  * ISO9660 format
117305420Smm  * 7-Zip archives
118305420Smm  * XAR archives
119305420Smm
120305420SmmWhen creating archives, the result can be filtered with any of the following:
121305420Smm  * uuencode
122305420Smm  * gzip compression
123305420Smm  * bzip2 compression
124305420Smm  * compress/LZW compression
125305420Smm  * lzma, lzip, and xz compression
126305420Smm  * lz4 compression
127305420Smm  * lzop compression
128305420Smm
129305420Smm## Notes about the Library Design
130305420Smm
131305420SmmThe following notes address many of the most common
132305420Smmquestions we are asked about libarchive:
133305420Smm
134305420Smm* This is a heavily stream-oriented system.  That means that
135305420Smm  it is optimized to read or write the archive in a single
136305420Smm  pass from beginning to end.  For example, this allows
137305420Smm  libarchive to process archives too large to store on disk
138305420Smm  by processing them on-the-fly as they are read from or
139305420Smm  written to a network or tape drive.  This also makes
140305420Smm  libarchive useful for tools that need to produce
141305420Smm  archives on-the-fly (such as webservers that provide
142305420Smm  archived contents of a users account).
143305420Smm
144305420Smm* In-place modification and random access to the contents
145305420Smm  of an archive are not directly supported.  For some formats,
146305420Smm  this is not an issue: For example, tar.gz archives are not
147305420Smm  designed for random access.  In some other cases, libarchive
148305420Smm  can re-open an archive and scan it from the beginning quickly
149305420Smm  enough to provide the needed abilities even without true
150305420Smm  random access.  Of course, some applications do require true
151305420Smm  random access; those applications should consider alternatives
152305420Smm  to libarchive.
153305420Smm
154305420Smm* The library is designed to be extended with new compression and
155305420Smm  archive formats.  The only requirement is that the format be
156305420Smm  readable or writable as a stream and that each archive entry be
157305420Smm  independent.  There are articles on the libarchive Wiki explaining
158305420Smm  how to extend libarchive.
159305420Smm
160305420Smm* On read, compression and format are always detected automatically.
161305420Smm
162305420Smm* The same API is used for all formats; in particular, it's very
163305420Smm  easy for software using libarchive to transparently handle
164305420Smm  any of libarchive's archiving formats.
165305420Smm
166305420Smm* Libarchive's automatic support for decompression can be used
167305420Smm  without archiving by explicitly selecting the "raw" and "empty"
168305420Smm  formats.
169305420Smm
170305420Smm* I've attempted to minimize static link pollution.  If you don't
171305420Smm  explicitly invoke a particular feature (such as support for a
172305420Smm  particular compression or format), it won't get pulled in to
173305420Smm  statically-linked programs.  In particular, if you don't explicitly
174305420Smm  enable a particular compression or decompression support, you won't
175305420Smm  need to link against the corresponding compression or decompression
176305420Smm  libraries.  This also reduces the size of statically-linked
177305420Smm  binaries in environments where that matters.
178305420Smm
179305420Smm* The library is generally _thread safe_ depending on the platform:
180305420Smm  it does not define any global variables of its own.  However, some
181305420Smm  platforms do not provide fully thread-safe versions of key C library
182305420Smm  functions.  On those platforms, libarchive will use the non-thread-safe
183305420Smm  functions.  Patches to improve this are of great interest to us.
184305420Smm
185305420Smm* In particular, libarchive's modules to read or write a directory
186305420Smm  tree do use `chdir()` to optimize the directory traversals.  This
187305420Smm  can cause problems for programs that expect to do disk access from
188305420Smm  multiple threads.  Of course, those modules are completely
189305420Smm  optional and you can use the rest of libarchive without them.
190305420Smm
191305420Smm* The library is _not_ thread aware, however.  It does no locking
192305420Smm  or thread management of any kind.  If you create a libarchive
193305420Smm  object and need to access it from multiple threads, you will
194305420Smm  need to provide your own locking.
195305420Smm
196305420Smm* On read, the library accepts whatever blocks you hand it.
197305420Smm  Your read callback is free to pass the library a byte at a time
198305420Smm  or mmap the entire archive and give it to the library at once.
199305420Smm  On write, the library always produces correctly-blocked output.
200305420Smm
201305420Smm* The object-style approach allows you to have multiple archive streams
202305420Smm  open at once.  bsdtar uses this in its "@archive" extension.
203305420Smm
204305420Smm* The archive itself is read/written using callback functions.
205305420Smm  You can read an archive directly from an in-memory buffer or
206305420Smm  write it to a socket, if you wish.  There are some utility
207305420Smm  functions to provide easy-to-use "open file," etc, capabilities.
208305420Smm
209305420Smm* The read/write APIs are designed to allow individual entries
210305420Smm  to be read or written to any data source:  You can create
211305420Smm  a block of data in memory and add it to a tar archive without
212305420Smm  first writing a temporary file.  You can also read an entry from
213305420Smm  an archive and write the data directly to a socket.  If you want
214305420Smm  to read/write entries to disk, there are convenience functions to
215305420Smm  make this especially easy.
216305420Smm
217305420Smm* Note: The "pax interchange format" is a POSIX standard extended tar
218305420Smm  format that should be used when the older _ustar_ format is not
219305420Smm  appropriate.  It has many advantages over other tar formats
220305420Smm  (including the legacy GNU tar format) and is widely supported by
221305420Smm  current tar implementations.
222305420Smm
223