1@c This is part of the paxutils manual. 2@c Copyright (C) 2006 Free Software Foundation, Inc. 3@c This file is distributed under GFDL 1.1 or any later version 4@c published by the Free Software Foundation. 5 6@cindex sparse formats 7@cindex sparse versions 8The notion of sparse file, and the ways of handling it from the point 9of view of @GNUTAR{} user have been described in detail in 10@ref{sparse}. This chapter describes the internal format @GNUTAR{} 11uses to store such files. 12 13The support for sparse files in @GNUTAR{} has a long history. The 14earliest version featuring this support that I was able to find was 1.09, 15released in November, 1990. The format introduced back then is called 16@dfn{old GNU} sparse format and in spite of the fact that its design 17contained many flaws, it was the only format @GNUTAR{} supported 18until version 1.14 (May, 2004), which introduced initial support for 19sparse archives in @acronym{PAX} archives (@pxref{posix}). This 20format was not free from design flows, either and it was subsequently 21improved in versions 1.15.2 (November, 2005) and 1.15.92 (June, 222006). 23 24In addition to GNU sparse format, @GNUTAR{} is able to read and 25extract sparse files archived by @command{star}. 26 27The following subsections describe each format in detail. 28 29@menu 30* Old GNU Format:: 31* PAX 0:: PAX Format, Versions 0.0 and 0.1 32* PAX 1:: PAX Format, Version 1.0 33@end menu 34 35@node Old GNU Format 36@appendixsubsec Old GNU Format 37 38@cindex sparse formats, Old GNU 39@cindex Old GNU sparse format 40The format introduced some time around 1990 (v. 1.09). It was 41designed on top of standard @code{ustar} headers in such an 42unfortunate way that some of its fields overwrote fields required by 43POSIX. 44 45An old GNU sparse header is designated by type @samp{S} 46(@code{GNUTYPE_SPARSE}) and has the following layout: 47 48@multitable @columnfractions 0.10 0.10 0.20 0.20 0.40 49@headitem Offset @tab Size @tab Name @tab Data type @tab Contents 50@item 0 @tab 345 @tab @tab N/A @tab Not used. 51@item 345 @tab 12 @tab atime @tab Number @tab @code{atime} of the file. 52@item 357 @tab 12 @tab ctime @tab Number @tab @code{ctime} of the file . 53@item 369 @tab 12 @tab offset @tab Number @tab For 54multivolume archives: the offset of the start of this volume. 55@item 381 @tab 4 @tab @tab N/A @tab Not used. 56@item 385 @tab 1 @tab @tab N/A @tab Not used. 57@item 386 @tab 96 @tab sp @tab @code{sparse_header} @tab (4 entries) File map. 58@item 482 @tab 1 @tab isextended @tab Bool @tab @code{1} if an 59extension sparse header follows, @code{0} otherwise. 60@item 483 @tab 12 @tab realsize @tab Number @tab Real size of the file. 61@end multitable 62 63Each of @code{sparse_header} object at offset 386 describes a single 64data chunk. It has the following structure: 65 66@multitable @columnfractions 0.10 0.10 0.20 0.60 67@headitem Offset @tab Size @tab Data type @tab Contents 68@item 0 @tab 12 @tab Number @tab Offset of the 69beginning of the chunk. 70@item 12 @tab 12 @tab Number @tab Size of the chunk. 71@end multitable 72 73If the member contains more than four chunks, the @code{isextended} 74field of the header has the value @code{1} and the main header is 75followed by one or more @dfn{extension headers}. Each such header has 76the following structure: 77 78@multitable @columnfractions 0.10 0.10 0.20 0.20 0.40 79@headitem Offset @tab Size @tab Name @tab Data type @tab Contents 80@item 0 @tab 21 @tab sp @tab @code{sparse_header} @tab 81(21 entires) File map. 82@item 504 @tab 1 @tab isextended @tab Bool @tab @code{1} if an 83extension sparse header follows, or @code{0} otherwise. 84@end multitable 85 86A header with @code{isextended=0} ends the map. 87 88@node PAX 0 89@appendixsubsec PAX Format, Versions 0.0 and 0.1 90 91@cindex sparse formats, v.0.0 92There are two formats available in this branch. The version @code{0.0} 93is the initial version of sparse format used by @command{tar} 94versions 1.14--1.15.1. The sparse file map is kept in extended 95(@code{x}) PAX header variables: 96 97@table @code 98@vrindex GNU.sparse.size, extended header variable 99@item GNU.sparse.size 100Real size of the stored file 101 102@item GNU.sparse.numblocks 103@vrindex GNU.sparse.numblocks, extended header variable 104Number of blocks in the sparse map 105 106@item GNU.sparse.offset 107@vrindex GNU.sparse.offset, extended header variable 108Offset of the data block 109 110@item GNU.sparse.numbytes 111@vrindex GNU.sparse.numbytes, extended header variable 112Size of the data block 113@end table 114 115The latter two variables repeat for each data block, so the overall 116structure is like this: 117 118@smallexample 119@group 120GNU.sparse.size=@var{size} 121GNU.sparse.numblocks=@var{numblocks} 122repeat @var{numblocks} times 123 GNU.sparse.offset=@var{offset} 124 GNU.sparse.numbytes=@var{numbytes} 125end repeat 126@end group 127@end smallexample 128 129This format presented the following two problems: 130 131@enumerate 1 132@item 133Whereas the POSIX specification allows a variable to appear multiple 134times in a header, it requires that only the last occurrence be 135meaningful. Thus, multiple occurrences of @code{GNU.sparse.offset} and 136@code{GNU.sparse.numbytes} are conflicting with the POSIX specs. 137 138@item 139Attempting to extract such archives using a third-party @command{tar}s 140results in extraction of sparse files in @emph{compressed form}. If 141the @command{tar} implementation in question does not support POSIX 142format, it will also extract a file containing extension header 143attributes. This file can be used to expand the file to its original 144state. However, posix-aware @command{tar}s will usually ignore the 145unknown variables, which makes restoring the file more 146difficult. @xref{extracting sparse v.0.x, Extraction of sparse 147members in v.0.0 format}, for the detailed description of how to 148restore such members using non-GNU @command{tar}s. 149@end enumerate 150 151@cindex sparse formats, v.0.1 152@GNUTAR{} 1.15.2 introduced sparse format version @code{0.1}, which 153attempted to solve these problems. As its predecessor, this format 154stores sparse map in the extended POSIX header. It retains 155@code{GNU.sparse.size} and @code{GNU.sparse.numblocks} variables, but 156instead of @code{GNU.sparse.offset}/@code{GNU.sparse.numbytes} pairs 157it uses a single variable: 158 159@table @code 160@item GNU.sparse.map 161@vrindex GNU.sparse.map, extended header variable 162Map of non-null data chunks. It is a string consisting of 163comma-separated values "@var{offset},@var{size}[,@var{offset-1},@var{size-1}...]" 164@end table 165 166To address the 2nd problem, the @code{name} field in @code{ustar} 167is replaced with a special name, constructed using the following pattern: 168 169@smallexample 170%d/GNUSparseFile.%p/%f 171@end smallexample 172 173@vrindex GNU.sparse.name, extended header variable 174The real name of the sparse file is stored in the variable 175@code{GNU.sparse.name}. Thus, those @command{tar} implementations 176that are not aware of GNU extensions will at least extract the files 177into separate directories, giving the user a possibility to expand it 178afterwards. @xref{extracting sparse v.0.x, Extraction of sparse 179members in v.0.1 format}, for the detailed description of how to 180restore such members using non-GNU @command{tar}s. 181 182The resulting @code{GNU.sparse.map} string can be @emph{very} long. 183Although POSIX does not impose any limit on the length of a @code{x} 184header variable, this possibly can confuse some tars. 185 186@node PAX 1 187@appendixsubsec PAX Format, Version 1.0 188 189@cindex sparse formats, v.1.0 190The version @code{1.0} of sparse format was introduced with @GNUTAR{} 1911.15.92. Its main objective was to make the resulting file 192extractable with little effort even by non-posix aware @command{tar} 193implementations. Starting from this version, the extended header 194preceding a sparse member always contains the following variables that 195identify the format being used: 196 197@table @code 198@item GNU.sparse.major 199@vrindex GNU.sparse.major, extended header variable 200Major version 201 202@item GNU.sparse.minor 203@vrindex GNU.sparse.minor, extended header variable 204Minor version 205@end table 206 207The @code{name} field in @code{ustar} header contains a special name, 208constructed using the following pattern: 209 210@smallexample 211%d/GNUSparseFile.%p/%f 212@end smallexample 213 214@vrindex GNU.sparse.name, extended header variable, in v.1.0 215@vrindex GNU.sparse.realsize, extended header variable 216The real name of the sparse file is stored in the variable 217@code{GNU.sparse.name}. The real size of the file is stored in the 218variable @code{GNU.sparse.realsize}. 219 220The sparse map itself is stored in the file data block, preceding the actual 221file data. It consists of a series of octal numbers of arbitrary length, delimited 222by newlines. The map is padded with nulls to the nearest block boundary. 223 224The first number gives the number of entries in the map. Following are map entries, 225each one consisting of two numbers giving the offset and size of the 226data block it describes. 227 228The format is designed in such a way that non-posix aware tars and tars not 229supporting @code{GNU.sparse.*} keywords will extract each sparse file 230in its condensed form with the file map prepended and will place it 231into a separate directory. Then, using a simple program it would be 232possible to expand the file to its original form even without @GNUTAR{}. 233@xref{Sparse Recovery}, for the detailed information on how to extract 234sparse members without @GNUTAR{}. 235 236