1tar(5) FreeBSD File Formats Manual tar(5) 2 3NAME 4 tar -- format of tape archive files 5 6DESCRIPTION 7 The tar archive format collects any number of files, directories, and 8 other file system objects (symbolic links, device nodes, etc.) into a 9 single stream of bytes. The format was originally designed to be used 10 with tape drives that operate with fixed-size blocks, but is widely used 11 as a general packaging mechanism. 12 13 General Format 14 A tar archive consists of a series of 512-byte records. Each file system 15 object requires a header record which stores basic metadata (pathname, 16 owner, permissions, etc.) and zero or more records containing any file 17 data. The end of the archive is indicated by two records consisting 18 entirely of zero bytes. 19 20 For compatibility with tape drives that use fixed block sizes, programs 21 that read or write tar files always read or write a fixed number of 22 records with each I/O operation. These ``blocks'' are always a multiple 23 of the record size. The maximum block size supported by early implemen- 24 tations was 10240 bytes or 20 records. This is still the default for 25 most implementations although block sizes of 1MiB (2048 records) or 26 larger are commonly used with modern high-speed tape drives. (Note: the 27 terms ``block'' and ``record'' here are not entirely standard; this docu- 28 ment follows the convention established by John Gilmore in documenting 29 pdtar.) 30 31 Old-Style Archive Format 32 The original tar archive format has been extended many times to include 33 additional information that various implementors found necessary. This 34 section describes the variant implemented by the tar command included in 35 Version 7 AT&T UNIX, which seems to be the earliest widely-used version 36 of the tar program. 37 38 The header record for an old-style tar archive consists of the following: 39 40 struct header_old_tar { 41 char name[100]; 42 char mode[8]; 43 char uid[8]; 44 char gid[8]; 45 char size[12]; 46 char mtime[12]; 47 char checksum[8]; 48 char linkflag[1]; 49 char linkname[100]; 50 char pad[255]; 51 }; 52 All unused bytes in the header record are filled with nulls. 53 54 name Pathname, stored as a null-terminated string. Early tar imple- 55 mentations only stored regular files (including hardlinks to 56 those files). One common early convention used a trailing "/" 57 character to indicate a directory name, allowing directory per- 58 missions and owner information to be archived and restored. 59 60 mode File mode, stored as an octal number in ASCII. 61 62 uid, gid 63 User id and group id of owner, as octal numbers in ASCII. 64 65 size Size of file, as octal number in ASCII. For regular files only, 66 this indicates the amount of data that follows the header. In 67 particular, this field was ignored by early tar implementations 68 when extracting hardlinks. Modern writers should always store a 69 zero length for hardlink entries. 70 71 mtime Modification time of file, as an octal number in ASCII. This 72 indicates the number of seconds since the start of the epoch, 73 00:00:00 UTC January 1, 1970. Note that negative values should 74 be avoided here, as they are handled inconsistently. 75 76 checksum 77 Header checksum, stored as an octal number in ASCII. To compute 78 the checksum, set the checksum field to all spaces, then sum all 79 bytes in the header using unsigned arithmetic. This field should 80 be stored as six octal digits followed by a null and a space 81 character. Note that many early implementations of tar used 82 signed arithmetic for the checksum field, which can cause inter- 83 operability problems when transferring archives between systems. 84 Modern robust readers compute the checksum both ways and accept 85 the header if either computation matches. 86 87 linkflag, linkname 88 In order to preserve hardlinks and conserve tape, a file with 89 multiple links is only written to the archive the first time it 90 is encountered. The next time it is encountered, the linkflag is 91 set to an ASCII `1' and the linkname field holds the first name 92 under which this file appears. (Note that regular files have a 93 null value in the linkflag field.) 94 95 Early tar implementations varied in how they terminated these fields. 96 The tar command in Version 7 AT&T UNIX used the following conventions 97 (this is also documented in early BSD manpages): the pathname must be 98 null-terminated; the mode, uid, and gid fields must end in a space and a 99 null byte; the size and mtime fields must end in a space; the checksum is 100 terminated by a null and a space. Early implementations filled the 101 numeric fields with leading spaces. This seems to have been common prac- 102 tice until the IEEE Std 1003.1-1988 (``POSIX.1'') standard was released. 103 For best portability, modern implementations should fill the numeric 104 fields with leading zeros. 105 106 Pre-POSIX Archives 107 An early draft of IEEE Std 1003.1-1988 (``POSIX.1'') served as the basis 108 for John Gilmore's pdtar program and many system implementations from the 109 late 1980s and early 1990s. These archives generally follow the POSIX 110 ustar format described below with the following variations: 111 o The magic value is ``ustar '' (note the following space). The 112 version field contains a space character followed by a null. 113 o The numeric fields are generally filled with leading spaces (not 114 leading zeros as recommended in the final standard). 115 o The prefix field is often not used, limiting pathnames to the 100 116 characters of old-style archives. 117 118 POSIX ustar Archives 119 IEEE Std 1003.1-1988 (``POSIX.1'') defined a standard tar file format to 120 be read and written by compliant implementations of tar(1). This format 121 is often called the ``ustar'' format, after the magic value used in the 122 header. (The name is an acronym for ``Unix Standard TAR''.) It extends 123 the historic format with new fields: 124 125 struct header_posix_ustar { 126 char name[100]; 127 char mode[8]; 128 char uid[8]; 129 char gid[8]; 130 char size[12]; 131 char mtime[12]; 132 char checksum[8]; 133 char typeflag[1]; 134 char linkname[100]; 135 char magic[6]; 136 char version[2]; 137 char uname[32]; 138 char gname[32]; 139 char devmajor[8]; 140 char devminor[8]; 141 char prefix[155]; 142 char pad[12]; 143 }; 144 145 typeflag 146 Type of entry. POSIX extended the earlier linkflag field with 147 several new type values: 148 ``0'' Regular file. NUL should be treated as a synonym, for 149 compatibility purposes. 150 ``1'' Hard link. 151 ``2'' Symbolic link. 152 ``3'' Character device node. 153 ``4'' Block device node. 154 ``5'' Directory. 155 ``6'' FIFO node. 156 ``7'' Reserved. 157 Other A POSIX-compliant implementation must treat any unrecog- 158 nized typeflag value as a regular file. In particular, 159 writers should ensure that all entries have a valid file- 160 name so that they can be restored by readers that do not 161 support the corresponding extension. Uppercase letters 162 "A" through "Z" are reserved for custom extensions. Note 163 that sockets and whiteout entries are not archivable. 164 It is worth noting that the size field, in particular, has dif- 165 ferent meanings depending on the type. For regular files, of 166 course, it indicates the amount of data following the header. 167 For directories, it may be used to indicate the total size of all 168 files in the directory, for use by operating systems that pre- 169 allocate directory space. For all other types, it should be set 170 to zero by writers and ignored by readers. 171 172 magic Contains the magic value ``ustar'' followed by a NUL byte to 173 indicate that this is a POSIX standard archive. Full compliance 174 requires the uname and gname fields be properly set. 175 176 version 177 Version. This should be ``00'' (two copies of the ASCII digit 178 zero) for POSIX standard archives. 179 180 uname, gname 181 User and group names, as null-terminated ASCII strings. These 182 should be used in preference to the uid/gid values when they are 183 set and the corresponding names exist on the system. 184 185 devmajor, devminor 186 Major and minor numbers for character device or block device 187 entry. 188 189 name, prefix 190 If the pathname is too long to fit in the 100 bytes provided by 191 the standard format, it can be split at any / character with the 192 first portion going into the prefix field. If the prefix field 193 is not empty, the reader will prepend the prefix value and a / 194 character to the regular name field to obtain the full pathname. 195 The standard does not require a trailing / character on directory 196 names, though most implementations still include this for compat- 197 ibility reasons. 198 199 Note that all unused bytes must be set to NUL. 200 201 Field termination is specified slightly differently by POSIX than by pre- 202 vious implementations. The magic, uname, and gname fields must have a 203 trailing NUL. The pathname, linkname, and prefix fields must have a 204 trailing NUL unless they fill the entire field. (In particular, it is 205 possible to store a 256-character pathname if it happens to have a / as 206 the 156th character.) POSIX requires numeric fields to be zero-padded in 207 the front, and requires them to be terminated with either space or NUL 208 characters. 209 210 Currently, most tar implementations comply with the ustar format, occa- 211 sionally extending it by adding new fields to the blank area at the end 212 of the header record. 213 214 Pax Interchange Format 215 There are many attributes that cannot be portably stored in a POSIX ustar 216 archive. IEEE Std 1003.1-2001 (``POSIX.1'') defined a ``pax interchange 217 format'' that uses two new types of entries to hold text-formatted meta- 218 data that applies to following entries. Note that a pax interchange for- 219 mat archive is a ustar archive in every respect. The new data is stored 220 in ustar-compatible archive entries that use the ``x'' or ``g'' typeflag. 221 In particular, older implementations that do not fully support these 222 extensions will extract the metadata into regular files, where the meta- 223 data can be examined as necessary. 224 225 An entry in a pax interchange format archive consists of one or two stan- 226 dard ustar entries, each with its own header and data. The first 227 optional entry stores the extended attributes for the following entry. 228 This optional first entry has an "x" typeflag and a size field that indi- 229 cates the total size of the extended attributes. The extended attributes 230 themselves are stored as a series of text-format lines encoded in the 231 portable UTF-8 encoding. Each line consists of a decimal number, a 232 space, a key string, an equals sign, a value string, and a new line. The 233 decimal number indicates the length of the entire line, including the 234 initial length field and the trailing newline. An example of such a 235 field is: 236 25 ctime=1084839148.1212\n 237 Keys in all lowercase are standard keys. Vendors can add their own keys 238 by prefixing them with an all uppercase vendor name and a period. Note 239 that, unlike the historic header, numeric values are stored using deci- 240 mal, not octal. A description of some common keys follows: 241 242 atime, ctime, mtime 243 File access, inode change, and modification times. These fields 244 can be negative or include a decimal point and a fractional 245 value. 246 247 uname, uid, gname, gid 248 User name, group name, and numeric UID and GID values. The user 249 name and group name stored here are encoded in UTF8 and can thus 250 include non-ASCII characters. The UID and GID fields can be of 251 arbitrary length. 252 253 linkpath 254 The full path of the linked-to file. Note that this is encoded 255 in UTF8 and can thus include non-ASCII characters. 256 257 path The full pathname of the entry. Note that this is encoded in 258 UTF8 and can thus include non-ASCII characters. 259 260 realtime.*, security.* 261 These keys are reserved and may be used for future standardiza- 262 tion. 263 264 size The size of the file. Note that there is no length limit on this 265 field, allowing conforming archives to store files much larger 266 than the historic 8GB limit. 267 268 SCHILY.* 269 Vendor-specific attributes used by Joerg Schilling's star imple- 270 mentation. 271 272 SCHILY.acl.access, SCHILY.acl.default 273 Stores the access and default ACLs as textual strings in a format 274 that is an extension of the format specified by POSIX.1e draft 275 17. In particular, each user or group access specification can 276 include a fourth colon-separated field with the numeric UID or 277 GID. This allows ACLs to be restored on systems that may not 278 have complete user or group information available (such as when 279 NIS/YP or LDAP services are temporarily unavailable). 280 281 SCHILY.devminor, SCHILY.devmajor 282 The full minor and major numbers for device nodes. 283 284 SCHILY.fflags 285 The file flags. 286 287 SCHILY.realsize 288 The full size of the file on disk. XXX explain? XXX 289 290 SCHILY.dev, SCHILY.ino, SCHILY.nlinks 291 The device number, inode number, and link count for the entry. 292 In particular, note that a pax interchange format archive using 293 Joerg Schilling's SCHILY.* extensions can store all of the data 294 from struct stat. 295 296 LIBARCHIVE.xattr.namespace.key 297 Libarchive stores POSIX.1e-style extended attributes using keys 298 of this form. The key value is URL-encoded: All non-ASCII char- 299 acters and the two special characters ``='' and ``%'' are encoded 300 as ``%'' followed by two uppercase hexadecimal digits. The value 301 of this key is the extended attribute value encoded in base 64. 302 XXX Detail the base-64 format here XXX 303 304 VENDOR.* 305 XXX document other vendor-specific extensions XXX 306 307 Any values stored in an extended attribute override the corresponding 308 values in the regular tar header. Note that compliant readers should 309 ignore the regular fields when they are overridden. This is important, 310 as existing archivers are known to store non-compliant values in the 311 standard header fields in this situation. There are no limits on length 312 for any of these fields. In particular, numeric fields can be arbitrar- 313 ily large. All text fields are encoded in UTF8. Compliant writers 314 should store only portable 7-bit ASCII characters in the standard ustar 315 header and use extended attributes whenever a text value contains non- 316 ASCII characters. 317 318 In addition to the x entry described above, the pax interchange format 319 also supports a g entry. The g entry is identical in format, but speci- 320 fies attributes that serve as defaults for all subsequent archive 321 entries. The g entry is not widely used. 322 323 Besides the new x and g entries, the pax interchange format has a few 324 other minor variations from the earlier ustar format. The most troubling 325 one is that hardlinks are permitted to have data following them. This 326 allows readers to restore any hardlink to a file without having to rewind 327 the archive to find an earlier entry. However, it creates complications 328 for robust readers, as it is no longer clear whether or not they should 329 ignore the size field for hardlink entries. 330 331 GNU Tar Archives 332 The GNU tar program started with a pre-POSIX format similar to that 333 described earlier and has extended it using several different mechanisms: 334 It added new fields to the empty space in the header (some of which was 335 later used by POSIX for conflicting purposes); it allowed the header to 336 be continued over multiple records; and it defined new entries that mod- 337 ify following entries (similar in principle to the x entry described 338 above, but each GNU special entry is single-purpose, unlike the general- 339 purpose x entry). As a result, GNU tar archives are not POSIX compati- 340 ble, although more lenient POSIX-compliant readers can successfully 341 extract most GNU tar archives. 342 343 struct header_gnu_tar { 344 char name[100]; 345 char mode[8]; 346 char uid[8]; 347 char gid[8]; 348 char size[12]; 349 char mtime[12]; 350 char checksum[8]; 351 char typeflag[1]; 352 char linkname[100]; 353 char magic[6]; 354 char version[2]; 355 char uname[32]; 356 char gname[32]; 357 char devmajor[8]; 358 char devminor[8]; 359 char atime[12]; 360 char ctime[12]; 361 char offset[12]; 362 char longnames[4]; 363 char unused[1]; 364 struct { 365 char offset[12]; 366 char numbytes[12]; 367 } sparse[4]; 368 char isextended[1]; 369 char realsize[12]; 370 char pad[17]; 371 }; 372 373 typeflag 374 GNU tar uses the following special entry types, in addition to 375 those defined by POSIX: 376 377 7 GNU tar treats type "7" records identically to type "0" 378 records, except on one obscure RTOS where they are used 379 to indicate the pre-allocation of a contiguous file on 380 disk. 381 382 D This indicates a directory entry. Unlike the POSIX-stan- 383 dard "5" typeflag, the header is followed by data records 384 listing the names of files in this directory. Each name 385 is preceded by an ASCII "Y" if the file is stored in this 386 archive or "N" if the file is not stored in this archive. 387 Each name is terminated with a null, and an extra null 388 marks the end of the name list. The purpose of this 389 entry is to support incremental backups; a program 390 restoring from such an archive may wish to delete files 391 on disk that did not exist in the directory when the ar- 392 chive was made. 393 394 Note that the "D" typeflag specifically violates POSIX, 395 which requires that unrecognized typeflags be restored as 396 normal files. In this case, restoring the "D" entry as a 397 file could interfere with subsequent creation of the 398 like-named directory. 399 400 K The data for this entry is a long linkname for the fol- 401 lowing regular entry. 402 403 L The data for this entry is a long pathname for the fol- 404 lowing regular entry. 405 406 M This is a continuation of the last file on the previous 407 volume. GNU multi-volume archives guarantee that each 408 volume begins with a valid entry header. To ensure this, 409 a file may be split, with part stored at the end of one 410 volume, and part stored at the beginning of the next vol- 411 ume. The "M" typeflag indicates that this entry contin- 412 ues an existing file. Such entries can only occur as the 413 first or second entry in an archive (the latter only if 414 the first entry is a volume label). The size field spec- 415 ifies the size of this entry. The offset field at bytes 416 369-380 specifies the offset where this file fragment 417 begins. The realsize field specifies the total size of 418 the file (which must equal size plus offset). When 419 extracting, GNU tar checks that the header file name is 420 the one it is expecting, that the header offset is in the 421 correct sequence, and that the sum of offset and size is 422 equal to realsize. 423 424 N Type "N" records are no longer generated by GNU tar. 425 They contained a list of files to be renamed or symlinked 426 after extraction; this was originally used to support 427 long names. The contents of this record are a text 428 description of the operations to be done, in the form 429 ``Rename %s to %s\n'' or ``Symlink %s to %s\n''; in 430 either case, both filenames are escaped using K&R C syn- 431 tax. Due to security concerns, "N" records are now gen- 432 erally ignored when reading archives. 433 434 S This is a ``sparse'' regular file. Sparse files are 435 stored as a series of fragments. The header contains a 436 list of fragment offset/length pairs. If more than four 437 such entries are required, the header is extended as nec- 438 essary with ``extra'' header extensions (an older format 439 that is no longer used), or ``sparse'' extensions. 440 441 V The name field should be interpreted as a tape/volume 442 header name. This entry should generally be ignored on 443 extraction. 444 445 magic The magic field holds the five characters ``ustar'' followed by a 446 space. Note that POSIX ustar archives have a trailing null. 447 448 version 449 The version field holds a space character followed by a null. 450 Note that POSIX ustar archives use two copies of the ASCII digit 451 ``0''. 452 453 atime, ctime 454 The time the file was last accessed and the time of last change 455 of file information, stored in octal as with mtime. 456 457 longnames 458 This field is apparently no longer used. 459 460 Sparse offset / numbytes 461 Each such structure specifies a single fragment of a sparse file. 462 The two fields store values as octal numbers. The fragments are 463 each padded to a multiple of 512 bytes in the archive. On 464 extraction, the list of fragments is collected from the header 465 (including any extension headers), and the data is then read and 466 written to the file at appropriate offsets. 467 468 isextended 469 If this is set to non-zero, the header will be followed by addi- 470 tional ``sparse header'' records. Each such record contains 471 information about as many as 21 additional sparse blocks as shown 472 here: 473 474 struct gnu_sparse_header { 475 struct { 476 char offset[12]; 477 char numbytes[12]; 478 } sparse[21]; 479 char isextended[1]; 480 char padding[7]; 481 }; 482 483 realsize 484 A binary representation of the file's complete size, with a much 485 larger range than the POSIX file size. In particular, with M 486 type files, the current entry is only a portion of the file. In 487 that case, the POSIX size field will indicate the size of this 488 entry; the realsize field will indicate the total size of the 489 file. 490 491 GNU tar pax archives 492 GNU tar 1.14 (XXX check this XXX) and later will write pax interchange 493 format archives when you specify the --posix flag. This format uses cus- 494 tom keywords to store sparse file information. There have been three 495 iterations of this support, referred to as ``0.0'', ``0.1'', and ``1.0''. 496 497 GNU.sparse.numblocks, GNU.sparse.offset, GNU.sparse.numbytes, 498 GNU.sparse.size 499 The ``0.0'' format used an initial GNU.sparse.numblocks attribute 500 to indicate the number of blocks in the file, a pair of 501 GNU.sparse.offset and GNU.sparse.numbytes to indicate the offset 502 and size of each block, and a single GNU.sparse.size to indicate 503 the full size of the file. This is not the same as the size in 504 the tar header because the latter value does not include the size 505 of any holes. This format required that the order of attributes 506 be preserved and relied on readers accepting multiple appearances 507 of the same attribute names, which is not officially permitted by 508 the standards. 509 510 GNU.sparse.map 511 The ``0.1'' format used a single attribute that stored a comma- 512 separated list of decimal numbers. Each pair of numbers indi- 513 cated the offset and size, respectively, of a block of data. 514 This does not work well if the archive is extracted by an 515 archiver that does not recognize this extension, since many pax 516 implementations simply discard unrecognized attributes. 517 518 GNU.sparse.major, GNU.sparse.minor, GNU.sparse.name, GNU.sparse.realsize 519 The ``1.0'' format stores the sparse block map in one or more 520 512-byte blocks prepended to the file data in the entry body. 521 The pax attributes indicate the existence of this map (via the 522 GNU.sparse.major and GNU.sparse.minor fields) and the full size 523 of the file. The GNU.sparse.name holds the true name of the 524 file. To avoid confusion, the name stored in the regular tar 525 header is a modified name so that extraction errors will be 526 apparent to users. 527 528 Solaris Tar 529 XXX More Details Needed XXX 530 531 Solaris tar (beginning with SunOS XXX 5.7 ?? XXX) supports an 532 ``extended'' format that is fundamentally similar to pax interchange for- 533 mat, with the following differences: 534 o Extended attributes are stored in an entry whose type is X, not 535 x, as used by pax interchange format. The detailed format of 536 this entry appears to be the same as detailed above for the x 537 entry. 538 o An additional A entry is used to store an ACL for the following 539 regular entry. The body of this entry contains a seven-digit 540 octal number followed by a zero byte, followed by the textual ACL 541 description. The octal value is the number of ACL entries plus a 542 constant that indicates the ACL type: 01000000 for POSIX.1e ACLs 543 and 03000000 for NFSv4 ACLs. 544 545 AIX Tar 546 XXX More details needed XXX 547 548 Mac OS X Tar 549 The tar distributed with Apple's Mac OS X stores most regular files as 550 two separate entries in the tar archive. The two entries have the same 551 name except that the first one has ``._'' added to the beginning of the 552 name. This first entry stores the ``resource fork'' with additional 553 attributes for the file. The Mac OS X CopyFile() API is used to separate 554 a file on disk into separate resource and data streams and to reassemble 555 those separate streams when the file is restored to disk. 556 557 Other Extensions 558 One obvious extension to increase the size of files is to eliminate the 559 terminating characters from the various numeric fields. For example, the 560 standard only allows the size field to contain 11 octal digits, reserving 561 the twelfth byte for a trailing NUL character. Allowing 12 octal digits 562 allows file sizes up to 64 GB. 563 564 Another extension, utilized by GNU tar, star, and other newer tar imple- 565 mentations, permits binary numbers in the standard numeric fields. This 566 is flagged by setting the high bit of the first byte. This permits 567 95-bit values for the length and time fields and 63-bit values for the 568 uid, gid, and device numbers. GNU tar supports this extension for the 569 length, mtime, ctime, and atime fields. Joerg Schilling's star program 570 supports this extension for all numeric fields. Note that this extension 571 is largely obsoleted by the extended attribute record provided by the pax 572 interchange format. 573 574 Another early GNU extension allowed base-64 values rather than octal. 575 This extension was short-lived and is no longer supported by any imple- 576 mentation. 577 578SEE ALSO 579 ar(1), pax(1), tar(1) 580 581STANDARDS 582 The tar utility is no longer a part of POSIX or the Single Unix Standard. 583 It last appeared in Version 2 of the Single UNIX Specification 584 (``SUSv2''). It has been supplanted in subsequent standards by pax(1). 585 The ustar format is currently part of the specification for the pax(1) 586 utility. The pax interchange file format is new with IEEE Std 587 1003.1-2001 (``POSIX.1''). 588 589HISTORY 590 A tar command appeared in Seventh Edition Unix, which was released in 591 January, 1979. It replaced the tp program from Fourth Edition Unix which 592 in turn replaced the tap program from First Edition Unix. John Gilmore's 593 pdtar public-domain implementation (circa 1987) was highly influential 594 and formed the basis of GNU tar (circa 1988). Joerg Shilling's star 595 archiver is another open-source (GPL) archiver (originally developed 596 circa 1985) which features complete support for pax interchange format. 597 598 This documentation was written as part of the libarchive and bsdtar 599 project by Tim Kientzle <kientzle@FreeBSD.org>. 600 601FreeBSD 9.0 December 27, 2009 FreeBSD 9.0 602