21.br 22.B file 23.B -C 24[ 25.B \-m 26magicfile ] 27.SH DESCRIPTION 28This manual page documents version __VERSION__ of the 29.B file 30command. 31.PP 32.B File 33tests each argument in an attempt to classify it. 34There are three sets of tests, performed in this order: 35filesystem tests, magic number tests, and language tests. 36The 37.I first 38test that succeeds causes the file type to be printed. 39.PP 40The type printed will usually contain one of the words 41.B text 42(the file contains only 43printing characters and a few common control 44characters and is probably safe to read on an 45.SM ASCII 46terminal), 47.B executable 48(the file contains the result of compiling a program 49in a form understandable to some \s-1UNIX\s0 kernel or another), 50or 51.B data 52meaning anything else (data is usually `binary' or non-printable). 53Exceptions are well-known file formats (core files, tar archives) 54that are known to contain binary data. 55When modifying the file 56.I __MAGIC__ 57or the program itself, 58.B "preserve these keywords" . 59People depend on knowing that all the readable files in a directory 60have the word ``text'' printed. 61Don't do as Berkeley did and change ``shell commands text'' 62to ``shell script''. 63Note that the file 64.I __MAGIC__ 65is built mechanically from a large number of small files in 66the subdirectory 67.I Magdir 68in the source distribution of this program. 69.PP 70The filesystem tests are based on examining the return from a 71.BR stat (2) 72system call. 73The program checks to see if the file is empty, 74or if it's some sort of special file. 75Any known file types appropriate to the system you are running on 76(sockets, symbolic links, or named pipes (FIFOs) on those systems that 77implement them) 78are intuited if they are defined in 79the system header file 80.IR <sys/stat.h> . 81.PP 82The magic number tests are used to check for files with data in 83particular fixed formats. 84The canonical example of this is a binary executable (compiled program) 85.I a.out 86file, whose format is defined in 87.I a.out.h 88and possibly 89.I exec.h 90in the standard include directory. 91These files have a `magic number' stored in a particular place 92near the beginning of the file that tells the \s-1UNIX\s0 operating system 93that the file is a binary executable, and which of several types thereof. 94The concept of `magic number' has been applied by extension to data files. 95Any file with some invariant identifier at a small fixed 96offset into the file can usually be described in this way. 97The information identifying these files is read from the compiled 98magic file 99.I __MAGIC__.mgc , 100or 101.I __MAGIC__ 102if the compile file does not exist. 103.PP 104If a file does not match any of the entries in the magic file, 105it is examined to see if it seems to be a text file. 106ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets 107(such as those used on Macintosh and IBM PC systems), 108UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC 109character sets can be distinguished by the different 110ranges and sequences of bytes that constitute printable text 111in each set. 112If a file passes any of these tests, its character set is reported. 113ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified 114as ``text'' because they will be mostly readable on nearly any terminal; 115UTF-16 and EBCDIC are only ``character data'' because, while 116they contain text, it is text that will require translation 117before it can be read. 118In addition, 119.B file 120will attempt to determine other characteristics of text-type files. 121If the lines of a file are terminated by CR, CRLF, or NEL, instead 122of the Unix-standard LF, this will be reported. 123Files that contain embedded escape sequences or overstriking 124will also be identified. 125.PP 126Once 127.B file 128has determined the character set used in a text-type file, 129it will 130attempt to determine in what language the file is written. 131The language tests look for particular strings (cf 132.IR names.h ) 133that can appear anywhere in the first few blocks of a file. 134For example, the keyword 135.B .br 136indicates that the file is most likely a 137.BR troff (1) 138input file, just as the keyword 139.B struct 140indicates a C program. 141These tests are less reliable than the previous 142two groups, so they are performed last. 143The language test routines also test for some miscellany 144(such as 145.BR tar (1) 146archives). 147.PP 148Any file that cannot be identified as having been written 149in any of the character sets listed above is simply said to be ``data''. 150.SH OPTIONS 151.TP 8 152.B \-b 153Do not prepend filenames to output lines (brief mode). 154.TP 8 155.B \-c 156Cause a checking printout of the parsed form of the magic file. 157This is usually used in conjunction with
|
190Specify an alternate list of files containing magic numbers. 191This can be a single file, or a colon-separated list of files. 192.TP 8 193.B \-n 194Force stdout to be flushed after checking each file. This is only useful if 195checking a list of files. It is intended to be used by programs that want 196filetype output from a pipe. 197.TP 8 198.B \-v 199Print the version of the program and exit. 200.TP 8 201.B \-z 202Try to look inside compressed files. 203.TP 8 204.B \-L 205option causes symlinks to be followed, as the like-named option in 206.BR ls (1). 207(on systems that support symbolic links). 208.TP 8 209.B \-s 210Normally, 211.B file 212only attempts to read and determine the type of argument files which 213.BR stat (2) 214reports are ordinary files. 215This prevents problems, because reading special files may have peculiar 216consequences. 217Specifying the 218.BR \-s 219option causes 220.B file 221to also read argument files which are block or character special files. 222This is useful for determining the filesystem types of the data in raw 223disk partitions, which are block special files. 224This option also causes 225.B file 226to disregard the file size as reported by 227.BR stat (2) 228since on some systems it reports a zero size for raw disk partitions. 229.SH FILES 230.I __MAGIC__.mgc 231\- defaults compiled list of magic numbers 232.PP 233.I __MAGIC__ 234\- default list of magic numbers 235.PP 236.I __MAGIC__.mime 237\- default list of magic numbers, used to output mime types when the -i option 238is specified. 239 240.SH ENVIRONMENT 241The environment variable 242.B MAGIC 243can be used to set the default magic number files. 244.SH SEE ALSO 245.BR magic (__FSECTION__) 246\- description of magic file format. 247.br 248.BR strings (1), " od" (1), " hexdump(1)" 249\- tools for examining non-textfiles. 250.SH STANDARDS CONFORMANCE 251This program is believed to exceed the System V Interface Definition 252of FILE(CMD), as near as one can determine from the vague language 253contained therein. 254Its behaviour is mostly compatible with the System V program of the same name. 255This version knows more magic, however, so it will produce 256different (albeit more accurate) output in many cases. 257.PP 258The one significant difference 259between this version and System V 260is that this version treats any white space 261as a delimiter, so that spaces in pattern strings must be escaped. 262For example, 263.br 264>10 string language impress\ (imPRESS data) 265.br 266in an existing magic file would have to be changed to 267.br 268>10 string language\e impress (imPRESS data) 269.br 270In addition, in this version, if a pattern string contains a backslash, 271it must be escaped. For example 272.br 2730 string \ebegindata Andrew Toolkit document 274.br 275in an existing magic file would have to be changed to 276.br 2770 string \e\ebegindata Andrew Toolkit document 278.br 279.PP 280SunOS releases 3.2 and later from Sun Microsystems include a 281.BR file (1) 282command derived from the System V one, but with some extensions. 283My version differs from Sun's only in minor ways. 284It includes the extension of the `&' operator, used as, 285for example, 286.br 287>16 long&0x7fffffff >0 not stripped 288.SH MAGIC DIRECTORY 289The magic file entries have been collected from various sources, 290mainly USENET, and contributed by various authors. 291Christos Zoulas (address below) will collect additional 292or corrected magic file entries. 293A consolidation of magic file entries 294will be distributed periodically. 295.PP 296The order of entries in the magic file is significant. 297Depending on what system you are using, the order that 298they are put together may be incorrect. 299If your old 300.B file 301command uses a magic file, 302keep the old magic file around for comparison purposes 303(rename it to 304.IR __MAGIC__.orig ). 305.SH EXAMPLES 306.nf 307$ file file.c file /dev/hda 308file.c: C program text 309file: ELF 32-bit LSB executable, Intel 80386, version 1, 310 dynamically linked, not stripped 311/dev/hda: block special 312 313$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} 314/dev/hda: x86 boot sector 315/dev/hda1: Linux/i386 ext2 filesystem 316/dev/hda2: x86 boot sector 317/dev/hda3: x86 boot sector, extended partition table 318/dev/hda4: Linux/i386 ext2 filesystem 319/dev/hda5: Linux/i386 swap file 320/dev/hda6: Linux/i386 swap file 321/dev/hda7: Linux/i386 swap file 322/dev/hda8: Linux/i386 swap file 323/dev/hda9: empty 324/dev/hda10: empty 325 326$ file -i file.c file /dev/hda 327file.c: text/x-c 328file: application/x-executable, dynamically linked (uses shared libs), not stripped 329/dev/hda: application/x-not-regular-file 330 331.fi 332.SH HISTORY 333There has been a 334.B file 335command in every \s-1UNIX\s0 since at least Research Version 6 336(man page dated January 16, 1975). 337The System V version introduced one significant major change: 338the external list of magic number types. 339This slowed the program down slightly but made it a lot more flexible. 340.PP 341This program, based on the System V version, 342was written by Ian Darwin <ian@darwinsys.com> 343without looking at anybody else's source code. 344.PP 345John Gilmore revised the code extensively, making it better than 346the first version. 347Geoff Collyer found several inadequacies 348and provided some magic file entries. 349Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989. 350.PP 351Guy Harris, guy@netapp.com, made many changes from 1993 to the present. 352.PP 353Primary development and maintenance from 1990 to the present by 354Christos Zoulas (christos@astron.com). 355.PP 356Altered by Chris Lowth, chris@lowth.com, 2000: 357Handle the ``-i'' option to output mime type strings and using an alternative 358magic file and internal logic. 359.PP 360Altered by Eric Fischer (enf@pobox.com), July, 2000, 361to identify character codes and attempt to identify the languages 362of non-ASCII files. 363.PP 364The list of contributors to the "Magdir" directory (source for the 365/etc/magic 366file) is too long to include here. You know who you are; thank you. 367.SH LEGAL NOTICE 368Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. 369Covered by the standard Berkeley Software Distribution copyright; see the file 370LEGAL.NOTICE in the source distribution. 371.PP 372The files 373.I tar.h 374and 375.I is_tar.c 376were written by John Gilmore from his public-domain 377.B tar 378program, and are not covered by the above license. 379.SH BUGS 380There must be a better way to automate the construction of the Magic 381file from all the glop in Magdir. What is it? 382Better yet, the magic file should be compiled into binary (say, 383.BR ndbm (3) 384or, better yet, fixed-length 385.SM ASCII 386strings for use in heterogenous network environments) for faster startup. 387Then the program would run as fast as the Version 7 program of the same name, 388with the flexibility of the System V version. 389.PP 390.B File 391uses several algorithms that favor speed over accuracy, 392thus it can be misled about the contents of 393text 394files. 395.PP 396The support for 397text 398files (primarily for programming languages) 399is simplistic, inefficient and requires recompilation to update. 400.PP 401There should be an ``else'' clause to follow a series of continuation lines. 402.PP 403The magic file and keywords should have regular expression support. 404Their use of 405.SM "ASCII TAB" 406as a field delimiter is ugly and makes 407it hard to edit the files, but is entrenched. 408.PP 409It might be advisable to allow upper-case letters in keywords 410for e.g., 411.BR troff (1) 412commands vs man page macros. 413Regular expression support would make this easy. 414.PP 415The program doesn't grok \s-2FORTRAN\s0. 416It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which 417appear indented at the start of line. 418Regular expression support would make this easy. 419.PP 420The list of keywords in 421.I ascmagic 422probably belongs in the Magic file. 423This could be done by using some keyword like `*' for the offset value. 424.PP 425Another optimisation would be to sort 426the magic file so that we can just run down all the 427tests for the first byte, first word, first long, etc, once we 428have fetched it. Complain about conflicts in the magic file entries. 429Make a rule that the magic entries sort based on file offset rather 430than position within the magic file? 431.PP 432The program should provide a way to give an estimate 433of ``how good'' a guess is. 434We end up removing guesses (e.g. ``From '' as first 5 chars of file) because 435they are not as good as other guesses (e.g. ``Newsgroups:'' versus 436``Return-Path:''). Still, if the others don't pan out, it should be 437possible to use the first guess. 438.PP 439This program is slower than some vendors' file commands. 440The new support for multiple character codes makes it even slower. 441.PP 442This manual page, and particularly this section, is too long. 443.SH AVAILABILITY 444You can obtain the original author's latest version by anonymous FTP 445on 446.B ftp.astron.com 447in the directory 448.I /pub/file/file-X.YY.tar.gz
|