Deleted Added
sdiff udiff text old ( 103373 ) new ( 110949 )
full compact
1.TH FILE __CSECTION__ "Copyright but distributable"
2.\" $Id: file.man,v 1.42 2002/07/03 18:26:37 christos Exp $
3.SH NAME
4file
5\- determine file type
6.SH SYNOPSIS
7.B file
8[
9.B \-bciknsvzL
10]
11[
12.B \-f
13.I namefile
14]
15[
16.B \-m
17.I magicfiles
18]
19.I file
20\*[Am]...
21.br
22.B file
23.B -C
24[
25.B \-m
26magicfile ]
27.SH DESCRIPTION
28This manual page documents version __VERSION__ of the
29.B file
30command.
31.PP
32.B File
33tests each argument in an attempt to classify it.
34There are three sets of tests, performed in this order:
35filesystem tests, magic number tests, and language tests.
36The
37.I first
38test that succeeds causes the file type to be printed.
39.PP
40The type printed will usually contain one of the words
41.B text
42(the file contains only
43printing characters and a few common control
44characters and is probably safe to read on an
45.SM ASCII
46terminal),
47.B executable
48(the file contains the result of compiling a program
49in a form understandable to some \s-1UNIX\s0 kernel or another),
50or
51.B data
52meaning anything else (data is usually `binary' or non-printable).
53Exceptions are well-known file formats (core files, tar archives)
54that are known to contain binary data.
55When modifying the file
56.I __MAGIC__
57or the program itself,
58.B "preserve these keywords" .
59People depend on knowing that all the readable files in a directory
60have the word ``text'' printed.
61Don't do as Berkeley did and change ``shell commands text''
62to ``shell script''.
63Note that the file
64.I __MAGIC__
65is built mechanically from a large number of small files in
66the subdirectory
67.I Magdir
68in the source distribution of this program.
69.PP
70The filesystem tests are based on examining the return from a
71.BR stat (2)
72system call.
73The program checks to see if the file is empty,
74or if it's some sort of special file.
75Any known file types appropriate to the system you are running on
76(sockets, symbolic links, or named pipes (FIFOs) on those systems that
77implement them)
78are intuited if they are defined in
79the system header file
80.IR \*[Lt]sys/stat.h\*[Gt] .
81.PP
82The magic number tests are used to check for files with data in
83particular fixed formats.
84The canonical example of this is a binary executable (compiled program)
85.I a.out
86file, whose format is defined in
87.I a.out.h
88and possibly
89.I exec.h
90in the standard include directory.
91These files have a `magic number' stored in a particular place
92near the beginning of the file that tells the \s-1UNIX\s0 operating system
93that the file is a binary executable, and which of several types thereof.
94The concept of `magic number' has been applied by extension to data files.
95Any file with some invariant identifier at a small fixed
96offset into the file can usually be described in this way.
97The information identifying these files is read from the compiled
98magic file
99.I __MAGIC__.mgc ,
100or
101.I __MAGIC__
102if the compile file does not exist.
103.PP
104If a file does not match any of the entries in the magic file,
105it is examined to see if it seems to be a text file.
106ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
107(such as those used on Macintosh and IBM PC systems),
108UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
109character sets can be distinguished by the different
110ranges and sequences of bytes that constitute printable text
111in each set.
112If a file passes any of these tests, its character set is reported.
113ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
114as ``text'' because they will be mostly readable on nearly any terminal;
115UTF-16 and EBCDIC are only ``character data'' because, while
116they contain text, it is text that will require translation
117before it can be read.
118In addition,
119.B file
120will attempt to determine other characteristics of text-type files.
121If the lines of a file are terminated by CR, CRLF, or NEL, instead
122of the Unix-standard LF, this will be reported.
123Files that contain embedded escape sequences or overstriking
124will also be identified.
125.PP
126Once
127.B file
128has determined the character set used in a text-type file,
129it will
130attempt to determine in what language the file is written.
131The language tests look for particular strings (cf
132.IR names.h )
133that can appear anywhere in the first few blocks of a file.
134For example, the keyword
135.B .br
136indicates that the file is most likely a
137.BR troff (1)
138input file, just as the keyword
139.B struct
140indicates a C program.
141These tests are less reliable than the previous
142two groups, so they are performed last.
143The language test routines also test for some miscellany
144(such as
145.BR tar (1)
146archives).
147.PP
148Any file that cannot be identified as having been written
149in any of the character sets listed above is simply said to be ``data''.
150.SH OPTIONS
151.TP 8
152.B \-b
153Do not prepend filenames to output lines (brief mode).
154.TP 8
155.B \-c
156Cause a checking printout of the parsed form of the magic file.
157This is usually used in conjunction with
158.B \-m
159to debug a new magic file before installing it.
160.TP 8
161.B \-C
162Write a magic.mgc output file that contains a pre-parsed version of
163file.
164.TP 8
165.BI \-f " namefile"
166Read the names of the files to be examined from
167.I namefile
168(one per line)
169before the argument list.
170Either
171.I namefile
172or at least one filename argument must be present;
173to test the standard input, use ``\-'' as a filename argument.
174.TP 8
175.B \-i
176Causes the file command to output mime type strings rather than the more
177traditional human readable ones. Thus it may say
178``text/plain; charset=us-ascii''
179rather
180than ``ASCII text''. In order for this option to work, file changes the way
181it handles files recognised by the command itself (such as many of the
182text file types, directories etc), and makes use of an alternative
183``magic'' file.
184(See ``FILES'' section, below).
185.TP 8
186.B \-k
187Don't stop at the first match, keep going.
188.TP 8
189.BI \-m " list"
190Specify an alternate list of files containing magic numbers.
191This can be a single file, or a colon-separated list of files.
192.TP 8
193.B \-n
194Force stdout to be flushed after checking each file. This is only useful if
195checking a list of files. It is intended to be used by programs that want
196filetype output from a pipe.
197.TP 8
198.B \-v
199Print the version of the program and exit.
200.TP 8
201.B \-z
202Try to look inside compressed files.
203.TP 8
204.B \-L
205option causes symlinks to be followed, as the like-named option in
206.BR ls (1).
207(on systems that support symbolic links).
208.TP 8
209.B \-s
210Normally,
211.B file
212only attempts to read and determine the type of argument files which
213.BR stat (2)
214reports are ordinary files.
215This prevents problems, because reading special files may have peculiar
216consequences.
217Specifying the
218.BR \-s
219option causes
220.B file
221to also read argument files which are block or character special files.
222This is useful for determining the filesystem types of the data in raw
223disk partitions, which are block special files.
224This option also causes
225.B file
226to disregard the file size as reported by
227.BR stat (2)
228since on some systems it reports a zero size for raw disk partitions.
229.SH FILES
230.I __MAGIC__.mgc
231\- defaults compiled list of magic numbers
232.PP
233.I __MAGIC__
234\- default list of magic numbers
235.PP
236.I __MAGIC__.mime
237\- default list of magic numbers, used to output mime types when the -i option
238is specified.
239
240.SH ENVIRONMENT
241The environment variable
242.B MAGIC
243can be used to set the default magic number files.
244.SH SEE ALSO
245.BR magic (__FSECTION__)
246\- description of magic file format.
247.br
248.BR strings (1), " od" (1), " hexdump(1)"
249\- tools for examining non-textfiles.
250.SH STANDARDS CONFORMANCE
251This program is believed to exceed the System V Interface Definition
252of FILE(CMD), as near as one can determine from the vague language
253contained therein.
254Its behaviour is mostly compatible with the System V program of the same name.
255This version knows more magic, however, so it will produce
256different (albeit more accurate) output in many cases.
257.PP
258The one significant difference
259between this version and System V
260is that this version treats any white space
261as a delimiter, so that spaces in pattern strings must be escaped.
262For example,
263.br
264\*[Gt]10 string language impress\ (imPRESS data)
265.br
266in an existing magic file would have to be changed to
267.br
268\*[Gt]10 string language\e impress (imPRESS data)
269.br
270In addition, in this version, if a pattern string contains a backslash,
271it must be escaped. For example
272.br
2730 string \ebegindata Andrew Toolkit document
274.br
275in an existing magic file would have to be changed to
276.br
2770 string \e\ebegindata Andrew Toolkit document
278.br
279.PP
280SunOS releases 3.2 and later from Sun Microsystems include a
281.BR file (1)
282command derived from the System V one, but with some extensions.
283My version differs from Sun's only in minor ways.
284It includes the extension of the `\*[Am]' operator, used as,
285for example,
286.br
287\*[Gt]16 long\*[Am]0x7fffffff \*[Gt]0 not stripped
288.SH MAGIC DIRECTORY
289The magic file entries have been collected from various sources,
290mainly USENET, and contributed by various authors.
291Christos Zoulas (address below) will collect additional
292or corrected magic file entries.
293A consolidation of magic file entries
294will be distributed periodically.
295.PP
296The order of entries in the magic file is significant.
297Depending on what system you are using, the order that
298they are put together may be incorrect.
299If your old
300.B file
301command uses a magic file,
302keep the old magic file around for comparison purposes
303(rename it to
304.IR __MAGIC__.orig ).
305.SH EXAMPLES
306.nf
307$ file file.c file /dev/{wd0a,hda}
308file.c: C program text
309file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
310 dynamically linked (uses shared libs), stripped
311/dev/wd0a: block special (0/0)
312/dev/hda: block special (3/0)
313$ file -s /dev/wd0{b,d}
314/dev/wd0b: data
315/dev/wd0d: x86 boot sector
316$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
317/dev/hda: x86 boot sector
318/dev/hda1: Linux/i386 ext2 filesystem
319/dev/hda2: x86 boot sector
320/dev/hda3: x86 boot sector, extended partition table
321/dev/hda4: Linux/i386 ext2 filesystem
322/dev/hda5: Linux/i386 swap file
323/dev/hda6: Linux/i386 swap file
324/dev/hda7: Linux/i386 swap file
325/dev/hda8: Linux/i386 swap file
326/dev/hda9: empty
327/dev/hda10: empty
328
329$ file -i file.c file /dev/{wd0a,hda}
330file.c: text/x-c
331file: application/x-executable, dynamically linked (uses shared libs),
332not stripped
333/dev/hda: application/x-not-regular-file
334/dev/wd0a: application/x-not-regular-file
335
336.fi
337.SH HISTORY
338There has been a
339.B file
340command in every \s-1UNIX\s0 since at least Research Version 4
341(man page dated November, 1973).
342The System V version introduced one significant major change:
343the external list of magic number types.
344This slowed the program down slightly but made it a lot more flexible.
345.PP
346This program, based on the System V version,
347was written by Ian Darwin \*[Lt]ian@darwinsys.com\*[Gt]
348without looking at anybody else's source code.
349.PP
350John Gilmore revised the code extensively, making it better than
351the first version.
352Geoff Collyer found several inadequacies
353and provided some magic file entries.
354Contributions by the `\*[Am]' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989.
355.PP
356Guy Harris, guy@netapp.com, made many changes from 1993 to the present.
357.PP
358Primary development and maintenance from 1990 to the present by
359Christos Zoulas (christos@astron.com).
360.PP
361Altered by Chris Lowth, chris@lowth.com, 2000:
362Handle the ``-i'' option to output mime type strings and using an alternative
363magic file and internal logic.
364.PP
365Altered by Eric Fischer (enf@pobox.com), July, 2000,
366to identify character codes and attempt to identify the languages
367of non-ASCII files.
368.PP
369The list of contributors to the "Magdir" directory (source for the
370/etc/magic
371file) is too long to include here. You know who you are; thank you.
372.SH LEGAL NOTICE
373Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
374Covered by the standard Berkeley Software Distribution copyright; see the file
375LEGAL.NOTICE in the source distribution.
376.PP
377The files
378.I tar.h
379and
380.I is_tar.c
381were written by John Gilmore from his public-domain
382.B tar
383program, and are not covered by the above license.
384.SH BUGS
385There must be a better way to automate the construction of the Magic
386file from all the glop in Magdir. What is it?
387Better yet, the magic file should be compiled into binary (say,
388.BR ndbm (3)
389or, better yet, fixed-length
390.SM ASCII
391strings for use in heterogenous network environments) for faster startup.
392Then the program would run as fast as the Version 7 program of the same name,
393with the flexibility of the System V version.
394.PP
395.B File
396uses several algorithms that favor speed over accuracy,
397thus it can be misled about the contents of
398text
399files.
400.PP
401The support for
402text
403files (primarily for programming languages)
404is simplistic, inefficient and requires recompilation to update.
405.PP
406There should be an ``else'' clause to follow a series of continuation lines.
407.PP
408The magic file and keywords should have regular expression support.
409Their use of
410.SM "ASCII TAB"
411as a field delimiter is ugly and makes
412it hard to edit the files, but is entrenched.
413.PP
414It might be advisable to allow upper-case letters in keywords
415for e.g.,
416.BR troff (1)
417commands vs man page macros.
418Regular expression support would make this easy.
419.PP
420The program doesn't grok \s-2FORTRAN\s0.
421It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
422appear indented at the start of line.
423Regular expression support would make this easy.
424.PP
425The list of keywords in
426.I ascmagic
427probably belongs in the Magic file.
428This could be done by using some keyword like `*' for the offset value.
429.PP
430Another optimisation would be to sort
431the magic file so that we can just run down all the
432tests for the first byte, first word, first long, etc, once we
433have fetched it. Complain about conflicts in the magic file entries.
434Make a rule that the magic entries sort based on file offset rather
435than position within the magic file?
436.PP
437The program should provide a way to give an estimate
438of ``how good'' a guess is.
439We end up removing guesses (e.g. ``From '' as first 5 chars of file) because
440they are not as good as other guesses (e.g. ``Newsgroups:'' versus
441``Return-Path:''). Still, if the others don't pan out, it should be
442possible to use the first guess.
443.PP
444This program is slower than some vendors' file commands.
445The new support for multiple character codes makes it even slower.
446.PP
447This manual page, and particularly this section, is too long.
448.SH AVAILABILITY
449You can obtain the original author's latest version by anonymous FTP
450on
451.B ftp.astron.com
452in the directory
453.I /pub/file/file-X.YY.tar.gz