Deleted Added
full compact
1.TH FILE __CSECTION__ "Copyright but distributable"
2.\" $Id: file.man,v 1.38 2001/03/11 20:37:08 christos Exp $
2.\" $Id: file.man,v 1.39 2001/04/27 22:48:33 christos Exp $
3.SH NAME
4file
5\- determine file type
6.SH SYNOPSIS
7.B file
8[
9.B \-bciknsvzL
10]
11[
12.B \-f
13namefile ]
13.I namefile
14]
15[
16.B \-m
16magicfiles ]
17file ...
17.I magicfiles
18]
19.I file
20\&...
21.br
22.B file
23.B -C
24[
25.B \-m
26magicfile ]
27.SH DESCRIPTION
28This manual page documents version __VERSION__ of the
29.B file
30command.
31.PP
32.B File
33tests each argument in an attempt to classify it.
34There are three sets of tests, performed in this order:
35filesystem tests, magic number tests, and language tests.
36The
37.I first
38test that succeeds causes the file type to be printed.
39.PP
40The type printed will usually contain one of the words
41.B text
42(the file contains only
43printing characters and a few common control
44characters and is probably safe to read on an
45.SM ASCII
46terminal),
47.B executable
48(the file contains the result of compiling a program
49in a form understandable to some \s-1UNIX\s0 kernel or another),
50or
51.B data
52meaning anything else (data is usually `binary' or non-printable).
53Exceptions are well-known file formats (core files, tar archives)
54that are known to contain binary data.
55When modifying the file
56.I __MAGIC__
57or the program itself,
58.B "preserve these keywords" .
59People depend on knowing that all the readable files in a directory
60have the word ``text'' printed.
61Don't do as Berkeley did and change ``shell commands text''
62to ``shell script''.
63Note that the file
64.I __MAGIC__
65is built mechanically from a large number of small files in
66the subdirectory
67.I Magdir
68in the source distribution of this program.
69.PP
70The filesystem tests are based on examining the return from a
71.BR stat (2)
72system call.
73The program checks to see if the file is empty,
74or if it's some sort of special file.
75Any known file types appropriate to the system you are running on
76(sockets, symbolic links, or named pipes (FIFOs) on those systems that
77implement them)
78are intuited if they are defined in
79the system header file
80.IR <sys/stat.h> .
81.PP
82The magic number tests are used to check for files with data in
83particular fixed formats.
84The canonical example of this is a binary executable (compiled program)
85.I a.out
86file, whose format is defined in
87.I a.out.h
88and possibly
89.I exec.h
90in the standard include directory.
91These files have a `magic number' stored in a particular place
92near the beginning of the file that tells the \s-1UNIX\s0 operating system
93that the file is a binary executable, and which of several types thereof.
94The concept of `magic number' has been applied by extension to data files.
95Any file with some invariant identifier at a small fixed
96offset into the file can usually be described in this way.
97The information identifying these files is read from the compiled
98magic file
99.I __MAGIC__.mgc ,
100or
101.I __MAGIC__
102if the compile file does not exist.
103.PP
104If a file does not match any of the entries in the magic file,
105it is examined to see if it seems to be a text file.
106ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
107(such as those used on Macintosh and IBM PC systems),
108UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
109character sets can be distinguished by the different
110ranges and sequences of bytes that constitute printable text
111in each set.
112If a file passes any of these tests, its character set is reported.
113ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
114as ``text'' because they will be mostly readable on nearly any terminal;
115UTF-16 and EBCDIC are only ``character data'' because, while
116they contain text, it is text that will require translation
117before it can be read.
118In addition,
119.B file
120will attempt to determine other characteristics of text-type files.
121If the lines of a file are terminated by CR, CRLF, or NEL, instead
122of the Unix-standard LF, this will be reported.
123Files that contain embedded escape sequences or overstriking
124will also be identified.
125.PP
126Once
127.B file
128has determined the character set used in a text-type file,
129it will
130attempt to determine in what language the file is written.
131The language tests look for particular strings (cf
132.IR names.h )
133that can appear anywhere in the first few blocks of a file.
134For example, the keyword
135.B .br
136indicates that the file is most likely a
137.BR troff (1)
138input file, just as the keyword
139.B struct
140indicates a C program.
141These tests are less reliable than the previous
142two groups, so they are performed last.
143The language test routines also test for some miscellany
144(such as
145.BR tar (1)
146archives).
147.PP
148Any file that cannot be identified as having been written
149in any of the character sets listed above is simply said to be ``data''.
150.SH OPTIONS
151.TP 8
152.B \-b
153Do not prepend filenames to output lines (brief mode).
154.TP 8
155.B \-c
156Cause a checking printout of the parsed form of the magic file.
157This is usually used in conjunction with
158.B \-m
159to debug a new magic file before installing it.
160.TP 8
161.B \-C
162Write a magic.mgc output file that contains a pre-parsed version of
163file.
158.B \-m
159to debug a new magic file before installing it.
164.TP 8
161.B \-f namefile
165.BI \-f " namefile"
166Read the names of the files to be examined from
167.I namefile
168(one per line)
169before the argument list.
170Either
171.I namefile
172or at least one filename argument must be present;
173to test the standard input, use ``\-'' as a filename argument.
174.TP 8
175.B \-i
176Causes the file command to output mime type strings rather than the more
177traditional human readable ones. Thus it may say
178``text/plain; charset=us-ascii''
179rather
180than ``ASCII text''. In order for this option to work, file changes the way
181it handles files recognised by the command itself (such as many of the
182text file types, directories etc), and makes use of an alternative
183``magic'' file.
184(See ``FILES'' section, below).
185.TP 8
186.B \-k
187Don't stop at the first match, keep going.
188.TP 8
185.B \-m list
189.BI \-m " list"
190Specify an alternate list of files containing magic numbers.
191This can be a single file, or a colon-separated list of files.
192.TP 8
193.B \-n
194Force stdout to be flushed after checking each file. This is only useful if
195checking a list of files. It is intended to be used by programs that want
196filetype output from a pipe.
197.TP 8
198.B \-v
199Print the version of the program and exit.
200.TP 8
201.B \-z
202Try to look inside compressed files.
203.TP 8
204.B \-L
205option causes symlinks to be followed, as the like-named option in
206.BR ls (1).
207(on systems that support symbolic links).
208.TP 8
209.B \-s
210Normally,
211.B file
212only attempts to read and determine the type of argument files which
213.BR stat (2)
214reports are ordinary files.
215This prevents problems, because reading special files may have peculiar
216consequences.
217Specifying the
218.BR \-s
219option causes
220.B file
221to also read argument files which are block or character special files.
222This is useful for determining the filesystem types of the data in raw
223disk partitions, which are block special files.
224This option also causes
225.B file
226to disregard the file size as reported by
227.BR stat (2)
228since on some systems it reports a zero size for raw disk partitions.
229.SH FILES
230.I __MAGIC__.mgc
231\- defaults compiled list of magic numbers
232.PP
233.I __MAGIC__
234\- default list of magic numbers
235.PP
236.I __MAGIC__.mime
237\- default list of magic numbers, used to output mime types when the -i option
238is specified.
239
240.SH ENVIRONMENT
241The environment variable
242.B MAGIC
243can be used to set the default magic number files.
244.SH SEE ALSO
245.BR magic (__FSECTION__)
246\- description of magic file format.
247.br
248.BR strings (1), " od" (1), " hexdump(1)"
249\- tools for examining non-textfiles.
250.SH STANDARDS CONFORMANCE
251This program is believed to exceed the System V Interface Definition
252of FILE(CMD), as near as one can determine from the vague language
253contained therein.
254Its behaviour is mostly compatible with the System V program of the same name.
255This version knows more magic, however, so it will produce
256different (albeit more accurate) output in many cases.
257.PP
258The one significant difference
259between this version and System V
260is that this version treats any white space
261as a delimiter, so that spaces in pattern strings must be escaped.
262For example,
263.br
264>10 string language impress\ (imPRESS data)
265.br
266in an existing magic file would have to be changed to
267.br
268>10 string language\e impress (imPRESS data)
269.br
270In addition, in this version, if a pattern string contains a backslash,
271it must be escaped. For example
272.br
2730 string \ebegindata Andrew Toolkit document
274.br
275in an existing magic file would have to be changed to
276.br
2770 string \e\ebegindata Andrew Toolkit document
278.br
279.PP
280SunOS releases 3.2 and later from Sun Microsystems include a
281.BR file (1)
282command derived from the System V one, but with some extensions.
283My version differs from Sun's only in minor ways.
284It includes the extension of the `&' operator, used as,
285for example,
286.br
287>16 long&0x7fffffff >0 not stripped
288.SH MAGIC DIRECTORY
289The magic file entries have been collected from various sources,
290mainly USENET, and contributed by various authors.
291Christos Zoulas (address below) will collect additional
292or corrected magic file entries.
293A consolidation of magic file entries
294will be distributed periodically.
295.PP
296The order of entries in the magic file is significant.
297Depending on what system you are using, the order that
298they are put together may be incorrect.
299If your old
300.B file
301command uses a magic file,
302keep the old magic file around for comparison purposes
303(rename it to
304.IR __MAGIC__.orig ).
305.SH EXAMPLES
306.nf
307$ file file.c file /dev/hda
308file.c: C program text
309file: ELF 32-bit LSB executable, Intel 80386, version 1,
310 dynamically linked, not stripped
311/dev/hda: block special
312
313$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
314/dev/hda: x86 boot sector
315/dev/hda1: Linux/i386 ext2 filesystem
316/dev/hda2: x86 boot sector
317/dev/hda3: x86 boot sector, extended partition table
318/dev/hda4: Linux/i386 ext2 filesystem
319/dev/hda5: Linux/i386 swap file
320/dev/hda6: Linux/i386 swap file
321/dev/hda7: Linux/i386 swap file
322/dev/hda8: Linux/i386 swap file
323/dev/hda9: empty
324/dev/hda10: empty
325
326$ file -i file.c file /dev/hda
327file.c: text/x-c
328file: application/x-executable, dynamically linked (uses shared libs), not stripped
329/dev/hda: application/x-not-regular-file
330
331.fi
332.SH HISTORY
333There has been a
334.B file
335command in every \s-1UNIX\s0 since at least Research Version 6
336(man page dated January 16, 1975).
337The System V version introduced one significant major change:
338the external list of magic number types.
339This slowed the program down slightly but made it a lot more flexible.
340.PP
341This program, based on the System V version,
342was written by Ian Darwin <ian@darwinsys.com>
343without looking at anybody else's source code.
344.PP
345John Gilmore revised the code extensively, making it better than
346the first version.
347Geoff Collyer found several inadequacies
348and provided some magic file entries.
349Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989.
350.PP
351Guy Harris, guy@netapp.com, made many changes from 1993 to the present.
352.PP
353Primary development and maintenance from 1990 to the present by
354Christos Zoulas (christos@astron.com).
355.PP
356Altered by Chris Lowth, chris@lowth.com, 2000:
357Handle the ``-i'' option to output mime type strings and using an alternative
358magic file and internal logic.
359.PP
360Altered by Eric Fischer (enf@pobox.com), July, 2000,
361to identify character codes and attempt to identify the languages
362of non-ASCII files.
363.PP
364The list of contributors to the "Magdir" directory (source for the
365/etc/magic
366file) is too long to include here. You know who you are; thank you.
367.SH LEGAL NOTICE
368Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
369Covered by the standard Berkeley Software Distribution copyright; see the file
370LEGAL.NOTICE in the source distribution.
371.PP
372The files
373.I tar.h
374and
375.I is_tar.c
376were written by John Gilmore from his public-domain
377.B tar
378program, and are not covered by the above license.
379.SH BUGS
380There must be a better way to automate the construction of the Magic
381file from all the glop in Magdir. What is it?
382Better yet, the magic file should be compiled into binary (say,
383.BR ndbm (3)
384or, better yet, fixed-length
385.SM ASCII
386strings for use in heterogenous network environments) for faster startup.
387Then the program would run as fast as the Version 7 program of the same name,
388with the flexibility of the System V version.
389.PP
390.B File
391uses several algorithms that favor speed over accuracy,
392thus it can be misled about the contents of
393text
394files.
395.PP
396The support for
397text
398files (primarily for programming languages)
399is simplistic, inefficient and requires recompilation to update.
400.PP
401There should be an ``else'' clause to follow a series of continuation lines.
402.PP
403The magic file and keywords should have regular expression support.
404Their use of
405.SM "ASCII TAB"
406as a field delimiter is ugly and makes
407it hard to edit the files, but is entrenched.
408.PP
409It might be advisable to allow upper-case letters in keywords
410for e.g.,
411.BR troff (1)
412commands vs man page macros.
413Regular expression support would make this easy.
414.PP
415The program doesn't grok \s-2FORTRAN\s0.
416It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
417appear indented at the start of line.
418Regular expression support would make this easy.
419.PP
420The list of keywords in
421.I ascmagic
422probably belongs in the Magic file.
423This could be done by using some keyword like `*' for the offset value.
424.PP
425Another optimisation would be to sort
426the magic file so that we can just run down all the
427tests for the first byte, first word, first long, etc, once we
428have fetched it. Complain about conflicts in the magic file entries.
429Make a rule that the magic entries sort based on file offset rather
430than position within the magic file?
431.PP
432The program should provide a way to give an estimate
433of ``how good'' a guess is.
434We end up removing guesses (e.g. ``From '' as first 5 chars of file) because
435they are not as good as other guesses (e.g. ``Newsgroups:'' versus
436``Return-Path:''). Still, if the others don't pan out, it should be
437possible to use the first guess.
438.PP
439This program is slower than some vendors' file commands.
440The new support for multiple character codes makes it even slower.
441.PP
442This manual page, and particularly this section, is too long.
443.SH AVAILABILITY
444You can obtain the original author's latest version by anonymous FTP
445on
446.B ftp.astron.com
447in the directory
448.I /pub/file/file-X.YY.tar.gz