Deleted Added
full compact
1.TH FILE __CSECTION__ "Copyright but distributable"
2.\" $Id: file.man,v 1.42 2002/07/03 18:26:37 christos Exp $
2.\" $Id: file.man,v 1.43 2003/02/08 18:33:53 christos Exp $
3.SH NAME
4file
5\- determine file type
6.SH SYNOPSIS
7.B file
8[
9.B \-bciknsvzL
9.B \-bciknNsvzL
10]
11[
12.B \-f
13.I namefile
14]
15[
16.B \-F
17separator ]
18[
19.B \-m
20.I magicfiles
21]
22.I file
23\*[Am]...
24.br
25.B file
26.B -C
27[
28.B \-m
29magicfile ]
30.SH DESCRIPTION
31This manual page documents version __VERSION__ of the
32.B file
33command.
34.PP
35.B File
36tests each argument in an attempt to classify it.
37There are three sets of tests, performed in this order:
38filesystem tests, magic number tests, and language tests.
39The
40.I first
41test that succeeds causes the file type to be printed.
42.PP
43The type printed will usually contain one of the words
44.B text
45(the file contains only
46printing characters and a few common control
47characters and is probably safe to read on an
48.SM ASCII
49terminal),
50.B executable
51(the file contains the result of compiling a program
52in a form understandable to some \s-1UNIX\s0 kernel or another),
53or
54.B data
55meaning anything else (data is usually `binary' or non-printable).
56Exceptions are well-known file formats (core files, tar archives)
57that are known to contain binary data.
58When modifying the file
59.I __MAGIC__
60or the program itself,
61.B "preserve these keywords" .
62People depend on knowing that all the readable files in a directory
63have the word ``text'' printed.
64Don't do as Berkeley did and change ``shell commands text''
65to ``shell script''.
66Note that the file
67.I __MAGIC__
68is built mechanically from a large number of small files in
69the subdirectory
70.I Magdir
71in the source distribution of this program.
72.PP
73The filesystem tests are based on examining the return from a
74.BR stat (2)
75system call.
76The program checks to see if the file is empty,
77or if it's some sort of special file.
78Any known file types appropriate to the system you are running on
79(sockets, symbolic links, or named pipes (FIFOs) on those systems that
80implement them)
81are intuited if they are defined in
82the system header file
83.IR \*[Lt]sys/stat.h\*[Gt] .
84.PP
85The magic number tests are used to check for files with data in
86particular fixed formats.
87The canonical example of this is a binary executable (compiled program)
88.I a.out
89file, whose format is defined in
90.I a.out.h
91and possibly
92.I exec.h
93in the standard include directory.
94These files have a `magic number' stored in a particular place
95near the beginning of the file that tells the \s-1UNIX\s0 operating system
96that the file is a binary executable, and which of several types thereof.
97The concept of `magic number' has been applied by extension to data files.
98Any file with some invariant identifier at a small fixed
99offset into the file can usually be described in this way.
100The information identifying these files is read from the compiled
101magic file
102.I __MAGIC__.mgc ,
103or
104.I __MAGIC__
105if the compile file does not exist.
106.PP
107If a file does not match any of the entries in the magic file,
108it is examined to see if it seems to be a text file.
109ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
110(such as those used on Macintosh and IBM PC systems),
111UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
112character sets can be distinguished by the different
113ranges and sequences of bytes that constitute printable text
114in each set.
115If a file passes any of these tests, its character set is reported.
116ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
117as ``text'' because they will be mostly readable on nearly any terminal;
118UTF-16 and EBCDIC are only ``character data'' because, while
119they contain text, it is text that will require translation
120before it can be read.
121In addition,
122.B file
123will attempt to determine other characteristics of text-type files.
124If the lines of a file are terminated by CR, CRLF, or NEL, instead
125of the Unix-standard LF, this will be reported.
126Files that contain embedded escape sequences or overstriking
127will also be identified.
128.PP
129Once
130.B file
131has determined the character set used in a text-type file,
132it will
133attempt to determine in what language the file is written.
134The language tests look for particular strings (cf
135.IR names.h )
136that can appear anywhere in the first few blocks of a file.
137For example, the keyword
138.B .br
139indicates that the file is most likely a
140.BR troff (1)
141input file, just as the keyword
142.B struct
143indicates a C program.
144These tests are less reliable than the previous
145two groups, so they are performed last.
146The language test routines also test for some miscellany
147(such as
148.BR tar (1)
149archives).
150.PP
151Any file that cannot be identified as having been written
152in any of the character sets listed above is simply said to be ``data''.
153.SH OPTIONS
154.TP 8
155.B \-b
156Do not prepend filenames to output lines (brief mode).
157.TP 8
158.B \-c
159Cause a checking printout of the parsed form of the magic file.
160This is usually used in conjunction with
161.B \-m
162to debug a new magic file before installing it.
163.TP 8
164.B \-C
165Write a magic.mgc output file that contains a pre-parsed version of
166file.
167.TP 8
168.BI \-f " namefile"
169Read the names of the files to be examined from
170.I namefile
171(one per line)
172before the argument list.
173Either
174.I namefile
175or at least one filename argument must be present;
176to test the standard input, use ``\-'' as a filename argument.
177.TP 8
178.BI \-F " separator"
179Use the specified separator character instead of ``:''.
180.TP 8
181.B \-i
182Causes the file command to output mime type strings rather than the more
183traditional human readable ones. Thus it may say
184``text/plain; charset=us-ascii''
185rather
186than ``ASCII text''. In order for this option to work, file changes the way
187it handles files recognised by the command itself (such as many of the
188text file types, directories etc), and makes use of an alternative
189``magic'' file.
190(See ``FILES'' section, below).
191.TP 8
192.B \-k
193Don't stop at the first match, keep going.
194.TP 8
195.BI \-m " list"
196Specify an alternate list of files containing magic numbers.
197This can be a single file, or a colon-separated list of files.
198.TP 8
199.B \-n
200Force stdout to be flushed after checking each file. This is only useful if
201checking a list of files. It is intended to be used by programs that want
202filetype output from a pipe.
203.TP 8
204.B \-N
205Don't pad output to align filenames nicely.
206.TP 8
207.B \-v
208Print the version of the program and exit.
209.TP 8
210.B \-z
211Try to look inside compressed files.
212.TP 8
213.B \-L
214option causes symlinks to be followed, as the like-named option in
215.BR ls (1).
216(on systems that support symbolic links).
217.TP 8
218.B \-s
219Normally,
220.B file
221only attempts to read and determine the type of argument files which
222.BR stat (2)
223reports are ordinary files.
224This prevents problems, because reading special files may have peculiar
225consequences.
226Specifying the
227.BR \-s
228option causes
229.B file
230to also read argument files which are block or character special files.
231This is useful for determining the filesystem types of the data in raw
232disk partitions, which are block special files.
233This option also causes
234.B file
235to disregard the file size as reported by
236.BR stat (2)
237since on some systems it reports a zero size for raw disk partitions.
238.SH FILES
239.I __MAGIC__.mgc
231\- defaults compiled list of magic numbers
240\- default compiled list of magic numbers
241.PP
242.I __MAGIC__
243\- default list of magic numbers
244.PP
245.I __MAGIC__.mime
246\- default list of magic numbers, used to output mime types when the -i option
247is specified.
248
249.SH ENVIRONMENT
250The environment variable
251.B MAGIC
252can be used to set the default magic number files.
253.SH SEE ALSO
254.BR magic (__FSECTION__)
255\- description of magic file format.
256.br
257.BR strings (1), " od" (1), " hexdump(1)"
258\- tools for examining non-textfiles.
259.SH STANDARDS CONFORMANCE
260This program is believed to exceed the System V Interface Definition
261of FILE(CMD), as near as one can determine from the vague language
262contained therein.
263Its behaviour is mostly compatible with the System V program of the same name.
264This version knows more magic, however, so it will produce
265different (albeit more accurate) output in many cases.
266.PP
267The one significant difference
268between this version and System V
269is that this version treats any white space
270as a delimiter, so that spaces in pattern strings must be escaped.
271For example,
272.br
273\*[Gt]10 string language impress\ (imPRESS data)
274.br
275in an existing magic file would have to be changed to
276.br
277\*[Gt]10 string language\e impress (imPRESS data)
278.br
279In addition, in this version, if a pattern string contains a backslash,
280it must be escaped. For example
281.br
2820 string \ebegindata Andrew Toolkit document
283.br
284in an existing magic file would have to be changed to
285.br
2860 string \e\ebegindata Andrew Toolkit document
287.br
288.PP
289SunOS releases 3.2 and later from Sun Microsystems include a
290.BR file (1)
291command derived from the System V one, but with some extensions.
292My version differs from Sun's only in minor ways.
293It includes the extension of the `\*[Am]' operator, used as,
294for example,
295.br
296\*[Gt]16 long\*[Am]0x7fffffff \*[Gt]0 not stripped
297.SH MAGIC DIRECTORY
298The magic file entries have been collected from various sources,
299mainly USENET, and contributed by various authors.
300Christos Zoulas (address below) will collect additional
301or corrected magic file entries.
302A consolidation of magic file entries
303will be distributed periodically.
304.PP
305The order of entries in the magic file is significant.
306Depending on what system you are using, the order that
307they are put together may be incorrect.
308If your old
309.B file
310command uses a magic file,
311keep the old magic file around for comparison purposes
312(rename it to
313.IR __MAGIC__.orig ).
314.SH EXAMPLES
315.nf
316$ file file.c file /dev/{wd0a,hda}
317file.c: C program text
318file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
319 dynamically linked (uses shared libs), stripped
320/dev/wd0a: block special (0/0)
321/dev/hda: block special (3/0)
322$ file -s /dev/wd0{b,d}
323/dev/wd0b: data
324/dev/wd0d: x86 boot sector
325$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
326/dev/hda: x86 boot sector
327/dev/hda1: Linux/i386 ext2 filesystem
328/dev/hda2: x86 boot sector
329/dev/hda3: x86 boot sector, extended partition table
330/dev/hda4: Linux/i386 ext2 filesystem
331/dev/hda5: Linux/i386 swap file
332/dev/hda6: Linux/i386 swap file
333/dev/hda7: Linux/i386 swap file
334/dev/hda8: Linux/i386 swap file
335/dev/hda9: empty
336/dev/hda10: empty
337
338$ file -i file.c file /dev/{wd0a,hda}
339file.c: text/x-c
340file: application/x-executable, dynamically linked (uses shared libs),
341not stripped
342/dev/hda: application/x-not-regular-file
343/dev/wd0a: application/x-not-regular-file
344
345.fi
346.SH HISTORY
347There has been a
348.B file
349command in every \s-1UNIX\s0 since at least Research Version 4
350(man page dated November, 1973).
351The System V version introduced one significant major change:
352the external list of magic number types.
353This slowed the program down slightly but made it a lot more flexible.
354.PP
355This program, based on the System V version,
356was written by Ian Darwin \*[Lt]ian@darwinsys.com\*[Gt]
357without looking at anybody else's source code.
358.PP
359John Gilmore revised the code extensively, making it better than
360the first version.
361Geoff Collyer found several inadequacies
362and provided some magic file entries.
363Contributions by the `\*[Am]' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989.
364.PP
365Guy Harris, guy@netapp.com, made many changes from 1993 to the present.
366.PP
367Primary development and maintenance from 1990 to the present by
368Christos Zoulas (christos@astron.com).
369.PP
370Altered by Chris Lowth, chris@lowth.com, 2000:
371Handle the ``-i'' option to output mime type strings and using an alternative
372magic file and internal logic.
373.PP
374Altered by Eric Fischer (enf@pobox.com), July, 2000,
375to identify character codes and attempt to identify the languages
376of non-ASCII files.
377.PP
378The list of contributors to the "Magdir" directory (source for the
379/etc/magic
380file) is too long to include here. You know who you are; thank you.
381.SH LEGAL NOTICE
382Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
383Covered by the standard Berkeley Software Distribution copyright; see the file
384LEGAL.NOTICE in the source distribution.
385.PP
386The files
387.I tar.h
388and
389.I is_tar.c
390were written by John Gilmore from his public-domain
391.B tar
392program, and are not covered by the above license.
393.SH BUGS
394There must be a better way to automate the construction of the Magic
395file from all the glop in Magdir. What is it?
396Better yet, the magic file should be compiled into binary (say,
397.BR ndbm (3)
398or, better yet, fixed-length
399.SM ASCII
400strings for use in heterogenous network environments) for faster startup.
401Then the program would run as fast as the Version 7 program of the same name,
402with the flexibility of the System V version.
403.PP
404.B File
405uses several algorithms that favor speed over accuracy,
406thus it can be misled about the contents of
407text
408files.
409.PP
410The support for
411text
412files (primarily for programming languages)
413is simplistic, inefficient and requires recompilation to update.
414.PP
415There should be an ``else'' clause to follow a series of continuation lines.
416.PP
417The magic file and keywords should have regular expression support.
418Their use of
419.SM "ASCII TAB"
420as a field delimiter is ugly and makes
421it hard to edit the files, but is entrenched.
422.PP
423It might be advisable to allow upper-case letters in keywords
424for e.g.,
425.BR troff (1)
426commands vs man page macros.
427Regular expression support would make this easy.
428.PP
429The program doesn't grok \s-2FORTRAN\s0.
430It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
431appear indented at the start of line.
432Regular expression support would make this easy.
433.PP
434The list of keywords in
435.I ascmagic
436probably belongs in the Magic file.
437This could be done by using some keyword like `*' for the offset value.
438.PP
439Another optimisation would be to sort
440the magic file so that we can just run down all the
441tests for the first byte, first word, first long, etc, once we
442have fetched it. Complain about conflicts in the magic file entries.
443Make a rule that the magic entries sort based on file offset rather
444than position within the magic file?
445.PP
446The program should provide a way to give an estimate
447of ``how good'' a guess is.
448We end up removing guesses (e.g. ``From '' as first 5 chars of file) because
449they are not as good as other guesses (e.g. ``Newsgroups:'' versus
450``Return-Path:''). Still, if the others don't pan out, it should be
451possible to use the first guess.
452.PP
453This program is slower than some vendors' file commands.
454The new support for multiple character codes makes it even slower.
455.PP
456This manual page, and particularly this section, is too long.
457.SH AVAILABILITY
458You can obtain the original author's latest version by anonymous FTP
459on
460.B ftp.astron.com
461in the directory
462.I /pub/file/file-X.YY.tar.gz