Deleted Added
full compact
xmlwf.1 (104349) xmlwf.1 (178848)
1.\" This manpage has been automatically generated by docbook2man
2.\" from a DocBook document. This tool can be found at:
3.\" <http://shell.ipoline.com/~elmert/comp/docbook2X/>
4.\" Please send any bug reports, improvements, comments, patches,
5.\" etc. to Steve Cheng <steve@ggi-project.org>.
1.\" This manpage has been automatically generated by docbook2man
2.\" from a DocBook document. This tool can be found at:
3.\" <http://shell.ipoline.com/~elmert/comp/docbook2X/>
4.\" Please send any bug reports, improvements, comments, patches,
5.\" etc. to Steve Cheng <steve@ggi-project.org>.
6.TH "XMLWF" "1" "22 April 2002" "" ""
6.TH "XMLWF" "1" "24 January 2003" "" ""
7.SH NAME
8xmlwf \- Determines if an XML document is well-formed
9.SH SYNOPSIS
10
11\fBxmlwf\fR [ \fB-s\fR] [ \fB-n\fR] [ \fB-p\fR] [ \fB-x\fR] [ \fB-e \fIencoding\fB\fR] [ \fB-w\fR] [ \fB-d \fIoutput-dir\fB\fR] [ \fB-c\fR] [ \fB-m\fR] [ \fB-r\fR] [ \fB-t\fR] [ \fB-v\fR] [ \fBfile ...\fR]
12
13.SH "DESCRIPTION"
14.PP
7.SH NAME
8xmlwf \- Determines if an XML document is well-formed
9.SH SYNOPSIS
10
11\fBxmlwf\fR [ \fB-s\fR] [ \fB-n\fR] [ \fB-p\fR] [ \fB-x\fR] [ \fB-e \fIencoding\fB\fR] [ \fB-w\fR] [ \fB-d \fIoutput-dir\fB\fR] [ \fB-c\fR] [ \fB-m\fR] [ \fB-r\fR] [ \fB-t\fR] [ \fB-v\fR] [ \fBfile ...\fR]
12
13.SH "DESCRIPTION"
14.PP
15\fBxmlwf\fR uses the Expat library to determine
16if an XML document is well-formed. It is non-validating.
15\fBxmlwf\fR uses the Expat library to
16determine if an XML document is well-formed. It is
17non-validating.
17.PP
18.PP
18If you do not specify any files on the command-line,
19and you have a recent version of xmlwf, the input
20file will be read from stdin.
19If you do not specify any files on the command-line, and you
20have a recent version of \fBxmlwf\fR, the
21input file will be read from standard input.
21.SH "WELL-FORMED DOCUMENTS"
22.PP
23A well-formed document must adhere to the
24following rules:
25.TP 0.2i
26\(bu
27The file begins with an XML declaration. For instance,
28<?xml version="1.0" standalone="yes"?>.
22.SH "WELL-FORMED DOCUMENTS"
23.PP
24A well-formed document must adhere to the
25following rules:
26.TP 0.2i
27\(bu
28The file begins with an XML declaration. For instance,
29<?xml version="1.0" standalone="yes"?>.
29\fBNOTE:\fR xmlwf does not currently
30\fBNOTE:\fR
31\fBxmlwf\fR does not currently
30check for a valid XML declaration.
31.TP 0.2i
32\(bu
33Every start tag is either empty (<tag/>)
34or has a corresponding end tag.
35.TP 0.2i
36\(bu
37There is exactly one root element. This element must contain

--- 5 unchanged lines hidden (view full) ---

43All elements nest properly.
44.TP 0.2i
45\(bu
46All attribute values are enclosed in quotes (either single
47or double).
48.PP
49If the document has a DTD, and it strictly complies with that
50DTD, then the document is also considered \fBvalid\fR.
32check for a valid XML declaration.
33.TP 0.2i
34\(bu
35Every start tag is either empty (<tag/>)
36or has a corresponding end tag.
37.TP 0.2i
38\(bu
39There is exactly one root element. This element must contain

--- 5 unchanged lines hidden (view full) ---

45All elements nest properly.
46.TP 0.2i
47\(bu
48All attribute values are enclosed in quotes (either single
49or double).
50.PP
51If the document has a DTD, and it strictly complies with that
52DTD, then the document is also considered \fBvalid\fR.
51xmlwf is a non-validating parser -- it does not check the DTD.
52However, it does support external entities (see the -x option).
53\fBxmlwf\fR is a non-validating parser --
54it does not check the DTD. However, it does support
55external entities (see the \fB-x\fR option).
53.SH "OPTIONS"
54.PP
55When an option includes an argument, you may specify the argument either
56.SH "OPTIONS"
57.PP
58When an option includes an argument, you may specify the argument either
56separate ("d output") or mashed ("-doutput"). xmlwf supports both.
59separately ("\fB-d\fR output") or concatenated with the
60option ("\fB-d\fRoutput"). \fBxmlwf\fR
61supports both.
57.TP
58\fB-c\fR
62.TP
63\fB-c\fR
59If the input file is well-formed and xmlwf doesn't
60encounter any errors, the input file is simply copied to
64If the input file is well-formed and \fBxmlwf\fR
65doesn't encounter any errors, the input file is simply copied to
61the output directory unchanged.
66the output directory unchanged.
62This implies no namespaces (turns off -n) and
63requires -d to specify an output file.
67This implies no namespaces (turns off \fB-n\fR) and
68requires \fB-d\fR to specify an output file.
64.TP
65\fB-d output-dir\fR
66Specifies a directory to contain transformed
67representations of the input files.
69.TP
70\fB-d output-dir\fR
71Specifies a directory to contain transformed
72representations of the input files.
68By default, -d outputs a canonical representation
73By default, \fB-d\fR outputs a canonical representation
69(described below).
74(described below).
70You can select different output formats using -c and -m.
75You can select different output formats using \fB-c\fR
76and \fB-m\fR.
71
72The output filenames will
73be exactly the same as the input filenames or "STDIN" if the input is
77
78The output filenames will
79be exactly the same as the input filenames or "STDIN" if the input is
74coming from STDIN. Therefore, you must be careful that the
80coming from standard input. Therefore, you must be careful that the
75output file does not go into the same directory as the input
81output file does not go into the same directory as the input
76file. Otherwise, xmlwf will delete the input file before
77it generates the output file (just like running
82file. Otherwise, \fBxmlwf\fR will delete the
83input file before it generates the output file (just like running
78cat < file > file in most shells).
79
80Two structurally equivalent XML documents have a byte-for-byte
81identical canonical XML representation.
82Note that ignorable white space is considered significant and
83is treated equivalently to data.
84More on canonical XML can be found at
85http://www.jclark.com/xml/canonxml.html .
86.TP
87\fB-e encoding\fR
88Specifies the character encoding for the document, overriding
84cat < file > file in most shells).
85
86Two structurally equivalent XML documents have a byte-for-byte
87identical canonical XML representation.
88Note that ignorable white space is considered significant and
89is treated equivalently to data.
90More on canonical XML can be found at
91http://www.jclark.com/xml/canonxml.html .
92.TP
93\fB-e encoding\fR
94Specifies the character encoding for the document, overriding
89any document encoding declaration. xmlwf
90has four built-in encodings:
95any document encoding declaration. \fBxmlwf\fR
96supports four built-in encodings:
91US-ASCII,
92UTF-8,
93UTF-16, and
94ISO-8859-1.
97US-ASCII,
98UTF-8,
99UTF-16, and
100ISO-8859-1.
95Also see the -w option.
101Also see the \fB-w\fR option.
96.TP
97\fB-m\fR
98Outputs some strange sort of XML file that completely
102.TP
103\fB-m\fR
104Outputs some strange sort of XML file that completely
99describes the the input file, including character postitions.
100Requires -d to specify an output file.
105describes the input file, including character positions.
106Requires \fB-d\fR to specify an output file.
101.TP
102\fB-n\fR
103Turns on namespace processing. (describe namespaces)
107.TP
108\fB-n\fR
109Turns on namespace processing. (describe namespaces)
104-c disables namespaces.
110\fB-c\fR disables namespaces.
105.TP
106\fB-p\fR
107Tells xmlwf to process external DTDs and parameter
108entities.
109
111.TP
112\fB-p\fR
113Tells xmlwf to process external DTDs and parameter
114entities.
115
110Normally xmlwf never parses parameter entities.
111-p tells it to always parse them.
112-p implies -x.
116Normally \fBxmlwf\fR never parses parameter
117entities. \fB-p\fR tells it to always parse them.
118\fB-p\fR implies \fB-x\fR.
113.TP
114\fB-r\fR
119.TP
120\fB-r\fR
115Normally xmlwf memory-maps the XML file before parsing.
116-r turns off memory-mapping and uses normal file IO calls instead.
121Normally \fBxmlwf\fR memory-maps the XML file
122before parsing; this can result in faster parsing on many
123platforms.
124\fB-r\fR turns off memory-mapping and uses normal file
125IO calls instead.
117Of course, memory-mapping is automatically turned off
126Of course, memory-mapping is automatically turned off
118when reading from STDIN.
127when reading from standard input.
128
129Use of memory-mapping can cause some platforms to report
130substantially higher memory usage for
131\fBxmlwf\fR, but this appears to be a matter of
132the operating system reporting memory in a strange way; there is
133not a leak in \fBxmlwf\fR.
119.TP
120\fB-s\fR
121Prints an error if the document is not standalone.
122A document is standalone if it has no external subset and no
123references to parameter entities.
124.TP
125\fB-t\fR
126Turns on timings. This tells Expat to parse the entire file,
127but not perform any processing.
128This gives a fairly accurate idea of the raw speed of Expat itself
129without client overhead.
134.TP
135\fB-s\fR
136Prints an error if the document is not standalone.
137A document is standalone if it has no external subset and no
138references to parameter entities.
139.TP
140\fB-t\fR
141Turns on timings. This tells Expat to parse the entire file,
142but not perform any processing.
143This gives a fairly accurate idea of the raw speed of Expat itself
144without client overhead.
130-t turns off most of the output options (-d, -m -c, ...).
145\fB-t\fR turns off most of the output options
146(\fB-d\fR, \fB-m\fR, \fB-c\fR,
147\&...).
131.TP
132\fB-v\fR
148.TP
149\fB-v\fR
133Prints the version of the Expat library being used, and then exits.
150Prints the version of the Expat library being used, including some
151information on the compile-time configuration of the library, and
152then exits.
134.TP
135\fB-w\fR
153.TP
154\fB-w\fR
136Enables Windows code pages.
137Normally, xmlwf will throw an error if it runs across
138an encoding that it is not equipped to handle itself. With
139-w, xmlwf will try to use a Windows code page. See
140also -e.
155Enables support for Windows code pages.
156Normally, \fBxmlwf\fR will throw an error if it
157runs across an encoding that it is not equipped to handle itself. With
158\fB-w\fR, xmlwf will try to use a Windows code
159page. See also \fB-e\fR.
141.TP
142\fB-x\fR
143Turns on parsing external entities.
144
145Non-validating parsers are not required to resolve external
146entities, or even expand entities at all.
147Expat always expands internal entities (?),
148but external entity parsing must be enabled explicitly.

--- 10 unchanged lines hidden (view full) ---

159And here are some examples of external entities:
160
161.nf
162<!ENTITY header SYSTEM "header-&vers;.xml"> (parsed)
163<!ENTITY logo SYSTEM "logo.png" PNG> (unparsed)
164.fi
165.TP
166\fB--\fR
160.TP
161\fB-x\fR
162Turns on parsing external entities.
163
164Non-validating parsers are not required to resolve external
165entities, or even expand entities at all.
166Expat always expands internal entities (?),
167but external entity parsing must be enabled explicitly.

--- 10 unchanged lines hidden (view full) ---

178And here are some examples of external entities:
179
180.nf
181<!ENTITY header SYSTEM "header-&vers;.xml"> (parsed)
182<!ENTITY logo SYSTEM "logo.png" PNG> (unparsed)
183.fi
184.TP
185\fB--\fR
167For some reason, xmlwf specifically ignores "--"
168anywhere it appears on the command line.
186(Two hyphens.)
187Terminates the list of options. This is only needed if a filename
188starts with a hyphen. For example:
189
190.nf
191xmlwf -- -myfile.xml
192.fi
193
194will run \fBxmlwf\fR on the file
195\fI-myfile.xml\fR.
169.PP
196.PP
170Older versions of xmlwf do not support reading from STDIN.
197Older versions of \fBxmlwf\fR do not support
198reading from standard input.
171.SH "OUTPUT"
172.PP
199.SH "OUTPUT"
200.PP
173If an input file is not well-formed, xmlwf outputs
174a single line describing the problem to STDOUT.
175If a file is well formed, xmlwf outputs nothing.
201If an input file is not well-formed,
202\fBxmlwf\fR prints a single line describing
203the problem to standard output. If a file is well formed,
204\fBxmlwf\fR outputs nothing.
176Note that the result code is \fBnot\fR set.
177.SH "BUGS"
178.PP
179According to the W3C standard, an XML file without a
180declaration at the beginning is not considered well-formed.
205Note that the result code is \fBnot\fR set.
206.SH "BUGS"
207.PP
208According to the W3C standard, an XML file without a
209declaration at the beginning is not considered well-formed.
181However, xmlwf allows this to pass.
210However, \fBxmlwf\fR allows this to pass.
182.PP
211.PP
183xmlwf returns a 0 - noerr result, even if the file is
184not well-formed. There is no good way for a program to use
185xmlwf to quickly check a file -- it must parse xmlwf's STDOUT.
212\fBxmlwf\fR returns a 0 - noerr result,
213even if the file is not well-formed. There is no good way for
214a program to use \fBxmlwf\fR to quickly
215check a file -- it must parse \fBxmlwf\fR's
216standard output.
186.PP
217.PP
187The errors should go to STDERR, not stdout.
218The errors should go to standard error, not standard output.
188.PP
219.PP
189There should be a way to get -d to send its output to STDOUT
190rather than forcing the user to send it to a file.
220There should be a way to get \fB-d\fR to send its
221output to standard output rather than forcing the user to send
222it to a file.
191.PP
223.PP
192I have no idea why anyone would want to use the -d, -c
193and -m options. If someone could explain it to me, I'd
194like to add this information to this manpage.
224I have no idea why anyone would want to use the
225\fB-d\fR, \fB-c\fR, and
226\fB-m\fR options. If someone could explain it to
227me, I'd like to add this information to this manpage.
195.SH "ALTERNATIVES"
196.PP
197Here are some XML validators on the web:
198
199.nf
200http://www.hcrc.ed.ac.uk/~richard/xml-check.html
201http://www.stg.brown.edu/service/xmlvalid/
202http://www.scripting.com/frontier5/xml/code/xmlValidator.html
203http://www.xml.com/pub/a/tools/ruwf/check.html
228.SH "ALTERNATIVES"
229.PP
230Here are some XML validators on the web:
231
232.nf
233http://www.hcrc.ed.ac.uk/~richard/xml-check.html
234http://www.stg.brown.edu/service/xmlvalid/
235http://www.scripting.com/frontier5/xml/code/xmlValidator.html
236http://www.xml.com/pub/a/tools/ruwf/check.html
237.fi
238.SH "SEE ALSO"
239.PP
240
241.nf
242The Expat home page: http://www.libexpat.org/
243The W3 XML specification: http://www.w3.org/TR/REC-xml
244.fi
245.SH "AUTHOR"
246.PP
247This manual page was written by Scott Bronson <bronson@rinspin.com> for
248the Debian GNU/Linux system (but may be used by others). Permission is
249granted to copy, distribute and/or modify this document under
250the terms of the GNU Free Documentation
251License, Version 1.1.