1<?xml version="1.0" encoding="iso-8859-1"?>
2<!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.1//EN"
3 "xmlspec-v21.dtd" [
4<!--ArborText, Inc., 1988-2000, v.4002-->
5<!ENTITY http-ident "http://www.w3.org/TR/2000/REC-xml">
6<!ENTITY draft.month "October">
7<!ENTITY draft.day "6">
8<!ENTITY iso6.doc.date "20001006">
9<!ENTITY draft.year "2000">
10<!ENTITY versionOfXML "1.0">
11<!ENTITY pio "'&lt;?xml'">
12<!ENTITY doc.date "10 February 1998">
13<!ENTITY w3c.doc.date "02-Feb-1998">
14<!ENTITY WebSGML "WebSGML Adaptations Annex to ISO 8879">
15<!ENTITY pic "'?>'">
16<!ENTITY br "\n">
17<!ENTITY cellback "#c0d9c0">
18<!ENTITY mdash "--">
19<!ENTITY com "--">
20<!ENTITY como "--">
21<!ENTITY comc "--">
22<!ENTITY hcro "&amp;#x">
23<!ENTITY nbsp "&#160;">
24<!ENTITY magicents "<code>amp</code>,
25<code>lt</code>,
26<code>gt</code>,
27<code>apos</code>,
28<code>quot</code>">
29<!ENTITY doc.audience "public review and discussion">
30<!ENTITY doc.distribution "may be distributed freely, as long as
31all text and legal notices remain intact">
32]>
33<spec w3c-doctype="rec">
34<!--
35Notes on preparation of the Second Edition:
36
37- Worked from http://www.w3.org/XML/xml-19980210-errata.
38- Changed DTD reference to point to V2.1 of XMLspec.
39- Moved version number from <title> to <version> element and
40  added "second edition" wording.  Mentioned edition information
41  in status.
42- Removed bgcolor="&cellback;" attributes from all <td>
43  elements because that attribute is not in the current table model.
44- Reversed status and abstract, so that abstract is first, according
45  to W3C guidelines.
46- Changed some <emph>s to <titleref>s in bibliography.
47- Changed some <code>s to <at> etc. throughout; where used <attval>,
48  removed existing <quote>s because the stylesheet produces them.
49- Removed some spurious spaces.
50- Added affiliation markup to the original member list.
51- Added commas between individual <thisver> elements, because
52  whitespace is now significant there.
53- Moved <eg>s, <scrap>s, and lists outside of <p>s for cleaner HTML
54  conversion.
55- Revised Status section to reflect new status.
56- Fixed all titleref hrefs so they get transformed properly; at
57  next revision, these all probably need to be changed to some
58  other markup.
59- Incorporated all errata (barring obsoleted and invalid ones);
60  added links to the errata document with <loc role="erratumref">
61  elements; used diff="{add|chg|del}" attribute.  This version 
62  expects that the official HTML output will have diff="del" 
63  elements suppressed.
64-->
65<header>
66<title>Extensible Markup Language (XML)</title>
67<version>1.0 (Second Edition)</version>
68<w3c-designation>REC-xml-&iso6.doc.date;</w3c-designation>
69<w3c-doctype>W3C Recommendation</w3c-doctype>
70<pubdate><day>&draft.day;</day><month>&draft.month;</month><year>&draft.year;</year>
71</pubdate>
72<publoc><loc href="&http-ident;-&iso6.doc.date;">&http-ident;-&iso6.doc.date;</loc>
73(<loc href="&http-ident;-&iso6.doc.date;.html">XHTML</loc>, <loc href="&http-ident;-&iso6.doc.date;.xml">XML</loc>, <loc
74href="&http-ident;-&iso6.doc.date;.pdf">PDF</loc>, <loc href="&http-ident;-&iso6.doc.date;-review.html">XHTML
75review version</loc> with color-coded revision indicators)</publoc>
76<latestloc><loc href="http://www.w3.org/TR/REC-xml">http://www.w3.org/TR/REC-xml</loc></latestloc>
77<prevlocs><loc href="http://www.w3.org/TR/2000/WD-xml-2e-20000814"> http://www.w3.org/TR/2000/WD-xml-2e-20000814</loc>
78<loc href="http://www.w3.org/TR/1998/REC-xml-19980210"> http://www.w3.org/TR/1998/REC-xml-19980210</loc><!--
79<loc href='http://www.w3.org/TR/PR-xml-971208'>
80http://www.w3.org/TR/PR-xml-971208</loc>
81<loc href='http://www.w3.org/TR/WD-xml-961114'>
82http://www.w3.org/TR/WD-xml-961114</loc>
83<loc href='http://www.w3.org/TR/WD-xml-lang-970331'>
84http://www.w3.org/TR/WD-xml-lang-970331</loc>
85<loc href='http://www.w3.org/TR/WD-xml-lang-970630'>
86http://www.w3.org/TR/WD-xml-lang-970630</loc>
87<loc href='http://www.w3.org/TR/WD-xml-970807'>
88http://www.w3.org/TR/WD-xml-970807</loc>
89<loc href='http://www.w3.org/TR/WD-xml-971117'>
90http://www.w3.org/TR/WD-xml-971117</loc>--> </prevlocs>
91<authlist>
92<author role="1e"><name>Tim Bray</name><affiliation>Textuality and Netscape</affiliation>
93<email href="mailto:tbray@textuality.com">tbray@textuality.com</email></author>
94<author role="1e"><name>Jean Paoli</name><affiliation>Microsoft</affiliation>
95<email href="mailto:jeanpa@microsoft.com">jeanpa@microsoft.com</email></author>
96<author role="1e" diff="chg"><name>C. M. Sperberg-McQueen</name><affiliation>University
97of Illinois at Chicago and Text Encoding Initiative</affiliation><email href="mailto:cmsmcq@uic.edu">cmsmcq@uic.edu</email>
98</author>
99<author role="2e" diff="add"><name>Eve Maler</name><affiliation>Sun Microsystems,
100Inc.</affiliation><email href="mailto:elm@east.sun.com">eve.maler@east.sun.com</email>
101</author>
102</authlist>
103<abstract>
104<p>The Extensible Markup Language (XML) is a subset of SGML that is completely
105described in this document. Its goal is to enable generic SGML to be served,
106received, and processed on the Web in the way that is now possible with HTML.
107XML has been designed for ease of implementation and for interoperability
108with both SGML and HTML.</p>
109</abstract>
110<status>
111<p>This document has been reviewed by W3C Members and other interested parties
112and has been endorsed by the Director as a W3C Recommendation. It is a stable
113document and may be used as reference material or cited as a normative reference
114from another document. W3C's role in making the Recommendation is to draw
115attention to the specification and to promote its widespread deployment. This
116enhances the functionality and interoperability of the Web.</p>
117<p>This document specifies a syntax created by subsetting an existing, widely
118used international text processing standard (Standard Generalized Markup Language,
119ISO 8879:1986(E) as amended and corrected) for use on the World Wide Web.
120It is a product of the W3C XML Activity, details of which can be found at <loc
121href="http://www.w3.org/XML/">http://www.w3.org/XML</loc>. <phrase diff="add"><loc
122role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E100">[E100]</loc>
123The English version of this specification is the only normative version. However,
124for translations of this document, see <loc href="http://www.w3.org/XML/#trans">http://www.w3.org/XML/#trans</loc>. </phrase>A
125list of current W3C Recommendations and other technical documents can be found
126at <loc href="http://www.w3.org/TR/">http://www.w3.org/TR</loc>.</p>
127<p diff="del"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E66">[E66]</loc>This
128specification uses the term URI, which is defined by <bibref ref="Berners-Lee"/>,
129a work in progress expected to update <bibref ref="RFC1738"/> and <bibref
130ref="RFC1808"/>.</p>
131<p diff="add">This second edition is <emph>not</emph> a new version of XML (first published 10 February 1998);
132it merely incorporates the changes dictated by the first-edition errata (available
133at <loc href="http://www.w3.org/XML/xml-19980210-errata">http://www.w3.org/XML/xml-19980210-errata</loc>)
134as a convenience to readers. The errata list for this second edition is available
135at <loc href="http://www.w3.org/XML/xml-V10-2e-errata">http://www.w3.org/XML/xml-V10-2e-errata</loc>.</p>
136<p>Please report errors in this document to <loc href="mailto:xml-editor@w3.org">xml-editor@w3.org</loc><phrase
137diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E101">[E101]</loc>; <loc
138href="http://lists.w3.org/Archives/Public/xml-editor">archives</loc> are available</phrase>.</p>
139<note diff="add">
140<p>C. M. Sperberg-McQueen's affiliation has changed since the publication
141of the first edition. He is now at the World Wide Web Consortium, and can
142be contacted at <loc href="mailto:cmsmcq@w3.org">cmsmcq@w3.org</loc>.</p>
143</note>
144</status>
145<pubstmt>
146<p>Chicago, Vancouver, Mountain View, et al.: World-Wide Web Consortium, XML
147Working Group, 1996, 1997, 2000.</p>
148</pubstmt>
149<sourcedesc>
150<p>Created in electronic form.</p>
151</sourcedesc>
152<langusage>
153<language id="EN">English</language>
154<language id="ebnf">Extended Backus-Naur Form (formal grammar)</language>
155</langusage>
156<revisiondesc>
157<slist>
158<sitem>1997-12-03 : CMSMcQ : yet further changes</sitem>
159<sitem>1997-12-02 : TB : further changes (see TB to XML WG, 2 December 1997)</sitem>
160<sitem>1997-12-02 : CMSMcQ : deal with as many corrections and comments from
161the proofreaders as possible: entify hard-coded document date in pubdate element,
162change expansion of entity WebSGML, update status description as per Dan Connolly
163(am not sure about refernece to Berners-Lee et al.), add 'The' to abstract
164as per WG decision, move Relationship to Existing Standards to back matter
165and combine with References, re-order back matter so normative appendices
166come first, re-tag back matter so informative appendices are tagged informdiv1,
167remove XXX XXX from list of 'normative' specs in prose, move some references
168from Other References to Normative References, add RFC 1738, 1808, and 2141
169to Other References (they are not normative since we do not require the processor
170to enforce any rules based on them), add reference to 'Fielding draft' (Berners-Lee
171et al.), move notation section to end of body, drop URIchar non-terminal and
172use SkipLit instead, lose stray reference to defunct nonterminal 'markupdecls',
173move reference to Aho et al. into appendix (Tim's right), add prose note saying
174that hash marks and fragment identifiers are NOT part of the URI formally
175speaking, and are NOT legal in system identifiers (processor 'may' signal
176an error). Work through: Tim Bray reacting to James Clark, Tim Bray on his
177own, Eve Maler, NOT DONE YET: change binary / text to unparsed / parsed. handle
178James's suggestion about &lt; in attriubte values uppercase hex characters,
179namechar list, </sitem>
180<sitem>1997-12-01 : JB : add some column-width parameters</sitem>
181<sitem>1997-12-01 : CMSMcQ : begin round of changes to incorporate recent
182WG decisions and other corrections: binding sources of character encoding
183info (27 Aug / 3 Sept), correct wording of Faust quotation (restore dropped
184line), drop SDD from EncodingDecl, change text at version number 1.0, drop
185misleading (wrong!) sentence about ignorables and extenders, modify definition
186of PCData to make bar on msc grammatical, change grammar's handling of internal
187subset (drop non-terminal markupdecls), change definition of includeSect to
188allow conditional sections, add integral-declaration constraint on internal
189subset, drop misleading / dangerous sentence about relationship of entities
190with system storage objects, change table body tag to htbody as per EM change
191to DTD, add rule about space normalization in public identifiers, add description
192of how to generate our name-space rules from Unicode character database (needs
193further work!). </sitem>
194<sitem>1997-10-08 : TB : Removed %-constructs again, new rules for PE appearance.</sitem>
195<sitem>1997-10-01 : TB : Case-sensitive markup; cleaned up element-type defs,
196lotsa little edits for style</sitem>
197<sitem>1997-09-25 : TB : Change to elm's new DTD, with substantial detail
198cleanup as a side-effect</sitem>
199<sitem>1997-07-24 : CMSMcQ : correct error (lost *) in definition of ignoreSectContents
200(thanks to Makoto Murata)</sitem>
201<sitem>Allow all empty elements to have end-tags, consistent with SGML TC
202(as per JJC).</sitem>
203<sitem>1997-07-23 : CMSMcQ : pre-emptive strike on pending corrections: introduce
204the term 'empty-element tag', note that all empty elements may use it, and
205elements declared EMPTY must use it. Add WFC requiring encoding decl to come
206first in an entity. Redefine notations to point to PIs as well as binary entities.
207Change autodetection table by removing bytes 3 and 4 from examples with Byte
208Order Mark. Add content model as a term and clarify that it applies to both
209mixed and element content. </sitem>
210<sitem>1997-06-30 : CMSMcQ : change date, some cosmetic changes, changes to
211productions for choice, seq, Mixed, NotationType, Enumeration. Follow James
212Clark's suggestion and prohibit conditional sections in internal subset. TO
213DO: simplify production for ignored sections as a result, since we don't need
214to worry about parsers which don't expand PErefs finding a conditional section.</sitem>
215<sitem>1997-06-29 : TB : various edits</sitem>
216<sitem>1997-06-29 : CMSMcQ : further changes: Suppress old FINAL EDIT comments
217and some dead material. Revise occurrences of % in grammar to exploit Henry
218Thompson's pun, especially markupdecl and attdef. Remove RMD requirement relating
219to element content (?). </sitem>
220<sitem>1997-06-28 : CMSMcQ : Various changes for 1 July draft: Add text for
221draconian error handling (introduce the term Fatal Error). RE deleta est (changing
222wording from original announcement to restrict the requirement to validating
223parsers). Tag definition of validating processor and link to it. Add colon
224as name character. Change def of %operator. Change standard definitions of
225lt, gt, amp. Strip leading zeros from #x00nn forms.</sitem>
226<sitem>1997-04-02 : CMSMcQ : final corrections of editorial errors found in
227last night's proofreading. Reverse course once more on well-formed: Webster's
228Second hyphenates it, and that's enough for me.</sitem>
229<sitem>1997-04-01 : CMSMcQ : corrections from JJC, EM, HT, and self</sitem>
230<sitem>1997-03-31 : Tim Bray : many changes</sitem>
231<sitem>1997-03-29 : CMSMcQ : some Henry Thompson (on entity handling), some
232Charles Goldfarb, some ERB decisions (PE handling in miscellaneous declarations.
233Changed Ident element to accept def attribute. Allow normalization of Unicode
234characters. move def of systemliteral into section on literals.</sitem>
235<sitem>1997-03-28 : CMSMcQ : make as many corrections as possible, from Terry
236Allen, Norbert Mikula, James Clark, Jon Bosak, Henry Thompson, Paul Grosso,
237and self. Among other things: give in on "well formed" (Terry is right), tentatively
238rename QuotedCData as AttValue and Literal as EntityValue to be more informative,
239since attribute values are the <emph>only</emph> place QuotedCData was used,
240and vice versa for entity text and Literal. (I'd call it Entity Text, but
2418879 uses that name for both internal and external entities.)</sitem>
242<sitem>1997-03-26 : CMSMcQ : resynch the two forks of this draft, reapply
243my changes dated 03-20 and 03-21. Normalize old 'may not' to 'must not' except
244in the one case where it meant 'may or may not'.</sitem>
245<sitem>1997-03-21 : TB : massive changes on plane flight from Chicago to Vancouver</sitem>
246<sitem>1997-03-21 : CMSMcQ : correct as many reported errors as possible. </sitem>
247<sitem>1997-03-20 : CMSMcQ : correct typos listed in CMSMcQ hand copy of spec.</sitem>
248<sitem>1997-03-20 : CMSMcQ : cosmetic changes preparatory to revision for
249WWW conference April 1997: restore some of the internal entity references
250(e.g. to docdate, etc.), change character xA0 to &amp;nbsp; and define nbsp
251as &amp;#160;, and refill a lot of paragraphs for legibility.</sitem>
252<sitem>1996-11-12 : CMSMcQ : revise using Tim's edits: Add list type of NUMBERED
253and change most lists either to BULLETS or to NUMBERED. Suppress QuotedNames,
254Names (not used). Correct trivial-grammar doc type decl. Rename 'marked section'
255as 'CDATA section' passim. Also edits from James Clark: Define the set of
256characters from which [^abc] subtracts. Charref should use just [0-9] not
257Digit. Location info needs cleaner treatment: remove? (ERB question). One
258example of a PI has wrong pic. Clarify discussion of encoding names. Encoding
259failure should lead to unspecified results; don't prescribe error recovery.
260Don't require exposure of entity boundaries. Ignore white space in element
261content. Reserve entity names of the form u-NNNN. Clarify relative URLs. And
262some of my own: Correct productions for content model: model cannot consist
263of a name, so "elements ::= cp" is no good. </sitem>
264<sitem>1996-11-11 : CMSMcQ : revise for style. Add new rhs to entity declaration,
265for parameter entities.</sitem>
266<sitem>1996-11-10 : CMSMcQ : revise for style. Fix / complete section on names,
267characters. Add sections on parameter entities, conditional sections. Still
268to do: Add compatibility note on deterministic content models. Finish stylistic
269revision.</sitem>
270<sitem>1996-10-31 : TB : Add Entity Handling section</sitem>
271<sitem>1996-10-30 : TB : Clean up term &amp; termdef. Slip in ERB decision
272re EMPTY.</sitem>
273<sitem>1996-10-28 : TB : Change DTD. Implement some of Michael's suggestions.
274Change comments back to //. Introduce language for XML namespace reservation.
275Add section on white-space handling. Lots more cleanup.</sitem>
276<sitem>1996-10-24 : CMSMcQ : quick tweaks, implement some ERB decisions. Characters
277are not integers. Comments are /* */ not //. Add bibliographic refs to 10646,
278HyTime, Unicode. Rename old Cdata as MsData since it's <emph>only</emph> seen
279in marked sections. Call them attribute-value pairs not name-value pairs,
280except once. Internal subset is optional, needs '?'. Implied attributes should
281be signaled to the app, not have values supplied by processor.</sitem>
282<sitem>1996-10-16 : TB : track down &amp; excise all DSD references; introduce
283some EBNF for entity declarations.</sitem>
284<sitem>1996-10-?? : TB : consistency check, fix up scraps so they all parse,
285get formatter working, correct a few productions.</sitem>
286<sitem>1996-10-10/11 : CMSMcQ : various maintenance, stylistic, and organizational
287changes: Replace a few literals with xmlpio and pic entities, to make them
288consistent and ensure we can change pic reliably when the ERB votes. Drop
289paragraph on recognizers from notation section. Add match, exact match to
290terminology. Move old 2.2 XML Processors and Apps into intro. Mention comments,
291PIs, and marked sections in discussion of delimiter escaping. Streamline discussion
292of doctype decl syntax. Drop old section of 'PI syntax' for doctype decl,
293and add section on partial-DTD summary PIs to end of Logical Structures section.
294Revise DSD syntax section to use Tim's subset-in-a-PI mechanism.</sitem>
295<sitem>1996-10-10 : TB : eliminate name recognizers (and more?)</sitem>
296<sitem>1996-10-09 : CMSMcQ : revise for style, consistency through 2.3 (Characters)</sitem>
297<sitem>1996-10-09 : CMSMcQ : re-unite everything for convenience, at least
298temporarily, and revise quickly</sitem>
299<sitem>1996-10-08 : TB : first major homogenization pass</sitem>
300<sitem>1996-10-08 : TB : turn "current" attribute on div type into CDATA</sitem>
301<sitem>1996-10-02 : TB : remould into skeleton + entities</sitem>
302<sitem>1996-09-30 : CMSMcQ : add a few more sections prior to exchange with
303Tim.</sitem>
304<sitem>1996-09-20 : CMSMcQ : finish transcribing notes.</sitem>
305<sitem>1996-09-19 : CMSMcQ : begin transcribing notes for draft.</sitem>
306<sitem>1996-09-13 : CMSMcQ : made outline from notes of 09-06, do some housekeeping</sitem>
307</slist>
308</revisiondesc>
309</header>
310<body>
311<div1 id="sec-intro">
312<head>Introduction</head>
313<p>Extensible Markup Language, abbreviated XML, describes a class of data
314objects called <termref def="dt-xml-doc">XML documents</termref> and partially
315describes the behavior of computer programs which process them. XML is an
316application profile or restricted form of SGML, the Standard Generalized Markup
317Language <bibref ref="ISO8879"/>. By construction, XML documents are conforming
318SGML documents.</p>
319<p>XML documents are made up of storage units called <termref def="dt-entity">entities</termref>,
320which contain either parsed or unparsed data. Parsed data is made up of <termref
321def="dt-character">characters</termref>, some of which form <termref def="dt-chardata">character
322data</termref>, and some of which form <termref def="dt-markup">markup</termref>.
323Markup encodes a description of the document's storage layout and logical
324structure. XML provides a mechanism to impose constraints on the storage layout
325and logical structure.</p>
326<p><termdef id="dt-xml-proc" term="XML Processor">A software module called
327an <term>XML processor</term> is used to read XML documents and provide access
328to their content and structure.</termdef> <termdef id="dt-app" term="Application">It
329is assumed that an XML processor is doing its work on behalf of another module,
330called the <term>application</term>.</termdef> This specification describes
331the required behavior of an XML processor in terms of how it must read XML
332data and the information it must provide to the application.</p>
333<div2 id="sec-origin-goals">
334<head>Origin and Goals</head>
335<p>XML was developed by an XML Working Group (originally known as the SGML
336Editorial Review Board) formed under the auspices of the World Wide Web Consortium
337(W3C) in 1996. It was chaired by Jon Bosak of Sun Microsystems with the active
338participation of an XML Special Interest Group (previously known as the SGML
339Working Group) also organized by the W3C. The membership of the XML Working
340Group is given in an appendix. Dan Connolly served as the WG's contact with
341the W3C.</p>
342<p>The design goals for XML are:</p>
343<olist>
344<item><p>XML shall be straightforwardly usable over the Internet.</p></item>
345<item><p>XML shall support a wide variety of applications.</p></item>
346<item><p>XML shall be compatible with SGML.</p></item>
347<item><p>It shall be easy to write programs which process XML documents.</p>
348</item>
349<item><p>The number of optional features in XML is to be kept to the absolute
350minimum, ideally zero.</p></item>
351<item><p>XML documents should be human-legible and reasonably clear.</p></item>
352<item><p>The XML design should be prepared quickly.</p></item>
353<item><p>The design of XML shall be formal and concise.</p></item>
354<item><p>XML documents shall be easy to create.</p></item>
355<item><p>Terseness in XML markup is of minimal importance.</p></item>
356</olist>
357<p>This specification, together with associated standards (Unicode and ISO/IEC
35810646 for characters, Internet RFC 1766 for language identification tags,
359ISO 639 for language name codes, and ISO 3166 for country name codes), provides
360all the information necessary to understand XML Version &versionOfXML; and
361construct computer programs to process it.</p>
362<p>This version of the XML specification <!-- is for &doc.audience;.--> &doc.distribution;.</p>
363</div2>
364<div2 id="sec-terminology">
365<head>Terminology</head>
366<p>The terminology used to describe XML documents is defined in the body of
367this specification. The terms defined in the following list are used in building
368those definitions and in describing the actions of an XML processor: <glist>
369<gitem><label>may</label>
370<def>
371<p><termdef id="dt-may" term="May">Conforming documents and XML processors
372are permitted to but need not behave as described.</termdef></p>
373</def></gitem>
374<gitem><label>must</label>
375<def>
376<p><termdef id="dt-must" term="Must">Conforming documents and XML processors
377are required to behave as described; otherwise they are in error. <!-- do NOT change this! this is what defines a violation of
378a 'must' clause as 'an error'. -MSM --></termdef></p>
379</def></gitem>
380<gitem><label>error</label>
381<def>
382<p><termdef id="dt-error" term="Error">A violation of the rules of this specification;
383results are undefined. Conforming software may detect and report an error
384and may recover from it.</termdef></p>
385</def></gitem>
386<gitem><label>fatal error</label>
387<def>
388<p><termdef id="dt-fatal" term="Fatal Error">An error which a conforming <termref
389def="dt-xml-proc">XML processor</termref> must detect and report to the application.
390After encountering a fatal error, the processor may continue processing the
391data to search for further errors and may report such errors to the application.
392In order to support correction of errors, the processor may make unprocessed
393data from the document (with intermingled character data and markup) available
394to the application. Once a fatal error is detected, however, the processor
395must not continue normal processing (i.e., it must not continue to pass character
396data and information about the document's logical structure to the application
397in the normal way).</termdef></p>
398</def></gitem>
399<gitem><label>at user option</label>
400<def>
401<p><termdef id="dt-atuseroption" term="At user option">Conforming software
402may or must (depending on the modal verb in the sentence) behave as described;
403if it does, it must provide users a means to enable or disable the behavior
404described.</termdef></p>
405</def></gitem>
406<gitem><label>validity constraint</label>
407<def>
408<p><termdef id="dt-vc" term="Validity constraint">A rule which applies to
409all <termref def="dt-valid">valid</termref> XML documents. Violations of validity
410constraints are errors; they must, at user option, be reported by <termref
411def="dt-validating">validating XML processors</termref>.</termdef></p>
412</def></gitem>
413<gitem><label>well-formedness constraint</label>
414<def>
415<p><termdef id="dt-wfc" term="Well-formedness constraint">A rule which applies
416to all <termref def="dt-wellformed">well-formed</termref> XML documents. Violations
417of well-formedness constraints are <termref def="dt-fatal">fatal errors</termref>.</termdef></p>
418</def></gitem>
419<gitem><label>match</label>
420<def>
421<p><termdef id="dt-match" term="match">(Of strings or names:) Two strings
422or names being compared must be identical. Characters with multiple possible
423representations in ISO/IEC 10646 (e.g. characters with both precomposed and
424base+diacritic forms) match only if they have the same representation in both
425strings. <phrase diff="del"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E85">[E85]</loc>At
426user option, processors may normalize such characters to some canonical form. </phrase>No
427case folding is performed. (Of strings and rules in the grammar:) A string
428matches a grammatical production if it belongs to the language generated by
429that production. (Of content and content models:) An element matches its declaration
430when it conforms in the fashion described in the constraint <specref ref="elementvalid"/>.</termdef></p>
431</def></gitem>
432<gitem><label>for compatibility</label>
433<def>
434<p><termdef id="dt-compat" term="For Compatibility"><phrase diff="add"><loc
435role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E87">[E87]</loc>Marks
436a sentence describing</phrase> a feature of XML included solely to ensure
437that XML remains compatible with SGML.</termdef></p>
438</def></gitem>
439<gitem><label>for interoperability</label>
440<def>
441<p><termdef id="dt-interop" term="For interoperability"><phrase diff="add"><loc
442role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E87">[E87]</loc>Marks
443a sentence describing</phrase> a non-binding recommendation included to increase
444the chances that XML documents can be processed by the existing installed
445base of SGML processors which predate the &WebSGML;.</termdef></p>
446</def></gitem>
447</glist></p>
448</div2>
449</div1>
450<!-- &Docs; -->
451<div1 id="sec-documents">
452<head>Documents</head>
453<p><termdef id="dt-xml-doc" term="XML Document"> A data object is an <term>XML
454document</term> if it is <termref def="dt-wellformed">well-formed</termref>,
455as defined in this specification. A well-formed XML document may in addition
456be <termref def="dt-valid">valid</termref> if it meets certain further constraints.</termdef></p>
457<p>Each XML document has both a logical and a physical structure. Physically,
458the document is composed of units called <termref def="dt-entity">entities</termref>.
459An entity may <termref def="dt-entref">refer</termref> to other entities to
460cause their inclusion in the document. A document begins in a <quote>root</quote>
461or <termref def="dt-docent">document entity</termref>. Logically, the document
462is composed of declarations, elements, comments, character references, and
463processing instructions, all of which are indicated in the document by explicit
464markup. The logical and physical structures must nest properly, as described
465in <specref ref="wf-entities"/>.</p>
466<div2 id="sec-well-formed">
467<head>Well-Formed XML Documents</head>
468<p><termdef id="dt-wellformed" term="Well-Formed"> A textual object is a <term>well-formed</term>
469XML document if:</termdef></p>
470<olist>
471<item><p>Taken as a whole, it matches the production labeled <nt def="NT-document">document</nt>.</p>
472</item>
473<item><p>It meets all the well-formedness constraints given in this specification.</p>
474</item>
475<item><p>Each of the <termref def="dt-parsedent">parsed entities</termref>
476which is referenced directly or indirectly within the document is <termref
477def="dt-wellformed">well-formed</termref>.</p></item>
478</olist>
479<scrap id="document" lang="ebnf">
480<head>Document</head>
481<prod id="NT-document">
482<lhs>document</lhs><rhs><nt def="NT-prolog">prolog</nt> <nt def="NT-element">element</nt> <nt
483def="NT-Misc">Misc</nt>*</rhs>
484</prod>
485</scrap>
486<p>Matching the <nt def="NT-document">document</nt> production implies that:</p>
487<olist>
488<item><p>It contains one or more <termref def="dt-element">elements</termref>.</p>
489</item>
490<!--* N.B. some readers (notably JC) find the following
491paragraph awkward and redundant. I agree it's logically redundant:
492it *says* it is summarizing the logical implications of
493matching the grammar, and that means by definition it's
494logically redundant. I don't think it's rhetorically
495redundant or unnecessary, though, so I'm keeping it. It
496could however use some recasting when the editors are feeling
497stronger. -MSM *-->
498<item><p><termdef id="dt-root" term="Root Element">There is exactly one element,
499called the <term>root</term>, or document element, no part of which appears
500in the <termref def="dt-content">content</termref> of any other element.</termdef> <phrase
501diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E17">[E17]</loc>For
502all other elements, if the <termref def="dt-stag">start-tag</termref> is in
503the content of another element, the <termref def="dt-etag">end-tag</termref>
504is in the content of the same element.</phrase> More simply stated, the elements,
505delimited by start- and end-tags, nest properly within each other.</p></item>
506</olist>
507<p><termdef id="dt-parentchild" term="Parent/Child">As a consequence of this,
508for each non-root element <el>C</el> in the document, there is one other element <el>P</el>
509in the document such that <el>C</el> is in the content of <el>P</el>, but
510is not in the content of any other element that is in the content of <el>P</el>. <el>P</el>
511is referred to as the <term>parent</term> of <el>C</el>, and <el>C</el> as
512a <term>child</term> of <el>P</el>.</termdef></p>
513</div2>
514<div2 id="charsets">
515<head>Characters</head>
516<p><termdef id="dt-text" term="Text">A parsed entity contains <term>text</term>,
517a sequence of <termref def="dt-character">characters</termref>, which may
518represent markup or character data.</termdef> <termdef id="dt-character" term="Character">A <term>character</term>
519is an atomic unit of text as specified by ISO/IEC 10646 <bibref ref="ISO10646"/> <phrase
520diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc>(see
521also <bibref ref="ISO10646-2000"/>)</phrase>. Legal characters are tab, carriage
522return, line feed, and the legal <phrase diff="del"><loc role="erratumref"
523href="http://www.w3.org/XML/xml-19980210-errata#E35">[E35]</loc>graphic </phrase>characters
524of Unicode and ISO/IEC 10646. <phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E69">[E69]</loc>The
525versions of these standards cited in <specref ref="sec-existing-stds"/> were
526current at the time this document was prepared. New characters may be added
527to these standards by amendments or new editions. Consequently, XML processors
528must accept any character in the range specified for <nt def="NT-Char">Char</nt>.</phrase>
529The use of <quote>compatibility characters</quote>, as defined in section
5306.8 of <bibref ref="Unicode"/> <phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc>(see
531also D21 in section 3.6 of <bibref ref="Unicode3"/>)</phrase>, is discouraged.</termdef></p>
532<scrap id="char32" lang="ebnf">
533<head>Character Range</head>
534<prodgroup pcw2="4" pcw4="17.5" pcw5="11">
535<prod id="NT-Char">
536<lhs>Char</lhs><rhs>#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]</rhs>
537<com>any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.</com>
538</prod>
539</prodgroup></scrap>
540<p>The mechanism for encoding character code points into bit patterns may
541vary from entity to entity. All XML processors must accept the UTF-8 and UTF-16
542encodings of 10646; the mechanisms for signaling which of the two is in use,
543or for bringing other encodings into play, are discussed later, in <specref
544ref="charencoding"/>.</p>
545<!--
546<p>Regardless of the specific encoding used, any character in
547the ISO/IEC 10646 character set may be referred to by the decimal
548or hexadecimal equivalent of its UCS-4 code value.
549</p>-->
550</div2>
551<div2 id="sec-common-syn">
552<head>Common Syntactic Constructs</head>
553<p>This section defines some symbols used widely in the grammar.</p>
554<p><nt def="NT-S">S</nt> (white space) consists of one or more space (#x20)
555characters, carriage returns, line feeds, or tabs.</p>
556<scrap id="white" lang="ebnf">
557<head>White Space</head>
558<prodgroup pcw2="4" pcw4="17.5" pcw5="11">
559<prod id="NT-S">
560<lhs>S</lhs><rhs>(#x20 | #x9 | #xD | #xA)+</rhs>
561</prod>
562</prodgroup></scrap>
563<p>Characters are classified for convenience as letters, digits, or other
564characters. <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E30">[E30]</loc>A
565letter consists of an alphabetic or syllabic base character or an ideographic
566character.</phrase> Full definitions of the specific characters in each class
567are given in <specref ref="CharClasses"/>.</p>
568<p><termdef id="dt-name" term="Name">A <term>Name</term> is a token beginning
569with a letter or one of a few punctuation characters, and continuing with
570letters, digits, hyphens, underscores, colons, or full stops, together known
571as name characters.</termdef> Names beginning with the string <quote><code>xml</code></quote>,
572or any string which would match <code>(('X'|'x') ('M'|'m') ('L'|'l'))</code>,
573are reserved for standardization in this or future versions of this specification.</p>
574<note>
575<p diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E98">[E98]</loc>The
576Namespaces in XML Recommendation <bibref ref="xml-names"/> assigns a meaning
577to names containing colon characters. Therefore, authors should not use the
578colon in XML names except for namespace purposes, but XML processors must
579accept the colon as a name character.</p>
580</note>
581<p>An <nt def="NT-Nmtoken">Nmtoken</nt> (name token) is any mixture of name
582characters.</p>
583<scrap lang="ebnf">
584<head>Names and Tokens</head>
585<prod id="NT-NameChar">
586<lhs>NameChar</lhs><rhs><nt def="NT-Letter">Letter</nt> | <nt def="NT-Digit">Digit</nt>
587| '.' | '-' | '_' | ':' | <nt def="NT-CombiningChar">CombiningChar</nt> | <nt
588def="NT-Extender">Extender</nt></rhs>
589</prod>
590<prod id="NT-Name">
591<lhs>Name</lhs><rhs>(<nt def="NT-Letter">Letter</nt> | '_' | ':') (<nt def="NT-NameChar">NameChar</nt>)*</rhs>
592</prod>
593<prod id="NT-Names">
594<lhs>Names</lhs><rhs><nt def="NT-Name">Name</nt> (<nt def="NT-S">S</nt> <nt
595def="NT-Name">Name</nt>)*</rhs>
596</prod>
597<prod id="NT-Nmtoken">
598<lhs>Nmtoken</lhs><rhs>(<nt def="NT-NameChar">NameChar</nt>)+</rhs>
599</prod>
600<prod id="NT-Nmtokens">
601<lhs>Nmtokens</lhs><rhs><nt def="NT-Nmtoken">Nmtoken</nt> (<nt def="NT-S">S</nt> <nt
602def="NT-Nmtoken">Nmtoken</nt>)*</rhs>
603</prod>
604</scrap>
605<p>Literal data is any quoted string not containing the quotation mark used
606as a delimiter for that string. Literals are used for specifying the content
607of internal entities (<nt def="NT-EntityValue">EntityValue</nt>), the values
608of attributes (<nt def="NT-AttValue">AttValue</nt>), and external identifiers
609(<nt def="NT-SystemLiteral">SystemLiteral</nt>). Note that a <nt def="NT-SystemLiteral">SystemLiteral</nt>
610can be parsed without scanning for markup.</p>
611<scrap lang="ebnf">
612<head>Literals</head>
613<prod id="NT-EntityValue">
614<lhs>EntityValue</lhs><rhs>'"' ([^%&amp;"] | <nt def="NT-PEReference">PEReference</nt>
615| <nt def="NT-Reference">Reference</nt>)* '"' </rhs>
616<rhs>|&nbsp; "'" ([^%&amp;'] | <nt def="NT-PEReference">PEReference</nt> | <nt
617def="NT-Reference">Reference</nt>)* "'"</rhs>
618</prod>
619<prod id="NT-AttValue">
620<lhs>AttValue</lhs><rhs>'"' ([^&lt;&amp;"] | <nt def="NT-Reference">Reference</nt>)*
621'"' </rhs>
622<rhs>|&nbsp; "'" ([^&lt;&amp;'] | <nt def="NT-Reference">Reference</nt>)*
623"'"</rhs>
624</prod>
625<prod id="NT-SystemLiteral">
626<lhs>SystemLiteral</lhs><rhs>('"' [^"]* '"') |&nbsp;("'" [^']* "'") </rhs>
627</prod>
628<prod id="NT-PubidLiteral">
629<lhs>PubidLiteral</lhs><rhs>'"' <nt def="NT-PubidChar">PubidChar</nt>* '"'
630| "'" (<nt def="NT-PubidChar">PubidChar</nt> - "'")* "'"</rhs>
631</prod>
632<prod id="NT-PubidChar">
633<lhs>PubidChar</lhs><rhs>#x20 | #xD | #xA |&nbsp;[a-zA-Z0-9] |&nbsp;[-'()+,./:=?;!*#@$_%]</rhs>
634</prod>
635</scrap>
636<note diff="add">
637<p><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E72">[E72]</loc>Although
638the <nt def="NT-EntityValue">EntityValue</nt> production allows the definition
639of an entity consisting of a single explicit <code>&lt;</code> in the literal
640(e.g., <code>&lt;!ENTITY mylt "&lt;"></code>), it is strongly advised to avoid
641this practice since any reference to that entity will cause a well-formedness
642error.</p>
643</note>
644</div2>
645<div2 id="syntax">
646<head>Character Data and Markup</head>
647<p><termref def="dt-text">Text</termref> consists of intermingled <termref
648def="dt-chardata">character data</termref> and markup. <termdef id="dt-markup"
649term="Markup"><term>Markup</term> takes the form of <termref def="dt-stag">start-tags</termref>, <termref
650def="dt-etag">end-tags</termref>, <termref def="dt-empty">empty-element tags</termref>, <termref
651def="dt-entref">entity references</termref>, <termref def="dt-charref">character
652references</termref>, <termref def="dt-comment">comments</termref>, <termref
653def="dt-cdsection">CDATA section</termref> delimiters, <termref def="dt-doctype">document
654type declarations</termref>, <termref def="dt-pi">processing instructions</termref>, <phrase
655diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E89">[E89]</loc><nt
656def="NT-XMLDecl">XML declarations</nt>, <nt def="NT-TextDecl">text declarations</nt>,
657and any white space that is at the top level of the document entity (that
658is, outside the document element and not inside any other markup).</phrase></termdef></p>
659<p><termdef id="dt-chardata" term="Character Data">All text that is not markup
660constitutes the <term>character data</term> of the document.</termdef></p>
661<p>The ampersand character (&amp;) and the left angle bracket (&lt;) may appear
662in their literal form <emph>only</emph> when used as markup delimiters, or
663within a <termref def="dt-comment">comment</termref>, a <termref def="dt-pi">processing
664instruction</termref>, or a <termref def="dt-cdsection">CDATA section</termref>.<phrase
665diff="del"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E18">[E18]</loc>They
666are also legal within the <termref def="dt-litentval">literal entity value</termref>
667of an internal entity declaration; see <specref ref="wf-entities"/>.</phrase> <!-- FINAL EDIT: restore internal entity decl or leave it out. -->
668If they are needed elsewhere, they must be <termref def="dt-escape">escaped</termref>
669using either <termref def="dt-charref">numeric character references</termref>
670or the strings <quote><code>&amp;amp;</code></quote> and <quote><code>&amp;lt;</code></quote>
671respectively. The right angle bracket (>) may be represented using the string <quote><code>&amp;gt;</code></quote>,
672and must, <termref def="dt-compat">for compatibility</termref>, be escaped
673using <quote><code>&amp;gt;</code></quote> or a character reference when it
674appears in the string <quote><code>]]&gt;</code></quote> in content, when
675that string is not marking the end of a <termref def="dt-cdsection">CDATA
676section</termref>.</p>
677<p>In the content of elements, character data is any string of characters
678which does not contain the start-delimiter of any markup. In a CDATA section,
679character data is any string of characters not including the CDATA-section-close
680delimiter, <quote><code>]]&gt;</code></quote>.</p>
681<p>To allow attribute values to contain both single and double quotes, the
682apostrophe or single-quote character (') may be represented as <quote><code>&amp;apos;</code></quote>,
683and the double-quote character (") as <quote><code>&amp;quot;</code></quote>.</p>
684<scrap lang="ebnf">
685<head>Character Data</head>
686<prod id="NT-CharData">
687<lhs>CharData</lhs><rhs>[^&lt;&amp;]* - ([^&lt;&amp;]* ']]&gt;' [^&lt;&amp;]*)</rhs>
688</prod>
689</scrap>
690</div2>
691<div2 id="sec-comments">
692<head>Comments</head>
693<p><termdef id="dt-comment" term="Comment"><term>Comments</term> may appear
694anywhere in a document outside other <termref def="dt-markup">markup</termref>;
695in addition, they may appear within the document type declaration at places
696allowed by the grammar. They are not part of the document's <termref def="dt-chardata">character
697data</termref>; an XML processor may, but need not, make it possible for an
698application to retrieve the text of comments. <termref def="dt-compat">For
699compatibility</termref>, the string <quote><code>--</code></quote> (double-hyphen)
700must not occur within comments.</termdef> <phrase diff="add"><loc role="erratumref"
701href="http://www.w3.org/XML/xml-19980210-errata#E63">[E63]</loc>Parameter
702entity references are not recognized within comments.</phrase></p>
703<scrap lang="ebnf">
704<head>Comments</head>
705<prod id="NT-Comment">
706<lhs>Comment</lhs><rhs>'&lt;!--' ((<nt def="NT-Char">Char</nt> - '-') | ('-'
707(<nt def="NT-Char">Char</nt> - '-')))* '-->'</rhs>
708</prod>
709</scrap>
710<p>An example of a comment:</p>
711<eg>&lt;!&como; declarations for &lt;head> &amp; &lt;body> &comc;></eg>
712<p diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E27">[E27]</loc>Note
713that the grammar does not allow a comment ending in <code>---></code>. The
714following example is <emph>not</emph> well-formed.</p>
715<eg diff="add">&lt;!-- B+, B, or B---></eg>
716</div2>
717<div2 id="sec-pi">
718<head>Processing Instructions</head>
719<p><termdef id="dt-pi" term="Processing instruction"><term>Processing instructions</term>
720(PIs) allow documents to contain instructions for applications.</termdef></p>
721<scrap lang="ebnf">
722<head>Processing Instructions</head>
723<prod id="NT-PI">
724<lhs>PI</lhs><rhs>'&lt;?' <nt def="NT-PITarget">PITarget</nt> (<nt def="NT-S">S</nt>
725(<nt def="NT-Char">Char</nt>* - (<nt def="NT-Char">Char</nt>* &pic; <nt def="NT-Char">Char</nt>*)))? &pic;</rhs>
726</prod>
727<prod id="NT-PITarget">
728<lhs>PITarget</lhs><rhs><nt def="NT-Name">Name</nt> - (('X' | 'x') ('M' |
729'm') ('L' | 'l'))</rhs>
730</prod>
731</scrap>
732<p>PIs are not part of the document's <termref def="dt-chardata">character
733data</termref>, but must be passed through to the application. The PI begins
734with a target (<nt def="NT-PITarget">PITarget</nt>) used to identify the application
735to which the instruction is directed. The target names <quote><code>XML</code></quote>, <quote><code>xml</code></quote>,
736and so on are reserved for standardization in this or future versions of this
737specification. The XML <termref def="dt-notation">Notation</termref> mechanism
738may be used for formal declaration of PI targets. <phrase diff="add"><loc
739role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E63">[E63]</loc>Parameter
740entity references are not recognized within processing instructions.</phrase></p>
741</div2>
742<div2 id="sec-cdata-sect">
743<head>CDATA Sections</head>
744<p><termdef id="dt-cdsection" term="CDATA Section"><term>CDATA sections</term>
745may occur anywhere character data may occur; they are used to escape blocks
746of text containing characters which would otherwise be recognized as markup.
747CDATA sections begin with the string <quote><code>&lt;![CDATA[</code></quote>
748and end with the string <quote><code>]]&gt;</code></quote>:</termdef></p>
749<scrap lang="ebnf">
750<head>CDATA Sections</head>
751<prod id="NT-CDSect">
752<lhs>CDSect</lhs><rhs><nt def="NT-CDStart">CDStart</nt> <nt def="NT-CData">CData</nt> <nt
753def="NT-CDEnd">CDEnd</nt></rhs>
754</prod>
755<prod id="NT-CDStart">
756<lhs>CDStart</lhs><rhs>'&lt;![CDATA['</rhs>
757</prod>
758<prod id="NT-CData">
759<lhs>CData</lhs><rhs>(<nt def="NT-Char">Char</nt>* - (<nt def="NT-Char">Char</nt>*
760']]&gt;' <nt def="NT-Char">Char</nt>*)) </rhs>
761</prod>
762<prod id="NT-CDEnd">
763<lhs>CDEnd</lhs><rhs>']]&gt;'</rhs>
764</prod>
765</scrap>
766<p>Within a CDATA section, only the <nt def="NT-CDEnd">CDEnd</nt> string is
767recognized as markup, so that left angle brackets and ampersands may occur
768in their literal form; they need not (and cannot) be escaped using <quote><code>&amp;lt;</code></quote>
769and <quote><code>&amp;amp;</code></quote>. CDATA sections cannot nest.</p>
770<p>An example of a CDATA section, in which <quote><code>&lt;greeting></code></quote>
771and <quote><code>&lt;/greeting></code></quote> are recognized as <termref
772def="dt-chardata">character data</termref>, not <termref def="dt-markup">markup</termref>:</p>
773<eg>&lt;![CDATA[&lt;greeting>Hello, world!&lt;/greeting>]]&gt; </eg>
774</div2>
775<div2 id="sec-prolog-dtd">
776<head>Prolog and Document Type Declaration</head>
777<p><termdef id="dt-xmldecl" term="XML Declaration">XML documents <phrase diff="chg"><loc
778role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E107">[E107]</loc>should</phrase>
779begin with an <term>XML declaration</term> which specifies the version of
780XML being used.</termdef> For example, the following is a complete XML document, <termref
781def="dt-wellformed">well-formed</termref> but not <termref def="dt-valid">valid</termref>:</p>
782<eg><![CDATA[<?xml version="1.0"?> <greeting>Hello, world!</greeting> ]]></eg>
783<p>and so is this:</p>
784<eg><![CDATA[<greeting>Hello, world!</greeting>]]></eg>
785<p>The version number <quote><code>1.0</code></quote> should be used to indicate
786conformance to this version of this specification; it is an error for a document
787to use the value <quote><code>1.0</code></quote> if it does not conform to
788this version of this specification. It is the intent of the XML working group
789to give later versions of this specification numbers other than <quote><code>1.0</code></quote>,
790but this intent does not indicate a commitment to produce any future versions
791of XML, nor if any are produced, to use any particular numbering scheme. Since
792future versions are not ruled out, this construct is provided as a means to
793allow the possibility of automatic version recognition, should it become necessary.
794Processors may signal an error if they receive documents labeled with versions
795they do not support.</p>
796<p>The function of the markup in an XML document is to describe its storage
797and logical structure and to associate attribute-value pairs with its logical
798structures. XML provides a mechanism, the <termref def="dt-doctype">document
799type declaration</termref>, to define constraints on the logical structure
800and to support the use of predefined storage units. <termdef id="dt-valid"
801term="Validity">An XML document is <term>valid</term> if it has an associated
802document type declaration and if the document complies with the constraints
803expressed in it.</termdef></p>
804<p>The document type declaration must appear before the first <termref def="dt-element">element</termref>
805in the document.</p>
806<scrap id="xmldoc" lang="ebnf">
807<head>Prolog</head>
808<prodgroup pcw2="6" pcw4="17.5" pcw5="9">
809<prod id="NT-prolog">
810<lhs>prolog</lhs><rhs><nt def="NT-XMLDecl">XMLDecl</nt>? <nt def="NT-Misc">Misc</nt>*
811(<nt def="NT-doctypedecl">doctypedecl</nt> <nt def="NT-Misc">Misc</nt>*)?</rhs>
812</prod>
813<prod id="NT-XMLDecl">
814<lhs>XMLDecl</lhs><rhs>&pio; <nt def="NT-VersionInfo">VersionInfo</nt> <nt
815def="NT-EncodingDecl">EncodingDecl</nt>? <nt def="NT-SDDecl">SDDecl</nt>? <nt
816def="NT-S">S</nt>? &pic;</rhs>
817</prod>
818<prod id="NT-VersionInfo" diff="chg">
819<lhs>VersionInfo</lhs><rhs><nt def="NT-S">S</nt> 'version' <nt def="NT-Eq">Eq</nt>
820("'" <nt def="NT-VersionNum">VersionNum</nt> "'" | '"' <nt def="NT-VersionNum">VersionNum</nt>
821'"')<com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E15">[E15]</loc></com></rhs>
822</prod>
823<prod id="NT-Eq">
824<lhs>Eq</lhs><rhs><nt def="NT-S">S</nt>? '=' <nt def="NT-S">S</nt>?</rhs>
825</prod>
826<prod id="NT-VersionNum">
827<lhs>VersionNum</lhs><rhs>([a-zA-Z0-9_.:] | '-')+</rhs>
828</prod>
829<prod id="NT-Misc">
830<lhs>Misc</lhs><rhs><nt def="NT-Comment">Comment</nt> | <nt def="NT-PI">PI</nt>
831| <nt def="NT-S">S</nt></rhs>
832</prod>
833</prodgroup></scrap>
834<p><termdef id="dt-doctype" term="Document Type Declaration">The XML <term>document
835type declaration</term> contains or points to <termref def="dt-markupdecl">markup
836declarations</termref> that provide a grammar for a class of documents. This
837grammar is known as a document type definition, or <term>DTD</term>. The document
838type declaration can point to an external subset (a special kind of <termref
839def="dt-extent">external entity</termref>) containing markup declarations,
840or can contain the markup declarations directly in an internal subset, or
841can do both. The DTD for a document consists of both subsets taken together.</termdef></p>
842<p><termdef id="dt-markupdecl" term="markup declaration"> A <term>markup declaration</term>
843is an <termref def="dt-eldecl">element type declaration</termref>, an <termref
844def="dt-attdecl">attribute-list declaration</termref>, an <termref def="dt-entdecl">entity
845declaration</termref>, or a <termref def="dt-notdecl">notation declaration</termref>.</termdef>
846These declarations may be contained in whole or in part within <termref def="dt-PE">parameter
847entities</termref>, as described in the well-formedness and validity constraints
848below. For <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E14">[E14]</loc>further</phrase>
849information, see <specref ref="sec-physical-struct"/>.</p>
850<scrap id="dtd" lang="ebnf">
851<head>Document Type Definition</head>
852<prodgroup pcw2="6" pcw4="17.5" pcw5="9">
853<prod id="NT-doctypedecl" diff="chg">
854<lhs>doctypedecl</lhs><rhs>'&lt;!DOCTYPE' <nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt>
855(<nt def="NT-S">S</nt> <nt def="NT-ExternalID">ExternalID</nt>)? <nt def="NT-S">S</nt>?
856('[' (<nt def="NT-markupdecl">markupdecl</nt> | <nt diff="chg" def="NT-DeclSep">DeclSep</nt>)*
857']' <nt def="NT-S">S</nt>?)? '>'</rhs><vc def="vc-roottype"/><wfc def="ExtSubset"
858diff="add"/><com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc></com>
859</prod>
860<prod id="NT-DeclSep" diff="add">
861<lhs>DeclSep</lhs><rhs><nt def="NT-PEReference">PEReference</nt> | <nt def="NT-S">S</nt></rhs>
862<wfc def="PE-between-Decls" diff="add"/><com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc></com>
863</prod>
864<prod id="NT-markupdecl">
865<lhs>markupdecl</lhs><rhs><nt def="NT-elementdecl">elementdecl</nt> | <nt
866def="NT-AttlistDecl">AttlistDecl</nt> | <nt def="NT-EntityDecl">EntityDecl</nt>
867| <nt def="NT-NotationDecl">NotationDecl</nt> | <nt def="NT-PI">PI</nt> | <nt
868def="NT-Comment">Comment</nt> </rhs><vc def="vc-PEinMarkupDecl"/><wfc def="wfc-PEinInternalSubset"/>
869</prod>
870</prodgroup></scrap>
871<p diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E82">[E82]</loc>Note
872that it is possible to construct a well-formed document containing a <nt def="NT-doctypedecl">doctypedecl</nt>
873that neither points to an external subset nor contains an internal subset.</p>
874<p>The markup declarations may be made up in whole or in part of the <termref
875def="dt-repltext">replacement text</termref> of <termref def="dt-PE">parameter
876entities</termref>. The productions later in this specification for individual
877nonterminals (<nt def="NT-elementdecl">elementdecl</nt>, <nt def="NT-AttlistDecl">AttlistDecl</nt>,
878and so on) describe the declarations <emph>after</emph> all the parameter
879entities have been <termref def="dt-include">included</termref>.</p>
880<p diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E75">[E75]</loc>Parameter
881entity references are recognized anywhere in the DTD (internal and external
882subsets and external parameter entities), except in literals, processing instructions,
883comments, and the contents of ignored conditional sections (see <specref ref="sec-condition-sect"/>).
884They are also recognized in entity value literals. The use of parameter entities
885in the internal subset is restricted as described below.</p>
886<vcnote id="vc-roottype"><head>Root Element Type</head><p>The <nt def="NT-Name">Name</nt>
887in the document type declaration must match the element type of the <termref
888def="dt-root">root element</termref>.</p>
889</vcnote>
890<vcnote id="vc-PEinMarkupDecl"><head>Proper Declaration/PE Nesting</head>
891<p>Parameter-entity <termref def="dt-repltext">replacement text</termref>
892must be properly nested with markup declarations. That is to say, if either
893the first character or the last character of a markup declaration (<nt def="NT-markupdecl">markupdecl</nt>
894above) is contained in the replacement text for a <termref def="dt-PERef">parameter-entity
895reference</termref>, both must be contained in the same replacement text.</p>
896</vcnote>
897<wfcnote id="wfc-PEinInternalSubset"><head>PEs in Internal Subset</head><p>In
898the internal DTD subset, <termref def="dt-PERef">parameter-entity references</termref>
899can occur only where markup declarations can occur, not within markup declarations.
900(This does not apply to references that occur in external parameter entities
901or to the external subset.)</p>
902</wfcnote>
903<wfcnote id="ExtSubset" diff="add"><head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc>External
904Subset</head><p>The external subset, if any, must match the production for <nt
905def="NT-extSubset">extSubset</nt>.</p>
906</wfcnote>
907<wfcnote id="PE-between-Decls" diff="add"><head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc>PE
908Between Declarations</head><p>The replacement text of a parameter entity reference
909in a <nt def="NT-DeclSep">DeclSep</nt> must match the production <nt def="NT-extSubsetDecl">extSubsetDecl</nt>.</p>
910</wfcnote>
911<p>Like the internal subset, the external subset and any external parameter
912entities <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc>referenced
913in a <nt def="NT-DeclSep">DeclSep</nt></phrase> must consist of a series of
914complete markup declarations of the types allowed by the non-terminal symbol <nt
915def="NT-markupdecl">markupdecl</nt>, interspersed with white space or <termref
916def="dt-PERef">parameter-entity references</termref>. However, portions of
917the contents of the external subset or of <phrase diff="add"><loc role="erratumref"
918href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc>these </phrase>
919external parameter entities may conditionally be ignored by using the <termref
920def="dt-cond-section">conditional section</termref> construct; this is not
921allowed in the internal subset.</p>
922<scrap id="ext-Subset">
923<head>External Subset</head>
924<prodgroup pcw2="6" pcw4="17.5" pcw5="9">
925<prod id="NT-extSubset">
926<lhs>extSubset</lhs><rhs><nt def="NT-TextDecl">TextDecl</nt>? <nt def="NT-extSubsetDecl">extSubsetDecl</nt></rhs>
927</prod>
928<prod id="NT-extSubsetDecl" diff="chg">
929<lhs>extSubsetDecl</lhs><rhs>( <nt def="NT-markupdecl">markupdecl</nt> | <nt
930def="NT-conditionalSect">conditionalSect</nt> | <nt diff="chg" def="NT-DeclSep">DeclSep</nt>)*</rhs>
931<com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc></com>
932</prod>
933</prodgroup></scrap>
934<p>The external subset and external parameter entities also differ from the
935internal subset in that in them, <termref def="dt-PERef">parameter-entity
936references</termref> are permitted <emph>within</emph> markup declarations,
937not only <emph>between</emph> markup declarations.</p>
938<p>An example of an XML document with a document type declaration:</p>
939<eg><![CDATA[<?xml version="1.0"?> <!DOCTYPE greeting SYSTEM "hello.dtd"> <greeting>Hello, world!</greeting> ]]></eg>
940<p>The <termref def="dt-sysid">system identifier</termref> <quote><code>hello.dtd</code></quote>
941gives the <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E78">[E78]</loc>address
942(a URI reference)</phrase> of a DTD for the document.</p>
943<p>The declarations can also be given locally, as in this example:</p>
944<eg><![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
945<!DOCTYPE greeting [
946  <!ELEMENT greeting (#PCDATA)>
947]>
948<greeting>Hello, world!</greeting>]]></eg>
949<p>If both the external and internal subsets are used, the internal subset
950is considered to occur before the external subset. <!-- 'is considered to'? boo. whazzat mean? -->
951This has the effect that entity and attribute-list declarations in the internal
952subset take precedence over those in the external subset.</p>
953</div2>
954<div2 id="sec-rmd">
955<head>Standalone Document Declaration</head>
956<p>Markup declarations can affect the content of the document, as passed from
957an <termref def="dt-xml-proc">XML processor</termref> to an application; examples
958are attribute defaults and entity declarations. The standalone document declaration,
959which may appear as a component of the XML declaration, signals whether or
960not there are such declarations which appear external to the <termref def="dt-docent">document
961entity</termref><phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E64">[E64]</loc>
962or in parameter entities. <termdef id="dt-extmkpdecl" term="External Markup Declaration">An <term>external
963markup declaration</term> is defined as a markup declaration occurring in
964the external subset or in a parameter entity (external or internal, the latter
965being included because non-validating processors are not required to read
966them).</termdef></phrase></p>
967<scrap id="fulldtd" lang="ebnf">
968<head>Standalone Document Declaration</head>
969<prodgroup pcw2="4" pcw4="19.5" pcw5="9">
970<prod id="NT-SDDecl">
971<lhs>SDDecl</lhs><rhs> <nt def="NT-S">S</nt> 'standalone' <nt def="NT-Eq">Eq</nt>
972(("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"')) </rhs><vc def="vc-check-rmd"/>
973</prod>
974</prodgroup></scrap>
975<p>In a standalone document declaration, the value <attval>yes</attval> indicates
976that there are no <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E64">[E64]</loc><termref
977def="dt-extmkpdecl">external markup declarations</termref></phrase> which
978affect the information passed from the XML processor to the application. The
979value <attval>no</attval> indicates that there are or may be such external
980markup declarations. Note that the standalone document declaration only denotes
981the presence of external <emph>declarations</emph>; the presence, in a document,
982of references to external <emph>entities</emph>, when those entities are internally
983declared, does not change its standalone status.</p>
984<p>If there are no external markup declarations, the standalone document declaration
985has no meaning. If there are external markup declarations but there is no
986standalone document declaration, the value <attval>no</attval> is assumed.</p>
987<p>Any XML document for which <code>standalone="no"</code> holds can be converted
988algorithmically to a standalone document, which may be desirable for some
989network delivery applications.</p>
990<vcnote id="vc-check-rmd"><head>Standalone Document Declaration</head><p>The
991standalone document declaration must have the value <attval>no</attval> if
992any external markup declarations contain declarations of:</p>
993<ulist>
994<item><p>attributes with <termref def="dt-default">default</termref> values,
995if elements to which these attributes apply appear in the document without
996specifications of values for these attributes, or</p></item>
997<item><p>entities (other than &magicents;), if <termref def="dt-entref">references</termref>
998to those entities appear in the document, or</p></item>
999<item><p>attributes with values subject to <titleref href="#AVNormalize">normalization</titleref>,
1000where the attribute appears in the document with a value which will change
1001as a result of normalization, or</p></item>
1002<item><p>element types with <termref def="dt-elemcontent">element content</termref>,
1003if white space occurs directly within any instance of those types.</p></item>
1004</ulist>
1005</vcnote>
1006<p>An example XML declaration with a standalone document declaration:</p>
1007<eg>&lt;?xml version="&versionOfXML;" standalone='yes'?></eg>
1008</div2>
1009<div2 id="sec-white-space">
1010<head>White Space Handling</head>
1011<p>In editing XML documents, it is often convenient to use <quote>white space</quote>
1012(spaces, tabs, and blank lines<phrase diff="del"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E39">[E39]</loc>,
1013denoted by the nonterminal <nt def="NT-S">S</nt> in this specification</phrase>)
1014to set apart the markup for greater readability. Such white space is typically
1015not intended for inclusion in the delivered version of the document. On the
1016other hand, <quote>significant</quote> white space that should be preserved
1017in the delivered version is common, for example in poetry and source code.</p>
1018<p>An <termref def="dt-xml-proc">XML processor</termref> must always pass
1019all characters in a document that are not markup through to the application.
1020A <termref def="dt-validating"> validating XML processor</termref> must also
1021inform the application which of these characters constitute white space appearing
1022in <termref def="dt-elemcontent">element content</termref>.</p>
1023<p>A special <termref def="dt-attr">attribute</termref> named <att>xml:space</att>
1024may be attached to an element to signal an intention that in that element,
1025white space should be preserved by applications. In valid documents, this
1026attribute, like any other, must be <termref def="dt-attdecl">declared</termref>
1027if it is used. When declared, it must be given as an <termref def="dt-enumerated">enumerated
1028type</termref> whose <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E81">[E81]</loc>values
1029are one or both of</phrase> <attval>default</attval> and <attval>preserve</attval>.
1030For example:</p>
1031<eg diff="chg"><![CDATA[<!ATTLIST poem  xml:space (default|preserve) 'preserve'>]]>
1032
1033&lt;!-- <loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E81">[E81]</loc>-->
1034&lt;!ATTLIST pre xml:space (preserve) #FIXED 'preserve'></eg>
1035<p>The value <attval>default</attval> signals that applications' default white-space
1036processing modes are acceptable for this element; the value <attval>preserve</attval>
1037indicates the intent that applications preserve all the white space. This
1038declared intent is considered to apply to all elements within the content
1039of the element where it is specified, unless overriden with another instance
1040of the <att>xml:space</att> attribute.</p>
1041<p>The <termref def="dt-root">root element</termref> of any document is considered
1042to have signaled no intentions as regards application space handling, unless
1043it provides a value for this attribute or the attribute is declared with a
1044default value.</p>
1045</div2>
1046<div2 id="sec-line-ends">
1047<head>End-of-Line Handling</head>
1048<p>XML <termref def="dt-parsedent">parsed entities</termref> are often stored
1049in computer files which, for editing convenience, are organized into lines.
1050These lines are typically separated by some combination of the characters
1051carriage-return (#xD) and line-feed (#xA).</p>
1052<p diff="del">To simplify the tasks of <termref def="dt-app">applications</termref>,
1053wherever an external parsed entity or the literal entity value of an internal
1054parsed entity contains either the literal two-character sequence <quote>#xD#xA</quote>
1055or a standalone literal #xD, an <termref def="dt-xml-proc">XML processor</termref>
1056must pass to the application the single character #xA. (This behavior can
1057conveniently be produced by normalizing all line breaks to #xA on input, before
1058parsing.)</p>
1059<p diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E104">[E104]</loc>To
1060simplify the tasks of <termref def="dt-app">applications</termref>, the characters
1061passed to an application by the <termref def="dt-xml-proc">XML processor</termref>
1062must be as if the XML processor normalized all line breaks in external parsed
1063entities (including the document entity) on input, before parsing, by translating
1064both the two-character sequence #xD #xA and any #xD that is not followed by
1065#xA to a single #xA character.</p>
1066</div2>
1067<div2 id="sec-lang-tag">
1068<head>Language Identification</head>
1069<p>In document processing, it is often useful to identify the natural or formal
1070language in which the content is written. A special <termref def="dt-attr">attribute</termref>
1071named <att>xml:lang</att> may be inserted in documents to specify the language
1072used in the contents and attribute values of any element in an XML document.
1073In valid documents, this attribute, like any other, must be <termref def="dt-attdecl">declared</termref>
1074if it is used. <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E73">[E73]</loc>The
1075values of the attribute are language identifiers as defined by <bibref ref="RFC1766"/>, <titleref>Tags
1076for the Identification of Languages</titleref>, or its successor on the IETF
1077Standards Track.</phrase></p>
1078<note diff="add">
1079<p><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E73">[E73]</loc><bibref
1080ref="RFC1766"/> tags are constructed from two-letter language codes as defined
1081by <bibref ref="ISO639"/>, from two-letter country codes as defined by <bibref
1082ref="ISO3166"/>, or from language identifiers registered with the Internet
1083Assigned Numbers Authority <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E58">[E58]</loc><bibref
1084diff="chg" ref="IANA-LANGCODES"/></phrase>. It is expected that the successor
1085to <bibref ref="RFC1766"/> will introduce three-letter language codes for
1086languages not presently covered by <bibref ref="ISO639"/>.</p>
1087</note>
1088<p diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E73">[E73]</loc>(Productions
108933 through 38 have been removed.)</p>
1090<scrap diff="del" lang="ebnf">
1091<head>Language Identification</head>
1092<prod id="NT-LanguageID">
1093<lhs>LanguageID</lhs><rhs><nt def="NT-Langcode">Langcode</nt> ('-' <nt def="NT-Subcode">Subcode</nt>)*</rhs>
1094</prod>
1095<prod id="NT-Langcode">
1096<lhs>Langcode</lhs><rhs><nt def="NT-ISO639Code">ISO639Code</nt> | <nt def="NT-IanaCode">IanaCode</nt>
1097| <nt def="NT-UserCode">UserCode</nt></rhs>
1098</prod>
1099<prod id="NT-ISO639Code">
1100<lhs>ISO639Code</lhs><rhs>([a-z] | [A-Z]) ([a-z] | [A-Z])</rhs>
1101</prod>
1102<prod id="NT-IanaCode">
1103<lhs>IanaCode</lhs><rhs>('i' | 'I') '-' ([a-z] | [A-Z])+</rhs>
1104</prod>
1105<prod id="NT-UserCode">
1106<lhs>UserCode</lhs><rhs>('x' | 'X') '-' ([a-z] | [A-Z])+</rhs>
1107</prod>
1108<prod id="NT-Subcode">
1109<lhs>Subcode</lhs><rhs>([a-z] | [A-Z])+</rhs>
1110</prod>
1111</scrap>
1112<p diff="del">The <nt def="NT-Langcode">Langcode</nt> may be any of the following:</p>
1113<ulist diff="del">
1114<item><p>a two-letter language code as defined by <bibref ref="ISO639"/>, <titleref>Codes
1115for the representation of names of languages</titleref></p></item>
1116<item><p>a language identifier registered with the Internet Assigned Numbers
1117Authority <bibref diff="chg" ref="IANA-LANGCODES"/>; these begin with the
1118prefix <quote><code>i-</code></quote> (or <quote><code>I-</code></quote>)</p>
1119</item>
1120<item><p>a language identifier assigned by the user, or agreed on between
1121parties in private use; these must begin with the prefix <quote><code>x-</code></quote>
1122or <quote><code>X-</code></quote> in order to ensure that they do not conflict
1123with names later standardized or registered with IANA</p></item>
1124</ulist>
1125<p diff="del">There may be any number of <nt def="NT-Subcode">Subcode</nt>
1126segments; if the first subcode segment exists and the Subcode consists of
1127two letters, then it must be a country code from <bibref ref="ISO3166"/>,
1128"Codes for the representation of names of countries." If the first subcode
1129consists of more than two letters, it must be a subcode for the language in
1130question registered with IANA, unless the <nt def="NT-Langcode">Langcode</nt>
1131begins with the prefix "<code>x-</code>" or "<code>X-</code>". </p>
1132<p diff="del">It is customary to give the language code in lower case, and
1133the country code (if any) in upper case. Note that these values, unlike other
1134names in XML documents, are case insensitive.</p>
1135<p>For example:</p>
1136<eg><![CDATA[<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p>
1137<p xml:lang="en-GB">What colour is it?</p>
1138<p xml:lang="en-US">What color is it?</p>
1139<sp who="Faust" desc='leise' xml:lang="de">
1140  <l>Habe nun, ach! Philosophie,</l>
1141  <l>Juristerei, und Medizin</l>
1142  <l>und leider auch Theologie</l>
1143  <l>durchaus studiert mit hei�em Bem�h'n.</l>
1144</sp>]]></eg>
1145<!--<p>The xml:lang value is considered to apply both to the contents of an
1146element and
1147(unless otherwise via attribute default values) to the
1148values of all of its attributes with free-text (CDATA) values. -->
1149<p>The intent declared with <att>xml:lang</att> is considered to apply to
1150all attributes and content of the element where it is specified, unless overridden
1151with an instance of <att>xml:lang</att> on another element within that content.</p>
1152<!--
1153If no
1154value is specified for xml:lang on an element, and no default value is
1155defined for it in the DTD, then the xml:lang attribute of any element
1156takes the same value it has in the parent element, if any. The two
1157technical terms in the following example both have the same effective
1158value for xml:lang:
1159
1160  <p xml:lang="en">Here the keywords are
1161  <term xml:lang="en">shift</term> and
1162  <term>reduce</term>. ...</p>
1163
1164The application, not the XML processor, is responsible for this '
1165inheritance' of attribute values.
1166-->
1167<p>A simple declaration for <att>xml:lang</att> might take the form</p>
1168<eg>xml:lang NMTOKEN #IMPLIED</eg>
1169<p>but specific default values may also be given, if appropriate. In a collection
1170of French poems for English students, with glosses and notes in English, the <att>xml:lang</att>
1171attribute might be declared this way:</p>
1172<eg><![CDATA[<!ATTLIST poem   xml:lang NMTOKEN 'fr'>
1173<!ATTLIST gloss  xml:lang NMTOKEN 'en'>
1174<!ATTLIST note   xml:lang NMTOKEN 'en'>]]></eg>
1175</div2>
1176</div1>
1177<!-- &Elements; -->
1178<div1 id="sec-logical-struct">
1179<head>Logical Structures</head>
1180<p><termdef id="dt-element" term="Element">Each <termref def="dt-xml-doc">XML
1181document</termref> contains one or more <term>elements</term>, the boundaries
1182of which are either delimited by <termref def="dt-stag">start-tags</termref>
1183and <termref def="dt-etag">end-tags</termref>, or, for <termref def="dt-empty">empty</termref>
1184elements, by an <termref def="dt-eetag">empty-element tag</termref>. Each
1185element has a type, identified by name, sometimes called its <quote>generic
1186identifier</quote> (GI), and may have a set of attribute specifications.</termdef>
1187Each attribute specification has a <termref def="dt-attrname">name</termref>
1188and a <termref def="dt-attrval">value</termref>.</p>
1189<scrap lang="ebnf">
1190<head>Element</head>
1191<prod id="NT-element">
1192<lhs>element</lhs><rhs><nt def="NT-EmptyElemTag">EmptyElemTag</nt></rhs>
1193<rhs>| <nt def="NT-STag">STag</nt> <nt def="NT-content">content</nt> <nt def="NT-ETag">ETag</nt></rhs>
1194<wfc def="GIMatch"/><vc def="elementvalid"/>
1195</prod>
1196</scrap>
1197<p>This specification does not constrain the semantics, use, or (beyond syntax)
1198names of the element types and attributes, except that names beginning with
1199a match to <code>(('X'|'x')('M'|'m')('L'|'l'))</code> are reserved for standardization
1200in this or future versions of this specification.</p>
1201<wfcnote id="GIMatch"><head>Element Type Match</head><p>The <nt def="NT-Name">Name</nt>
1202in an element's end-tag must match the element type in the start-tag.</p>
1203</wfcnote>
1204<vcnote id="elementvalid"><head>Element Valid</head><p>An element is valid
1205if there is a declaration matching <nt def="NT-elementdecl">elementdecl</nt>
1206where the <nt def="NT-Name">Name</nt> matches the element type, and one of
1207the following holds:</p>
1208<olist>
1209<item><p>The declaration matches <kw>EMPTY</kw> and the element has no <termref
1210def="dt-content">content</termref>.</p></item>
1211<item><p>The declaration matches <nt def="NT-children">children</nt> and the
1212sequence of <termref def="dt-parentchild">child elements</termref> belongs
1213to the language generated by the regular expression in the content model,
1214with optional white space (characters matching the nonterminal <nt def="NT-S">S</nt>)
1215between <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E59">[E59]</loc>the
1216start-tag and the first child element, between child elements, or between
1217the last child element and the end-tag. Note that a CDATA section containing
1218only white space does not match the nonterminal <nt def="NT-S">S</nt>, and
1219hence cannot appear in these positions.</phrase></p></item>
1220<item><p>The declaration matches <nt def="NT-Mixed">Mixed</nt> and the content
1221consists of <termref def="dt-chardata">character data</termref> and <termref
1222def="dt-parentchild">child elements</termref> whose types match names in the
1223content model.</p></item>
1224<item><p>The declaration matches <kw>ANY</kw>, and the types of any <termref
1225def="dt-parentchild">child elements</termref> have been declared.</p></item>
1226</olist>
1227</vcnote>
1228<div2 id="sec-starttags">
1229<head>Start-Tags, End-Tags, and Empty-Element Tags</head>
1230<p><termdef id="dt-stag" term="Start-Tag">The beginning of every non-empty
1231XML element is marked by a <term>start-tag</term>.</termdef></p>
1232<scrap lang="ebnf">
1233<head>Start-tag</head>
1234<prodgroup pcw2="6" pcw4="15" pcw5="11.5">
1235<prod id="NT-STag">
1236<lhs>STag</lhs><rhs>'&lt;' <nt def="NT-Name">Name</nt> (<nt def="NT-S">S</nt> <nt
1237def="NT-Attribute">Attribute</nt>)* <nt def="NT-S">S</nt>? '>'</rhs><wfc def="uniqattspec"/>
1238</prod>
1239<prod id="NT-Attribute">
1240<lhs>Attribute</lhs><rhs><nt def="NT-Name">Name</nt> <nt def="NT-Eq">Eq</nt> <nt
1241def="NT-AttValue">AttValue</nt></rhs><vc def="ValueType"/><wfc def="NoExternalRefs"/>
1242<wfc def="CleanAttrVals"/>
1243</prod>
1244</prodgroup></scrap>
1245<p>The <nt def="NT-Name">Name</nt> in the start- and end-tags gives the element's <term>type</term>. <termdef
1246id="dt-attr" term="Attribute"> The <nt def="NT-Name">Name</nt>-<nt def="NT-AttValue">AttValue</nt>
1247pairs are referred to as the <term>attribute specifications</term> of the
1248element</termdef>, <termdef id="dt-attrname" term="Attribute Name">with the <nt
1249def="NT-Name">Name</nt> in each pair referred to as the <term>attribute name</term></termdef>
1250and <termdef id="dt-attrval" term="Attribute Value">the content of the <nt
1251def="NT-AttValue">AttValue</nt> (the text between the <code>'</code> or <code>"</code>
1252delimiters) as the <term>attribute value</term>.</termdef><phrase diff="add"><loc
1253role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E46">[E46]</loc>Note
1254that the order of attribute specifications in a start-tag or empty-element
1255tag is not significant.</phrase></p>
1256<wfcnote id="uniqattspec"><head>Unique Att Spec</head><p>No attribute name
1257may appear more than once in the same start-tag or empty-element tag.</p>
1258</wfcnote>
1259<vcnote id="ValueType"><head>Attribute Value Type</head><p>The attribute must
1260have been declared; the value must be of the type declared for it. (For attribute
1261types, see <specref ref="attdecls"/>.)</p>
1262</vcnote>
1263<wfcnote id="NoExternalRefs"><head>No External Entity References</head><p>Attribute
1264values cannot contain direct or indirect entity references to external entities.</p>
1265</wfcnote>
1266<wfcnote id="CleanAttrVals"><head>No <code>&lt;</code> in Attribute Values</head>
1267<p>The <termref def="dt-repltext">replacement text</termref> of any entity
1268referred to directly or indirectly in an attribute value <phrase diff="del"><loc
1269role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E83">[E83]</loc>(other
1270than <quote><code>&amp;lt;</code></quote>) </phrase>must not contain a <code>&lt;</code>.</p>
1271</wfcnote>
1272<p>An example of a start-tag:</p>
1273<eg>&lt;termdef id="dt-dog" term="dog"></eg>
1274<p><termdef id="dt-etag" term="End Tag">The end of every element that begins
1275with a start-tag must be marked by an <term>end-tag</term> containing a name
1276that echoes the element's type as given in the start-tag:</termdef></p>
1277<scrap lang="ebnf">
1278<head>End-tag</head>
1279<prodgroup pcw2="6" pcw4="15" pcw5="11.5">
1280<prod id="NT-ETag">
1281<lhs>ETag</lhs><rhs>'&lt;/' <nt def="NT-Name">Name</nt> <nt def="NT-S">S</nt>?
1282'>'</rhs>
1283</prod>
1284</prodgroup></scrap>
1285<p>An example of an end-tag:</p>
1286<eg>&lt;/termdef></eg>
1287<p><termdef id="dt-content" term="Content">The <termref def="dt-text">text</termref>
1288between the start-tag and end-tag is called the element's <term>content</term>:</termdef></p>
1289<scrap lang="ebnf">
1290<head>Content of Elements</head>
1291<prodgroup pcw2="6" pcw4="15" pcw5="11.5">
1292<prod id="NT-content" diff="chg">
1293<lhs>content</lhs><rhs><nt def="NT-CharData">CharData</nt>? ((<nt def="NT-element">element</nt>
1294| <nt def="NT-Reference">Reference</nt> | <nt def="NT-CDSect">CDSect</nt>
1295| <nt def="NT-PI">PI</nt> | <nt def="NT-Comment">Comment</nt>) <nt def="NT-CharData">CharData</nt>?)*</rhs>
1296<com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E71">[E71]</loc></com>
1297</prod>
1298</prodgroup></scrap>
1299<p><phrase diff="chg"><termdef id="dt-empty" term="Empty"><loc role="erratumref"
1300href="http://www.w3.org/XML/xml-19980210-errata#E97">[E97]</loc>An element
1301with no content is said to be <term>empty</term>.</termdef> The representation
1302of an empty element is either a start-tag immediately followed by an end-tag,
1303or an empty-element tag.</phrase> <termdef id="dt-eetag" term="empty-element tag">An <term>empty-element
1304tag</term> takes a special form:</termdef></p>
1305<scrap lang="ebnf">
1306<head>Tags for Empty Elements</head>
1307<prodgroup pcw2="6" pcw4="15" pcw5="11.5">
1308<prod id="NT-EmptyElemTag">
1309<lhs>EmptyElemTag</lhs><rhs>'&lt;' <nt def="NT-Name">Name</nt> (<nt def="NT-S">S</nt> <nt
1310def="NT-Attribute">Attribute</nt>)* <nt def="NT-S">S</nt>? '/>'</rhs><wfc
1311def="uniqattspec"/>
1312</prod>
1313</prodgroup></scrap>
1314<p>Empty-element tags may be used for any element which has no content, whether
1315or not it is declared using the keyword <kw>EMPTY</kw>. <termref def="dt-interop">For
1316interoperability</termref>, the empty-element tag <phrase diff="chg"><loc
1317role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E45">[E45]</loc>should
1318be used, and should only be used,</phrase> for elements which are declared
1319EMPTY.</p>
1320<p>Examples of empty elements:</p>
1321<eg>&lt;IMG align="left"
1322 src="http://www.w3.org/Icons/WWW/w3c_home" />
1323&lt;br>&lt;/br>
1324&lt;br/></eg>
1325</div2>
1326<div2 id="elemdecls">
1327<head>Element Type Declarations</head>
1328<p>The <termref def="dt-element">element</termref> structure of an <termref
1329def="dt-xml-doc">XML document</termref> may, for <termref def="dt-valid">validation</termref>
1330purposes, be constrained using element type and attribute-list declarations.
1331An element type declaration constrains the element's <termref def="dt-content">content</termref>.</p>
1332<p>Element type declarations often constrain which element types can appear
1333as <termref def="dt-parentchild">children</termref> of the element. At user
1334option, an XML processor may issue a warning when a declaration mentions an
1335element type for which no declaration is provided, but this is not an error.</p>
1336<p><termdef id="dt-eldecl" term="Element Type declaration">An <term>element
1337type declaration</term> takes the form:</termdef></p>
1338<scrap lang="ebnf">
1339<head>Element Type Declaration</head>
1340<prodgroup pcw2="5.5" pcw4="18" pcw5="9">
1341<prod id="NT-elementdecl">
1342<lhs>elementdecl</lhs><rhs>'&lt;!ELEMENT' <nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt> <nt
1343def="NT-S">S</nt> <nt def="NT-contentspec">contentspec</nt> <nt def="NT-S">S</nt>?
1344'>'</rhs><vc def="EDUnique"/>
1345</prod>
1346<prod id="NT-contentspec">
1347<lhs>contentspec</lhs><rhs>'EMPTY' | 'ANY' | <nt def="NT-Mixed">Mixed</nt>
1348| <nt def="NT-children">children</nt> </rhs>
1349</prod>
1350</prodgroup></scrap>
1351<p>where the <nt def="NT-Name">Name</nt> gives the element type being declared.</p>
1352<vcnote id="EDUnique"><head>Unique Element Type Declaration</head><p>No element
1353type may be declared more than once.</p>
1354</vcnote>
1355<p>Examples of element type declarations:</p>
1356<eg>&lt;!ELEMENT br EMPTY>
1357&lt;!ELEMENT p (#PCDATA|emph)* >
1358&lt;!ELEMENT %name.para; %content.para; >
1359&lt;!ELEMENT container ANY></eg>
1360<div3 id="sec-element-content">
1361<head>Element Content</head>
1362<p><termdef id="dt-elemcontent" term="Element content">An element <termref
1363def="dt-stag">type</termref> has <term>element content</term> when elements
1364of that type must contain only <termref def="dt-parentchild">child</termref>
1365elements (no character data), optionally separated by white space (characters
1366matching the nonterminal <nt def="NT-S">S</nt>).</termdef><termdef id="dt-content-model"
1367term="Content model">In this case, the constraint includes a <phrase diff="chg"><loc
1368role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E55">[E55]</loc><term>content
1369model</term></phrase>, a simple grammar governing the allowed types of the
1370child elements and the order in which they are allowed to appear.</termdef>
1371The grammar is built on content particles (<nt def="NT-cp">cp</nt>s), which
1372consist of names, choice lists of content particles, or sequence lists of
1373content particles:</p>
1374<scrap lang="ebnf">
1375<head>Element-content Models</head>
1376<prodgroup pcw2="5.5" pcw4="16" pcw5="11">
1377<prod id="NT-children">
1378<lhs>children</lhs><rhs>(<nt def="NT-choice">choice</nt> | <nt def="NT-seq">seq</nt>)
1379('?' | '*' | '+')?</rhs>
1380</prod>
1381<prod id="NT-cp">
1382<lhs>cp</lhs><rhs>(<nt def="NT-Name">Name</nt> | <nt def="NT-choice">choice</nt>
1383| <nt def="NT-seq">seq</nt>) ('?' | '*' | '+')?</rhs>
1384</prod>
1385<prod id="NT-choice" diff="chg">
1386<lhs>choice</lhs><rhs>'(' <nt def="NT-S">S</nt>? <nt def="NT-cp">cp</nt> ( <nt
1387def="NT-S">S</nt>? '|' <nt def="NT-S">S</nt>? <nt def="NT-cp">cp</nt> )+ <nt
1388def="NT-S">S</nt>? ')'</rhs><com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E50">[E50]</loc></com>
1389<com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E52">[E52]</loc></com>
1390<vc def="vc-PEinGroup"/>
1391</prod>
1392<prod id="NT-seq" diff="chg">
1393<lhs>seq</lhs><rhs>'(' <nt def="NT-S">S</nt>? <nt def="NT-cp">cp</nt> ( <nt
1394def="NT-S">S</nt>? ',' <nt def="NT-S">S</nt>? <nt def="NT-cp">cp</nt> )* <nt
1395def="NT-S">S</nt>? ')'</rhs><com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E52">[E52]</loc></com>
1396<vc def="vc-PEinGroup"/>
1397</prod>
1398</prodgroup></scrap>
1399<p>where each <nt def="NT-Name">Name</nt> is the type of an element which
1400may appear as a <termref def="dt-parentchild">child</termref>. Any content
1401particle in a choice list may appear in the <termref def="dt-elemcontent">element
1402content</termref> at the location where the choice list appears in the grammar;
1403content particles occurring in a sequence list must each appear in the <termref
1404def="dt-elemcontent">element content</termref> in the order given in the list.
1405The optional character following a name or list governs whether the element
1406or the content particles in the list may occur one or more (<code>+</code>),
1407zero or more (<code>*</code>), or zero or one times (<code>?</code>). The
1408absence of such an operator means that the element or content particle must
1409appear exactly once. This syntax and meaning are identical to those used in
1410the productions in this specification.</p>
1411<p>The content of an element matches a content model if and only if it is
1412possible to trace out a path through the content model, obeying the sequence,
1413choice, and repetition operators and matching each element in the content
1414against an element type in the content model. <termref def="dt-compat">For
1415compatibility</termref>, it is an error if an element in the document can
1416match more than one occurrence of an element type in the content model. For
1417more information, see <specref ref="determinism"/>.</p>
1418<!--appendix <specref ref="determinism"/>.-->
1419<!-- appendix on deterministic content models. -->
1420<vcnote id="vc-PEinGroup"><head>Proper Group/PE Nesting</head><p>Parameter-entity <termref
1421def="dt-repltext">replacement text</termref> must be properly nested with <phrase
1422diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E11">[E11]</loc>parenthesized</phrase>
1423groups. That is to say, if either of the opening or closing parentheses in
1424a <nt def="NT-choice">choice</nt>, <nt def="NT-seq">seq</nt>, or <nt def="NT-Mixed">Mixed</nt>
1425construct is contained in the replacement text for a <termref def="dt-PERef">parameter
1426entity</termref>, both must be contained in the same replacement text.</p>
1427<p diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E19">[E19]</loc><termref
1428def="dt-interop">For interoperability</termref>, if a parameter-entity reference
1429appears in a <nt def="NT-choice">choice</nt>, <nt def="NT-seq">seq</nt>, or <nt
1430def="NT-Mixed">Mixed</nt> construct, its replacement text should contain at
1431least one non-blank character, and neither the first nor last non-blank character
1432of the replacement text should be a connector (<code>|</code> or <code>,</code>).</p>
1433</vcnote>
1434<p>Examples of element-content models:</p>
1435<eg>&lt;!ELEMENT spec (front, body, back?)>
1436&lt;!ELEMENT div1 (head, (p | list | note)*, div2*)>
1437&lt;!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*></eg>
1438</div3>
1439<div3 id="sec-mixed-content">
1440<head>Mixed Content</head>
1441<p><termdef id="dt-mixed" term="Mixed Content">An element <termref def="dt-stag">type</termref>
1442has <term>mixed content</term> when elements of that type may contain character
1443data, optionally interspersed with <termref def="dt-parentchild">child</termref>
1444elements.</termdef> In this case, the types of the child elements may be constrained,
1445but not their order or their number of occurrences:</p>
1446<scrap lang="ebnf">
1447<head>Mixed-content Declaration</head>
1448<prodgroup pcw2="5.5" pcw4="16" pcw5="11">
1449<prod id="NT-Mixed">
1450<lhs>Mixed</lhs><rhs>'(' <nt def="NT-S">S</nt>? '#PCDATA' (<nt def="NT-S">S</nt>?
1451'|' <nt def="NT-S">S</nt>? <nt def="NT-Name">Name</nt>)* <nt def="NT-S">S</nt>?
1452')*' </rhs>
1453<rhs>| '(' <nt def="NT-S">S</nt>? '#PCDATA' <nt def="NT-S">S</nt>? ')' </rhs>
1454<vc def="vc-PEinGroup"/><vc def="vc-MixedChildrenUnique"/>
1455</prod>
1456</prodgroup></scrap>
1457<p>where the <nt def="NT-Name">Name</nt>s give the types of elements that
1458may appear as children. <phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E10">[E10]</loc>The
1459keyword <kw>#PCDATA</kw> derives historically from the term <quote>parsed
1460character data.</quote></phrase></p>
1461<vcnote id="vc-MixedChildrenUnique"><head>No Duplicate Types</head><p>The
1462same name must not appear more than once in a single mixed-content declaration.</p>
1463</vcnote>
1464<p>Examples of mixed content declarations:</p>
1465<eg>&lt;!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
1466&lt;!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* >
1467&lt;!ELEMENT b (#PCDATA)></eg>
1468</div3>
1469</div2>
1470<div2 id="attdecls">
1471<head>Attribute-List Declarations</head>
1472<p><termref def="dt-attr">Attributes</termref> are used to associate name-value
1473pairs with <termref def="dt-element">elements</termref>. Attribute specifications
1474may appear only within <termref def="dt-stag">start-tags</termref> and <termref
1475def="dt-eetag">empty-element tags</termref>; thus, the productions used to
1476recognize them appear in <specref ref="sec-starttags"/>. Attribute-list declarations
1477may be used:</p>
1478<ulist>
1479<item><p>To define the set of attributes pertaining to a given element type.</p>
1480</item>
1481<item><p>To establish type constraints for these attributes.</p></item>
1482<item><p>To provide <termref def="dt-default">default values</termref> for
1483attributes.</p></item>
1484</ulist>
1485<p><termdef id="dt-attdecl" term="Attribute-List Declaration"> <term>Attribute-list
1486declarations</term> specify the name, data type, and default value (if any)
1487of each attribute associated with a given element type:</termdef></p>
1488<scrap lang="ebnf">
1489<head>Attribute-list Declaration</head>
1490<prod id="NT-AttlistDecl">
1491<lhs>AttlistDecl</lhs><rhs>'&lt;!ATTLIST' <nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt> <nt
1492def="NT-AttDef">AttDef</nt>* <nt def="NT-S">S</nt>? '>'</rhs>
1493</prod>
1494<prod id="NT-AttDef">
1495<lhs>AttDef</lhs><rhs><nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt> <nt
1496def="NT-S">S</nt> <nt def="NT-AttType">AttType</nt> <nt def="NT-S">S</nt> <nt
1497def="NT-DefaultDecl">DefaultDecl</nt></rhs>
1498</prod>
1499</scrap>
1500<p>The <nt def="NT-Name">Name</nt> in the <nt def="NT-AttlistDecl">AttlistDecl</nt>
1501rule is the type of an element. At user option, an XML processor may issue
1502a warning if attributes are declared for an element type not itself declared,
1503but this is not an error. The <nt def="NT-Name">Name</nt> in the <nt def="NT-AttDef">AttDef</nt>
1504rule is the name of the attribute.</p>
1505<p>When more than one <nt def="NT-AttlistDecl">AttlistDecl</nt> is provided
1506for a given element type, the contents of all those provided are merged. When
1507more than one definition is provided for the same attribute of a given element
1508type, the first declaration is binding and later declarations are ignored. <phrase
1509diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E9">[E9]</loc><termref
1510def="dt-interop">For interoperability,</termref> writers of DTDs may choose
1511to provide at most one attribute-list declaration for a given element type,
1512at most one attribute definition for a given attribute name in an attribute-list
1513declaration, and at least one attribute definition in each attribute-list
1514declaration.</phrase> For interoperability, an XML processor may at user option
1515issue a warning when more than one attribute-list declaration is provided
1516for a given element type, or more than one attribute definition is provided
1517for a given attribute, but this is not an error.</p>
1518<div3 id="sec-attribute-types">
1519<head>Attribute Types</head>
1520<p>XML attribute types are of three kinds: a string type, a set of tokenized
1521types, and enumerated types. The string type may take any literal string as
1522a value; the tokenized types have varying lexical and semantic constraints<phrase
1523diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E8">[E8]</loc>.
1524The validity constraints noted in the grammar are applied after the attribute
1525value has been normalized as described in <specref ref="attdecls"/>.</phrase></p>
1526<scrap lang="ebnf">
1527<head>Attribute Types</head>
1528<prodgroup pcw4="14" pcw5="11.5">
1529<prod id="NT-AttType">
1530<lhs>AttType</lhs><rhs><nt def="NT-StringType">StringType</nt> | <nt def="NT-TokenizedType">TokenizedType</nt>
1531| <nt def="NT-EnumeratedType">EnumeratedType</nt> </rhs>
1532</prod>
1533<prod id="NT-StringType">
1534<lhs>StringType</lhs><rhs>'CDATA'</rhs>
1535</prod>
1536<prod id="NT-TokenizedType">
1537<lhs>TokenizedType</lhs><rhs>'ID'</rhs><vc def="id"/><vc def="one-id-per-el"/>
1538<vc def="id-default"/>
1539<rhs>| 'IDREF'</rhs><vc def="idref"/>
1540<rhs>| 'IDREFS'</rhs><vc def="idref"/>
1541<rhs>| 'ENTITY'</rhs><vc def="entname"/>
1542<rhs>| 'ENTITIES'</rhs><vc def="entname"/>
1543<rhs>| 'NMTOKEN'</rhs><vc def="nmtok"/>
1544<rhs>| 'NMTOKENS'</rhs><vc def="nmtok"/>
1545</prod>
1546</prodgroup></scrap>
1547<vcnote id="id"><head>ID</head><p>Values of type <kw>ID</kw> must match the <nt
1548def="NT-Name">Name</nt> production. A name must not appear more than once
1549in an XML document as a value of this type; i.e., ID values must uniquely
1550identify the elements which bear them.</p>
1551</vcnote>
1552<vcnote id="one-id-per-el"><head>One ID per Element Type</head><p>No element
1553type may have more than one ID attribute specified.</p>
1554</vcnote>
1555<vcnote id="id-default"><head>ID Attribute Default</head><p>An ID attribute
1556must have a declared default of <kw>#IMPLIED</kw> or <kw>#REQUIRED</kw>.</p>
1557</vcnote>
1558<vcnote id="idref"><head>IDREF</head><p>Values of type <kw>IDREF</kw> must
1559match the <nt def="NT-Name">Name</nt> production, and values of type <kw>IDREFS</kw>
1560must match <nt def="NT-Names">Names</nt>; each <nt def="NT-Name">Name</nt>
1561must match the value of an ID attribute on some element in the XML document;
1562i.e. <kw>IDREF</kw> values must match the value of some ID attribute.</p>
1563</vcnote>
1564<vcnote id="entname"><head>Entity Name</head><p>Values of type <kw>ENTITY</kw>
1565must match the <nt def="NT-Name">Name</nt> production, values of type <kw>ENTITIES</kw>
1566must match <nt def="NT-Names">Names</nt>; each <nt def="NT-Name">Name</nt>
1567must match the name of an <termref def="dt-unparsed">unparsed entity</termref>
1568declared in the <termref def="dt-doctype">DTD</termref>.</p>
1569</vcnote>
1570<vcnote id="nmtok"><head>Name Token</head><p>Values of type <kw>NMTOKEN</kw>
1571must match the <nt def="NT-Nmtoken">Nmtoken</nt> production; values of type <kw>NMTOKENS</kw>
1572must match <termref def="NT-Nmtokens">Nmtokens</termref>.</p>
1573</vcnote>
1574<!-- why?
1575<p>The XML processor must normalize attribute values before
1576passing them to the application, as described in
1577<specref ref="AVNormalize"/>.</p>-->
1578<p><termdef id="dt-enumerated" term="Enumerated Attribute
1579Values"><term>Enumerated attributes</term> can take one of a list of values
1580provided in the declaration</termdef>. There are two kinds of enumerated types:</p>
1581<scrap lang="ebnf">
1582<head>Enumerated Attribute Types</head>
1583<prod id="NT-EnumeratedType">
1584<lhs>EnumeratedType</lhs><rhs><nt def="NT-NotationType">NotationType</nt>
1585| <nt def="NT-Enumeration">Enumeration</nt> </rhs>
1586</prod>
1587<prod id="NT-NotationType">
1588<lhs>NotationType</lhs><rhs>'NOTATION' <nt def="NT-S">S</nt> '(' <nt def="NT-S">S</nt>? <nt
1589def="NT-Name">Name</nt> (<nt def="NT-S">S</nt>? '|' <nt def="NT-S">S</nt>? <nt
1590def="NT-Name">Name</nt>)* <nt def="NT-S">S</nt>? ')' </rhs><vc def="notatn"/>
1591<vc def="OneNotationPer" diff="add"/><vc def="NoNotationEmpty" diff="add"/>
1592</prod>
1593<prod id="NT-Enumeration">
1594<lhs>Enumeration</lhs><rhs>'(' <nt def="NT-S">S</nt>? <nt def="NT-Nmtoken">Nmtoken</nt>
1595(<nt def="NT-S">S</nt>? '|' <nt def="NT-S">S</nt>? <nt def="NT-Nmtoken">Nmtoken</nt>)* <nt
1596def="NT-S">S</nt>? ')'</rhs><vc def="enum"/>
1597</prod>
1598</scrap>
1599<p>A <kw>NOTATION</kw> attribute identifies a <termref def="dt-notation">notation</termref>,
1600declared in the DTD with associated system and/or public identifiers, to be
1601used in interpreting the element to which the attribute is attached.</p>
1602<vcnote id="notatn"><head>Notation Attributes</head><p>Values of this type
1603must match one of the <titleref href="#Notations">notation</titleref> names
1604included in the declaration; all notation names in the declaration must be
1605declared.</p>
1606</vcnote>
1607<vcnote id="OneNotationPer" diff="add"><head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E7">[E7]</loc>One
1608Notation Per Element Type</head><p>No element type may have more than one <kw>NOTATION</kw>
1609attribute specified.</p>
1610</vcnote>
1611<vcnote id="NoNotationEmpty" diff="add"><head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E68">[E68]</loc>No
1612Notation on Empty Element</head><p><termref def="dt-compat">For compatibility</termref>,
1613an attribute of type <kw>NOTATION</kw> must not be declared on an element
1614declared <kw>EMPTY</kw>.</p>
1615</vcnote>
1616<vcnote id="enum"><head>Enumeration</head><p>Values of this type must match
1617one of the <nt def="NT-Nmtoken">Nmtoken</nt> tokens in the declaration.</p>
1618</vcnote>
1619<p><termref def="dt-interop">For interoperability,</termref> the same <nt
1620def="NT-Nmtoken">Nmtoken</nt> should not occur more than once in the enumerated
1621attribute types of a single element type.</p>
1622</div3>
1623<div3 id="sec-attr-defaults">
1624<head>Attribute Defaults</head>
1625<p>An <termref def="dt-attdecl">attribute declaration</termref> provides information
1626on whether the attribute's presence is required, and if not, how an XML processor
1627should react if a declared attribute is absent in a document.</p>
1628<scrap lang="ebnf">
1629<head>Attribute Defaults</head>
1630<prodgroup pcw4="14" pcw5="11.5">
1631<prod id="NT-DefaultDecl">
1632<lhs>DefaultDecl</lhs><rhs>'#REQUIRED' |&nbsp;'#IMPLIED' </rhs>
1633<rhs>| (('#FIXED' S)? <nt def="NT-AttValue">AttValue</nt>)</rhs><vc def="RequiredAttr"/>
1634<vc def="defattrvalid"/><wfc def="CleanAttrVals"/><vc def="FixedAttr"/>
1635</prod>
1636</prodgroup></scrap>
1637<p>In an attribute declaration, <kw>#REQUIRED</kw> means that the attribute
1638must always be provided, <kw>#IMPLIED</kw> that no default value is provided. <!-- not any more!!
1639<kw>#IMPLIED</kw> means that if the attribute is omitted
1640from an element of this type,
1641the XML processor must inform the application
1642that no value was specified; no constraint is placed on the behavior
1643of the application. --> <termdef id="dt-default" term="Attribute Default">If
1644the declaration is neither <kw>#REQUIRED</kw> nor <kw>#IMPLIED</kw>, then
1645the <nt def="NT-AttValue">AttValue</nt> value contains the declared <term>default</term>
1646value; the <kw>#FIXED</kw> keyword states that the attribute must always have
1647the default value. If a default value is declared, when an XML processor encounters
1648an omitted attribute, it is to behave as though the attribute were present
1649with the declared default value.</termdef></p>
1650<vcnote id="RequiredAttr"><head>Required Attribute</head><p>If the default
1651declaration is the keyword <kw>#REQUIRED</kw>, then the attribute must be
1652specified for all elements of the type in the attribute-list declaration.</p>
1653</vcnote>
1654<vcnote id="defattrvalid"><head>Attribute Default Legal</head><p>The declared
1655default value must meet the lexical constraints of the declared attribute
1656type.</p>
1657</vcnote>
1658<vcnote id="FixedAttr"><head>Fixed Attribute Default</head><p>If an attribute
1659has a default value declared with the <kw>#FIXED</kw> keyword, instances of
1660that attribute must match the default value.</p>
1661</vcnote>
1662<p>Examples of attribute-list declarations:</p>
1663<eg>&lt;!ATTLIST termdef
1664          id      ID      #REQUIRED
1665          name    CDATA   #IMPLIED>
1666&lt;!ATTLIST list
1667          type    (bullets|ordered|glossary)  "ordered">
1668&lt;!ATTLIST form
1669          method  CDATA   #FIXED "POST"></eg>
1670</div3>
1671<div3 id="AVNormalize" diff="chg">
1672<head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E70">[E70]</loc>Attribute-Value
1673Normalization</head>
1674<p>Before the value of an attribute is passed to the application or checked
1675for validity, the XML processor must normalize the attribute value by applying
1676the algorithm below, or by using some other method such that the value passed
1677to the application is the same as that produced by the algorithm.</p>
1678<olist>
1679<item><p>All line breaks must have been normalized on input to #xA as described
1680in <specref ref="sec-line-ends"/>, so the rest of this algorithm operates
1681on text normalized in this way.</p></item>
1682<item><p>Begin with a normalized value consisting of the empty string.</p>
1683</item>
1684<item><p>For each character, entity reference, or character reference in the
1685unnormalized attribute value, beginning with the first and continuing to the
1686last, do the following:</p>
1687<ulist>
1688<item><p>For a character reference, append the referenced character to the
1689normalized value.</p></item>
1690<item><p>For an entity reference, recursively apply step 3 of this algorithm
1691to the replacement text of the entity.</p></item>
1692<item><p>For a white space character (#x20, #xD, #xA, #x9), append a space
1693character (#x20) to the normalized value.</p></item>
1694<item><p>For another character, append the character to the normalized value.</p>
1695</item>
1696</ulist>
1697</item>
1698</olist>
1699<p>If the attribute type is not CDATA, then the XML processor must further
1700process the normalized attribute value by discarding any leading and trailing
1701space (#x20) characters, and by replacing sequences of space (#x20) characters
1702by a single space (#x20) character.</p>
1703<p>Note that if the unnormalized attribute value contains a character reference
1704to a white space character other than space (#x20), the normalized value contains
1705the referenced character itself (#xD, #xA or #x9). This contrasts with the
1706case where the unnormalized value contains a white space character (not a
1707reference), which is replaced with a space character (#x20) in the normalized
1708value and also contrasts with the case where the unnormalized value contains
1709an entity reference whose replacement text contains a white space character;
1710being recursively processed, the white space character is replaced with a
1711space character (#x20) in the normalized value.</p>
1712<p>All attributes for which no declaration has been read should be treated
1713by a non-validating <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E95">[E95]</loc>processor</phrase>
1714as if declared <kw>CDATA</kw>.</p>
1715<p>Following are examples of attribute normalization. Given the following
1716declarations:</p>
1717<eg>&lt;!ENTITY d "&amp;#xD;">
1718&lt;!ENTITY a "&amp;#xA;">
1719&lt;!ENTITY da "&amp;#xD;&amp;#xA;"></eg>
1720<p>the attribute specifications in the left column below would be normalized
1721to the character sequences of the middle column if the attribute <att>a</att>
1722is declared <kw>NMTOKENS</kw> and to those of the right columns if <att>a</att>
1723is declared <kw>CDATA</kw>.</p>
1724<table border="1" frame="border"><thead><tr><th>Attribute specification</th>
1725<th>a is NMTOKENS</th><th>a is CDATA</th></tr></thead><tbody><tr><td><eg>a="
1726
1727xyz"</eg></td><td><code>x y z</code></td><td><code>#x20 #x20 x y z</code></td>
1728</tr><tr><td><eg>a="&amp;d;&amp;d;A&amp;a;&amp;a;B&amp;da;"</eg></td><td><code>A
1729#x20 B</code></td><td><code>#x20 #x20 A #x20 #x20 B #x20 #x20</code></td>
1730</tr><tr><td><eg>a=
1731"&amp;#xd;&amp;#xd;A&amp;#xa;&amp;#xa;B&amp;#xd;&amp;#xa;"</eg></td><td><code>#xD
1732#xD A #xA #xA B #xD #xA</code></td><td><code>#xD #xD A #xA #xA B #xD #xD</code></td>
1733</tr></tbody></table>
1734<p>Note that the last example is invalid (but well-formed) if <att>a</att>
1735is declared to be of type <kw>NMTOKENS</kw>.</p>
1736</div3>
1737</div2>
1738<div2 id="sec-condition-sect">
1739<head>Conditional Sections</head>
1740<p><termdef id="dt-cond-section" term="conditional section"> <term>Conditional
1741sections</term> are portions of the <termref def="dt-doctype">document type
1742declaration external subset</termref> which are included in, or excluded from,
1743the logical structure of the DTD based on the keyword which governs them.</termdef></p>
1744<scrap lang="ebnf">
1745<head>Conditional Section</head>
1746<prodgroup pcw2="9" pcw4="14.5">
1747<prod id="NT-conditionalSect">
1748<lhs>conditionalSect</lhs><rhs><nt def="NT-includeSect">includeSect</nt> | <nt
1749def="NT-ignoreSect">ignoreSect</nt> </rhs>
1750</prod>
1751<prod id="NT-includeSect">
1752<lhs>includeSect</lhs><rhs>'&lt;![' S? 'INCLUDE' S? '[' <nt def="NT-extSubsetDecl">extSubsetDecl</nt>
1753']]&gt;' </rhs><com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E90">[E90]</loc></com>
1754<vc def="condsec-nesting" diff="add"/>
1755</prod>
1756<prod id="NT-ignoreSect">
1757<lhs>ignoreSect</lhs><rhs>'&lt;![' S? 'IGNORE' S? '[' <nt def="NT-ignoreSectContents">ignoreSectContents</nt>*
1758']]&gt;'</rhs><com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E90">[E90]</loc></com>
1759<vc def="condsec-nesting" diff="add"/>
1760</prod>
1761<prod id="NT-ignoreSectContents">
1762<lhs>ignoreSectContents</lhs><rhs><nt def="NT-Ignore">Ignore</nt> ('&lt;![' <nt
1763def="NT-ignoreSectContents">ignoreSectContents</nt> ']]&gt;' <nt def="NT-Ignore">Ignore</nt>)*</rhs>
1764</prod>
1765<prod id="NT-Ignore">
1766<lhs>Ignore</lhs><rhs><nt def="NT-Char">Char</nt>* - (<nt def="NT-Char">Char</nt>*
1767('&lt;![' | ']]&gt;') <nt def="NT-Char">Char</nt>*) </rhs>
1768</prod>
1769</prodgroup></scrap>
1770<vcnote id="condsec-nesting" diff="add"><head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E90">[E90]</loc>Proper
1771Conditional Section/PE Nesting</head><p>If any of the "<code>&lt;![</code>",
1772"<code>[</code>", or "<code>]]&gt;</code>" of a conditional section is contained
1773in the replacement text for a parameter-entity reference, all of them must
1774be contained in the same replacement text.</p>
1775</vcnote>
1776<p>Like the internal and external DTD subsets, a conditional section may contain
1777one or more complete declarations, comments, processing instructions, or nested
1778conditional sections, intermingled with white space.</p>
1779<p>If the keyword of the conditional section is <kw>INCLUDE</kw>, then the
1780contents of the conditional section are part of the DTD. If the keyword of
1781the conditional section is <kw>IGNORE</kw>, then the contents of the conditional
1782section are not logically part of the DTD. <phrase diff="del"><loc role="erratumref"
1783href="http://www.w3.org/XML/xml-19980210-errata#E90">[E90]</loc>Note that
1784for reliable parsing, the contents of even ignored conditional sections must
1785be read in order to detect nested conditional sections and ensure that the
1786end of the outermost (ignored) conditional section is properly detected.</phrase>
1787If a conditional section with a keyword of <kw>INCLUDE</kw> occurs within
1788a larger conditional section with a keyword of <kw>IGNORE</kw>, both the outer
1789and the inner conditional sections are ignored.<phrase diff="add"> <loc role="erratumref"
1790href="http://www.w3.org/XML/xml-19980210-errata#E90">[E90]</loc>The contents
1791of an ignored conditional section are parsed by ignoring all characters after
1792the "<code>[</code>" following the keyword, except conditional section starts
1793"<code>&lt;![</code>" and ends "<code>]]&gt;</code>", until the matching conditional
1794section end is found. Parameter entity references are not recognized in this
1795process.</phrase></p>
1796<p>If the keyword of the conditional section is a parameter-entity reference,
1797the parameter entity must be replaced by its content before the processor
1798decides whether to include or ignore the conditional section.</p>
1799<p>An example:</p>
1800<eg>&lt;!ENTITY % draft 'INCLUDE' >
1801&lt;!ENTITY % final 'IGNORE' >
1802
1803&lt;![%draft;[
1804&lt;!ELEMENT book (comments*, title, body, supplements?)>
1805]]&gt;
1806&lt;![%final;[
1807&lt;!ELEMENT book (title, body, supplements?)>
1808]]&gt;</eg>
1809</div2>
1810<!--
1811<div2 id='sec-pass-to-app'>
1812<head>XML Processor Treatment of Logical Structure</head>
1813<p>When an XML processor encounters a start-tag, it must make
1814at least the following information available to the application:
1815<ulist>
1816<item>
1817<p>the element type's generic identifier</p>
1818</item>
1819<item>
1820<p>the names of attributes known to apply to this element type
1821(validating processors must make available names of all attributes
1822declared for the element type; non-validating processors must
1823make available at least the names of the attributes for which
1824values are specified.
1825</p>
1826</item>
1827</ulist>
1828</p>
1829</div2>
1830-->
1831</div1>
1832<!-- &Entities; -->
1833<div1 id="sec-physical-struct">
1834<head>Physical Structures</head>
1835<p><termdef id="dt-entity" term="Entity">An XML document may consist of one
1836or many storage units. <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E6">[E6]</loc>These
1837are called <term>entities</term>; they all have <term>content</term> and are
1838all (except for the <termref def="dt-docent">document entity</termref> and
1839the <termref def="dt-doctype">external DTD subset</termref>) identified by
1840entity <term>name</term></phrase>.</termdef> Each XML document has one entity
1841called the <termref def="dt-docent">document entity</termref>, which serves
1842as the starting point for the <termref def="dt-xml-proc">XML processor</termref>
1843and may contain the whole document.</p>
1844<p>Entities may be either parsed or unparsed. <termdef id="dt-parsedent" term="Text Entity">A <term>parsed
1845entity's</term> contents are referred to as its <termref def="dt-repltext">replacement
1846text</termref>; this <termref def="dt-text">text</termref> is considered an
1847integral part of the document.</termdef></p>
1848<p><termdef id="dt-unparsed" term="Unparsed Entity">An <term>unparsed entity</term>
1849is a resource whose contents may or may not be <termref def="dt-text">text</termref>,
1850and if text, <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E25">[E25]</loc>may
1851be other than</phrase> XML. Each unparsed entity has an associated <termref
1852def="dt-notation">notation</termref>, identified by name. Beyond a requirement
1853that an XML processor make the identifiers for the entity and notation available
1854to the application, XML places no constraints on the contents of unparsed
1855entities.</termdef></p>
1856<p>Parsed entities are invoked by name using entity references; unparsed entities
1857by name, given in the value of <kw>ENTITY</kw> or <kw>ENTITIES</kw> attributes.</p>
1858<p><termdef id="gen-entity" term="general entity"><term>General entities</term>
1859are entities for use within the document content. In this specification, general
1860entities are sometimes referred to with the unqualified term <emph>entity</emph>
1861when this leads to no ambiguity.</termdef> <termdef id="dt-PE" term="Parameter entity"><phrase
1862diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E53">[E53]</loc><term>Parameter
1863entities</term></phrase> are parsed entities for use within the DTD.</termdef>
1864These two types of entities use different forms of reference and are recognized
1865in different contexts. Furthermore, they occupy different namespaces; a parameter
1866entity and a general entity with the same name are two distinct entities.</p>
1867<div2 id="sec-references">
1868<head>Character and Entity References</head>
1869<p><termdef id="dt-charref" term="Character Reference"> A <term>character
1870reference</term> refers to a specific character in the ISO/IEC 10646 character
1871set, for example one not directly accessible from available input devices.</termdef></p>
1872<scrap lang="ebnf">
1873<head>Character Reference</head>
1874<prod id="NT-CharRef">
1875<lhs>CharRef</lhs><rhs>'&amp;#' [0-9]+ ';' </rhs>
1876<rhs>| '&hcro;' [0-9a-fA-F]+ ';'</rhs><wfc def="wf-Legalchar"/>
1877</prod>
1878</scrap>
1879<wfcnote id="wf-Legalchar"><head>Legal Character</head><p>Characters referred
1880to using character references must match the production for <termref def="NT-Char">Char</termref>.</p>
1881</wfcnote>
1882<p>If the character reference begins with <quote><code>&amp;#x</code></quote>,
1883the digits and letters up to the terminating <code>;</code> provide a hexadecimal
1884representation of the character's code point in ISO/IEC 10646. If it begins
1885just with <quote><code>&amp;#</code></quote>, the digits up to the terminating <code>;</code>
1886provide a decimal representation of the character's code point.</p>
1887<p><termdef id="dt-entref" term="Entity Reference">An <term>entity reference</term>
1888refers to the content of a named entity.</termdef> <termdef id="dt-GERef"
1889term="General Entity Reference">References to parsed general entities use
1890ampersand (<code>&amp;</code>) and semicolon (<code>;</code>) as delimiters.</termdef> <termdef
1891id="dt-PERef" term="Parameter-entity reference"> <term>Parameter-entity references</term>
1892use percent-sign (<code>%</code>) and semicolon (<code>;</code>) as delimiters.</termdef></p>
1893<scrap lang="ebnf">
1894<head>Entity Reference</head>
1895<prod id="NT-Reference">
1896<lhs>Reference</lhs><rhs><nt def="NT-EntityRef">EntityRef</nt> | <nt def="NT-CharRef">CharRef</nt></rhs>
1897</prod>
1898<prod id="NT-EntityRef">
1899<lhs>EntityRef</lhs><rhs>'&amp;' <nt def="NT-Name">Name</nt> ';'</rhs><wfc
1900def="wf-entdeclared"/><vc def="vc-entdeclared"/><wfc def="textent"/><wfc def="norecursion"/>
1901</prod>
1902<prod id="NT-PEReference">
1903<lhs>PEReference</lhs><rhs>'%' <nt def="NT-Name">Name</nt> ';'</rhs><vc def="vc-entdeclared"/>
1904<wfc def="norecursion"/><wfc def="indtd"/>
1905</prod>
1906</scrap>
1907<wfcnote id="wf-entdeclared"><head>Entity Declared</head><p>In a document
1908without any DTD, a document with only an internal DTD subset which contains
1909no parameter entity references, or a document with <quote><code>standalone='yes'</code></quote>, <phrase
1910diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E34">[E34]</loc>for
1911an entity reference that does not occur within the external subset or a parameter
1912entity, the <nt def="NT-Name">Name</nt> given in the entity reference must <termref
1913def="dt-match">match</termref> that in an <titleref href="#sec-entity-decl">entity
1914declaration</titleref> that does not occur within the external subset or a
1915parameter entity</phrase>, except that well-formed documents need not declare
1916any of the following entities: &magicents;. <phrase diff="del"><loc role="erratumref"
1917href="http://www.w3.org/XML/xml-19980210-errata#E29">[E29]</loc>The declaration
1918of a parameter entity must precede any reference to it. Similarly, </phrase>The
1919declaration of a general entity must precede any reference to it which appears
1920in a default value in an attribute-list declaration.</p>
1921<p>Note that if entities are declared in the external subset or in external
1922parameter entities, a non-validating processor is <titleref href="#include-if-valid">not
1923obligated to</titleref> read and process their declarations; for such documents,
1924the rule that an entity must be declared is a well-formedness constraint only
1925if <titleref href="#sec-rmd">standalone='yes'</titleref>.</p>
1926</wfcnote>
1927<vcnote id="vc-entdeclared"><head>Entity Declared</head><p>In a document with
1928an external subset or external parameter entities with <quote><code>standalone='no'</code></quote>,
1929the <nt def="NT-Name">Name</nt> given in the entity reference must <termref
1930def="dt-match">match</termref> that in an <titleref href="#sec-entity-decl">entity
1931declaration</titleref>. For interoperability, valid documents should declare
1932the entities &magicents;, in the form specified in <specref ref="sec-predefined-ent"/>.
1933The declaration of a parameter entity must precede any reference to it. Similarly,
1934the declaration of a general entity must precede any <phrase diff="chg"><loc
1935role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E92">[E92]</loc>attribute-list
1936declaration containing a default value with a direct or indirect reference
1937to that general entity.</phrase></p>
1938</vcnote>
1939<!-- FINAL EDIT: is this duplication too clumsy? -->
1940<wfcnote id="textent"><head>Parsed Entity</head><p>An entity reference must
1941not contain the name of an <termref def="dt-unparsed">unparsed entity</termref>.
1942Unparsed entities may be referred to only in <termref def="dt-attrval">attribute
1943values</termref> declared to be of type <kw>ENTITY</kw> or <kw>ENTITIES</kw>.</p>
1944</wfcnote>
1945<wfcnote id="norecursion"><head>No Recursion</head><p>A parsed entity must
1946not contain a recursive reference to itself, either directly or indirectly.</p>
1947</wfcnote>
1948<wfcnote id="indtd"><head>In DTD</head><p>Parameter-entity references may
1949only appear in the <termref def="dt-doctype">DTD</termref>.</p>
1950</wfcnote>
1951<p>Examples of character and entity references:</p>
1952<eg>Type &lt;key>less-than&lt;/key> (&hcro;3C;) to save options.
1953This document was prepared on &amp;docdate; and
1954is classified &amp;security-level;.</eg>
1955<p>Example of a parameter-entity reference:</p>
1956<eg><![CDATA[<!-- declare the parameter entity "ISOLat2"... -->
1957<!ENTITY % ISOLat2
1958         SYSTEM "http://www.xml.com/iso/isolat2-xml.entities" >
1959<!-- ... now reference it. -->
1960%ISOLat2;]]></eg>
1961</div2>
1962<div2 id="sec-entity-decl">
1963<head>Entity Declarations</head>
1964<p><termdef id="dt-entdecl" term="entity declaration"> Entities are declared
1965thus:</termdef></p>
1966<scrap lang="ebnf">
1967<head>Entity Declaration</head>
1968<prodgroup pcw2="5" pcw4="18.5">
1969<prod id="NT-EntityDecl">
1970<lhs>EntityDecl</lhs><rhs><nt def="NT-GEDecl">GEDecl</nt><!--</rhs><com>General entities</com>
1971<rhs>--> | <nt def="NT-PEDecl">PEDecl</nt></rhs>
1972<!--<com>Parameter entities</com>-->
1973</prod>
1974<prod id="NT-GEDecl">
1975<lhs>GEDecl</lhs><rhs>'&lt;!ENTITY' <nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt> <nt
1976def="NT-S">S</nt> <nt def="NT-EntityDef">EntityDef</nt> <nt def="NT-S">S</nt>?
1977'>'</rhs>
1978</prod>
1979<prod id="NT-PEDecl">
1980<lhs>PEDecl</lhs><rhs>'&lt;!ENTITY' <nt def="NT-S">S</nt> '%' <nt def="NT-S">S</nt> <nt
1981def="NT-Name">Name</nt> <nt def="NT-S">S</nt> <nt def="NT-PEDef">PEDef</nt> <nt
1982def="NT-S">S</nt>? '>'</rhs>
1983<!--<com>Parameter entities</com>-->
1984</prod>
1985<prod id="NT-EntityDef">
1986<lhs>EntityDef</lhs><rhs><nt def="NT-EntityValue">EntityValue</nt> <!--</rhs>
1987<rhs>-->| (<nt def="NT-ExternalID">ExternalID</nt> <nt def="NT-NDataDecl">NDataDecl</nt>?)</rhs>
1988<!-- <nt def='NT-ExternalDef'>ExternalDef</nt></rhs> -->
1989</prod>
1990<!-- FINAL EDIT: what happened to WFs here? -->
1991<prod id="NT-PEDef">
1992<lhs>PEDef</lhs><rhs><nt def="NT-EntityValue">EntityValue</nt> | <nt def="NT-ExternalID">ExternalID</nt></rhs>
1993</prod>
1994</prodgroup></scrap>
1995<p>The <nt def="NT-Name">Name</nt> identifies the entity in an <termref def="dt-entref">entity
1996reference</termref> or, in the case of an unparsed entity, in the value of
1997an <kw>ENTITY</kw> or <kw>ENTITIES</kw> attribute. If the same entity is declared
1998more than once, the first declaration encountered is binding; at user option,
1999an XML processor may issue a warning if entities are declared multiple times.</p>
2000<div3 id="sec-internal-ent">
2001<head>Internal Entities</head>
2002<p><termdef id="dt-internent" term="Internal Entity Replacement Text">If the
2003entity definition is an <nt def="NT-EntityValue">EntityValue</nt>, the defined
2004entity is called an <term>internal entity</term>. There is no separate physical
2005storage object, and the content of the entity is given in the declaration.</termdef>
2006Note that some processing of entity and character references in the <termref
2007def="dt-litentval">literal entity value</termref> may be required to produce
2008the correct <termref def="dt-repltext">replacement text</termref>: see <specref
2009ref="intern-replacement"/>.</p>
2010<p>An internal entity is a <termref def="dt-parsedent">parsed entity</termref>.</p>
2011<p>Example of an internal entity declaration:</p>
2012<eg>&lt;!ENTITY Pub-Status "This is a pre-release of the
2013 specification."></eg>
2014</div3>
2015<div3 id="sec-external-ent">
2016<head>External Entities</head>
2017<p><termdef id="dt-extent" term="External Entity">If the entity is not internal,
2018it is an <term>external entity</term>, declared as follows:</termdef></p>
2019<scrap lang="ebnf">
2020<head>External Entity Declaration</head>
2021<!--
2022<prod id='NT-ExternalDef'><lhs>ExternalDef</lhs>
2023<rhs></prod> -->
2024<prod id="NT-ExternalID">
2025<lhs>ExternalID</lhs><rhs>'SYSTEM' <nt def="NT-S">S</nt> <nt def="NT-SystemLiteral">SystemLiteral</nt></rhs>
2026<rhs>| 'PUBLIC' <nt def="NT-S">S</nt> <nt def="NT-PubidLiteral">PubidLiteral</nt> <nt
2027def="NT-S">S</nt> <nt def="NT-SystemLiteral">SystemLiteral</nt> </rhs>
2028</prod>
2029<prod id="NT-NDataDecl">
2030<lhs>NDataDecl</lhs><rhs><nt def="NT-S">S</nt> 'NDATA' <nt def="NT-S">S</nt> <nt
2031def="NT-Name">Name</nt></rhs><vc def="not-declared"/>
2032</prod>
2033</scrap>
2034<p>If the <nt def="NT-NDataDecl">NDataDecl</nt> is present, this is a general <termref
2035def="dt-unparsed">unparsed entity</termref>; otherwise it is a parsed entity.</p>
2036<vcnote id="not-declared"><head>Notation Declared</head><p>The <nt def="NT-Name">Name</nt>
2037must match the declared name of a <termref def="dt-notation">notation</termref>.</p>
2038</vcnote>
2039<p><phrase diff="chg"><termdef id="dt-sysid" term="System Identifier">The <nt
2040def="NT-SystemLiteral">SystemLiteral</nt> is called the entity's <term>system
2041identifier</term>. It is a <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E88">[E88]</loc>URI
2042reference</phrase><phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E66">[E66]</loc>
2043(as defined in <bibref ref="rfc2396"/>, updated by <bibref ref="rfc2732"/>)</phrase>, <loc
2044role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E76">[E76]</loc>meant
2045to be dereferenced to obtain input for the XML processor to construct the
2046entity's replacement text.</termdef> It is an error for a fragment identifier
2047(beginning with a <code>#</code> character) to be part of a system identifier.</phrase>
2048Unless otherwise provided by information outside the scope of this specification
2049(e.g. a special XML element type defined by a particular DTD, or a processing
2050instruction defined by a particular application specification), relative URIs
2051are relative to the location of the resource within which the entity declaration
2052occurs. A URI might thus be relative to the <termref def="dt-docent">document
2053entity</termref>, to the entity containing the <termref def="dt-doctype">external
2054DTD subset</termref>, or to some other <termref def="dt-extent">external parameter
2055entity</termref>.</p>
2056<p diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E78">[E78]</loc>URI
2057references require encoding and escaping of certain characters. The disallowed
2058characters include all non-ASCII characters, plus the excluded characters
2059listed in Section 2.4 of <bibref ref="rfc2396"/>, except for the number sign
2060(<code>#</code>) and percent sign (<code>%</code>) characters and the square
2061bracket characters re-allowed in <bibref ref="rfc2732"/>. Disallowed characters
2062must be escaped as follows:</p>
2063<olist diff="add">
2064<item><p>Each disallowed character is converted to UTF-8 <bibref ref="rfc2279"/>
2065as one or more bytes.</p></item>
2066<item><p>Any octets corresponding to a disallowed character are escaped with
2067the URI escaping mechanism (that is, converted to <code>%</code><var>HH</var>,
2068where HH is the hexadecimal notation of the byte value).</p></item>
2069<item><p>The original character is replaced by the resulting character sequence.</p>
2070</item>
2071</olist>
2072<p><termdef id="dt-pubid" term="Public identifier"> In addition to a system
2073identifier, an external identifier may include a <term>public identifier</term>.</termdef>
2074An XML processor attempting to retrieve the entity's content may use the public
2075identifier to try to generate an alternative <phrase diff="chg"><loc role="erratumref"
2076href="http://www.w3.org/XML/xml-19980210-errata#E88">[E88]</loc>URI reference</phrase>.
2077If the processor is unable to do so, it must use the <phrase diff="chg"><loc
2078role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E88">[E88]</loc>URI
2079reference</phrase> specified in the system literal. Before a match is attempted,
2080all strings of white space in the public identifier must be normalized to
2081single space characters (#x20), and leading and trailing white space must
2082be removed.</p>
2083<p>Examples of external entity declarations:</p>
2084<eg>&lt;!ENTITY open-hatch
2085         SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml">
2086&lt;!ENTITY open-hatch
2087         PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN"
2088         "http://www.textuality.com/boilerplate/OpenHatch.xml">
2089&lt;!ENTITY hatch-pic
2090         SYSTEM "/grafix/OpenHatch.gif"
2091         NDATA gif ></eg>
2092</div3>
2093</div2>
2094<div2 id="TextEntities">
2095<head>Parsed Entities</head>
2096<div3 id="sec-TextDecl">
2097<head>The Text Declaration</head>
2098<p>External parsed entities <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E107">[E107]</loc>should</phrase
2099> each begin with a <term>text declaration</term>.</p>
2100<scrap lang="ebnf">
2101<head>Text Declaration</head>
2102<prodgroup pcw4="12.5" pcw5="13">
2103<prod id="NT-TextDecl">
2104<lhs>TextDecl</lhs><rhs>&pio; <nt def="NT-VersionInfo">VersionInfo</nt>? <nt
2105def="NT-EncodingDecl">EncodingDecl</nt> <nt def="NT-S">S</nt>? &pic;</rhs>
2106</prod>
2107</prodgroup></scrap>
2108<p>The text declaration must be provided literally, not by reference to a
2109parsed entity. No text declaration may appear at any position other than the
2110beginning of an external parsed entity. <phrase diff="add"><loc role="erratumref"
2111href="http://www.w3.org/XML/xml-19980210-errata#E94">[E94]</loc>The text declaration
2112in an external parsed entity is not considered part of its <termref def="dt-repltext">replacement
2113text</termref>.</phrase></p>
2114</div3>
2115<div3 id="wf-entities">
2116<head>Well-Formed Parsed Entities</head>
2117<p>The document entity is well-formed if it matches the production labeled <nt
2118def="NT-document">document</nt>. An external general parsed entity is well-formed
2119if it matches the production labeled <nt def="NT-extParsedEnt">extParsedEnt</nt>. <phrase
2120diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc>All
2121external parameter entities are well-formed by definition.</phrase></p>
2122<scrap lang="ebnf">
2123<head>Well-Formed External Parsed Entity</head>
2124<prod id="NT-extParsedEnt">
2125<lhs>extParsedEnt</lhs><rhs><nt def="NT-TextDecl">TextDecl</nt>? <nt def="NT-content">content</nt></rhs>
2126</prod>
2127<prod id="NT-extPE" diff="del">
2128<lhs>extPE</lhs><rhs><nt def="NT-TextDecl">TextDecl</nt>? <nt def="NT-extSubsetDecl">extSubsetDecl</nt></rhs>
2129<com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc></com>
2130</prod>
2131</scrap>
2132<p>An internal general parsed entity is well-formed if its replacement text
2133matches the production labeled <nt def="NT-content">content</nt>. All internal
2134parameter entities are well-formed by definition.</p>
2135<p>A consequence of well-formedness in entities is that the logical and physical
2136structures in an XML document are properly nested; no <termref def="dt-stag">start-tag</termref>, <termref
2137def="dt-etag">end-tag</termref>, <termref def="dt-empty">empty-element tag</termref>, <termref
2138def="dt-element">element</termref>, <termref def="dt-comment">comment</termref>, <termref
2139def="dt-pi">processing instruction</termref>, <termref def="dt-charref">character
2140reference</termref>, or <termref def="dt-entref">entity reference</termref>
2141can begin in one entity and end in another.</p>
2142</div3>
2143<div3 id="charencoding">
2144<head>Character Encoding in Entities</head>
2145<p>Each external parsed entity in an XML document may use a different encoding
2146for its characters. All XML processors must be able to read entities in <phrase
2147diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E56">[E56]</loc>both
2148the UTF-8 and UTF-16 encodings.</phrase> <phrase diff="add"><loc role="erratumref"
2149href="http://www.w3.org/XML/xml-19980210-errata#E77">[E77]</loc>The terms <quote>UTF-8</quote>
2150and <quote>UTF-16</quote> in this specification do not apply to character
2151encodings with any other labels, even if the encodings or labels are very
2152similar to UTF-8 or UTF-16.</phrase></p>
2153<p>Entities encoded in UTF-16 must begin with the Byte Order Mark described
2154by <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc>Annex
2155F of <bibref ref="ISO10646"/>, Annex H of <bibref ref="ISO10646-2000"/>, section
21562.4 of <bibref ref="Unicode"/>, and section 2.7 of <bibref ref="Unicode3"/></phrase>
2157(the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an encoding signature,
2158not part of either the markup or the character data of the XML document. XML
2159processors must be able to use this character to differentiate between UTF-8
2160and UTF-16 encoded documents.</p>
2161<p>Although an XML processor is required to read only entities in the UTF-8
2162and UTF-16 encodings, it is recognized that other encodings are used around
2163the world, and it may be desired for XML processors to read entities that
2164use them. <phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E47">[E47]</loc>In
2165the absence of external character encoding information (such as MIME headers),</phrase>
2166parsed entities which are stored in an encoding other than UTF-8 or UTF-16
2167must begin with a text declaration <phrase diff="add">(see <specref ref="sec-TextDecl"/>) </phrase>containing
2168an encoding declaration:</p>
2169<scrap lang="ebnf">
2170<head>Encoding Declaration</head>
2171<prod id="NT-EncodingDecl">
2172<lhs>EncodingDecl</lhs><rhs><nt def="NT-S">S</nt> 'encoding' <nt def="NT-Eq">Eq</nt>
2173('"' <nt def="NT-EncName">EncName</nt> '"' | "'" <nt def="NT-EncName">EncName</nt>
2174"'" ) </rhs>
2175</prod>
2176<prod id="NT-EncName">
2177<lhs>EncName</lhs><rhs>[A-Za-z] ([A-Za-z0-9._] | '-')*</rhs><com>Encoding
2178name contains only Latin characters</com>
2179</prod>
2180</scrap>
2181<p>In the <termref def="dt-docent">document entity</termref>, the encoding
2182declaration is part of the <termref def="dt-xmldecl">XML declaration</termref>.
2183The <nt def="NT-EncName">EncName</nt> is the name of the encoding used.</p>
2184<!-- FINAL EDIT: check name of IANA and charset names -->
2185<p>In an encoding declaration, the values <quote><code>UTF-8</code></quote>, <quote><code>UTF-16</code></quote>, <quote><code>ISO-10646-UCS-2</code
2186></quote>, and <quote><code>ISO-10646-UCS-4</code></quote> should be used
2187for the various encodings and transformations of Unicode / ISO/IEC 10646,
2188the values <quote><code>ISO-8859-1</code></quote>, <quote><code>ISO-8859-2</code></quote>,
2189... <loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E106">[E106]</loc><phrase
2190diff="chg"><quote><code>ISO-8859-</code><var>n</var></quote> (where <var>n</var>
2191is the part number)</phrase> should be used for the parts of ISO 8859, and
2192the values <quote><code>ISO-2022-JP</code></quote>, <quote><code>Shift_JIS</code></quote>,
2193and <quote><code>EUC-JP</code></quote> should be used for the various encoded
2194forms of JIS X-0208-1997. <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E57">[E57]</loc>It
2195is recommended that character encodings registered (as <emph>charset</emph>s)
2196with the Internet Assigned Numbers Authority <phrase diff="chg"><loc role="erratumref"
2197href="http://www.w3.org/XML/xml-19980210-errata#E58">[E58]</loc><bibref ref="IANA"/></phrase>,
2198other than those just listed, be referred to using their registered names;
2199other encodings should use names starting with an <quote>x-</quote> prefix.
2200XML processors should match character encoding names in a case-insensitive
2201way and should either interpret an IANA-registered name as the encoding registered
2202at IANA for that name or treat it as unknown (processors are, of course, not
2203required to support all IANA-registered encodings).</phrase></p>
2204<p>In the absence of information provided by an external transport protocol
2205(e.g. HTTP or MIME), it is an <termref def="dt-error">error</termref> for
2206an entity including an encoding declaration to be presented to the XML processor
2207in an encoding other than that named in the declaration, <phrase diff="del"><loc
2208role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E5">[E5]</loc>for
2209an encoding declaration to occur other than at the beginning of an external
2210entity, </phrase>or for an entity which begins with neither a Byte Order Mark
2211nor an encoding declaration to use an encoding other than UTF-8. Note that
2212since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly
2213need an encoding declaration.</p>
2214<p diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E5">[E5]</loc>It
2215is <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E36">[E36]</loc>a
2216fatal</phrase> error for a <nt def="NT-TextDecl">TextDecl</nt> to occur other
2217than at the beginning of an external entity.</p>
2218<p>It is a <termref def="dt-fatal">fatal error</termref> when an XML processor
2219encounters an entity with an encoding that it is unable to process. <phrase
2220diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E79">[E79]</loc>It
2221is a fatal error if an XML entity is determined (via default, encoding declaration,
2222or higher-level protocol) to be in a certain encoding but contains octet sequences
2223that are not legal in that encoding. It is also a fatal error if an XML entity
2224contains no encoding declaration and its content is not legal UTF-8 or UTF-16.</phrase></p>
2225<p>Examples of <phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E23">[E23]</loc>text
2226declarations containing </phrase>encoding declarations:</p>
2227<eg>&lt;?xml encoding='UTF-8'?>
2228&lt;?xml encoding='EUC-JP'?></eg>
2229</div3>
2230</div2>
2231<div2 id="entproc">
2232<head>XML Processor Treatment of Entities and References</head>
2233<p>The table below summarizes the contexts in which character references,
2234entity references, and invocations of unparsed entities might appear and the
2235required behavior of an <termref def="dt-xml-proc">XML processor</termref>
2236in each case. The labels in the leftmost column describe the recognition context: <glist>
2237<gitem><label>Reference in Content</label>
2238<def>
2239<p>as a reference anywhere after the <termref def="dt-stag">start-tag</termref>
2240and before the <termref def="dt-etag">end-tag</termref> of an element; corresponds
2241to the nonterminal <nt def="NT-content">content</nt>.</p>
2242</def></gitem>
2243<gitem><label>Reference in Attribute Value</label>
2244<def>
2245<p>as a reference within either the value of an attribute in a <termref def="dt-stag">start-tag</termref>,
2246or a default value in an <termref def="dt-attdecl">attribute declaration</termref>;
2247corresponds to the nonterminal <nt def="NT-AttValue">AttValue</nt>.</p>
2248</def></gitem>
2249<gitem><label>Occurs as Attribute Value</label>
2250<def>
2251<p>as a <nt def="NT-Name">Name</nt>, not a reference, appearing either as
2252the value of an attribute which has been declared as type <kw>ENTITY</kw>,
2253or as one of the space-separated tokens in the value of an attribute which
2254has been declared as type <kw>ENTITIES</kw>.</p>
2255</def></gitem>
2256<gitem><label>Reference in Entity Value</label>
2257<def>
2258<p>as a reference within a parameter or internal entity's <termref def="dt-litentval">literal
2259entity value</termref> in the entity's declaration; corresponds to the nonterminal <nt
2260def="NT-EntityValue">EntityValue</nt>.</p>
2261</def></gitem>
2262<gitem><label>Reference in DTD</label>
2263<def>
2264<p diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E90">[E90]</loc>as
2265a reference within either the internal or external subsets of the <termref
2266def="dt-doctype">DTD</termref>, but outside of an <nt def="NT-EntityValue">EntityValue</nt>, <nt
2267def="NT-AttValue">AttValue</nt>, <nt def="NT-PI">PI</nt>, <nt def="NT-Comment">Comment</nt>, <nt
2268def="NT-SystemLiteral">SystemLiteral</nt>, <nt def="NT-PubidLiteral">PubidLiteral</nt>,
2269or the contents of an ignored conditional section (see <specref ref="sec-condition-sect"/>).</p>
2270<p>.</p>
2271</def></gitem>
2272</glist></p>
2273<table border="1" frame="border" cellpadding="7"><tbody align="center"><tr>
2274<td rowspan="2" colspan="1"></td><td colspan="4" align="center" valign="bottom">Entity
2275Type</td><td rowspan="2" align="center">Character</td></tr><tr align="center"
2276valign="bottom"><td>Parameter</td><td>Internal General</td><td>External Parsed
2277General</td><td>Unparsed</td></tr><tr align="center" valign="middle"><td align="right">Reference
2278in Content</td><td><titleref href="#not-recognized">Not recognized</titleref></td>
2279<td><titleref href="#included">Included</titleref></td><td><titleref href="#include-if-valid">Included
2280if validating</titleref></td><td><titleref href="#forbidden">Forbidden</titleref></td>
2281<td><titleref href="#included">Included</titleref></td></tr><tr align="center"
2282valign="middle"><td align="right">Reference in Attribute Value</td><td><titleref
2283href="#not-recognized">Not recognized</titleref></td><td><titleref href="#inliteral">Included
2284in literal</titleref></td><td><titleref href="#forbidden">Forbidden</titleref></td>
2285<td><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E51">[E51]</loc><titleref
2286diff="chg" href="#forbidden">Forbidden</titleref></td><td><titleref href="#included">Included</titleref></td>
2287</tr><tr align="center" valign="middle"><td align="right">Occurs as Attribute
2288Value</td><td><titleref href="#not-recognized">Not recognized</titleref></td>
2289<td><titleref href="#forbidden">Forbidden</titleref></td><td><loc role="erratumref"
2290href="http://www.w3.org/XML/xml-19980210-errata#E51">[E51]</loc><titleref
2291diff="chg" href="#forbidden">Forbidden</titleref></td><td><titleref href="#notify">Notify</titleref></td>
2292<td><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E51">[E51]</loc><titleref
2293diff="chg" href="#not-recognized">Not recognized</titleref></td></tr><tr align="center"
2294valign="middle"><td align="right">Reference in EntityValue</td><td><titleref
2295href="#inliteral">Included in literal</titleref></td><td><titleref href="#bypass">Bypassed</titleref></td>
2296<td><titleref href="#bypass">Bypassed</titleref></td><td><titleref href="#forbidden">Forbidden</titleref></td>
2297<td><titleref href="#included">Included</titleref></td></tr><tr align="center"
2298valign="middle"><td align="right">Reference in DTD</td><td><titleref href="#as-PE">Included
2299as PE</titleref></td><td><titleref href="#forbidden">Forbidden</titleref></td>
2300<td><titleref href="#forbidden">Forbidden</titleref></td><td><titleref href="#forbidden">Forbidden</titleref></td>
2301<td><titleref href="#forbidden">Forbidden</titleref></td></tr></tbody></table>
2302<div3 id="not-recognized">
2303<head>Not Recognized</head>
2304<p>Outside the DTD, the <code>%</code> character has no special significance;
2305thus, what would be parameter entity references in the DTD are not recognized
2306as markup in <nt def="NT-content">content</nt>. Similarly, the names of unparsed
2307entities are not recognized except when they appear in the value of an appropriately
2308declared attribute.</p>
2309</div3>
2310<div3 id="included">
2311<head>Included</head>
2312<p><termdef id="dt-include" term="Include">An entity is <term>included</term>
2313when its <termref def="dt-repltext">replacement text</termref> is retrieved
2314and processed, in place of the reference itself, as though it were part of
2315the document at the location the reference was recognized.</termdef> The replacement
2316text may contain both <termref def="dt-chardata">character data</termref>
2317and (except for parameter entities) <termref def="dt-markup">markup</termref>,
2318which must be recognized in the usual way<phrase diff="del"><loc role="erratumref"
2319href="http://www.w3.org/XML/xml-19980210-errata#E65">[E65]</loc>, except that
2320the replacement text of entities used to escape markup delimiters (the entities &magicents;)
2321is always treated as data</phrase>. (The string <quote><code>AT&amp;amp;T;</code></quote>
2322expands to <quote><code>AT&amp;T;</code></quote> and the remaining ampersand
2323is not recognized as an entity-reference delimiter.) A character reference
2324is <term>included</term> when the indicated character is processed in place
2325of the reference itself. </p>
2326</div3>
2327<div3 id="include-if-valid">
2328<head>Included If Validating</head>
2329<p>When an XML processor recognizes a reference to a parsed entity, in order
2330to <termref def="dt-valid">validate</termref> the document, the processor
2331must <termref def="dt-include">include</termref> its replacement text. If
2332the entity is external, and the processor is not attempting to validate the
2333XML document, the processor <termref def="dt-may">may</termref>, but need
2334not, include the entity's replacement text. If a non-validating <phrase diff="chg"><loc
2335role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E95">[E95]</loc>processor</phrase>
2336does not include the replacement text, it must inform the application that
2337it recognized, but did not read, the entity.</p>
2338<p>This rule is based on the recognition that the automatic inclusion provided
2339by the SGML and XML entity mechanism, primarily designed to support modularity
2340in authoring, is not necessarily appropriate for other applications, in particular
2341document browsing. Browsers, for example, when encountering an external parsed
2342entity reference, might choose to provide a visual indication of the entity's
2343presence and retrieve it for display only on demand.</p>
2344</div3>
2345<div3 id="forbidden">
2346<head>Forbidden</head>
2347<p>The following are forbidden, and constitute <termref def="dt-fatal">fatal</termref>
2348errors:</p>
2349<ulist>
2350<item><p>the appearance of a reference to an <termref def="dt-unparsed">unparsed
2351entity</termref>.</p></item>
2352<item><p>the appearance of any character or general-entity reference in the
2353DTD except within an <nt def="NT-EntityValue">EntityValue</nt> or <nt def="NT-AttValue">AttValue</nt>.</p>
2354</item>
2355<item><p>a reference to an external entity in an attribute value.</p></item>
2356</ulist>
2357</div3>
2358<div3 id="inliteral">
2359<head>Included in Literal</head>
2360<p>When an <termref def="dt-entref">entity reference</termref> appears in
2361an attribute value, or a parameter entity reference appears in a literal entity
2362value, its <termref def="dt-repltext">replacement text</termref> is processed
2363in place of the reference itself as though it were part of the document at
2364the location the reference was recognized, except that a single or double
2365quote character in the replacement text is always treated as a normal data
2366character and will not terminate the literal. For example, this is well-formed:</p>
2367<eg diff="chg">&lt;!-- <loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E4">[E4]</loc> -->
2368<![CDATA[<!ENTITY % YN '"Yes"' >
2369<!ENTITY WhatHeSaid "He said %YN;" >]]></eg>
2370<p>while this is not:</p>
2371<eg>&lt;!ENTITY EndAttr "27'" >
2372&lt;element attribute='a-&amp;EndAttr;></eg>
2373</div3>
2374<div3 id="notify">
2375<head>Notify</head>
2376<p>When the name of an <termref def="dt-unparsed">unparsed entity</termref>
2377appears as a token in the value of an attribute of declared type <kw>ENTITY</kw>
2378or <kw>ENTITIES</kw>, a validating processor must inform the application of
2379the <termref def="dt-sysid">system</termref> and <termref def="dt-pubid">public</termref>
2380(if any) identifiers for both the entity and its associated <termref def="dt-notation">notation</termref>.</p>
2381</div3>
2382<div3 id="bypass">
2383<head>Bypassed</head>
2384<p>When a general entity reference appears in the <nt def="NT-EntityValue">EntityValue</nt>
2385in an entity declaration, it is bypassed and left as is.</p>
2386</div3>
2387<div3 id="as-PE">
2388<head>Included as PE</head>
2389<p>Just as with external parsed entities, parameter entities need only be <titleref
2390href="#include-if-valid">included if validating</titleref>. When a parameter-entity
2391reference is recognized in the DTD and included, its <termref def="dt-repltext">replacement
2392text</termref> is enlarged by the attachment of one leading and one following
2393space (#x20) character; the intent is to constrain the replacement text of
2394parameter entities to contain an integral number of grammatical tokens in
2395the DTD. <phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E96">[E96]</loc>This
2396behavior does not apply to parameter entity references within entity values;
2397these are described in <specref ref="inliteral"/>.</phrase></p>
2398</div3>
2399</div2>
2400<div2 id="intern-replacement">
2401<head>Construction of Internal Entity Replacement Text</head>
2402<p>In discussing the treatment of internal entities, it is useful to distinguish
2403two forms of the entity's value. <termdef id="dt-litentval" term="Literal Entity Value">The <term>literal
2404entity value</term> is the quoted string actually present in the entity declaration,
2405corresponding to the non-terminal <nt def="NT-EntityValue">EntityValue</nt>.</termdef> <termdef
2406id="dt-repltext" term="Replacement Text">The <term>replacement text</term>
2407is the content of the entity, after replacement of character references and
2408parameter-entity references.</termdef></p>
2409<p>The literal entity value as given in an internal entity declaration (<nt
2410def="NT-EntityValue">EntityValue</nt>) may contain character, parameter-entity,
2411and general-entity references. Such references must be contained entirely
2412within the literal entity value. The actual replacement text that is <termref
2413def="dt-include">included</termref> as described above must contain the <emph>replacement
2414text</emph> of any parameter entities referred to, and must contain the character
2415referred to, in place of any character references in the literal entity value;
2416however, general-entity references must be left as-is, unexpanded. For example,
2417given the following declarations:</p>
2418<eg><![CDATA[<!ENTITY % pub    "&#xc9;ditions Gallimard" >
2419<!ENTITY   rights "All rights reserved" >
2420<!ENTITY   book   "La Peste: Albert Camus,
2421&#xA9; 1947 %pub;. &rights;" >]]></eg>
2422<p>then the replacement text for the entity <quote><code>book</code></quote>
2423is:</p>
2424<eg>La Peste: Albert Camus,
2425� 1947 �ditions Gallimard. &amp;rights;</eg>
2426<p>The general-entity reference <quote><code>&amp;rights;</code></quote> would
2427be expanded should the reference <quote><code>&amp;book;</code></quote> appear
2428in the document's content or an attribute value.</p>
2429<p>These simple rules may have complex interactions; for a detailed discussion
2430of a difficult example, see <specref ref="sec-entexpand"/>.</p>
2431</div2>
2432<div2 id="sec-predefined-ent">
2433<head>Predefined Entities</head>
2434<p><termdef id="dt-escape" term="escape">Entity and character references can
2435both be used to <term>escape</term> the left angle bracket, ampersand, and
2436other delimiters. A set of general entities (&magicents;) is specified for
2437this purpose. Numeric character references may also be used; they are expanded
2438immediately when recognized and must be treated as character data, so the
2439numeric character references <quote><code>&amp;#60;</code></quote> and <quote><code>&amp;#38;</code></quote>
2440may be used to escape <code>&lt;</code> and <code>&amp;</code> when they occur
2441in character data.</termdef></p>
2442<p>All XML processors must recognize these entities whether they are declared
2443or not. <termref def="dt-interop">For interoperability</termref>, valid XML
2444documents should declare these entities, like any others, before using them. <phrase
2445diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E80">[E80]</loc>If
2446the entities <code>lt</code> or <code>amp</code> are declared, they must be
2447declared as internal entities whose replacement text is a character reference
2448to the <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E103">[E103]</loc>respective
2449character (less-than sign or ampersand)</phrase> being escaped; the double
2450escaping is required for these entities so that references to them produce
2451a well-formed result. If the entities <code>gt</code>, <code>apos</code>,
2452or <code>quot</code> are declared, they must be declared as internal entities
2453whose replacement text is the single character being escaped (or a character
2454reference to that character; the double escaping here is unnecessary but harmless).
2455For example:</phrase></p>
2456<eg><![CDATA[<!ENTITY lt     "&#38;#60;">
2457<!ENTITY gt     "&#62;">
2458<!ENTITY amp    "&#38;#38;">
2459<!ENTITY apos   "&#39;">
2460<!ENTITY quot   "&#34;">]]></eg>
2461<p diff="del">Note that the <code>&lt;</code> and <code>&amp;</code> characters
2462in the declarations of <quote><code>lt</code></quote> and <quote><code>amp</code></quote>
2463are doubly escaped to meet the requirement that entity replacement be well-formed.</p>
2464</div2>
2465<div2 id="Notations">
2466<head>Notation Declarations</head>
2467<p><termdef id="dt-notation" term="Notation"><term>Notations</term> identify
2468by name the format of <termref def="dt-extent">unparsed entities</termref>,
2469the format of elements which bear a notation attribute, or the application
2470to which a <termref def="dt-pi">processing instruction</termref> is addressed.</termdef></p>
2471<p><termdef id="dt-notdecl" term="Notation Declaration"> <term>Notation declarations</term>
2472provide a name for the notation, for use in entity and attribute-list declarations
2473and in attribute specifications, and an external identifier for the notation
2474which may allow an XML processor or its client application to locate a helper
2475application capable of processing data in the given notation.</termdef></p>
2476<scrap lang="ebnf">
2477<head>Notation Declarations</head>
2478<prod id="NT-NotationDecl">
2479<lhs>NotationDecl</lhs><rhs>'&lt;!NOTATION' <nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt> <nt
2480def="NT-S">S</nt> (<nt def="NT-ExternalID">ExternalID</nt> | <nt def="NT-PublicID">PublicID</nt>) <nt
2481def="NT-S">S</nt>? '>'</rhs><vc def="UniqueNotationName" diff="add"/>
2482</prod>
2483<prod id="NT-PublicID">
2484<lhs>PublicID</lhs><rhs>'PUBLIC' <nt def="NT-S">S</nt> <nt def="NT-PubidLiteral">PubidLiteral</nt> </rhs>
2485</prod>
2486</scrap>
2487<vcnote id="UniqueNotationName" diff="add"><head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E22">[E22]</loc>Unique
2488Notation Name</head><p>Only one notation declaration can declare a given <nt
2489def="NT-Name">Name</nt>.</p>
2490</vcnote>
2491<p>XML processors must provide applications with the name and external identifier(s)
2492of any notation declared and referred to in an attribute value, attribute
2493definition, or entity declaration. They may additionally resolve the external
2494identifier into the <termref def="dt-sysid">system identifier</termref>, file
2495name, or other information needed to allow the application to call a processor
2496for data in the notation described. (It is not an error, however, for XML
2497documents to declare and refer to notations for which notation-specific applications
2498are not available on the system where the XML processor or application is
2499running.)</p>
2500</div2>
2501<div2 id="sec-doc-entity">
2502<head>Document Entity</head>
2503<p><termdef id="dt-docent" term="Document Entity">The <term>document entity</term>
2504serves as the root of the entity tree and a starting-point for an <termref
2505def="dt-xml-proc">XML processor</termref>.</termdef> This specification does
2506not specify how the document entity is to be located by an XML processor;
2507unlike other entities, the document entity has no name and might well appear
2508on a processor input stream without any identification at all.</p>
2509</div2>
2510</div1>
2511<!-- &Conformance; -->
2512<div1 id="sec-conformance">
2513<head>Conformance</head>
2514<div2 id="proc-types">
2515<head>Validating and Non-Validating Processors</head>
2516<p>Conforming <termref def="dt-xml-proc">XML processors</termref> fall into
2517two classes: validating and non-validating.</p>
2518<p>Validating and non-validating processors alike must report violations of
2519this specification's well-formedness constraints in the content of the <termref
2520def="dt-docent">document entity</termref> and any other <termref def="dt-parsedent">parsed
2521entities</termref> that they read.</p>
2522<p><termdef id="dt-validating" term="Validating Processor"><term>Validating
2523processors</term> must<phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E21">[E21]</loc>,
2524at user option,</phrase> report violations of the constraints expressed by
2525the declarations in the <termref def="dt-doctype">DTD</termref>, and failures
2526to fulfill the validity constraints given in this specification.</termdef>
2527To accomplish this, validating XML processors must read and process the entire
2528DTD and all external parsed entities referenced in the document.</p>
2529<p>Non-validating processors are required to check only the <termref def="dt-docent">document
2530entity</termref>, including the entire internal DTD subset, for well-formedness. <termdef
2531id="dt-use-mdecl" term="Process Declarations"> While they are not required
2532to check the document for validity, they are required to <term>process</term>
2533all the declarations they read in the internal DTD subset and in any parameter
2534entity that they read, up to the first reference to a parameter entity that
2535they do <emph>not</emph> read; that is to say, they must use the information
2536in those declarations to <titleref href="#AVNormalize">normalize</titleref>
2537attribute values, <titleref href="#included">include</titleref> the replacement
2538text of internal entities, and supply <titleref href="#sec-attr-defaults">default
2539attribute values</titleref>.</termdef> <phrase diff="add"><loc role="erratumref"
2540href="http://www.w3.org/XML/xml-19980210-errata#E33">[E33]</loc>Except when <code>standalone="yes"</code>, </phrase>they
2541must not <termref def="dt-use-mdecl">process</termref> <termref def="dt-entdecl">entity
2542declarations</termref> or <termref def="dt-attdecl">attribute-list declarations</termref>
2543encountered after a reference to a parameter entity that is not read, since
2544the entity may have contained overriding declarations.</p>
2545</div2>
2546<div2 id="safe-behavior">
2547<head>Using XML Processors</head>
2548<p>The behavior of a validating XML processor is highly predictable; it must
2549read every piece of a document and report all well-formedness and validity
2550violations. Less is required of a non-validating processor; it need not read
2551any part of the document other than the document entity. This has two effects
2552that may be important to users of XML processors:</p>
2553<ulist>
2554<item><p>Certain well-formedness errors, specifically those that require reading
2555external entities, may not be detected by a non-validating processor. Examples
2556include the constraints entitled <titleref href="#wf-entdeclared">Entity Declared</titleref>, <titleref
2557href="#textent">Parsed Entity</titleref>, and <titleref href="#norecursion">No
2558Recursion</titleref>, as well as some of the cases described as <titleref
2559href="#forbidden">forbidden</titleref> in <specref ref="entproc"/>.</p></item>
2560<item><p>The information passed from the processor to the application may
2561vary, depending on whether the processor reads parameter and external entities.
2562For example, a non-validating processor may not <titleref href="#AVNormalize">normalize</titleref>
2563attribute values, <titleref href="#included">include</titleref> the replacement
2564text of internal entities, or supply <titleref href="#sec-attr-defaults">default
2565attribute values</titleref>, where doing so depends on having read declarations
2566in external or parameter entities.</p></item>
2567</ulist>
2568<p>For maximum reliability in interoperating between different XML processors,
2569applications which use non-validating processors should not rely on any behaviors
2570not required of such processors. Applications which require facilities such
2571as the use of default attributes or internal entities which are declared in
2572external entities should use validating XML processors.</p>
2573</div2>
2574</div1>
2575<div1 id="sec-notation">
2576<head>Notation</head>
2577<p>The formal grammar of XML is given in this specification using a simple
2578Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines
2579one symbol, in the form</p>
2580<eg>symbol ::= expression</eg>
2581<p>Symbols are written with an initial capital letter if they are <phrase
2582diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E42">[E42]</loc>the
2583start symbol of a regular language,</phrase> otherwise with an initial lower
2584case letter. Literal strings are quoted.</p>
2585<p>Within the expression on the right-hand side of a rule, the following expressions
2586are used to match strings of one or more characters: <glist>
2587<gitem><label><code>#xN</code></label>
2588<def>
2589<p>where <code>N</code> is a hexadecimal integer, the expression matches the
2590character in ISO/IEC 10646 whose canonical (UCS-4) code value, when interpreted
2591as an unsigned binary number, has the value indicated. The number of leading
2592zeros in the <code>#xN</code> form is insignificant; the number of leading
2593zeros in the corresponding code value is governed by the character encoding
2594in use and is not significant for XML.</p>
2595</def></gitem>
2596<gitem><label><code>[a-zA-Z]</code>, <code>[#xN-#xN]</code></label>
2597<def>
2598<p>matches any <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E93">[E93]</loc><nt
2599def="NT-Char">Char</nt></phrase> with a value in the range(s) indicated (inclusive).</p>
2600</def></gitem>
2601<gitem diff="add"><label><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E3">[E3]</loc><code>[abc]</code>, <code>[#xN#xN#xN]</code
2602></label>
2603<def>
2604<p>matches any <nt def="NT-Char">Char</nt> with a value among the characters
2605enumerated. Enumerations and ranges can be mixed in one set of brackets.</p>
2606</def></gitem>
2607<gitem><label><code>[^a-z]</code>, <code>[^#xN-#xN]</code></label>
2608<def>
2609<p>matches any <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E93">[E93]</loc><nt
2610def="NT-Char">Char</nt></phrase> with a value <emph>outside</emph> the range
2611indicated.</p>
2612</def></gitem>
2613<gitem><label><code>[^abc]</code>, <code>[^#xN#xN#xN]</code></label>
2614<def>
2615<p>matches any <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E93">[E93]</loc><nt
2616def="NT-Char">Char</nt></phrase> with a value not among the characters given. <phrase
2617diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E3">[E3]</loc>Enumerations
2618and ranges of forbidden values can be mixed in one set of brackets.</phrase></p>
2619</def></gitem>
2620<gitem><label><code>"string"</code></label>
2621<def>
2622<p>matches a literal string <termref def="dt-match">matching</termref> that
2623given inside the double quotes.</p>
2624</def></gitem>
2625<gitem><label><code>'string'</code></label>
2626<def>
2627<p>matches a literal string <termref def="dt-match">matching</termref> that
2628given inside the single quotes.</p>
2629</def></gitem>
2630</glist> These symbols may be combined to match more complex patterns as follows,
2631where <code>A</code> and <code>B</code> represent simple expressions: <glist>
2632<gitem><label>(<code>expression</code>)</label>
2633<def>
2634<p><code>expression</code> is treated as a unit and may be combined as described
2635in this list.</p>
2636</def></gitem>
2637<gitem><label><code>A?</code></label>
2638<def>
2639<p>matches <code>A</code> or nothing; optional <code>A</code>.</p>
2640</def></gitem>
2641<gitem><label><code>A B</code></label>
2642<def>
2643<p>matches <code>A</code> followed by <code>B</code>. <phrase diff="add"><loc
2644role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E20">[E20]</loc>This
2645operator has higher precedence than alternation; thus <code>A B | C D</code>
2646is identical to <code>(A B) | (C D)</code>.</phrase></p>
2647</def></gitem>
2648<gitem><label><code>A | B</code></label>
2649<def>
2650<p>matches <code>A</code> or <code>B</code> but not both.</p>
2651</def></gitem>
2652<gitem><label><code>A - B</code></label>
2653<def>
2654<p>matches any string that matches <code>A</code> but does not match <code>B</code>.</p>
2655</def></gitem>
2656<gitem><label><code>A+</code></label>
2657<def>
2658<p>matches one or more occurrences of <code>A</code>.<phrase diff="add"><loc
2659role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E20">[E20]</loc>Concatenation
2660has higher precedence than alternation; thus <code>A+ | B+</code> is identical
2661to <code>(A+) | (B+)</code>.</phrase></p>
2662</def></gitem>
2663<gitem><label><code>A*</code></label>
2664<def>
2665<p>matches zero or more occurrences of <code>A</code>. <phrase diff="add"><loc
2666role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E20">[E20]</loc>Concatenation
2667has higher precedence than alternation; thus <code>A* | B*</code> is identical
2668to <code>(A*) | (B*)</code>.</phrase></p>
2669</def></gitem>
2670</glist> Other notations used in the productions are: <glist>
2671<gitem><label><code>/* ... */</code></label>
2672<def>
2673<p>comment.</p>
2674</def></gitem>
2675<gitem><label><code>[ wfc: ... ]</code></label>
2676<def>
2677<p>well-formedness constraint; this identifies by name a constraint on <termref
2678def="dt-wellformed">well-formed</termref> documents associated with a production.</p>
2679</def></gitem>
2680<gitem><label><code>[ vc: ... ]</code></label>
2681<def>
2682<p>validity constraint; this identifies by name a constraint on <termref def="dt-valid">valid</termref>
2683documents associated with a production.</p>
2684</def></gitem>
2685</glist></p>
2686</div1>
2687</body><back>
2688<!-- &SGML; -->
2689<!-- &Biblio; -->
2690<div1 id="sec-bibliography">
2691<head>References</head>
2692<div2 id="sec-existing-stds">
2693<head>Normative References</head>
2694<blist>
2695<bibl id="IANA" diff="chg" key="IANA-CHARSETS"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E58">[E58]</loc>(Internet
2696Assigned Numbers Authority) <titleref>Official Names for Character Sets</titleref>,
2697ed. Keld Simonsen et al. See <loc href="ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets">ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets</loc
2698>. </bibl>
2699<bibl id="RFC1766" href="http://www.ietf.org/rfc/rfc1766.txt" key="IETF RFC 1766">IETF
2700(Internet Engineering Task Force). <titleref>RFC 1766: Tags for the Identification
2701of Languages</titleref>, ed. H. Alvestrand. 1995.</bibl>
2702<bibl id="ISO639-old" diff="del" key="ISO 639"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E38">[E38]</loc>
2703(International Organization for Standardization). <titleref>ISO 639:1988 (E).
2704Code for the representation of names of languages.</titleref> [Geneva]: International
2705Organization for Standardization, 1988.</bibl>
2706<bibl id="ISO3166-old" diff="del" key="ISO 3166"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E38">[E38]</loc>
2707(International Organization for Standardization). <titleref>ISO 3166-1:1997
2708(E). Codes for the representation of names of countries and their subdivisions &mdash;
2709Part 1: Country codes</titleref> [Geneva]: International Organization for
2710Standardization, 1997.</bibl>
2711<bibl id="ISO10646" key="ISO/IEC 10646">ISO (International Organization for
2712Standardization). <titleref>ISO/IEC 10646-1993 (E). Information technology &mdash;
2713Universal Multiple-Octet Coded Character Set (UCS) &mdash; Part 1: Architecture
2714and Basic Multilingual Plane.</titleref> [Geneva]: International Organization
2715for Standardization, 1993 (plus amendments AM 1 through AM 7).</bibl>
2716<bibl id="ISO10646-2000" diff="add" key="ISO/IEC 10646-2000"><loc role="erratumref"
2717href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc> ISO (International
2718Organization for Standardization). <titleref>ISO/IEC 10646-1:2000. Information
2719technology &mdash; Universal Multiple-Octet Coded Character Set (UCS) &mdash;
2720Part 1: Architecture and Basic Multilingual Plane.</titleref> [Geneva]: International
2721Organization for Standardization, 2000.</bibl>
2722<bibl id="Unicode" key="Unicode">The Unicode Consortium. <emph>The Unicode
2723Standard, Version 2.0.</emph> Reading, Mass.: Addison-Wesley Developers Press,
27241996.</bibl>
2725<bibl id="Unicode3" diff="add" key="Unicode3"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc>
2726The Unicode Consortium. <emph>The Unicode Standard, Version 3.0.</emph> Reading,
2727Mass.: Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5.</bibl>
2728</blist></div2>
2729<div2 id="null">
2730<!--
2731ID made "null" to match its previous value in the First
2732Edition; it's odd, but if there's no set value, the stylesheet 
2733currently generates an odd string that would be backwards
2734incompatible with any references anyone might have made before.
2735-->
2736<head>Other References</head>
2737<blist>
2738<bibl id="Aho" key="Aho/Ullman">Aho, Alfred V., Ravi Sethi, and Jeffrey D.
2739Ullman. <titleref>Compilers: Principles, Techniques, and Tools</titleref>.
2740Reading: Addison-Wesley, 1986, rpt. corr. 1988.</bibl>
2741<bibl id="Berners-Lee" key="Berners-Lee et al."> Berners-Lee, T., R. Fielding,
2742and L. Masinter. <titleref>Uniform Resource Identifiers (URI): Generic Syntax
2743and Semantics</titleref>. 1997. (Work in progress; see updates to RFC1738.)</bibl>
2744<bibl id="ABK" diff="chg" key="Br�ggemann-Klein"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E2">[E2]</loc>Br�ggemann-Klein,
2745Anne. Formal Models in Document Processing. Habilitationsschrift. Faculty
2746of Mathematics at the University of Freiburg, 1993. (See <loc href="ftp://ftp.informatik.uni-freiburg.de/documents/papers/brueggem/habil.ps">ftp://ftp.informatik.uni-freiburg.de/documents/papers/brueggem/habil.ps</loc
2747>.)</bibl>
2748<bibl id="ABKDW" diff="chg" key="Br�ggemann-Klein and Wood"><loc role="erratumref"
2749href="http://www.w3.org/XML/xml-19980210-errata#E2">[E2]</loc>Br�ggemann-Klein,
2750Anne, and Derick Wood. <titleref>Deterministic Regular Languages</titleref>.
2751Universit�t Freiburg, Institut f�r Informatik, Bericht 38, Oktober 1991. Extended
2752abstract in A. Finkel, M. Jantzen, Hrsg., STACS 1992, S. 173-184. Springer-Verlag,
2753Berlin 1992. Lecture Notes in Computer Science 577. Full version titled <titleref>One-Unambiguous
2754Regular Languages</titleref> in Information and Computation 140 (2): 229-253,
2755February 1998.</bibl>
2756<bibl id="Clark" key="Clark">James Clark. Comparison of SGML and XML. See <loc
2757href="http://www.w3.org/TR/NOTE-sgml-xml-971215">http://www.w3.org/TR/NOTE-sgml-xml-971215</loc>. </bibl>
2758<bibl id="IANA-LANGCODES" diff="add" href="http://www.isi.edu/in-notes/iana/assignments/languages/"
2759key="IANA-LANGCODES"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E58">[E58]</loc>(Internet
2760Assigned Numbers Authority) <titleref>Registry of Language Tags</titleref>,
2761ed. Keld Simonsen et al.</bibl>
2762<bibl id="RFC1738" diff="del" href="http://www.ietf.org/rfc/rfc1738.txt" key="IETF RFC1738">IETF
2763(Internet Engineering Task Force). <titleref>RFC 1738: Uniform Resource Locators
2764(URL)</titleref>, ed. T. Berners-Lee, L. Masinter, M. McCahill. 1994. </bibl>
2765<bibl id="RFC1808" diff="del" href="http://www.ietf.org/rfc/rfc1808.txt" key="IETF RFC1808">IETF
2766(Internet Engineering Task Force). <titleref>RFC 1808: Relative Uniform Resource
2767Locators</titleref>, ed. R. Fielding. 1995. </bibl>
2768<bibl id="RFC2141" href="http://www.ietf.org/rfc/rfc2141.txt" key="IETF RFC2141">IETF
2769(Internet Engineering Task Force). <emph>RFC 2141: URN Syntax</emph>, ed.
2770R. Moats. 1997. </bibl>
2771<bibl id="rfc2279" diff="add" href="http://www.ietf.org/rfc/rfc2279.txt" key="IETF RFC 2279"><loc
2772role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E78">[E78]</loc>IETF
2773(Internet Engineering Task Force). <titleref>RFC 2279: UTF-8, a transformation
2774format of ISO 10646</titleref>, <phrase diff="add">ed. F. Yergeau, </phrase>1998.</bibl>
2775<bibl id="rfc2376" diff="add" href="http://www.ietf.org/rfc/rfc2376.txt" key="IETF RFC 2376"><loc
2776role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E48">[E48]</loc>IETF
2777(Internet Engineering Task Force). <titleref>RFC 2376: XML Media Types</titleref>.
2778ed. E. Whitehead, M. Murata. 1998.</bibl>
2779<bibl id="rfc2396" diff="add" href="http://www.ietf.org/rfc/rfc2396.txt" key="IETF RFC 2396"><loc
2780role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E66">[E66]</loc>IETF
2781(Internet Engineering Task Force). <titleref>RFC 2396: Uniform Resource Identifiers
2782(URI): Generic Syntax</titleref>. T. Berners-Lee, R. Fielding, L. Masinter.
27831998.</bibl>
2784<bibl id="rfc2732" diff="add" href="http://www.ietf.org/rfc/rfc2732.txt" key="IETF RFC 2732"><loc
2785role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E66">[E66]</loc>IETF
2786(Internet Engineering Task Force). <titleref>RFC 2732: Format for Literal
2787IPv6 Addresses in URL's</titleref>. R. Hinden, B. Carpenter, L. Masinter.
27881999.</bibl>
2789<bibl id="rfc2781" diff="add" href="http://www.ietf.org/rfc/rfc2781.txt" key="IETF RFC 2781"><loc
2790role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E77">[E77]</loc>
2791IETF (Internet Engineering Task Force). <emph>RFC 2781: UTF-16, an encoding
2792of ISO 10646</emph>, ed. P. Hoffman, F. Yergeau. 2000.</bibl>
2793<bibl id="ISO639" diff="add" key="ISO 639"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E38">[E38]</loc>
2794(International Organization for Standardization). <titleref>ISO 639:1988 (E).
2795Code for the representation of names of languages.</titleref> [Geneva]: International
2796Organization for Standardization, 1988.</bibl>
2797<bibl id="ISO3166" diff="add" key="ISO 3166"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E38">[E38]</loc>
2798(International Organization for Standardization). <titleref>ISO 3166-1:1997
2799(E). Codes for the representation of names of countries and their subdivisions &mdash;
2800Part 1: Country codes</titleref> [Geneva]: International Organization for
2801Standardization, 1997.</bibl>
2802<bibl id="ISO8879" key="ISO 8879">ISO (International Organization for Standardization). <titleref>ISO
28038879:1986(E). Information processing &mdash; Text and Office Systems &mdash;
2804Standard Generalized Markup Language (SGML).</titleref> First edition &mdash;
28051986-10-15. [Geneva]: International Organization for Standardization, 1986. </bibl>
2806<bibl id="ISO10744" key="ISO/IEC 10744">ISO (International Organization for
2807Standardization). <titleref>ISO/IEC 10744-1992 (E). Information technology &mdash;
2808Hypermedia/Time-based Structuring Language (HyTime). </titleref> [Geneva]:
2809International Organization for Standardization, 1992. <emph>Extended Facilities
2810Annexe.</emph> [Geneva]: International Organization for Standardization, 1996. </bibl>
2811<bibl id="websgml" diff="add" href="http://www.sgmlsource.com/8879rev/n0029.htm"
2812key="WEBSGML"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E43">[E43]</loc>ISO
2813(International Organization for Standardization). <titleref>ISO 8879:1986
2814TC2. Information technology &mdash; Document Description and Processing Languages. </titleref>
2815[Geneva]: International Organization for Standardization, 1998.</bibl>
2816<bibl id="xml-names" diff="add" xmlns:xlink="http://www.w3.org/TR/WD-xlink"
2817href="http://www.w3.org/TR/REC-xml-names/" key="XML Names"><loc role="erratumref"
2818href="http://www.w3.org/XML/xml-19980210-errata#E98">[E98]</loc>Tim Bray,
2819Dave Hollander, and Andrew Layman, editors. <titleref>Namespaces in XML</titleref>.
2820Textuality, Hewlett-Packard, and Microsoft. World Wide Web Consortium, 1999.</bibl>
2821</blist></div2>
2822</div1>
2823<div1 id="CharClasses">
2824<head>Character Classes</head>
2825<p>Following the characteristics defined in the Unicode standard, characters
2826are classed as base characters (among others, these contain the alphabetic
2827characters of the Latin alphabet<phrase diff="del"><loc role="erratumref"
2828href="http://www.w3.org/XML/xml-19980210-errata#E84">[E84]</loc>, without
2829diacritics</phrase>), ideographic characters, and combining characters (among
2830others, this class contains most diacritics)<phrase diff="del"><loc role="erratumref"
2831href="http://www.w3.org/XML/xml-19980210-errata#E30">[E30]</loc>; these classes
2832combine to form the class of letters.</phrase> Digits and extenders are also
2833distinguished.</p>
2834<scrap id="CHARACTERS" lang="ebnf">
2835<head>Characters</head>
2836<prodgroup pcw3="3" pcw4="15">
2837<prod id="NT-Letter">
2838<lhs>Letter</lhs><rhs><nt def="NT-BaseChar">BaseChar</nt> | <nt def="NT-Ideographic">Ideographic</nt></rhs>
2839</prod>
2840<prod id="NT-BaseChar">
2841<lhs>BaseChar</lhs><rhs>[#x0041-#x005A] |&nbsp;[#x0061-#x007A] |&nbsp;[#x00C0-#x00D6]
2842|&nbsp;[#x00D8-#x00F6] |&nbsp;[#x00F8-#x00FF] |&nbsp;[#x0100-#x0131] |&nbsp;[#x0134-#x013E]
2843|&nbsp;[#x0141-#x0148] |&nbsp;[#x014A-#x017E] |&nbsp;[#x0180-#x01C3] |&nbsp;[#x01CD-#x01F0]
2844|&nbsp;[#x01F4-#x01F5] |&nbsp;[#x01FA-#x0217] |&nbsp;[#x0250-#x02A8] |&nbsp;[#x02BB-#x02C1]
2845|&nbsp;#x0386 |&nbsp;[#x0388-#x038A] |&nbsp;#x038C |&nbsp;[#x038E-#x03A1]
2846|&nbsp;[#x03A3-#x03CE] |&nbsp;[#x03D0-#x03D6] |&nbsp;#x03DA |&nbsp;#x03DC
2847|&nbsp;#x03DE |&nbsp;#x03E0 |&nbsp;[#x03E2-#x03F3] |&nbsp;[#x0401-#x040C]
2848|&nbsp;[#x040E-#x044F] |&nbsp;[#x0451-#x045C] |&nbsp;[#x045E-#x0481] |&nbsp;[#x0490-#x04C4]
2849|&nbsp;[#x04C7-#x04C8] |&nbsp;[#x04CB-#x04CC] |&nbsp;[#x04D0-#x04EB] |&nbsp;[#x04EE-#x04F5]
2850|&nbsp;[#x04F8-#x04F9] |&nbsp;[#x0531-#x0556] |&nbsp;#x0559 |&nbsp;[#x0561-#x0586]
2851|&nbsp;[#x05D0-#x05EA] |&nbsp;[#x05F0-#x05F2] |&nbsp;[#x0621-#x063A] |&nbsp;[#x0641-#x064A]
2852|&nbsp;[#x0671-#x06B7] |&nbsp;[#x06BA-#x06BE] |&nbsp;[#x06C0-#x06CE] |&nbsp;[#x06D0-#x06D3]
2853|&nbsp;#x06D5 |&nbsp;[#x06E5-#x06E6] |&nbsp;[#x0905-#x0939] |&nbsp;#x093D
2854|&nbsp;[#x0958-#x0961] |&nbsp;[#x0985-#x098C] |&nbsp;[#x098F-#x0990] |&nbsp;[#x0993-#x09A8]
2855|&nbsp;[#x09AA-#x09B0] |&nbsp;#x09B2 |&nbsp;[#x09B6-#x09B9] |&nbsp;[#x09DC-#x09DD]
2856|&nbsp;[#x09DF-#x09E1] |&nbsp;[#x09F0-#x09F1] |&nbsp;[#x0A05-#x0A0A] |&nbsp;[#x0A0F-#x0A10]
2857|&nbsp;[#x0A13-#x0A28] |&nbsp;[#x0A2A-#x0A30] |&nbsp;[#x0A32-#x0A33] |&nbsp;[#x0A35-#x0A36]
2858|&nbsp;[#x0A38-#x0A39] |&nbsp;[#x0A59-#x0A5C] |&nbsp;#x0A5E |&nbsp;[#x0A72-#x0A74]
2859|&nbsp;[#x0A85-#x0A8B] |&nbsp;#x0A8D |&nbsp;[#x0A8F-#x0A91] |&nbsp;[#x0A93-#x0AA8]
2860|&nbsp;[#x0AAA-#x0AB0] |&nbsp;[#x0AB2-#x0AB3] |&nbsp;[#x0AB5-#x0AB9] |&nbsp;#x0ABD
2861|&nbsp;#x0AE0 |&nbsp;[#x0B05-#x0B0C] |&nbsp;[#x0B0F-#x0B10] |&nbsp;[#x0B13-#x0B28]
2862|&nbsp;[#x0B2A-#x0B30] |&nbsp;[#x0B32-#x0B33] |&nbsp;[#x0B36-#x0B39] |&nbsp;#x0B3D
2863|&nbsp;[#x0B5C-#x0B5D] |&nbsp;[#x0B5F-#x0B61] |&nbsp;[#x0B85-#x0B8A] |&nbsp;[#x0B8E-#x0B90]
2864|&nbsp;[#x0B92-#x0B95] |&nbsp;[#x0B99-#x0B9A] |&nbsp;#x0B9C |&nbsp;[#x0B9E-#x0B9F]
2865|&nbsp;[#x0BA3-#x0BA4] |&nbsp;[#x0BA8-#x0BAA] |&nbsp;[#x0BAE-#x0BB5] |&nbsp;[#x0BB7-#x0BB9]
2866|&nbsp;[#x0C05-#x0C0C] |&nbsp;[#x0C0E-#x0C10] |&nbsp;[#x0C12-#x0C28] |&nbsp;[#x0C2A-#x0C33]
2867|&nbsp;[#x0C35-#x0C39] |&nbsp;[#x0C60-#x0C61] |&nbsp;[#x0C85-#x0C8C] |&nbsp;[#x0C8E-#x0C90]
2868|&nbsp;[#x0C92-#x0CA8] |&nbsp;[#x0CAA-#x0CB3] |&nbsp;[#x0CB5-#x0CB9] |&nbsp;#x0CDE
2869|&nbsp;[#x0CE0-#x0CE1] |&nbsp;[#x0D05-#x0D0C] |&nbsp;[#x0D0E-#x0D10] |&nbsp;[#x0D12-#x0D28]
2870|&nbsp;[#x0D2A-#x0D39] |&nbsp;[#x0D60-#x0D61] |&nbsp;[#x0E01-#x0E2E] |&nbsp;#x0E30
2871|&nbsp;[#x0E32-#x0E33] |&nbsp;[#x0E40-#x0E45] |&nbsp;[#x0E81-#x0E82] |&nbsp;#x0E84
2872|&nbsp;[#x0E87-#x0E88] |&nbsp;#x0E8A |&nbsp;#x0E8D |&nbsp;[#x0E94-#x0E97]
2873|&nbsp;[#x0E99-#x0E9F] |&nbsp;[#x0EA1-#x0EA3] |&nbsp;#x0EA5 |&nbsp;#x0EA7
2874|&nbsp;[#x0EAA-#x0EAB] |&nbsp;[#x0EAD-#x0EAE] |&nbsp;#x0EB0 |&nbsp;[#x0EB2-#x0EB3]
2875|&nbsp;#x0EBD |&nbsp;[#x0EC0-#x0EC4] |&nbsp;[#x0F40-#x0F47] |&nbsp;[#x0F49-#x0F69]
2876|&nbsp;[#x10A0-#x10C5] |&nbsp;[#x10D0-#x10F6] |&nbsp;#x1100 |&nbsp;[#x1102-#x1103]
2877|&nbsp;[#x1105-#x1107] |&nbsp;#x1109 |&nbsp;[#x110B-#x110C] |&nbsp;[#x110E-#x1112]
2878|&nbsp;#x113C |&nbsp;#x113E |&nbsp;#x1140 |&nbsp;#x114C |&nbsp;#x114E |&nbsp;#x1150
2879|&nbsp;[#x1154-#x1155] |&nbsp;#x1159 |&nbsp;[#x115F-#x1161] |&nbsp;#x1163
2880|&nbsp;#x1165 |&nbsp;#x1167 |&nbsp;#x1169 |&nbsp;[#x116D-#x116E] |&nbsp;[#x1172-#x1173]
2881|&nbsp;#x1175 |&nbsp;#x119E |&nbsp;#x11A8 |&nbsp;#x11AB |&nbsp;[#x11AE-#x11AF]
2882|&nbsp;[#x11B7-#x11B8] |&nbsp;#x11BA |&nbsp;[#x11BC-#x11C2] |&nbsp;#x11EB
2883|&nbsp;#x11F0 |&nbsp;#x11F9 |&nbsp;[#x1E00-#x1E9B] |&nbsp;[#x1EA0-#x1EF9]
2884|&nbsp;[#x1F00-#x1F15] |&nbsp;[#x1F18-#x1F1D] |&nbsp;[#x1F20-#x1F45] |&nbsp;[#x1F48-#x1F4D]
2885|&nbsp;[#x1F50-#x1F57] |&nbsp;#x1F59 |&nbsp;#x1F5B |&nbsp;#x1F5D |&nbsp;[#x1F5F-#x1F7D]
2886|&nbsp;[#x1F80-#x1FB4] |&nbsp;[#x1FB6-#x1FBC] |&nbsp;#x1FBE |&nbsp;[#x1FC2-#x1FC4]
2887|&nbsp;[#x1FC6-#x1FCC] |&nbsp;[#x1FD0-#x1FD3] |&nbsp;[#x1FD6-#x1FDB] |&nbsp;[#x1FE0-#x1FEC]
2888|&nbsp;[#x1FF2-#x1FF4] |&nbsp;[#x1FF6-#x1FFC] |&nbsp;#x2126 |&nbsp;[#x212A-#x212B]
2889|&nbsp;#x212E |&nbsp;[#x2180-#x2182] |&nbsp;[#x3041-#x3094] |&nbsp;[#x30A1-#x30FA]
2890|&nbsp;[#x3105-#x312C] |&nbsp;[#xAC00-#xD7A3] </rhs>
2891</prod>
2892<prod id="NT-Ideographic">
2893<lhs>Ideographic</lhs><rhs>[#x4E00-#x9FA5] |&nbsp;#x3007 |&nbsp;[#x3021-#x3029] </rhs>
2894</prod>
2895<prod id="NT-CombiningChar">
2896<lhs>CombiningChar</lhs><rhs>[#x0300-#x0345] |&nbsp;[#x0360-#x0361] |&nbsp;[#x0483-#x0486]
2897|&nbsp;[#x0591-#x05A1] |&nbsp;[#x05A3-#x05B9] |&nbsp;[#x05BB-#x05BD] |&nbsp;#x05BF
2898|&nbsp;[#x05C1-#x05C2] |&nbsp;#x05C4 |&nbsp;[#x064B-#x0652] |&nbsp;#x0670
2899|&nbsp;[#x06D6-#x06DC] |&nbsp;[#x06DD-#x06DF] |&nbsp;[#x06E0-#x06E4] |&nbsp;[#x06E7-#x06E8]
2900|&nbsp;[#x06EA-#x06ED] |&nbsp;[#x0901-#x0903] |&nbsp;#x093C |&nbsp;[#x093E-#x094C]
2901|&nbsp;#x094D |&nbsp;[#x0951-#x0954] |&nbsp;[#x0962-#x0963] |&nbsp;[#x0981-#x0983]
2902|&nbsp;#x09BC |&nbsp;#x09BE |&nbsp;#x09BF |&nbsp;[#x09C0-#x09C4] |&nbsp;[#x09C7-#x09C8]
2903|&nbsp;[#x09CB-#x09CD] |&nbsp;#x09D7 |&nbsp;[#x09E2-#x09E3] |&nbsp;#x0A02
2904|&nbsp;#x0A3C |&nbsp;#x0A3E |&nbsp;#x0A3F |&nbsp;[#x0A40-#x0A42] |&nbsp;[#x0A47-#x0A48]
2905|&nbsp;[#x0A4B-#x0A4D] |&nbsp;[#x0A70-#x0A71] |&nbsp;[#x0A81-#x0A83] |&nbsp;#x0ABC
2906|&nbsp;[#x0ABE-#x0AC5] |&nbsp;[#x0AC7-#x0AC9] |&nbsp;[#x0ACB-#x0ACD] |&nbsp;[#x0B01-#x0B03]
2907|&nbsp;#x0B3C |&nbsp;[#x0B3E-#x0B43] |&nbsp;[#x0B47-#x0B48] |&nbsp;[#x0B4B-#x0B4D]
2908|&nbsp;[#x0B56-#x0B57] |&nbsp;[#x0B82-#x0B83] |&nbsp;[#x0BBE-#x0BC2] |&nbsp;[#x0BC6-#x0BC8]
2909|&nbsp;[#x0BCA-#x0BCD] |&nbsp;#x0BD7 |&nbsp;[#x0C01-#x0C03] |&nbsp;[#x0C3E-#x0C44]
2910|&nbsp;[#x0C46-#x0C48] |&nbsp;[#x0C4A-#x0C4D] |&nbsp;[#x0C55-#x0C56] |&nbsp;[#x0C82-#x0C83]
2911|&nbsp;[#x0CBE-#x0CC4] |&nbsp;[#x0CC6-#x0CC8] |&nbsp;[#x0CCA-#x0CCD] |&nbsp;[#x0CD5-#x0CD6]
2912|&nbsp;[#x0D02-#x0D03] |&nbsp;[#x0D3E-#x0D43] |&nbsp;[#x0D46-#x0D48] |&nbsp;[#x0D4A-#x0D4D]
2913|&nbsp;#x0D57 |&nbsp;#x0E31 |&nbsp;[#x0E34-#x0E3A] |&nbsp;[#x0E47-#x0E4E]
2914|&nbsp;#x0EB1 |&nbsp;[#x0EB4-#x0EB9] |&nbsp;[#x0EBB-#x0EBC] |&nbsp;[#x0EC8-#x0ECD]
2915|&nbsp;[#x0F18-#x0F19] |&nbsp;#x0F35 |&nbsp;#x0F37 |&nbsp;#x0F39 |&nbsp;#x0F3E
2916|&nbsp;#x0F3F |&nbsp;[#x0F71-#x0F84] |&nbsp;[#x0F86-#x0F8B] |&nbsp;[#x0F90-#x0F95]
2917|&nbsp;#x0F97 |&nbsp;[#x0F99-#x0FAD] |&nbsp;[#x0FB1-#x0FB7] |&nbsp;#x0FB9
2918|&nbsp;[#x20D0-#x20DC] |&nbsp;#x20E1 |&nbsp;[#x302A-#x302F] |&nbsp;#x3099
2919|&nbsp;#x309A </rhs>
2920</prod>
2921<prod id="NT-Digit">
2922<lhs>Digit</lhs><rhs>[#x0030-#x0039] |&nbsp;[#x0660-#x0669] |&nbsp;[#x06F0-#x06F9]
2923|&nbsp;[#x0966-#x096F] |&nbsp;[#x09E6-#x09EF] |&nbsp;[#x0A66-#x0A6F] |&nbsp;[#x0AE6-#x0AEF]
2924|&nbsp;[#x0B66-#x0B6F] |&nbsp;[#x0BE7-#x0BEF] |&nbsp;[#x0C66-#x0C6F] |&nbsp;[#x0CE6-#x0CEF]
2925|&nbsp;[#x0D66-#x0D6F] |&nbsp;[#x0E50-#x0E59] |&nbsp;[#x0ED0-#x0ED9] |&nbsp;[#x0F20-#x0F29] </rhs>
2926</prod>
2927<prod id="NT-Extender">
2928<lhs>Extender</lhs><rhs>#x00B7 |&nbsp;#x02D0 |&nbsp;#x02D1 |&nbsp;#x0387 |&nbsp;#x0640
2929|&nbsp;#x0E46 |&nbsp;#x0EC6 |&nbsp;#x3005 |&nbsp;[#x3031-#x3035] |&nbsp;[#x309D-#x309E]
2930|&nbsp;[#x30FC-#x30FE] </rhs>
2931</prod>
2932</prodgroup></scrap>
2933<p>The character classes defined here can be derived from the Unicode <phrase
2934diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc>2.0</phrase>
2935character database as follows:</p>
2936<ulist>
2937<item><p>Name start characters must have one of the categories Ll, Lu, Lo,
2938Lt, Nl.</p></item>
2939<item><p>Name characters other than Name-start characters must have one of
2940the categories Mc, Me, Mn, Lm, or Nd.</p></item>
2941<item><p>Characters in the compatibility area (i.e. with character code greater
2942than #xF900 and less than #xFFFE) are not allowed in XML names.</p></item>
2943<item><p>Characters which have a font or compatibility decomposition (i.e.
2944those with a <quote>compatibility formatting tag</quote> in field 5 of the
2945database -- marked by field 5 beginning with a <quote>&lt;</quote>) are not
2946allowed.</p></item>
2947<item><p>The following characters are treated as name-start characters rather
2948than name characters, because the property file classifies them as Alphabetic:
2949[#x02BB-#x02C1], #x0559, #x06E5, #x06E6.</p></item>
2950<item><p>Characters #x20DD-#x20E0 are excluded (in accordance with Unicode <phrase
2951diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc>2.0</phrase>,
2952section 5.14).</p></item>
2953<item><p>Character #x00B7 is classified as an extender, because the property
2954list so identifies it.</p></item>
2955<item><p>Character #x0387 is added as a name character, because #x00B7 is
2956its canonical equivalent.</p></item>
2957<item><p>Characters ':' and '_' are allowed as name-start characters.</p>
2958</item>
2959<item><p>Characters '-' and '.' are allowed as name characters.</p></item>
2960</ulist>
2961</div1>
2962<inform-div1 id="sec-xml-and-sgml">
2963<head>XML and SGML</head>
2964<p><phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E43">[E43]</loc>XML
2965is designed to be a subset of SGML, in that every XML document should also
2966be a conforming SGML document.</phrase> For a detailed comparison of the additional
2967restrictions that XML places on documents beyond those of SGML, see <bibref
2968ref="Clark"/>.</p>
2969</inform-div1>
2970<inform-div1 id="sec-entexpand">
2971<head>Expansion of Entity and Character References</head>
2972<p>This appendix contains some examples illustrating the sequence of entity-
2973and character-reference recognition and expansion, as specified in <specref
2974ref="entproc"/>.</p>
2975<p>If the DTD contains the declaration</p>
2976<eg><![CDATA[<!ENTITY example "<p>An ampersand (&#38;#38;) may be escaped
2977numerically (&#38;#38;#38;) or with a general entity
2978(&amp;amp;).</p>" >]]></eg>
2979<p>then the XML processor will recognize the character references when it
2980parses the entity declaration, and resolve them before storing the following
2981string as the value of the entity <quote><code>example</code></quote>:</p>
2982<eg><![CDATA[<p>An ampersand (&#38;) may be escaped
2983numerically (&#38;#38;) or with a general entity
2984(&amp;amp;).</p>]]></eg>
2985<p>A reference in the document to <quote><code>&amp;example;</code></quote>
2986will cause the text to be reparsed, at which time the start- and end-tags
2987of the <el>p</el> element will be recognized and the three references will
2988be recognized and expanded, resulting in a <el>p</el> element with the following
2989content (all data, no delimiters or markup):</p>
2990<eg><![CDATA[An ampersand (&) may be escaped
2991numerically (&#38;) or with a general entity
2992(&amp;).]]></eg>
2993<p>A more complex example will illustrate the rules and their effects fully.
2994In the following example, the line numbers are solely for reference.</p>
2995<eg><![CDATA[1 <?xml version='1.0'?>
29962 <!DOCTYPE test [
29973 <!ELEMENT test (#PCDATA) >
29984 <!ENTITY % xx '&#37;zz;'>
29995 <!ENTITY % zz '&#60;!ENTITY tricky "error-prone" >' >
30006 %xx;
30017 ]>
30028 <test>This sample shows a &tricky; method.</test>]]></eg>
3003<p>This produces the following:</p>
3004<ulist spacing="compact">
3005<item><p>in line 4, the reference to character 37 is expanded immediately,
3006and the parameter entity <quote><code>xx</code></quote> is stored in the symbol
3007table with the value <quote><code>%zz;</code></quote>. Since the replacement
3008text is not rescanned, the reference to parameter entity <quote><code>zz</code></quote>
3009is not recognized. (And it would be an error if it were, since <quote><code>zz</code></quote>
3010is not yet declared.)</p></item>
3011<item><p>in line 5, the character reference <quote><code>&amp;#60;</code></quote>
3012is expanded immediately and the parameter entity <quote><code>zz</code></quote>
3013is stored with the replacement text <quote><code>&lt;!ENTITY tricky "error-prone"
3014></code></quote>, which is a well-formed entity declaration.</p></item>
3015<item><p>in line 6, the reference to <quote><code>xx</code></quote> is recognized,
3016and the replacement text of <quote><code>xx</code></quote> (namely <quote><code>%zz;</code></quote>)
3017is parsed. The reference to <quote><code>zz</code></quote> is recognized in
3018its turn, and its replacement text (<quote><code>&lt;!ENTITY tricky "error-prone"
3019></code></quote>) is parsed. The general entity <quote><code>tricky</code></quote>
3020has now been declared, with the replacement text <quote><code>error-prone</code></quote>.</p>
3021</item>
3022<item><p>in line 8, the reference to the general entity <quote><code>tricky</code></quote>
3023is recognized, and it is expanded, so the full content of the <el>test</el>
3024element is the self-describing (and ungrammatical) string <emph>This sample
3025shows a error-prone method.</emph></p></item>
3026</ulist>
3027</inform-div1>
3028<inform-div1 id="determinism">
3029<head>Deterministic Content Models</head>
3030<p><phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E102">[E102]</loc>As
3031noted in <specref ref="sec-element-content"/>, it is required that content
3032models in element type declarations be deterministic. This requirement is <termref
3033def="dt-compat">for compatibility</termref> with SGML (which calls deterministic
3034content models <quote>unambiguous</quote>);</phrase> XML processors built
3035using SGML systems may flag non-deterministic content models as errors.</p>
3036<p>For example, the content model <code>((b, c) | (b, d))</code> is non-deterministic,
3037because given an initial <el>b</el> the <phrase diff="chg"><loc role="erratumref"
3038href="http://www.w3.org/XML/xml-19980210-errata#E95">[E95]</loc>XML processor</phrase>
3039cannot know which <el>b</el> in the model is being matched without looking
3040ahead to see which element follows the <el>b</el>. In this case, the two references
3041to <el>b</el> can be collapsed into a single reference, making the model read <code>(b,
3042(c | d))</code>. An initial <el>b</el> now clearly matches only a single name
3043in the content model. The <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E95">[E95]</loc>processor</phrase
3044> doesn't need to look ahead to see what follows; either <el>c</el> or <el>d</el>
3045would be accepted.</p>
3046<p>More formally: a finite state automaton may be constructed from the content
3047model using the standard algorithms, e.g. algorithm 3.5 in section 3.9 of
3048Aho, Sethi, and Ullman <bibref ref="Aho"/>. In many such algorithms, a follow
3049set is constructed for each position in the regular expression (i.e., each
3050leaf node in the syntax tree for the regular expression); if any position
3051has a follow set in which more than one following position is labeled with
3052the same element type name, then the content model is in error and may be
3053reported as an error.</p>
3054<p>Algorithms exist which allow many but not all non-deterministic content
3055models to be reduced automatically to equivalent deterministic models; see
3056Br�ggemann-Klein 1991 <bibref ref="ABK"/>.</p>
3057</inform-div1>
3058<inform-div1 id="sec-guessing">
3059<head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E105">[E105]</loc><loc
3060role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E48">[E48]</loc>Autodetection
3061of Character Encodings</head>
3062<p>The XML encoding declaration functions as an internal label on each entity,
3063indicating which character encoding is in use. Before an XML processor can
3064read the internal label, however, it apparently has to know what character
3065encoding is in use&mdash;which is what the internal label is trying to indicate.
3066In the general case, this is a hopeless situation. It is not entirely hopeless
3067in XML, however, because XML limits the general case in two ways: each implementation
3068is assumed to support only a finite set of character encodings, and the XML
3069encoding declaration is restricted in position and content in order to make
3070it feasible to autodetect the character encoding in use in each entity in
3071normal cases. Also, in many cases other sources of information are available
3072in addition to the XML data stream itself. Two cases may be distinguished,
3073depending on whether the XML entity is presented to the processor without,
3074or with, any accompanying (external) information. We consider the first case
3075first.</p>
3076<div2 id="sec-guessing-no-ext-info">
3077<head diff="add">Detection Without External Encoding Information</head>
3078<p>Because each XML entity <phrase diff="add">not accompanied by external
3079encoding information and </phrase>not in UTF-8 or UTF-16 <phrase diff="chg">encoding</phrase> <emph>must</emph>
3080begin with an XML encoding declaration, in which the first characters must
3081be '<code>&lt;?xml</code>', any conforming processor can detect, after two
3082to four octets of input, which of the following cases apply. In reading this
3083list, it may help to know that in UCS-4, '&lt;' is <quote><code>#x0000003C</code></quote>
3084and '?' is <quote><code>#x0000003F</code></quote>, and the Byte Order Mark
3085required of UTF-16 data streams is <quote><code>#xFEFF</code></quote>. <phrase
3086diff="add">The notation <var>##</var> is used to denote any byte value except <phrase
3087diff="chg">that two consecutive <var>##</var>s cannot be both 00</phrase>.</phrase></p>
3088<p diff="add">With a Byte Order Mark:</p>
3089<table diff="add" border="1" frame="border"><tbody><tr><td><code>00 00 FE
3090FF</code></td><td>UCS-4, big-endian machine (1234 order)</td></tr><tr><td><code>FF
3091FE 00 00</code></td><td>UCS-4, little-endian machine (4321 order)</td></tr>
3092<tr><td><code>00 00 FF FE</code></td><td>UCS-4, unusual octet order (2143)</td>
3093</tr><tr><td><code>FE FF 00 00</code></td><td>UCS-4, unusual octet order (3412)</td>
3094</tr><tr><td><code>FE FF ## ##</code></td><td>UTF-16, big-endian</td></tr>
3095<tr><td><code>FF FE ## ##</code></td><td>UTF-16, little-endian</td></tr><tr>
3096<td><code>EF BB BF</code></td><td>UTF-8</td></tr></tbody></table>
3097<p diff="add">Without a Byte Order Mark:</p>
3098<table diff="add" border="1" frame="border"><tbody><tr><td><code>00&nbsp;00&nbsp;00&nbsp;3C</code></td>
3099<td rowspan="4">UCS-4 or other encoding with a 32-bit code unit and ASCII
3100characters encoded as ASCII values, in respectively big-endian (1234), little-endian
3101(4321) and two unusual byte orders (2143 and 3412). The encoding declaration
3102must be read to determine which of UCS-4 or other supported 32-bit encodings
3103applies.</td></tr><tr><td><code>3C 00 00 00</code></td>
3104<!--<td>UCS-4, little-endian machine (4321 order)</td>-->
3105</tr><tr><td><code>00 00 3C 00</code></td>
3106<!--<td>UCS-4, unusual octet order (2143)</td>-->
3107</tr><tr><td><code>00 3C 00 00</code></td>
3108<!--<td>UCS-4, unusual octet order (3412)</td>-->
3109</tr><tr><td><code>00 3C 00 3F</code></td><td>UTF-16BE or big-endian ISO-10646-UCS-2
3110or other encoding with a 16-bit code unit in big-endian order and ASCII characters
3111encoded as ASCII values (the encoding declaration must be read to determine
3112which)</td></tr><tr><td><code>3C 00 3F 00</code></td><td>UTF-16LE or little-endian
3113ISO-10646-UCS-2 or other encoding with a 16-bit code unit in little-endian
3114order and ASCII characters encoded as ASCII values (the encoding declaration
3115must be read to determine which)</td></tr><tr><td><code>3C 3F 78 6D</code></td>
3116<td>UTF-8, ISO 646, ASCII, some part of ISO 8859, Shift-JIS, EUC, or any other
31177-bit, 8-bit, or mixed-width encoding which ensures that the characters of
3118ASCII have their normal positions, width, and values; the actual encoding
3119declaration must be read to detect which of these applies, but since all of
3120these encodings use the same bit patterns for the relevant ASCII characters,
3121the encoding declaration itself may be read reliably</td></tr><tr><td><code>4C
31226F A7 94</code></td><td>EBCDIC (in some flavor; the full encoding declaration
3123must be read to tell which code page is in use)</td></tr><tr><td>Other</td>
3124<td>UTF-8 without an encoding declaration, or else the data stream is mislabeled
3125(lacking a required encoding declaration), corrupt, fragmentary, or enclosed
3126in a wrapper of some kind</td></tr></tbody></table>
3127<note diff="add">
3128<p>In cases above which do not require reading the encoding declaration to
3129determine the encoding, section 4.3.3 still requires that the encoding declaration,
3130if present, be read and that the encoding name be checked to match the actual
3131encoding of the entity. Also, it is possible that new character encodings
3132will be invented that will make it necessary to use the encoding declaration
3133to determine the encoding, in cases where this is not required at present.</p>
3134</note>
3135<p>This level of autodetection is enough to read the XML encoding declaration
3136and parse the character-encoding identifier, which is still necessary to distinguish
3137the individual members of each family of encodings (e.g. to tell UTF-8 from
31388859, and the parts of 8859 from each other, or to distinguish the specific
3139EBCDIC code page in use, and so on).</p>
3140<p>Because the contents of the encoding declaration are restricted to <phrase
3141diff="chg">characters from the ASCII repertoire (however encoded)</phrase>,
3142a processor can reliably read the entire encoding declaration as soon as it
3143has detected which family of encodings is in use. Since in practice, all widely
3144used character encodings fall into one of the categories above, the XML encoding
3145declaration allows reasonably reliable in-band labeling of character encodings,
3146even when external sources of information at the operating-system or transport-protocol
3147level are unreliable. <phrase diff="del">Note that since external parsed entities
3148in UTF-16 may begin with any character, this autodetection does not always
3149work. Also, </phrase><phrase diff="add">Character encodings such as UTF-7
3150that make overloaded usage of ASCII-valued bytes may fail to be reliably detected.</phrase></p>
3151<p>Once the processor has detected the character encoding in use, it can act
3152appropriately, whether by invoking a separate input routine for each case,
3153or by calling the proper conversion function on each character of input.</p>
3154<p>Like any self-labeling system, the XML encoding declaration will not work
3155if any software changes the entity's character set or encoding without updating
3156the encoding declaration. Implementors of character-encoding routines should
3157be careful to ensure the accuracy of the internal and external information
3158used to label the entity.</p>
3159</div2>
3160<div2 id="sec-guessing-with-ext-info">
3161<head diff="add">Priorities in the Presence of External Encoding Information</head>
3162<p>The second possible case occurs when the XML entity is accompanied by encoding
3163information, as in some file systems and some network protocols. When multiple
3164sources of information are available, their relative priority and the preferred
3165method of handling conflict should be specified as part of the higher-level
3166protocol used to deliver XML. <phrase diff="chg">In particular, please refer
3167to <bibref ref="rfc2376"/> or its successor, which defines the <code>text/xml</code>
3168and <code>application/xml</code> MIME types and provides some useful guidance.
3169In the interests of interoperability, however, the following rule is recommended.</phrase></p>
3170<ulist>
3171<item><p>If an XML entity is in a file, the Byte-Order Mark and encoding declaration <phrase
3172diff="del">PI </phrase>are used (if present) to determine the character encoding.<phrase
3173diff="del"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E74">[E74]</loc>
3174All other heuristics and sources of information are solely for error recovery.</phrase></p>
3175</item>
3176</ulist>
3177<ulist diff="del">
3178<item><p>If an XML entity is delivered with a MIME type of text/xml, then
3179the <code>charset</code> parameter on the MIME type determines the character
3180encoding method; all other heuristics and sources of information are solely
3181for error recovery.</p></item>
3182<item><p>If an XML entity is delivered with a MIME type of application/xml,
3183then the Byte-Order Mark and encoding-declaration PI are used (if present)
3184to determine the character encoding. All other heuristics and sources of information
3185are solely for error recovery.</p></item>
3186</ulist>
3187<p diff="del">These rules apply only in the absence of protocol-level documentation;
3188in particular, when the MIME types text/xml and application/xml are defined,
3189the recommendations of the relevant RFC will supersede these rules.</p>
3190</div2>
3191</inform-div1>
3192<inform-div1 id="sec-xml-wg">
3193<head>W3C XML Working Group</head>
3194<p>This specification was prepared and approved for publication by the W3C
3195XML Working Group (WG). WG approval of this specification does not necessarily
3196imply that all WG members voted for its approval. The current and former members
3197of the XML WG are:</p>
3198<orglist>
3199<member><name>Jon Bosak</name><affiliation>Sun</affiliation><role>Chair</role>
3200</member>
3201<member><name>James Clark</name><role>Technical Lead</role></member>
3202<member><name>Tim Bray</name><affiliation>Textuality and Netscape</affiliation>
3203<role>XML Co-editor</role></member>
3204<member><name>Jean Paoli</name><affiliation>Microsoft</affiliation><role>XML
3205Co-editor</role></member>
3206<member><name>C. M. Sperberg-McQueen</name><affiliation>U. of Ill.</affiliation>
3207<role>XML Co-editor</role></member>
3208<member><name>Dan Connolly</name><affiliation>W3C</affiliation><role>W3C Liaison</role>
3209</member>
3210<member><name>Paula Angerstein</name><affiliation>Texcel</affiliation></member>
3211<member><name>Steve DeRose</name><affiliation>INSO</affiliation></member>
3212<member><name>Dave Hollander</name><affiliation>HP</affiliation></member>
3213<member><name>Eliot Kimber</name><affiliation>ISOGEN</affiliation></member>
3214<member><name>Eve Maler</name><affiliation>ArborText</affiliation></member>
3215<member><name>Tom Magliery</name><affiliation>NCSA</affiliation></member>
3216<member><name>Murray Maloney</name><affiliation diff="chg">SoftQuad, Grif
3217SA, Muzmo and Veo Systems</affiliation></member>
3218<member><name diff="chg">MURATA Makoto (FAMILY Given)</name><affiliation>Fuji
3219Xerox Information Systems</affiliation></member>
3220<member><name>Joel Nava</name><affiliation>Adobe</affiliation></member>
3221<member><name>Conleth O'Connell</name><affiliation>Vignette</affiliation>
3222</member>
3223<member><name>Peter Sharpe</name><affiliation>SoftQuad</affiliation></member>
3224<member><name>John Tigue</name><affiliation>DataChannel</affiliation></member>
3225</orglist>
3226</inform-div1>
3227<inform-div1 id="sec-core-wg" diff="add">
3228<head>W3C XML Core Group</head>
3229<p>The second edition of this specification was prepared by the W3C XML Core
3230Working Group (WG). The members of the WG at the time of publication of this
3231edition were:</p>
3232<orglist>
3233<member><name>Paula Angerstein</name><affiliation>Vignette</affiliation></member>
3234<member><name>Daniel Austin</name><affiliation>Ask Jeeves</affiliation></member>
3235<member><name>Tim Boland</name></member>
3236<member><name>Allen Brown</name><affiliation>Microsoft</affiliation></member>
3237<member><name>Dan Connolly</name><affiliation>W3C</affiliation><role>Staff
3238Contact</role></member>
3239<member><name>John Cowan</name><affiliation>Reuters Limited</affiliation>
3240</member>
3241<member><name>John Evdemon</name><affiliation>XMLSolutions Corporation</affiliation>
3242</member>
3243<member><name>Paul Grosso</name><affiliation>Arbortext</affiliation><role>Co-Chair</role>
3244</member>
3245<member><name>Arnaud Le Hors</name><affiliation>IBM</affiliation><role>Co-Chair</role>
3246</member>
3247<member><name>Eve Maler</name><affiliation>Sun Microsystems</affiliation>
3248<role>Second Edition Editor</role></member>
3249<member><name>Jonathan Marsh</name><affiliation>Microsoft</affiliation></member>
3250<member><name>MURATA Makoto (FAMILY Given)</name><affiliation>IBM</affiliation>
3251</member>
3252<member><name>Mark Needleman</name><affiliation>Data Research Associates</affiliation>
3253</member>
3254<member><name>David Orchard</name><affiliation>Jamcracker</affiliation></member>
3255<member><name>Lew Shannon</name><affiliation>NCR</affiliation></member>
3256<member><name>Richard Tobin</name><affiliation>University of Edinburgh</affiliation>
3257</member>
3258<member><name>Daniel Veillard</name><affiliation>W3C</affiliation></member>
3259<member><name>Dan Vint</name><affiliation>Lexica</affiliation></member>
3260<member><name>Norman Walsh</name><affiliation>Sun Microsystems</affiliation>
3261</member>
3262<member><name>Fran�ois Yergeau</name><affiliation>Alis Technologies</affiliation>
3263<role>Errata List Editor</role></member>
3264<member><name>Kongyi Zhou</name><affiliation>Oracle</affiliation></member>
3265</orglist>
3266</inform-div1>
3267<inform-div1 diff="add">
3268<head>Production Notes</head>
3269<p>This Second Edition was encoded in the <loc href="http://www.w3.org/XML/1998/06/xmlspec-v21.dtd">XMLspec
3270DTD</loc> (which has <loc href="http://www.w3.org/XML/1998/06/xmlspec-report-v21.htm">documentation</loc>
3271available). The HTML versions were produced with a combination of the <loc
3272href="http://www.w3.org/XML/1998/06/xmlspec.xsl">xmlspec.xsl</loc>, <loc href="http://www.w3.org/XML/1998/06/diffspec.xsl">diffspec.xsl</loc>,
3273and <loc href="http://www.w3.org/XML/1998/06/REC-xml-2e.xsl">REC-xml-2e.xsl</loc>
3274XSLT stylesheets.  The PDF version was produced with the <loc href="http://www.tdb.uu.se/~jan/html2ps.html">html2ps</loc>
3275facility and a distiller program.</p>
3276</inform-div1>
3277</back></spec>
3278