1<?xml version="1.0" encoding="iso-8859-1"?> 2<!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.1//EN" 3 "xmlspec-v21.dtd" [ 4<!--ArborText, Inc., 1988-2000, v.4002--> 5<!ENTITY http-ident "http://www.w3.org/TR/2000/REC-xml"> 6<!ENTITY draft.month "October"> 7<!ENTITY draft.day "6"> 8<!ENTITY iso6.doc.date "20001006"> 9<!ENTITY draft.year "2000"> 10<!ENTITY versionOfXML "1.0"> 11<!ENTITY pio "'<?xml'"> 12<!ENTITY doc.date "10 February 1998"> 13<!ENTITY w3c.doc.date "02-Feb-1998"> 14<!ENTITY WebSGML "WebSGML Adaptations Annex to ISO 8879"> 15<!ENTITY pic "'?>'"> 16<!ENTITY br "\n"> 17<!ENTITY cellback "#c0d9c0"> 18<!ENTITY mdash "--"> 19<!ENTITY com "--"> 20<!ENTITY como "--"> 21<!ENTITY comc "--"> 22<!ENTITY hcro "&#x"> 23<!ENTITY nbsp " "> 24<!ENTITY magicents "<code>amp</code>, 25<code>lt</code>, 26<code>gt</code>, 27<code>apos</code>, 28<code>quot</code>"> 29<!ENTITY doc.audience "public review and discussion"> 30<!ENTITY doc.distribution "may be distributed freely, as long as 31all text and legal notices remain intact"> 32]> 33<spec w3c-doctype="rec"> 34<!-- 35Notes on preparation of the Second Edition: 36 37- Worked from http://www.w3.org/XML/xml-19980210-errata. 38- Changed DTD reference to point to V2.1 of XMLspec. 39- Moved version number from <title> to <version> element and 40 added "second edition" wording. Mentioned edition information 41 in status. 42- Removed bgcolor="&cellback;" attributes from all <td> 43 elements because that attribute is not in the current table model. 44- Reversed status and abstract, so that abstract is first, according 45 to W3C guidelines. 46- Changed some <emph>s to <titleref>s in bibliography. 47- Changed some <code>s to <at> etc. throughout; where used <attval>, 48 removed existing <quote>s because the stylesheet produces them. 49- Removed some spurious spaces. 50- Added affiliation markup to the original member list. 51- Added commas between individual <thisver> elements, because 52 whitespace is now significant there. 53- Moved <eg>s, <scrap>s, and lists outside of <p>s for cleaner HTML 54 conversion. 55- Revised Status section to reflect new status. 56- Fixed all titleref hrefs so they get transformed properly; at 57 next revision, these all probably need to be changed to some 58 other markup. 59- Incorporated all errata (barring obsoleted and invalid ones); 60 added links to the errata document with <loc role="erratumref"> 61 elements; used diff="{add|chg|del}" attribute. This version 62 expects that the official HTML output will have diff="del" 63 elements suppressed. 64--> 65<header> 66<title>Extensible Markup Language (XML)</title> 67<version>1.0 (Second Edition)</version> 68<w3c-designation>REC-xml-&iso6.doc.date;</w3c-designation> 69<w3c-doctype>W3C Recommendation</w3c-doctype> 70<pubdate><day>&draft.day;</day><month>&draft.month;</month><year>&draft.year;</year> 71</pubdate> 72<publoc><loc href="&http-ident;-&iso6.doc.date;">&http-ident;-&iso6.doc.date;</loc> 73(<loc href="&http-ident;-&iso6.doc.date;.html">XHTML</loc>, <loc href="&http-ident;-&iso6.doc.date;.xml">XML</loc>, <loc 74href="&http-ident;-&iso6.doc.date;.pdf">PDF</loc>, <loc href="&http-ident;-&iso6.doc.date;-review.html">XHTML 75review version</loc> with color-coded revision indicators)</publoc> 76<latestloc><loc href="http://www.w3.org/TR/REC-xml">http://www.w3.org/TR/REC-xml</loc></latestloc> 77<prevlocs><loc href="http://www.w3.org/TR/2000/WD-xml-2e-20000814"> http://www.w3.org/TR/2000/WD-xml-2e-20000814</loc> 78<loc href="http://www.w3.org/TR/1998/REC-xml-19980210"> http://www.w3.org/TR/1998/REC-xml-19980210</loc><!-- 79<loc href='http://www.w3.org/TR/PR-xml-971208'> 80http://www.w3.org/TR/PR-xml-971208</loc> 81<loc href='http://www.w3.org/TR/WD-xml-961114'> 82http://www.w3.org/TR/WD-xml-961114</loc> 83<loc href='http://www.w3.org/TR/WD-xml-lang-970331'> 84http://www.w3.org/TR/WD-xml-lang-970331</loc> 85<loc href='http://www.w3.org/TR/WD-xml-lang-970630'> 86http://www.w3.org/TR/WD-xml-lang-970630</loc> 87<loc href='http://www.w3.org/TR/WD-xml-970807'> 88http://www.w3.org/TR/WD-xml-970807</loc> 89<loc href='http://www.w3.org/TR/WD-xml-971117'> 90http://www.w3.org/TR/WD-xml-971117</loc>--> </prevlocs> 91<authlist> 92<author role="1e"><name>Tim Bray</name><affiliation>Textuality and Netscape</affiliation> 93<email href="mailto:tbray@textuality.com">tbray@textuality.com</email></author> 94<author role="1e"><name>Jean Paoli</name><affiliation>Microsoft</affiliation> 95<email href="mailto:jeanpa@microsoft.com">jeanpa@microsoft.com</email></author> 96<author role="1e" diff="chg"><name>C. M. Sperberg-McQueen</name><affiliation>University 97of Illinois at Chicago and Text Encoding Initiative</affiliation><email href="mailto:cmsmcq@uic.edu">cmsmcq@uic.edu</email> 98</author> 99<author role="2e" diff="add"><name>Eve Maler</name><affiliation>Sun Microsystems, 100Inc.</affiliation><email href="mailto:elm@east.sun.com">eve.maler@east.sun.com</email> 101</author> 102</authlist> 103<abstract> 104<p>The Extensible Markup Language (XML) is a subset of SGML that is completely 105described in this document. Its goal is to enable generic SGML to be served, 106received, and processed on the Web in the way that is now possible with HTML. 107XML has been designed for ease of implementation and for interoperability 108with both SGML and HTML.</p> 109</abstract> 110<status> 111<p>This document has been reviewed by W3C Members and other interested parties 112and has been endorsed by the Director as a W3C Recommendation. It is a stable 113document and may be used as reference material or cited as a normative reference 114from another document. W3C's role in making the Recommendation is to draw 115attention to the specification and to promote its widespread deployment. This 116enhances the functionality and interoperability of the Web.</p> 117<p>This document specifies a syntax created by subsetting an existing, widely 118used international text processing standard (Standard Generalized Markup Language, 119ISO 8879:1986(E) as amended and corrected) for use on the World Wide Web. 120It is a product of the W3C XML Activity, details of which can be found at <loc 121href="http://www.w3.org/XML/">http://www.w3.org/XML</loc>. <phrase diff="add"><loc 122role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E100">[E100]</loc> 123The English version of this specification is the only normative version. However, 124for translations of this document, see <loc href="http://www.w3.org/XML/#trans">http://www.w3.org/XML/#trans</loc>. </phrase>A 125list of current W3C Recommendations and other technical documents can be found 126at <loc href="http://www.w3.org/TR/">http://www.w3.org/TR</loc>.</p> 127<p diff="del"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E66">[E66]</loc>This 128specification uses the term URI, which is defined by <bibref ref="Berners-Lee"/>, 129a work in progress expected to update <bibref ref="RFC1738"/> and <bibref 130ref="RFC1808"/>.</p> 131<p diff="add">This second edition is <emph>not</emph> a new version of XML (first published 10 February 1998); 132it merely incorporates the changes dictated by the first-edition errata (available 133at <loc href="http://www.w3.org/XML/xml-19980210-errata">http://www.w3.org/XML/xml-19980210-errata</loc>) 134as a convenience to readers. The errata list for this second edition is available 135at <loc href="http://www.w3.org/XML/xml-V10-2e-errata">http://www.w3.org/XML/xml-V10-2e-errata</loc>.</p> 136<p>Please report errors in this document to <loc href="mailto:xml-editor@w3.org">xml-editor@w3.org</loc><phrase 137diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E101">[E101]</loc>; <loc 138href="http://lists.w3.org/Archives/Public/xml-editor">archives</loc> are available</phrase>.</p> 139<note diff="add"> 140<p>C. M. Sperberg-McQueen's affiliation has changed since the publication 141of the first edition. He is now at the World Wide Web Consortium, and can 142be contacted at <loc href="mailto:cmsmcq@w3.org">cmsmcq@w3.org</loc>.</p> 143</note> 144</status> 145<pubstmt> 146<p>Chicago, Vancouver, Mountain View, et al.: World-Wide Web Consortium, XML 147Working Group, 1996, 1997, 2000.</p> 148</pubstmt> 149<sourcedesc> 150<p>Created in electronic form.</p> 151</sourcedesc> 152<langusage> 153<language id="EN">English</language> 154<language id="ebnf">Extended Backus-Naur Form (formal grammar)</language> 155</langusage> 156<revisiondesc> 157<slist> 158<sitem>1997-12-03 : CMSMcQ : yet further changes</sitem> 159<sitem>1997-12-02 : TB : further changes (see TB to XML WG, 2 December 1997)</sitem> 160<sitem>1997-12-02 : CMSMcQ : deal with as many corrections and comments from 161the proofreaders as possible: entify hard-coded document date in pubdate element, 162change expansion of entity WebSGML, update status description as per Dan Connolly 163(am not sure about refernece to Berners-Lee et al.), add 'The' to abstract 164as per WG decision, move Relationship to Existing Standards to back matter 165and combine with References, re-order back matter so normative appendices 166come first, re-tag back matter so informative appendices are tagged informdiv1, 167remove XXX XXX from list of 'normative' specs in prose, move some references 168from Other References to Normative References, add RFC 1738, 1808, and 2141 169to Other References (they are not normative since we do not require the processor 170to enforce any rules based on them), add reference to 'Fielding draft' (Berners-Lee 171et al.), move notation section to end of body, drop URIchar non-terminal and 172use SkipLit instead, lose stray reference to defunct nonterminal 'markupdecls', 173move reference to Aho et al. into appendix (Tim's right), add prose note saying 174that hash marks and fragment identifiers are NOT part of the URI formally 175speaking, and are NOT legal in system identifiers (processor 'may' signal 176an error). Work through: Tim Bray reacting to James Clark, Tim Bray on his 177own, Eve Maler, NOT DONE YET: change binary / text to unparsed / parsed. handle 178James's suggestion about < in attriubte values uppercase hex characters, 179namechar list, </sitem> 180<sitem>1997-12-01 : JB : add some column-width parameters</sitem> 181<sitem>1997-12-01 : CMSMcQ : begin round of changes to incorporate recent 182WG decisions and other corrections: binding sources of character encoding 183info (27 Aug / 3 Sept), correct wording of Faust quotation (restore dropped 184line), drop SDD from EncodingDecl, change text at version number 1.0, drop 185misleading (wrong!) sentence about ignorables and extenders, modify definition 186of PCData to make bar on msc grammatical, change grammar's handling of internal 187subset (drop non-terminal markupdecls), change definition of includeSect to 188allow conditional sections, add integral-declaration constraint on internal 189subset, drop misleading / dangerous sentence about relationship of entities 190with system storage objects, change table body tag to htbody as per EM change 191to DTD, add rule about space normalization in public identifiers, add description 192of how to generate our name-space rules from Unicode character database (needs 193further work!). </sitem> 194<sitem>1997-10-08 : TB : Removed %-constructs again, new rules for PE appearance.</sitem> 195<sitem>1997-10-01 : TB : Case-sensitive markup; cleaned up element-type defs, 196lotsa little edits for style</sitem> 197<sitem>1997-09-25 : TB : Change to elm's new DTD, with substantial detail 198cleanup as a side-effect</sitem> 199<sitem>1997-07-24 : CMSMcQ : correct error (lost *) in definition of ignoreSectContents 200(thanks to Makoto Murata)</sitem> 201<sitem>Allow all empty elements to have end-tags, consistent with SGML TC 202(as per JJC).</sitem> 203<sitem>1997-07-23 : CMSMcQ : pre-emptive strike on pending corrections: introduce 204the term 'empty-element tag', note that all empty elements may use it, and 205elements declared EMPTY must use it. Add WFC requiring encoding decl to come 206first in an entity. Redefine notations to point to PIs as well as binary entities. 207Change autodetection table by removing bytes 3 and 4 from examples with Byte 208Order Mark. Add content model as a term and clarify that it applies to both 209mixed and element content. </sitem> 210<sitem>1997-06-30 : CMSMcQ : change date, some cosmetic changes, changes to 211productions for choice, seq, Mixed, NotationType, Enumeration. Follow James 212Clark's suggestion and prohibit conditional sections in internal subset. TO 213DO: simplify production for ignored sections as a result, since we don't need 214to worry about parsers which don't expand PErefs finding a conditional section.</sitem> 215<sitem>1997-06-29 : TB : various edits</sitem> 216<sitem>1997-06-29 : CMSMcQ : further changes: Suppress old FINAL EDIT comments 217and some dead material. Revise occurrences of % in grammar to exploit Henry 218Thompson's pun, especially markupdecl and attdef. Remove RMD requirement relating 219to element content (?). </sitem> 220<sitem>1997-06-28 : CMSMcQ : Various changes for 1 July draft: Add text for 221draconian error handling (introduce the term Fatal Error). RE deleta est (changing 222wording from original announcement to restrict the requirement to validating 223parsers). Tag definition of validating processor and link to it. Add colon 224as name character. Change def of %operator. Change standard definitions of 225lt, gt, amp. Strip leading zeros from #x00nn forms.</sitem> 226<sitem>1997-04-02 : CMSMcQ : final corrections of editorial errors found in 227last night's proofreading. Reverse course once more on well-formed: Webster's 228Second hyphenates it, and that's enough for me.</sitem> 229<sitem>1997-04-01 : CMSMcQ : corrections from JJC, EM, HT, and self</sitem> 230<sitem>1997-03-31 : Tim Bray : many changes</sitem> 231<sitem>1997-03-29 : CMSMcQ : some Henry Thompson (on entity handling), some 232Charles Goldfarb, some ERB decisions (PE handling in miscellaneous declarations. 233Changed Ident element to accept def attribute. Allow normalization of Unicode 234characters. move def of systemliteral into section on literals.</sitem> 235<sitem>1997-03-28 : CMSMcQ : make as many corrections as possible, from Terry 236Allen, Norbert Mikula, James Clark, Jon Bosak, Henry Thompson, Paul Grosso, 237and self. Among other things: give in on "well formed" (Terry is right), tentatively 238rename QuotedCData as AttValue and Literal as EntityValue to be more informative, 239since attribute values are the <emph>only</emph> place QuotedCData was used, 240and vice versa for entity text and Literal. (I'd call it Entity Text, but 2418879 uses that name for both internal and external entities.)</sitem> 242<sitem>1997-03-26 : CMSMcQ : resynch the two forks of this draft, reapply 243my changes dated 03-20 and 03-21. Normalize old 'may not' to 'must not' except 244in the one case where it meant 'may or may not'.</sitem> 245<sitem>1997-03-21 : TB : massive changes on plane flight from Chicago to Vancouver</sitem> 246<sitem>1997-03-21 : CMSMcQ : correct as many reported errors as possible. </sitem> 247<sitem>1997-03-20 : CMSMcQ : correct typos listed in CMSMcQ hand copy of spec.</sitem> 248<sitem>1997-03-20 : CMSMcQ : cosmetic changes preparatory to revision for 249WWW conference April 1997: restore some of the internal entity references 250(e.g. to docdate, etc.), change character xA0 to &nbsp; and define nbsp 251as &#160;, and refill a lot of paragraphs for legibility.</sitem> 252<sitem>1996-11-12 : CMSMcQ : revise using Tim's edits: Add list type of NUMBERED 253and change most lists either to BULLETS or to NUMBERED. Suppress QuotedNames, 254Names (not used). Correct trivial-grammar doc type decl. Rename 'marked section' 255as 'CDATA section' passim. Also edits from James Clark: Define the set of 256characters from which [^abc] subtracts. Charref should use just [0-9] not 257Digit. Location info needs cleaner treatment: remove? (ERB question). One 258example of a PI has wrong pic. Clarify discussion of encoding names. Encoding 259failure should lead to unspecified results; don't prescribe error recovery. 260Don't require exposure of entity boundaries. Ignore white space in element 261content. Reserve entity names of the form u-NNNN. Clarify relative URLs. And 262some of my own: Correct productions for content model: model cannot consist 263of a name, so "elements ::= cp" is no good. </sitem> 264<sitem>1996-11-11 : CMSMcQ : revise for style. Add new rhs to entity declaration, 265for parameter entities.</sitem> 266<sitem>1996-11-10 : CMSMcQ : revise for style. Fix / complete section on names, 267characters. Add sections on parameter entities, conditional sections. Still 268to do: Add compatibility note on deterministic content models. Finish stylistic 269revision.</sitem> 270<sitem>1996-10-31 : TB : Add Entity Handling section</sitem> 271<sitem>1996-10-30 : TB : Clean up term & termdef. Slip in ERB decision 272re EMPTY.</sitem> 273<sitem>1996-10-28 : TB : Change DTD. Implement some of Michael's suggestions. 274Change comments back to //. Introduce language for XML namespace reservation. 275Add section on white-space handling. Lots more cleanup.</sitem> 276<sitem>1996-10-24 : CMSMcQ : quick tweaks, implement some ERB decisions. Characters 277are not integers. Comments are /* */ not //. Add bibliographic refs to 10646, 278HyTime, Unicode. Rename old Cdata as MsData since it's <emph>only</emph> seen 279in marked sections. Call them attribute-value pairs not name-value pairs, 280except once. Internal subset is optional, needs '?'. Implied attributes should 281be signaled to the app, not have values supplied by processor.</sitem> 282<sitem>1996-10-16 : TB : track down & excise all DSD references; introduce 283some EBNF for entity declarations.</sitem> 284<sitem>1996-10-?? : TB : consistency check, fix up scraps so they all parse, 285get formatter working, correct a few productions.</sitem> 286<sitem>1996-10-10/11 : CMSMcQ : various maintenance, stylistic, and organizational 287changes: Replace a few literals with xmlpio and pic entities, to make them 288consistent and ensure we can change pic reliably when the ERB votes. Drop 289paragraph on recognizers from notation section. Add match, exact match to 290terminology. Move old 2.2 XML Processors and Apps into intro. Mention comments, 291PIs, and marked sections in discussion of delimiter escaping. Streamline discussion 292of doctype decl syntax. Drop old section of 'PI syntax' for doctype decl, 293and add section on partial-DTD summary PIs to end of Logical Structures section. 294Revise DSD syntax section to use Tim's subset-in-a-PI mechanism.</sitem> 295<sitem>1996-10-10 : TB : eliminate name recognizers (and more?)</sitem> 296<sitem>1996-10-09 : CMSMcQ : revise for style, consistency through 2.3 (Characters)</sitem> 297<sitem>1996-10-09 : CMSMcQ : re-unite everything for convenience, at least 298temporarily, and revise quickly</sitem> 299<sitem>1996-10-08 : TB : first major homogenization pass</sitem> 300<sitem>1996-10-08 : TB : turn "current" attribute on div type into CDATA</sitem> 301<sitem>1996-10-02 : TB : remould into skeleton + entities</sitem> 302<sitem>1996-09-30 : CMSMcQ : add a few more sections prior to exchange with 303Tim.</sitem> 304<sitem>1996-09-20 : CMSMcQ : finish transcribing notes.</sitem> 305<sitem>1996-09-19 : CMSMcQ : begin transcribing notes for draft.</sitem> 306<sitem>1996-09-13 : CMSMcQ : made outline from notes of 09-06, do some housekeeping</sitem> 307</slist> 308</revisiondesc> 309</header> 310<body> 311<div1 id="sec-intro"> 312<head>Introduction</head> 313<p>Extensible Markup Language, abbreviated XML, describes a class of data 314objects called <termref def="dt-xml-doc">XML documents</termref> and partially 315describes the behavior of computer programs which process them. XML is an 316application profile or restricted form of SGML, the Standard Generalized Markup 317Language <bibref ref="ISO8879"/>. By construction, XML documents are conforming 318SGML documents.</p> 319<p>XML documents are made up of storage units called <termref def="dt-entity">entities</termref>, 320which contain either parsed or unparsed data. Parsed data is made up of <termref 321def="dt-character">characters</termref>, some of which form <termref def="dt-chardata">character 322data</termref>, and some of which form <termref def="dt-markup">markup</termref>. 323Markup encodes a description of the document's storage layout and logical 324structure. XML provides a mechanism to impose constraints on the storage layout 325and logical structure.</p> 326<p><termdef id="dt-xml-proc" term="XML Processor">A software module called 327an <term>XML processor</term> is used to read XML documents and provide access 328to their content and structure.</termdef> <termdef id="dt-app" term="Application">It 329is assumed that an XML processor is doing its work on behalf of another module, 330called the <term>application</term>.</termdef> This specification describes 331the required behavior of an XML processor in terms of how it must read XML 332data and the information it must provide to the application.</p> 333<div2 id="sec-origin-goals"> 334<head>Origin and Goals</head> 335<p>XML was developed by an XML Working Group (originally known as the SGML 336Editorial Review Board) formed under the auspices of the World Wide Web Consortium 337(W3C) in 1996. It was chaired by Jon Bosak of Sun Microsystems with the active 338participation of an XML Special Interest Group (previously known as the SGML 339Working Group) also organized by the W3C. The membership of the XML Working 340Group is given in an appendix. Dan Connolly served as the WG's contact with 341the W3C.</p> 342<p>The design goals for XML are:</p> 343<olist> 344<item><p>XML shall be straightforwardly usable over the Internet.</p></item> 345<item><p>XML shall support a wide variety of applications.</p></item> 346<item><p>XML shall be compatible with SGML.</p></item> 347<item><p>It shall be easy to write programs which process XML documents.</p> 348</item> 349<item><p>The number of optional features in XML is to be kept to the absolute 350minimum, ideally zero.</p></item> 351<item><p>XML documents should be human-legible and reasonably clear.</p></item> 352<item><p>The XML design should be prepared quickly.</p></item> 353<item><p>The design of XML shall be formal and concise.</p></item> 354<item><p>XML documents shall be easy to create.</p></item> 355<item><p>Terseness in XML markup is of minimal importance.</p></item> 356</olist> 357<p>This specification, together with associated standards (Unicode and ISO/IEC 35810646 for characters, Internet RFC 1766 for language identification tags, 359ISO 639 for language name codes, and ISO 3166 for country name codes), provides 360all the information necessary to understand XML Version &versionOfXML; and 361construct computer programs to process it.</p> 362<p>This version of the XML specification <!-- is for &doc.audience;.--> &doc.distribution;.</p> 363</div2> 364<div2 id="sec-terminology"> 365<head>Terminology</head> 366<p>The terminology used to describe XML documents is defined in the body of 367this specification. The terms defined in the following list are used in building 368those definitions and in describing the actions of an XML processor: <glist> 369<gitem><label>may</label> 370<def> 371<p><termdef id="dt-may" term="May">Conforming documents and XML processors 372are permitted to but need not behave as described.</termdef></p> 373</def></gitem> 374<gitem><label>must</label> 375<def> 376<p><termdef id="dt-must" term="Must">Conforming documents and XML processors 377are required to behave as described; otherwise they are in error. <!-- do NOT change this! this is what defines a violation of 378a 'must' clause as 'an error'. -MSM --></termdef></p> 379</def></gitem> 380<gitem><label>error</label> 381<def> 382<p><termdef id="dt-error" term="Error">A violation of the rules of this specification; 383results are undefined. Conforming software may detect and report an error 384and may recover from it.</termdef></p> 385</def></gitem> 386<gitem><label>fatal error</label> 387<def> 388<p><termdef id="dt-fatal" term="Fatal Error">An error which a conforming <termref 389def="dt-xml-proc">XML processor</termref> must detect and report to the application. 390After encountering a fatal error, the processor may continue processing the 391data to search for further errors and may report such errors to the application. 392In order to support correction of errors, the processor may make unprocessed 393data from the document (with intermingled character data and markup) available 394to the application. Once a fatal error is detected, however, the processor 395must not continue normal processing (i.e., it must not continue to pass character 396data and information about the document's logical structure to the application 397in the normal way).</termdef></p> 398</def></gitem> 399<gitem><label>at user option</label> 400<def> 401<p><termdef id="dt-atuseroption" term="At user option">Conforming software 402may or must (depending on the modal verb in the sentence) behave as described; 403if it does, it must provide users a means to enable or disable the behavior 404described.</termdef></p> 405</def></gitem> 406<gitem><label>validity constraint</label> 407<def> 408<p><termdef id="dt-vc" term="Validity constraint">A rule which applies to 409all <termref def="dt-valid">valid</termref> XML documents. Violations of validity 410constraints are errors; they must, at user option, be reported by <termref 411def="dt-validating">validating XML processors</termref>.</termdef></p> 412</def></gitem> 413<gitem><label>well-formedness constraint</label> 414<def> 415<p><termdef id="dt-wfc" term="Well-formedness constraint">A rule which applies 416to all <termref def="dt-wellformed">well-formed</termref> XML documents. Violations 417of well-formedness constraints are <termref def="dt-fatal">fatal errors</termref>.</termdef></p> 418</def></gitem> 419<gitem><label>match</label> 420<def> 421<p><termdef id="dt-match" term="match">(Of strings or names:) Two strings 422or names being compared must be identical. Characters with multiple possible 423representations in ISO/IEC 10646 (e.g. characters with both precomposed and 424base+diacritic forms) match only if they have the same representation in both 425strings. <phrase diff="del"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E85">[E85]</loc>At 426user option, processors may normalize such characters to some canonical form. </phrase>No 427case folding is performed. (Of strings and rules in the grammar:) A string 428matches a grammatical production if it belongs to the language generated by 429that production. (Of content and content models:) An element matches its declaration 430when it conforms in the fashion described in the constraint <specref ref="elementvalid"/>.</termdef></p> 431</def></gitem> 432<gitem><label>for compatibility</label> 433<def> 434<p><termdef id="dt-compat" term="For Compatibility"><phrase diff="add"><loc 435role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E87">[E87]</loc>Marks 436a sentence describing</phrase> a feature of XML included solely to ensure 437that XML remains compatible with SGML.</termdef></p> 438</def></gitem> 439<gitem><label>for interoperability</label> 440<def> 441<p><termdef id="dt-interop" term="For interoperability"><phrase diff="add"><loc 442role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E87">[E87]</loc>Marks 443a sentence describing</phrase> a non-binding recommendation included to increase 444the chances that XML documents can be processed by the existing installed 445base of SGML processors which predate the &WebSGML;.</termdef></p> 446</def></gitem> 447</glist></p> 448</div2> 449</div1> 450<!-- &Docs; --> 451<div1 id="sec-documents"> 452<head>Documents</head> 453<p><termdef id="dt-xml-doc" term="XML Document"> A data object is an <term>XML 454document</term> if it is <termref def="dt-wellformed">well-formed</termref>, 455as defined in this specification. A well-formed XML document may in addition 456be <termref def="dt-valid">valid</termref> if it meets certain further constraints.</termdef></p> 457<p>Each XML document has both a logical and a physical structure. Physically, 458the document is composed of units called <termref def="dt-entity">entities</termref>. 459An entity may <termref def="dt-entref">refer</termref> to other entities to 460cause their inclusion in the document. A document begins in a <quote>root</quote> 461or <termref def="dt-docent">document entity</termref>. Logically, the document 462is composed of declarations, elements, comments, character references, and 463processing instructions, all of which are indicated in the document by explicit 464markup. The logical and physical structures must nest properly, as described 465in <specref ref="wf-entities"/>.</p> 466<div2 id="sec-well-formed"> 467<head>Well-Formed XML Documents</head> 468<p><termdef id="dt-wellformed" term="Well-Formed"> A textual object is a <term>well-formed</term> 469XML document if:</termdef></p> 470<olist> 471<item><p>Taken as a whole, it matches the production labeled <nt def="NT-document">document</nt>.</p> 472</item> 473<item><p>It meets all the well-formedness constraints given in this specification.</p> 474</item> 475<item><p>Each of the <termref def="dt-parsedent">parsed entities</termref> 476which is referenced directly or indirectly within the document is <termref 477def="dt-wellformed">well-formed</termref>.</p></item> 478</olist> 479<scrap id="document" lang="ebnf"> 480<head>Document</head> 481<prod id="NT-document"> 482<lhs>document</lhs><rhs><nt def="NT-prolog">prolog</nt> <nt def="NT-element">element</nt> <nt 483def="NT-Misc">Misc</nt>*</rhs> 484</prod> 485</scrap> 486<p>Matching the <nt def="NT-document">document</nt> production implies that:</p> 487<olist> 488<item><p>It contains one or more <termref def="dt-element">elements</termref>.</p> 489</item> 490<!--* N.B. some readers (notably JC) find the following 491paragraph awkward and redundant. I agree it's logically redundant: 492it *says* it is summarizing the logical implications of 493matching the grammar, and that means by definition it's 494logically redundant. I don't think it's rhetorically 495redundant or unnecessary, though, so I'm keeping it. It 496could however use some recasting when the editors are feeling 497stronger. -MSM *--> 498<item><p><termdef id="dt-root" term="Root Element">There is exactly one element, 499called the <term>root</term>, or document element, no part of which appears 500in the <termref def="dt-content">content</termref> of any other element.</termdef> <phrase 501diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E17">[E17]</loc>For 502all other elements, if the <termref def="dt-stag">start-tag</termref> is in 503the content of another element, the <termref def="dt-etag">end-tag</termref> 504is in the content of the same element.</phrase> More simply stated, the elements, 505delimited by start- and end-tags, nest properly within each other.</p></item> 506</olist> 507<p><termdef id="dt-parentchild" term="Parent/Child">As a consequence of this, 508for each non-root element <el>C</el> in the document, there is one other element <el>P</el> 509in the document such that <el>C</el> is in the content of <el>P</el>, but 510is not in the content of any other element that is in the content of <el>P</el>. <el>P</el> 511is referred to as the <term>parent</term> of <el>C</el>, and <el>C</el> as 512a <term>child</term> of <el>P</el>.</termdef></p> 513</div2> 514<div2 id="charsets"> 515<head>Characters</head> 516<p><termdef id="dt-text" term="Text">A parsed entity contains <term>text</term>, 517a sequence of <termref def="dt-character">characters</termref>, which may 518represent markup or character data.</termdef> <termdef id="dt-character" term="Character">A <term>character</term> 519is an atomic unit of text as specified by ISO/IEC 10646 <bibref ref="ISO10646"/> <phrase 520diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc>(see 521also <bibref ref="ISO10646-2000"/>)</phrase>. Legal characters are tab, carriage 522return, line feed, and the legal <phrase diff="del"><loc role="erratumref" 523href="http://www.w3.org/XML/xml-19980210-errata#E35">[E35]</loc>graphic </phrase>characters 524of Unicode and ISO/IEC 10646. <phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E69">[E69]</loc>The 525versions of these standards cited in <specref ref="sec-existing-stds"/> were 526current at the time this document was prepared. New characters may be added 527to these standards by amendments or new editions. Consequently, XML processors 528must accept any character in the range specified for <nt def="NT-Char">Char</nt>.</phrase> 529The use of <quote>compatibility characters</quote>, as defined in section 5306.8 of <bibref ref="Unicode"/> <phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc>(see 531also D21 in section 3.6 of <bibref ref="Unicode3"/>)</phrase>, is discouraged.</termdef></p> 532<scrap id="char32" lang="ebnf"> 533<head>Character Range</head> 534<prodgroup pcw2="4" pcw4="17.5" pcw5="11"> 535<prod id="NT-Char"> 536<lhs>Char</lhs><rhs>#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]</rhs> 537<com>any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.</com> 538</prod> 539</prodgroup></scrap> 540<p>The mechanism for encoding character code points into bit patterns may 541vary from entity to entity. All XML processors must accept the UTF-8 and UTF-16 542encodings of 10646; the mechanisms for signaling which of the two is in use, 543or for bringing other encodings into play, are discussed later, in <specref 544ref="charencoding"/>.</p> 545<!-- 546<p>Regardless of the specific encoding used, any character in 547the ISO/IEC 10646 character set may be referred to by the decimal 548or hexadecimal equivalent of its UCS-4 code value. 549</p>--> 550</div2> 551<div2 id="sec-common-syn"> 552<head>Common Syntactic Constructs</head> 553<p>This section defines some symbols used widely in the grammar.</p> 554<p><nt def="NT-S">S</nt> (white space) consists of one or more space (#x20) 555characters, carriage returns, line feeds, or tabs.</p> 556<scrap id="white" lang="ebnf"> 557<head>White Space</head> 558<prodgroup pcw2="4" pcw4="17.5" pcw5="11"> 559<prod id="NT-S"> 560<lhs>S</lhs><rhs>(#x20 | #x9 | #xD | #xA)+</rhs> 561</prod> 562</prodgroup></scrap> 563<p>Characters are classified for convenience as letters, digits, or other 564characters. <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E30">[E30]</loc>A 565letter consists of an alphabetic or syllabic base character or an ideographic 566character.</phrase> Full definitions of the specific characters in each class 567are given in <specref ref="CharClasses"/>.</p> 568<p><termdef id="dt-name" term="Name">A <term>Name</term> is a token beginning 569with a letter or one of a few punctuation characters, and continuing with 570letters, digits, hyphens, underscores, colons, or full stops, together known 571as name characters.</termdef> Names beginning with the string <quote><code>xml</code></quote>, 572or any string which would match <code>(('X'|'x') ('M'|'m') ('L'|'l'))</code>, 573are reserved for standardization in this or future versions of this specification.</p> 574<note> 575<p diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E98">[E98]</loc>The 576Namespaces in XML Recommendation <bibref ref="xml-names"/> assigns a meaning 577to names containing colon characters. Therefore, authors should not use the 578colon in XML names except for namespace purposes, but XML processors must 579accept the colon as a name character.</p> 580</note> 581<p>An <nt def="NT-Nmtoken">Nmtoken</nt> (name token) is any mixture of name 582characters.</p> 583<scrap lang="ebnf"> 584<head>Names and Tokens</head> 585<prod id="NT-NameChar"> 586<lhs>NameChar</lhs><rhs><nt def="NT-Letter">Letter</nt> | <nt def="NT-Digit">Digit</nt> 587| '.' | '-' | '_' | ':' | <nt def="NT-CombiningChar">CombiningChar</nt> | <nt 588def="NT-Extender">Extender</nt></rhs> 589</prod> 590<prod id="NT-Name"> 591<lhs>Name</lhs><rhs>(<nt def="NT-Letter">Letter</nt> | '_' | ':') (<nt def="NT-NameChar">NameChar</nt>)*</rhs> 592</prod> 593<prod id="NT-Names"> 594<lhs>Names</lhs><rhs><nt def="NT-Name">Name</nt> (<nt def="NT-S">S</nt> <nt 595def="NT-Name">Name</nt>)*</rhs> 596</prod> 597<prod id="NT-Nmtoken"> 598<lhs>Nmtoken</lhs><rhs>(<nt def="NT-NameChar">NameChar</nt>)+</rhs> 599</prod> 600<prod id="NT-Nmtokens"> 601<lhs>Nmtokens</lhs><rhs><nt def="NT-Nmtoken">Nmtoken</nt> (<nt def="NT-S">S</nt> <nt 602def="NT-Nmtoken">Nmtoken</nt>)*</rhs> 603</prod> 604</scrap> 605<p>Literal data is any quoted string not containing the quotation mark used 606as a delimiter for that string. Literals are used for specifying the content 607of internal entities (<nt def="NT-EntityValue">EntityValue</nt>), the values 608of attributes (<nt def="NT-AttValue">AttValue</nt>), and external identifiers 609(<nt def="NT-SystemLiteral">SystemLiteral</nt>). Note that a <nt def="NT-SystemLiteral">SystemLiteral</nt> 610can be parsed without scanning for markup.</p> 611<scrap lang="ebnf"> 612<head>Literals</head> 613<prod id="NT-EntityValue"> 614<lhs>EntityValue</lhs><rhs>'"' ([^%&"] | <nt def="NT-PEReference">PEReference</nt> 615| <nt def="NT-Reference">Reference</nt>)* '"' </rhs> 616<rhs>| "'" ([^%&'] | <nt def="NT-PEReference">PEReference</nt> | <nt 617def="NT-Reference">Reference</nt>)* "'"</rhs> 618</prod> 619<prod id="NT-AttValue"> 620<lhs>AttValue</lhs><rhs>'"' ([^<&"] | <nt def="NT-Reference">Reference</nt>)* 621'"' </rhs> 622<rhs>| "'" ([^<&'] | <nt def="NT-Reference">Reference</nt>)* 623"'"</rhs> 624</prod> 625<prod id="NT-SystemLiteral"> 626<lhs>SystemLiteral</lhs><rhs>('"' [^"]* '"') | ("'" [^']* "'") </rhs> 627</prod> 628<prod id="NT-PubidLiteral"> 629<lhs>PubidLiteral</lhs><rhs>'"' <nt def="NT-PubidChar">PubidChar</nt>* '"' 630| "'" (<nt def="NT-PubidChar">PubidChar</nt> - "'")* "'"</rhs> 631</prod> 632<prod id="NT-PubidChar"> 633<lhs>PubidChar</lhs><rhs>#x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]</rhs> 634</prod> 635</scrap> 636<note diff="add"> 637<p><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E72">[E72]</loc>Although 638the <nt def="NT-EntityValue">EntityValue</nt> production allows the definition 639of an entity consisting of a single explicit <code><</code> in the literal 640(e.g., <code><!ENTITY mylt "<"></code>), it is strongly advised to avoid 641this practice since any reference to that entity will cause a well-formedness 642error.</p> 643</note> 644</div2> 645<div2 id="syntax"> 646<head>Character Data and Markup</head> 647<p><termref def="dt-text">Text</termref> consists of intermingled <termref 648def="dt-chardata">character data</termref> and markup. <termdef id="dt-markup" 649term="Markup"><term>Markup</term> takes the form of <termref def="dt-stag">start-tags</termref>, <termref 650def="dt-etag">end-tags</termref>, <termref def="dt-empty">empty-element tags</termref>, <termref 651def="dt-entref">entity references</termref>, <termref def="dt-charref">character 652references</termref>, <termref def="dt-comment">comments</termref>, <termref 653def="dt-cdsection">CDATA section</termref> delimiters, <termref def="dt-doctype">document 654type declarations</termref>, <termref def="dt-pi">processing instructions</termref>, <phrase 655diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E89">[E89]</loc><nt 656def="NT-XMLDecl">XML declarations</nt>, <nt def="NT-TextDecl">text declarations</nt>, 657and any white space that is at the top level of the document entity (that 658is, outside the document element and not inside any other markup).</phrase></termdef></p> 659<p><termdef id="dt-chardata" term="Character Data">All text that is not markup 660constitutes the <term>character data</term> of the document.</termdef></p> 661<p>The ampersand character (&) and the left angle bracket (<) may appear 662in their literal form <emph>only</emph> when used as markup delimiters, or 663within a <termref def="dt-comment">comment</termref>, a <termref def="dt-pi">processing 664instruction</termref>, or a <termref def="dt-cdsection">CDATA section</termref>.<phrase 665diff="del"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E18">[E18]</loc>They 666are also legal within the <termref def="dt-litentval">literal entity value</termref> 667of an internal entity declaration; see <specref ref="wf-entities"/>.</phrase> <!-- FINAL EDIT: restore internal entity decl or leave it out. --> 668If they are needed elsewhere, they must be <termref def="dt-escape">escaped</termref> 669using either <termref def="dt-charref">numeric character references</termref> 670or the strings <quote><code>&amp;</code></quote> and <quote><code>&lt;</code></quote> 671respectively. The right angle bracket (>) may be represented using the string <quote><code>&gt;</code></quote>, 672and must, <termref def="dt-compat">for compatibility</termref>, be escaped 673using <quote><code>&gt;</code></quote> or a character reference when it 674appears in the string <quote><code>]]></code></quote> in content, when 675that string is not marking the end of a <termref def="dt-cdsection">CDATA 676section</termref>.</p> 677<p>In the content of elements, character data is any string of characters 678which does not contain the start-delimiter of any markup. In a CDATA section, 679character data is any string of characters not including the CDATA-section-close 680delimiter, <quote><code>]]></code></quote>.</p> 681<p>To allow attribute values to contain both single and double quotes, the 682apostrophe or single-quote character (') may be represented as <quote><code>&apos;</code></quote>, 683and the double-quote character (") as <quote><code>&quot;</code></quote>.</p> 684<scrap lang="ebnf"> 685<head>Character Data</head> 686<prod id="NT-CharData"> 687<lhs>CharData</lhs><rhs>[^<&]* - ([^<&]* ']]>' [^<&]*)</rhs> 688</prod> 689</scrap> 690</div2> 691<div2 id="sec-comments"> 692<head>Comments</head> 693<p><termdef id="dt-comment" term="Comment"><term>Comments</term> may appear 694anywhere in a document outside other <termref def="dt-markup">markup</termref>; 695in addition, they may appear within the document type declaration at places 696allowed by the grammar. They are not part of the document's <termref def="dt-chardata">character 697data</termref>; an XML processor may, but need not, make it possible for an 698application to retrieve the text of comments. <termref def="dt-compat">For 699compatibility</termref>, the string <quote><code>--</code></quote> (double-hyphen) 700must not occur within comments.</termdef> <phrase diff="add"><loc role="erratumref" 701href="http://www.w3.org/XML/xml-19980210-errata#E63">[E63]</loc>Parameter 702entity references are not recognized within comments.</phrase></p> 703<scrap lang="ebnf"> 704<head>Comments</head> 705<prod id="NT-Comment"> 706<lhs>Comment</lhs><rhs>'<!--' ((<nt def="NT-Char">Char</nt> - '-') | ('-' 707(<nt def="NT-Char">Char</nt> - '-')))* '-->'</rhs> 708</prod> 709</scrap> 710<p>An example of a comment:</p> 711<eg><!&como; declarations for <head> & <body> &comc;></eg> 712<p diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E27">[E27]</loc>Note 713that the grammar does not allow a comment ending in <code>---></code>. The 714following example is <emph>not</emph> well-formed.</p> 715<eg diff="add"><!-- B+, B, or B---></eg> 716</div2> 717<div2 id="sec-pi"> 718<head>Processing Instructions</head> 719<p><termdef id="dt-pi" term="Processing instruction"><term>Processing instructions</term> 720(PIs) allow documents to contain instructions for applications.</termdef></p> 721<scrap lang="ebnf"> 722<head>Processing Instructions</head> 723<prod id="NT-PI"> 724<lhs>PI</lhs><rhs>'<?' <nt def="NT-PITarget">PITarget</nt> (<nt def="NT-S">S</nt> 725(<nt def="NT-Char">Char</nt>* - (<nt def="NT-Char">Char</nt>* &pic; <nt def="NT-Char">Char</nt>*)))? &pic;</rhs> 726</prod> 727<prod id="NT-PITarget"> 728<lhs>PITarget</lhs><rhs><nt def="NT-Name">Name</nt> - (('X' | 'x') ('M' | 729'm') ('L' | 'l'))</rhs> 730</prod> 731</scrap> 732<p>PIs are not part of the document's <termref def="dt-chardata">character 733data</termref>, but must be passed through to the application. The PI begins 734with a target (<nt def="NT-PITarget">PITarget</nt>) used to identify the application 735to which the instruction is directed. The target names <quote><code>XML</code></quote>, <quote><code>xml</code></quote>, 736and so on are reserved for standardization in this or future versions of this 737specification. The XML <termref def="dt-notation">Notation</termref> mechanism 738may be used for formal declaration of PI targets. <phrase diff="add"><loc 739role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E63">[E63]</loc>Parameter 740entity references are not recognized within processing instructions.</phrase></p> 741</div2> 742<div2 id="sec-cdata-sect"> 743<head>CDATA Sections</head> 744<p><termdef id="dt-cdsection" term="CDATA Section"><term>CDATA sections</term> 745may occur anywhere character data may occur; they are used to escape blocks 746of text containing characters which would otherwise be recognized as markup. 747CDATA sections begin with the string <quote><code><![CDATA[</code></quote> 748and end with the string <quote><code>]]></code></quote>:</termdef></p> 749<scrap lang="ebnf"> 750<head>CDATA Sections</head> 751<prod id="NT-CDSect"> 752<lhs>CDSect</lhs><rhs><nt def="NT-CDStart">CDStart</nt> <nt def="NT-CData">CData</nt> <nt 753def="NT-CDEnd">CDEnd</nt></rhs> 754</prod> 755<prod id="NT-CDStart"> 756<lhs>CDStart</lhs><rhs>'<![CDATA['</rhs> 757</prod> 758<prod id="NT-CData"> 759<lhs>CData</lhs><rhs>(<nt def="NT-Char">Char</nt>* - (<nt def="NT-Char">Char</nt>* 760']]>' <nt def="NT-Char">Char</nt>*)) </rhs> 761</prod> 762<prod id="NT-CDEnd"> 763<lhs>CDEnd</lhs><rhs>']]>'</rhs> 764</prod> 765</scrap> 766<p>Within a CDATA section, only the <nt def="NT-CDEnd">CDEnd</nt> string is 767recognized as markup, so that left angle brackets and ampersands may occur 768in their literal form; they need not (and cannot) be escaped using <quote><code>&lt;</code></quote> 769and <quote><code>&amp;</code></quote>. CDATA sections cannot nest.</p> 770<p>An example of a CDATA section, in which <quote><code><greeting></code></quote> 771and <quote><code></greeting></code></quote> are recognized as <termref 772def="dt-chardata">character data</termref>, not <termref def="dt-markup">markup</termref>:</p> 773<eg><![CDATA[<greeting>Hello, world!</greeting>]]> </eg> 774</div2> 775<div2 id="sec-prolog-dtd"> 776<head>Prolog and Document Type Declaration</head> 777<p><termdef id="dt-xmldecl" term="XML Declaration">XML documents <phrase diff="chg"><loc 778role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E107">[E107]</loc>should</phrase> 779begin with an <term>XML declaration</term> which specifies the version of 780XML being used.</termdef> For example, the following is a complete XML document, <termref 781def="dt-wellformed">well-formed</termref> but not <termref def="dt-valid">valid</termref>:</p> 782<eg><![CDATA[<?xml version="1.0"?> <greeting>Hello, world!</greeting> ]]></eg> 783<p>and so is this:</p> 784<eg><![CDATA[<greeting>Hello, world!</greeting>]]></eg> 785<p>The version number <quote><code>1.0</code></quote> should be used to indicate 786conformance to this version of this specification; it is an error for a document 787to use the value <quote><code>1.0</code></quote> if it does not conform to 788this version of this specification. It is the intent of the XML working group 789to give later versions of this specification numbers other than <quote><code>1.0</code></quote>, 790but this intent does not indicate a commitment to produce any future versions 791of XML, nor if any are produced, to use any particular numbering scheme. Since 792future versions are not ruled out, this construct is provided as a means to 793allow the possibility of automatic version recognition, should it become necessary. 794Processors may signal an error if they receive documents labeled with versions 795they do not support.</p> 796<p>The function of the markup in an XML document is to describe its storage 797and logical structure and to associate attribute-value pairs with its logical 798structures. XML provides a mechanism, the <termref def="dt-doctype">document 799type declaration</termref>, to define constraints on the logical structure 800and to support the use of predefined storage units. <termdef id="dt-valid" 801term="Validity">An XML document is <term>valid</term> if it has an associated 802document type declaration and if the document complies with the constraints 803expressed in it.</termdef></p> 804<p>The document type declaration must appear before the first <termref def="dt-element">element</termref> 805in the document.</p> 806<scrap id="xmldoc" lang="ebnf"> 807<head>Prolog</head> 808<prodgroup pcw2="6" pcw4="17.5" pcw5="9"> 809<prod id="NT-prolog"> 810<lhs>prolog</lhs><rhs><nt def="NT-XMLDecl">XMLDecl</nt>? <nt def="NT-Misc">Misc</nt>* 811(<nt def="NT-doctypedecl">doctypedecl</nt> <nt def="NT-Misc">Misc</nt>*)?</rhs> 812</prod> 813<prod id="NT-XMLDecl"> 814<lhs>XMLDecl</lhs><rhs>&pio; <nt def="NT-VersionInfo">VersionInfo</nt> <nt 815def="NT-EncodingDecl">EncodingDecl</nt>? <nt def="NT-SDDecl">SDDecl</nt>? <nt 816def="NT-S">S</nt>? &pic;</rhs> 817</prod> 818<prod id="NT-VersionInfo" diff="chg"> 819<lhs>VersionInfo</lhs><rhs><nt def="NT-S">S</nt> 'version' <nt def="NT-Eq">Eq</nt> 820("'" <nt def="NT-VersionNum">VersionNum</nt> "'" | '"' <nt def="NT-VersionNum">VersionNum</nt> 821'"')<com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E15">[E15]</loc></com></rhs> 822</prod> 823<prod id="NT-Eq"> 824<lhs>Eq</lhs><rhs><nt def="NT-S">S</nt>? '=' <nt def="NT-S">S</nt>?</rhs> 825</prod> 826<prod id="NT-VersionNum"> 827<lhs>VersionNum</lhs><rhs>([a-zA-Z0-9_.:] | '-')+</rhs> 828</prod> 829<prod id="NT-Misc"> 830<lhs>Misc</lhs><rhs><nt def="NT-Comment">Comment</nt> | <nt def="NT-PI">PI</nt> 831| <nt def="NT-S">S</nt></rhs> 832</prod> 833</prodgroup></scrap> 834<p><termdef id="dt-doctype" term="Document Type Declaration">The XML <term>document 835type declaration</term> contains or points to <termref def="dt-markupdecl">markup 836declarations</termref> that provide a grammar for a class of documents. This 837grammar is known as a document type definition, or <term>DTD</term>. The document 838type declaration can point to an external subset (a special kind of <termref 839def="dt-extent">external entity</termref>) containing markup declarations, 840or can contain the markup declarations directly in an internal subset, or 841can do both. The DTD for a document consists of both subsets taken together.</termdef></p> 842<p><termdef id="dt-markupdecl" term="markup declaration"> A <term>markup declaration</term> 843is an <termref def="dt-eldecl">element type declaration</termref>, an <termref 844def="dt-attdecl">attribute-list declaration</termref>, an <termref def="dt-entdecl">entity 845declaration</termref>, or a <termref def="dt-notdecl">notation declaration</termref>.</termdef> 846These declarations may be contained in whole or in part within <termref def="dt-PE">parameter 847entities</termref>, as described in the well-formedness and validity constraints 848below. For <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E14">[E14]</loc>further</phrase> 849information, see <specref ref="sec-physical-struct"/>.</p> 850<scrap id="dtd" lang="ebnf"> 851<head>Document Type Definition</head> 852<prodgroup pcw2="6" pcw4="17.5" pcw5="9"> 853<prod id="NT-doctypedecl" diff="chg"> 854<lhs>doctypedecl</lhs><rhs>'<!DOCTYPE' <nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt> 855(<nt def="NT-S">S</nt> <nt def="NT-ExternalID">ExternalID</nt>)? <nt def="NT-S">S</nt>? 856('[' (<nt def="NT-markupdecl">markupdecl</nt> | <nt diff="chg" def="NT-DeclSep">DeclSep</nt>)* 857']' <nt def="NT-S">S</nt>?)? '>'</rhs><vc def="vc-roottype"/><wfc def="ExtSubset" 858diff="add"/><com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc></com> 859</prod> 860<prod id="NT-DeclSep" diff="add"> 861<lhs>DeclSep</lhs><rhs><nt def="NT-PEReference">PEReference</nt> | <nt def="NT-S">S</nt></rhs> 862<wfc def="PE-between-Decls" diff="add"/><com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc></com> 863</prod> 864<prod id="NT-markupdecl"> 865<lhs>markupdecl</lhs><rhs><nt def="NT-elementdecl">elementdecl</nt> | <nt 866def="NT-AttlistDecl">AttlistDecl</nt> | <nt def="NT-EntityDecl">EntityDecl</nt> 867| <nt def="NT-NotationDecl">NotationDecl</nt> | <nt def="NT-PI">PI</nt> | <nt 868def="NT-Comment">Comment</nt> </rhs><vc def="vc-PEinMarkupDecl"/><wfc def="wfc-PEinInternalSubset"/> 869</prod> 870</prodgroup></scrap> 871<p diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E82">[E82]</loc>Note 872that it is possible to construct a well-formed document containing a <nt def="NT-doctypedecl">doctypedecl</nt> 873that neither points to an external subset nor contains an internal subset.</p> 874<p>The markup declarations may be made up in whole or in part of the <termref 875def="dt-repltext">replacement text</termref> of <termref def="dt-PE">parameter 876entities</termref>. The productions later in this specification for individual 877nonterminals (<nt def="NT-elementdecl">elementdecl</nt>, <nt def="NT-AttlistDecl">AttlistDecl</nt>, 878and so on) describe the declarations <emph>after</emph> all the parameter 879entities have been <termref def="dt-include">included</termref>.</p> 880<p diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E75">[E75]</loc>Parameter 881entity references are recognized anywhere in the DTD (internal and external 882subsets and external parameter entities), except in literals, processing instructions, 883comments, and the contents of ignored conditional sections (see <specref ref="sec-condition-sect"/>). 884They are also recognized in entity value literals. The use of parameter entities 885in the internal subset is restricted as described below.</p> 886<vcnote id="vc-roottype"><head>Root Element Type</head><p>The <nt def="NT-Name">Name</nt> 887in the document type declaration must match the element type of the <termref 888def="dt-root">root element</termref>.</p> 889</vcnote> 890<vcnote id="vc-PEinMarkupDecl"><head>Proper Declaration/PE Nesting</head> 891<p>Parameter-entity <termref def="dt-repltext">replacement text</termref> 892must be properly nested with markup declarations. That is to say, if either 893the first character or the last character of a markup declaration (<nt def="NT-markupdecl">markupdecl</nt> 894above) is contained in the replacement text for a <termref def="dt-PERef">parameter-entity 895reference</termref>, both must be contained in the same replacement text.</p> 896</vcnote> 897<wfcnote id="wfc-PEinInternalSubset"><head>PEs in Internal Subset</head><p>In 898the internal DTD subset, <termref def="dt-PERef">parameter-entity references</termref> 899can occur only where markup declarations can occur, not within markup declarations. 900(This does not apply to references that occur in external parameter entities 901or to the external subset.)</p> 902</wfcnote> 903<wfcnote id="ExtSubset" diff="add"><head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc>External 904Subset</head><p>The external subset, if any, must match the production for <nt 905def="NT-extSubset">extSubset</nt>.</p> 906</wfcnote> 907<wfcnote id="PE-between-Decls" diff="add"><head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc>PE 908Between Declarations</head><p>The replacement text of a parameter entity reference 909in a <nt def="NT-DeclSep">DeclSep</nt> must match the production <nt def="NT-extSubsetDecl">extSubsetDecl</nt>.</p> 910</wfcnote> 911<p>Like the internal subset, the external subset and any external parameter 912entities <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc>referenced 913in a <nt def="NT-DeclSep">DeclSep</nt></phrase> must consist of a series of 914complete markup declarations of the types allowed by the non-terminal symbol <nt 915def="NT-markupdecl">markupdecl</nt>, interspersed with white space or <termref 916def="dt-PERef">parameter-entity references</termref>. However, portions of 917the contents of the external subset or of <phrase diff="add"><loc role="erratumref" 918href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc>these </phrase> 919external parameter entities may conditionally be ignored by using the <termref 920def="dt-cond-section">conditional section</termref> construct; this is not 921allowed in the internal subset.</p> 922<scrap id="ext-Subset"> 923<head>External Subset</head> 924<prodgroup pcw2="6" pcw4="17.5" pcw5="9"> 925<prod id="NT-extSubset"> 926<lhs>extSubset</lhs><rhs><nt def="NT-TextDecl">TextDecl</nt>? <nt def="NT-extSubsetDecl">extSubsetDecl</nt></rhs> 927</prod> 928<prod id="NT-extSubsetDecl" diff="chg"> 929<lhs>extSubsetDecl</lhs><rhs>( <nt def="NT-markupdecl">markupdecl</nt> | <nt 930def="NT-conditionalSect">conditionalSect</nt> | <nt diff="chg" def="NT-DeclSep">DeclSep</nt>)*</rhs> 931<com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc></com> 932</prod> 933</prodgroup></scrap> 934<p>The external subset and external parameter entities also differ from the 935internal subset in that in them, <termref def="dt-PERef">parameter-entity 936references</termref> are permitted <emph>within</emph> markup declarations, 937not only <emph>between</emph> markup declarations.</p> 938<p>An example of an XML document with a document type declaration:</p> 939<eg><![CDATA[<?xml version="1.0"?> <!DOCTYPE greeting SYSTEM "hello.dtd"> <greeting>Hello, world!</greeting> ]]></eg> 940<p>The <termref def="dt-sysid">system identifier</termref> <quote><code>hello.dtd</code></quote> 941gives the <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E78">[E78]</loc>address 942(a URI reference)</phrase> of a DTD for the document.</p> 943<p>The declarations can also be given locally, as in this example:</p> 944<eg><![CDATA[<?xml version="1.0" encoding="UTF-8" ?> 945<!DOCTYPE greeting [ 946 <!ELEMENT greeting (#PCDATA)> 947]> 948<greeting>Hello, world!</greeting>]]></eg> 949<p>If both the external and internal subsets are used, the internal subset 950is considered to occur before the external subset. <!-- 'is considered to'? boo. whazzat mean? --> 951This has the effect that entity and attribute-list declarations in the internal 952subset take precedence over those in the external subset.</p> 953</div2> 954<div2 id="sec-rmd"> 955<head>Standalone Document Declaration</head> 956<p>Markup declarations can affect the content of the document, as passed from 957an <termref def="dt-xml-proc">XML processor</termref> to an application; examples 958are attribute defaults and entity declarations. The standalone document declaration, 959which may appear as a component of the XML declaration, signals whether or 960not there are such declarations which appear external to the <termref def="dt-docent">document 961entity</termref><phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E64">[E64]</loc> 962or in parameter entities. <termdef id="dt-extmkpdecl" term="External Markup Declaration">An <term>external 963markup declaration</term> is defined as a markup declaration occurring in 964the external subset or in a parameter entity (external or internal, the latter 965being included because non-validating processors are not required to read 966them).</termdef></phrase></p> 967<scrap id="fulldtd" lang="ebnf"> 968<head>Standalone Document Declaration</head> 969<prodgroup pcw2="4" pcw4="19.5" pcw5="9"> 970<prod id="NT-SDDecl"> 971<lhs>SDDecl</lhs><rhs> <nt def="NT-S">S</nt> 'standalone' <nt def="NT-Eq">Eq</nt> 972(("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"')) </rhs><vc def="vc-check-rmd"/> 973</prod> 974</prodgroup></scrap> 975<p>In a standalone document declaration, the value <attval>yes</attval> indicates 976that there are no <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E64">[E64]</loc><termref 977def="dt-extmkpdecl">external markup declarations</termref></phrase> which 978affect the information passed from the XML processor to the application. The 979value <attval>no</attval> indicates that there are or may be such external 980markup declarations. Note that the standalone document declaration only denotes 981the presence of external <emph>declarations</emph>; the presence, in a document, 982of references to external <emph>entities</emph>, when those entities are internally 983declared, does not change its standalone status.</p> 984<p>If there are no external markup declarations, the standalone document declaration 985has no meaning. If there are external markup declarations but there is no 986standalone document declaration, the value <attval>no</attval> is assumed.</p> 987<p>Any XML document for which <code>standalone="no"</code> holds can be converted 988algorithmically to a standalone document, which may be desirable for some 989network delivery applications.</p> 990<vcnote id="vc-check-rmd"><head>Standalone Document Declaration</head><p>The 991standalone document declaration must have the value <attval>no</attval> if 992any external markup declarations contain declarations of:</p> 993<ulist> 994<item><p>attributes with <termref def="dt-default">default</termref> values, 995if elements to which these attributes apply appear in the document without 996specifications of values for these attributes, or</p></item> 997<item><p>entities (other than &magicents;), if <termref def="dt-entref">references</termref> 998to those entities appear in the document, or</p></item> 999<item><p>attributes with values subject to <titleref href="#AVNormalize">normalization</titleref>, 1000where the attribute appears in the document with a value which will change 1001as a result of normalization, or</p></item> 1002<item><p>element types with <termref def="dt-elemcontent">element content</termref>, 1003if white space occurs directly within any instance of those types.</p></item> 1004</ulist> 1005</vcnote> 1006<p>An example XML declaration with a standalone document declaration:</p> 1007<eg><?xml version="&versionOfXML;" standalone='yes'?></eg> 1008</div2> 1009<div2 id="sec-white-space"> 1010<head>White Space Handling</head> 1011<p>In editing XML documents, it is often convenient to use <quote>white space</quote> 1012(spaces, tabs, and blank lines<phrase diff="del"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E39">[E39]</loc>, 1013denoted by the nonterminal <nt def="NT-S">S</nt> in this specification</phrase>) 1014to set apart the markup for greater readability. Such white space is typically 1015not intended for inclusion in the delivered version of the document. On the 1016other hand, <quote>significant</quote> white space that should be preserved 1017in the delivered version is common, for example in poetry and source code.</p> 1018<p>An <termref def="dt-xml-proc">XML processor</termref> must always pass 1019all characters in a document that are not markup through to the application. 1020A <termref def="dt-validating"> validating XML processor</termref> must also 1021inform the application which of these characters constitute white space appearing 1022in <termref def="dt-elemcontent">element content</termref>.</p> 1023<p>A special <termref def="dt-attr">attribute</termref> named <att>xml:space</att> 1024may be attached to an element to signal an intention that in that element, 1025white space should be preserved by applications. In valid documents, this 1026attribute, like any other, must be <termref def="dt-attdecl">declared</termref> 1027if it is used. When declared, it must be given as an <termref def="dt-enumerated">enumerated 1028type</termref> whose <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E81">[E81]</loc>values 1029are one or both of</phrase> <attval>default</attval> and <attval>preserve</attval>. 1030For example:</p> 1031<eg diff="chg"><![CDATA[<!ATTLIST poem xml:space (default|preserve) 'preserve'>]]> 1032 1033<!-- <loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E81">[E81]</loc>--> 1034<!ATTLIST pre xml:space (preserve) #FIXED 'preserve'></eg> 1035<p>The value <attval>default</attval> signals that applications' default white-space 1036processing modes are acceptable for this element; the value <attval>preserve</attval> 1037indicates the intent that applications preserve all the white space. This 1038declared intent is considered to apply to all elements within the content 1039of the element where it is specified, unless overriden with another instance 1040of the <att>xml:space</att> attribute.</p> 1041<p>The <termref def="dt-root">root element</termref> of any document is considered 1042to have signaled no intentions as regards application space handling, unless 1043it provides a value for this attribute or the attribute is declared with a 1044default value.</p> 1045</div2> 1046<div2 id="sec-line-ends"> 1047<head>End-of-Line Handling</head> 1048<p>XML <termref def="dt-parsedent">parsed entities</termref> are often stored 1049in computer files which, for editing convenience, are organized into lines. 1050These lines are typically separated by some combination of the characters 1051carriage-return (#xD) and line-feed (#xA).</p> 1052<p diff="del">To simplify the tasks of <termref def="dt-app">applications</termref>, 1053wherever an external parsed entity or the literal entity value of an internal 1054parsed entity contains either the literal two-character sequence <quote>#xD#xA</quote> 1055or a standalone literal #xD, an <termref def="dt-xml-proc">XML processor</termref> 1056must pass to the application the single character #xA. (This behavior can 1057conveniently be produced by normalizing all line breaks to #xA on input, before 1058parsing.)</p> 1059<p diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E104">[E104]</loc>To 1060simplify the tasks of <termref def="dt-app">applications</termref>, the characters 1061passed to an application by the <termref def="dt-xml-proc">XML processor</termref> 1062must be as if the XML processor normalized all line breaks in external parsed 1063entities (including the document entity) on input, before parsing, by translating 1064both the two-character sequence #xD #xA and any #xD that is not followed by 1065#xA to a single #xA character.</p> 1066</div2> 1067<div2 id="sec-lang-tag"> 1068<head>Language Identification</head> 1069<p>In document processing, it is often useful to identify the natural or formal 1070language in which the content is written. A special <termref def="dt-attr">attribute</termref> 1071named <att>xml:lang</att> may be inserted in documents to specify the language 1072used in the contents and attribute values of any element in an XML document. 1073In valid documents, this attribute, like any other, must be <termref def="dt-attdecl">declared</termref> 1074if it is used. <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E73">[E73]</loc>The 1075values of the attribute are language identifiers as defined by <bibref ref="RFC1766"/>, <titleref>Tags 1076for the Identification of Languages</titleref>, or its successor on the IETF 1077Standards Track.</phrase></p> 1078<note diff="add"> 1079<p><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E73">[E73]</loc><bibref 1080ref="RFC1766"/> tags are constructed from two-letter language codes as defined 1081by <bibref ref="ISO639"/>, from two-letter country codes as defined by <bibref 1082ref="ISO3166"/>, or from language identifiers registered with the Internet 1083Assigned Numbers Authority <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E58">[E58]</loc><bibref 1084diff="chg" ref="IANA-LANGCODES"/></phrase>. It is expected that the successor 1085to <bibref ref="RFC1766"/> will introduce three-letter language codes for 1086languages not presently covered by <bibref ref="ISO639"/>.</p> 1087</note> 1088<p diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E73">[E73]</loc>(Productions 108933 through 38 have been removed.)</p> 1090<scrap diff="del" lang="ebnf"> 1091<head>Language Identification</head> 1092<prod id="NT-LanguageID"> 1093<lhs>LanguageID</lhs><rhs><nt def="NT-Langcode">Langcode</nt> ('-' <nt def="NT-Subcode">Subcode</nt>)*</rhs> 1094</prod> 1095<prod id="NT-Langcode"> 1096<lhs>Langcode</lhs><rhs><nt def="NT-ISO639Code">ISO639Code</nt> | <nt def="NT-IanaCode">IanaCode</nt> 1097| <nt def="NT-UserCode">UserCode</nt></rhs> 1098</prod> 1099<prod id="NT-ISO639Code"> 1100<lhs>ISO639Code</lhs><rhs>([a-z] | [A-Z]) ([a-z] | [A-Z])</rhs> 1101</prod> 1102<prod id="NT-IanaCode"> 1103<lhs>IanaCode</lhs><rhs>('i' | 'I') '-' ([a-z] | [A-Z])+</rhs> 1104</prod> 1105<prod id="NT-UserCode"> 1106<lhs>UserCode</lhs><rhs>('x' | 'X') '-' ([a-z] | [A-Z])+</rhs> 1107</prod> 1108<prod id="NT-Subcode"> 1109<lhs>Subcode</lhs><rhs>([a-z] | [A-Z])+</rhs> 1110</prod> 1111</scrap> 1112<p diff="del">The <nt def="NT-Langcode">Langcode</nt> may be any of the following:</p> 1113<ulist diff="del"> 1114<item><p>a two-letter language code as defined by <bibref ref="ISO639"/>, <titleref>Codes 1115for the representation of names of languages</titleref></p></item> 1116<item><p>a language identifier registered with the Internet Assigned Numbers 1117Authority <bibref diff="chg" ref="IANA-LANGCODES"/>; these begin with the 1118prefix <quote><code>i-</code></quote> (or <quote><code>I-</code></quote>)</p> 1119</item> 1120<item><p>a language identifier assigned by the user, or agreed on between 1121parties in private use; these must begin with the prefix <quote><code>x-</code></quote> 1122or <quote><code>X-</code></quote> in order to ensure that they do not conflict 1123with names later standardized or registered with IANA</p></item> 1124</ulist> 1125<p diff="del">There may be any number of <nt def="NT-Subcode">Subcode</nt> 1126segments; if the first subcode segment exists and the Subcode consists of 1127two letters, then it must be a country code from <bibref ref="ISO3166"/>, 1128"Codes for the representation of names of countries." If the first subcode 1129consists of more than two letters, it must be a subcode for the language in 1130question registered with IANA, unless the <nt def="NT-Langcode">Langcode</nt> 1131begins with the prefix "<code>x-</code>" or "<code>X-</code>". </p> 1132<p diff="del">It is customary to give the language code in lower case, and 1133the country code (if any) in upper case. Note that these values, unlike other 1134names in XML documents, are case insensitive.</p> 1135<p>For example:</p> 1136<eg><![CDATA[<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p> 1137<p xml:lang="en-GB">What colour is it?</p> 1138<p xml:lang="en-US">What color is it?</p> 1139<sp who="Faust" desc='leise' xml:lang="de"> 1140 <l>Habe nun, ach! Philosophie,</l> 1141 <l>Juristerei, und Medizin</l> 1142 <l>und leider auch Theologie</l> 1143 <l>durchaus studiert mit hei�em Bem�h'n.</l> 1144</sp>]]></eg> 1145<!--<p>The xml:lang value is considered to apply both to the contents of an 1146element and 1147(unless otherwise via attribute default values) to the 1148values of all of its attributes with free-text (CDATA) values. --> 1149<p>The intent declared with <att>xml:lang</att> is considered to apply to 1150all attributes and content of the element where it is specified, unless overridden 1151with an instance of <att>xml:lang</att> on another element within that content.</p> 1152<!-- 1153If no 1154value is specified for xml:lang on an element, and no default value is 1155defined for it in the DTD, then the xml:lang attribute of any element 1156takes the same value it has in the parent element, if any. The two 1157technical terms in the following example both have the same effective 1158value for xml:lang: 1159 1160 <p xml:lang="en">Here the keywords are 1161 <term xml:lang="en">shift</term> and 1162 <term>reduce</term>. ...</p> 1163 1164The application, not the XML processor, is responsible for this ' 1165inheritance' of attribute values. 1166--> 1167<p>A simple declaration for <att>xml:lang</att> might take the form</p> 1168<eg>xml:lang NMTOKEN #IMPLIED</eg> 1169<p>but specific default values may also be given, if appropriate. In a collection 1170of French poems for English students, with glosses and notes in English, the <att>xml:lang</att> 1171attribute might be declared this way:</p> 1172<eg><![CDATA[<!ATTLIST poem xml:lang NMTOKEN 'fr'> 1173<!ATTLIST gloss xml:lang NMTOKEN 'en'> 1174<!ATTLIST note xml:lang NMTOKEN 'en'>]]></eg> 1175</div2> 1176</div1> 1177<!-- &Elements; --> 1178<div1 id="sec-logical-struct"> 1179<head>Logical Structures</head> 1180<p><termdef id="dt-element" term="Element">Each <termref def="dt-xml-doc">XML 1181document</termref> contains one or more <term>elements</term>, the boundaries 1182of which are either delimited by <termref def="dt-stag">start-tags</termref> 1183and <termref def="dt-etag">end-tags</termref>, or, for <termref def="dt-empty">empty</termref> 1184elements, by an <termref def="dt-eetag">empty-element tag</termref>. Each 1185element has a type, identified by name, sometimes called its <quote>generic 1186identifier</quote> (GI), and may have a set of attribute specifications.</termdef> 1187Each attribute specification has a <termref def="dt-attrname">name</termref> 1188and a <termref def="dt-attrval">value</termref>.</p> 1189<scrap lang="ebnf"> 1190<head>Element</head> 1191<prod id="NT-element"> 1192<lhs>element</lhs><rhs><nt def="NT-EmptyElemTag">EmptyElemTag</nt></rhs> 1193<rhs>| <nt def="NT-STag">STag</nt> <nt def="NT-content">content</nt> <nt def="NT-ETag">ETag</nt></rhs> 1194<wfc def="GIMatch"/><vc def="elementvalid"/> 1195</prod> 1196</scrap> 1197<p>This specification does not constrain the semantics, use, or (beyond syntax) 1198names of the element types and attributes, except that names beginning with 1199a match to <code>(('X'|'x')('M'|'m')('L'|'l'))</code> are reserved for standardization 1200in this or future versions of this specification.</p> 1201<wfcnote id="GIMatch"><head>Element Type Match</head><p>The <nt def="NT-Name">Name</nt> 1202in an element's end-tag must match the element type in the start-tag.</p> 1203</wfcnote> 1204<vcnote id="elementvalid"><head>Element Valid</head><p>An element is valid 1205if there is a declaration matching <nt def="NT-elementdecl">elementdecl</nt> 1206where the <nt def="NT-Name">Name</nt> matches the element type, and one of 1207the following holds:</p> 1208<olist> 1209<item><p>The declaration matches <kw>EMPTY</kw> and the element has no <termref 1210def="dt-content">content</termref>.</p></item> 1211<item><p>The declaration matches <nt def="NT-children">children</nt> and the 1212sequence of <termref def="dt-parentchild">child elements</termref> belongs 1213to the language generated by the regular expression in the content model, 1214with optional white space (characters matching the nonterminal <nt def="NT-S">S</nt>) 1215between <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E59">[E59]</loc>the 1216start-tag and the first child element, between child elements, or between 1217the last child element and the end-tag. Note that a CDATA section containing 1218only white space does not match the nonterminal <nt def="NT-S">S</nt>, and 1219hence cannot appear in these positions.</phrase></p></item> 1220<item><p>The declaration matches <nt def="NT-Mixed">Mixed</nt> and the content 1221consists of <termref def="dt-chardata">character data</termref> and <termref 1222def="dt-parentchild">child elements</termref> whose types match names in the 1223content model.</p></item> 1224<item><p>The declaration matches <kw>ANY</kw>, and the types of any <termref 1225def="dt-parentchild">child elements</termref> have been declared.</p></item> 1226</olist> 1227</vcnote> 1228<div2 id="sec-starttags"> 1229<head>Start-Tags, End-Tags, and Empty-Element Tags</head> 1230<p><termdef id="dt-stag" term="Start-Tag">The beginning of every non-empty 1231XML element is marked by a <term>start-tag</term>.</termdef></p> 1232<scrap lang="ebnf"> 1233<head>Start-tag</head> 1234<prodgroup pcw2="6" pcw4="15" pcw5="11.5"> 1235<prod id="NT-STag"> 1236<lhs>STag</lhs><rhs>'<' <nt def="NT-Name">Name</nt> (<nt def="NT-S">S</nt> <nt 1237def="NT-Attribute">Attribute</nt>)* <nt def="NT-S">S</nt>? '>'</rhs><wfc def="uniqattspec"/> 1238</prod> 1239<prod id="NT-Attribute"> 1240<lhs>Attribute</lhs><rhs><nt def="NT-Name">Name</nt> <nt def="NT-Eq">Eq</nt> <nt 1241def="NT-AttValue">AttValue</nt></rhs><vc def="ValueType"/><wfc def="NoExternalRefs"/> 1242<wfc def="CleanAttrVals"/> 1243</prod> 1244</prodgroup></scrap> 1245<p>The <nt def="NT-Name">Name</nt> in the start- and end-tags gives the element's <term>type</term>. <termdef 1246id="dt-attr" term="Attribute"> The <nt def="NT-Name">Name</nt>-<nt def="NT-AttValue">AttValue</nt> 1247pairs are referred to as the <term>attribute specifications</term> of the 1248element</termdef>, <termdef id="dt-attrname" term="Attribute Name">with the <nt 1249def="NT-Name">Name</nt> in each pair referred to as the <term>attribute name</term></termdef> 1250and <termdef id="dt-attrval" term="Attribute Value">the content of the <nt 1251def="NT-AttValue">AttValue</nt> (the text between the <code>'</code> or <code>"</code> 1252delimiters) as the <term>attribute value</term>.</termdef><phrase diff="add"><loc 1253role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E46">[E46]</loc>Note 1254that the order of attribute specifications in a start-tag or empty-element 1255tag is not significant.</phrase></p> 1256<wfcnote id="uniqattspec"><head>Unique Att Spec</head><p>No attribute name 1257may appear more than once in the same start-tag or empty-element tag.</p> 1258</wfcnote> 1259<vcnote id="ValueType"><head>Attribute Value Type</head><p>The attribute must 1260have been declared; the value must be of the type declared for it. (For attribute 1261types, see <specref ref="attdecls"/>.)</p> 1262</vcnote> 1263<wfcnote id="NoExternalRefs"><head>No External Entity References</head><p>Attribute 1264values cannot contain direct or indirect entity references to external entities.</p> 1265</wfcnote> 1266<wfcnote id="CleanAttrVals"><head>No <code><</code> in Attribute Values</head> 1267<p>The <termref def="dt-repltext">replacement text</termref> of any entity 1268referred to directly or indirectly in an attribute value <phrase diff="del"><loc 1269role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E83">[E83]</loc>(other 1270than <quote><code>&lt;</code></quote>) </phrase>must not contain a <code><</code>.</p> 1271</wfcnote> 1272<p>An example of a start-tag:</p> 1273<eg><termdef id="dt-dog" term="dog"></eg> 1274<p><termdef id="dt-etag" term="End Tag">The end of every element that begins 1275with a start-tag must be marked by an <term>end-tag</term> containing a name 1276that echoes the element's type as given in the start-tag:</termdef></p> 1277<scrap lang="ebnf"> 1278<head>End-tag</head> 1279<prodgroup pcw2="6" pcw4="15" pcw5="11.5"> 1280<prod id="NT-ETag"> 1281<lhs>ETag</lhs><rhs>'</' <nt def="NT-Name">Name</nt> <nt def="NT-S">S</nt>? 1282'>'</rhs> 1283</prod> 1284</prodgroup></scrap> 1285<p>An example of an end-tag:</p> 1286<eg></termdef></eg> 1287<p><termdef id="dt-content" term="Content">The <termref def="dt-text">text</termref> 1288between the start-tag and end-tag is called the element's <term>content</term>:</termdef></p> 1289<scrap lang="ebnf"> 1290<head>Content of Elements</head> 1291<prodgroup pcw2="6" pcw4="15" pcw5="11.5"> 1292<prod id="NT-content" diff="chg"> 1293<lhs>content</lhs><rhs><nt def="NT-CharData">CharData</nt>? ((<nt def="NT-element">element</nt> 1294| <nt def="NT-Reference">Reference</nt> | <nt def="NT-CDSect">CDSect</nt> 1295| <nt def="NT-PI">PI</nt> | <nt def="NT-Comment">Comment</nt>) <nt def="NT-CharData">CharData</nt>?)*</rhs> 1296<com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E71">[E71]</loc></com> 1297</prod> 1298</prodgroup></scrap> 1299<p><phrase diff="chg"><termdef id="dt-empty" term="Empty"><loc role="erratumref" 1300href="http://www.w3.org/XML/xml-19980210-errata#E97">[E97]</loc>An element 1301with no content is said to be <term>empty</term>.</termdef> The representation 1302of an empty element is either a start-tag immediately followed by an end-tag, 1303or an empty-element tag.</phrase> <termdef id="dt-eetag" term="empty-element tag">An <term>empty-element 1304tag</term> takes a special form:</termdef></p> 1305<scrap lang="ebnf"> 1306<head>Tags for Empty Elements</head> 1307<prodgroup pcw2="6" pcw4="15" pcw5="11.5"> 1308<prod id="NT-EmptyElemTag"> 1309<lhs>EmptyElemTag</lhs><rhs>'<' <nt def="NT-Name">Name</nt> (<nt def="NT-S">S</nt> <nt 1310def="NT-Attribute">Attribute</nt>)* <nt def="NT-S">S</nt>? '/>'</rhs><wfc 1311def="uniqattspec"/> 1312</prod> 1313</prodgroup></scrap> 1314<p>Empty-element tags may be used for any element which has no content, whether 1315or not it is declared using the keyword <kw>EMPTY</kw>. <termref def="dt-interop">For 1316interoperability</termref>, the empty-element tag <phrase diff="chg"><loc 1317role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E45">[E45]</loc>should 1318be used, and should only be used,</phrase> for elements which are declared 1319EMPTY.</p> 1320<p>Examples of empty elements:</p> 1321<eg><IMG align="left" 1322 src="http://www.w3.org/Icons/WWW/w3c_home" /> 1323<br></br> 1324<br/></eg> 1325</div2> 1326<div2 id="elemdecls"> 1327<head>Element Type Declarations</head> 1328<p>The <termref def="dt-element">element</termref> structure of an <termref 1329def="dt-xml-doc">XML document</termref> may, for <termref def="dt-valid">validation</termref> 1330purposes, be constrained using element type and attribute-list declarations. 1331An element type declaration constrains the element's <termref def="dt-content">content</termref>.</p> 1332<p>Element type declarations often constrain which element types can appear 1333as <termref def="dt-parentchild">children</termref> of the element. At user 1334option, an XML processor may issue a warning when a declaration mentions an 1335element type for which no declaration is provided, but this is not an error.</p> 1336<p><termdef id="dt-eldecl" term="Element Type declaration">An <term>element 1337type declaration</term> takes the form:</termdef></p> 1338<scrap lang="ebnf"> 1339<head>Element Type Declaration</head> 1340<prodgroup pcw2="5.5" pcw4="18" pcw5="9"> 1341<prod id="NT-elementdecl"> 1342<lhs>elementdecl</lhs><rhs>'<!ELEMENT' <nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt> <nt 1343def="NT-S">S</nt> <nt def="NT-contentspec">contentspec</nt> <nt def="NT-S">S</nt>? 1344'>'</rhs><vc def="EDUnique"/> 1345</prod> 1346<prod id="NT-contentspec"> 1347<lhs>contentspec</lhs><rhs>'EMPTY' | 'ANY' | <nt def="NT-Mixed">Mixed</nt> 1348| <nt def="NT-children">children</nt> </rhs> 1349</prod> 1350</prodgroup></scrap> 1351<p>where the <nt def="NT-Name">Name</nt> gives the element type being declared.</p> 1352<vcnote id="EDUnique"><head>Unique Element Type Declaration</head><p>No element 1353type may be declared more than once.</p> 1354</vcnote> 1355<p>Examples of element type declarations:</p> 1356<eg><!ELEMENT br EMPTY> 1357<!ELEMENT p (#PCDATA|emph)* > 1358<!ELEMENT %name.para; %content.para; > 1359<!ELEMENT container ANY></eg> 1360<div3 id="sec-element-content"> 1361<head>Element Content</head> 1362<p><termdef id="dt-elemcontent" term="Element content">An element <termref 1363def="dt-stag">type</termref> has <term>element content</term> when elements 1364of that type must contain only <termref def="dt-parentchild">child</termref> 1365elements (no character data), optionally separated by white space (characters 1366matching the nonterminal <nt def="NT-S">S</nt>).</termdef><termdef id="dt-content-model" 1367term="Content model">In this case, the constraint includes a <phrase diff="chg"><loc 1368role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E55">[E55]</loc><term>content 1369model</term></phrase>, a simple grammar governing the allowed types of the 1370child elements and the order in which they are allowed to appear.</termdef> 1371The grammar is built on content particles (<nt def="NT-cp">cp</nt>s), which 1372consist of names, choice lists of content particles, or sequence lists of 1373content particles:</p> 1374<scrap lang="ebnf"> 1375<head>Element-content Models</head> 1376<prodgroup pcw2="5.5" pcw4="16" pcw5="11"> 1377<prod id="NT-children"> 1378<lhs>children</lhs><rhs>(<nt def="NT-choice">choice</nt> | <nt def="NT-seq">seq</nt>) 1379('?' | '*' | '+')?</rhs> 1380</prod> 1381<prod id="NT-cp"> 1382<lhs>cp</lhs><rhs>(<nt def="NT-Name">Name</nt> | <nt def="NT-choice">choice</nt> 1383| <nt def="NT-seq">seq</nt>) ('?' | '*' | '+')?</rhs> 1384</prod> 1385<prod id="NT-choice" diff="chg"> 1386<lhs>choice</lhs><rhs>'(' <nt def="NT-S">S</nt>? <nt def="NT-cp">cp</nt> ( <nt 1387def="NT-S">S</nt>? '|' <nt def="NT-S">S</nt>? <nt def="NT-cp">cp</nt> )+ <nt 1388def="NT-S">S</nt>? ')'</rhs><com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E50">[E50]</loc></com> 1389<com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E52">[E52]</loc></com> 1390<vc def="vc-PEinGroup"/> 1391</prod> 1392<prod id="NT-seq" diff="chg"> 1393<lhs>seq</lhs><rhs>'(' <nt def="NT-S">S</nt>? <nt def="NT-cp">cp</nt> ( <nt 1394def="NT-S">S</nt>? ',' <nt def="NT-S">S</nt>? <nt def="NT-cp">cp</nt> )* <nt 1395def="NT-S">S</nt>? ')'</rhs><com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E52">[E52]</loc></com> 1396<vc def="vc-PEinGroup"/> 1397</prod> 1398</prodgroup></scrap> 1399<p>where each <nt def="NT-Name">Name</nt> is the type of an element which 1400may appear as a <termref def="dt-parentchild">child</termref>. Any content 1401particle in a choice list may appear in the <termref def="dt-elemcontent">element 1402content</termref> at the location where the choice list appears in the grammar; 1403content particles occurring in a sequence list must each appear in the <termref 1404def="dt-elemcontent">element content</termref> in the order given in the list. 1405The optional character following a name or list governs whether the element 1406or the content particles in the list may occur one or more (<code>+</code>), 1407zero or more (<code>*</code>), or zero or one times (<code>?</code>). The 1408absence of such an operator means that the element or content particle must 1409appear exactly once. This syntax and meaning are identical to those used in 1410the productions in this specification.</p> 1411<p>The content of an element matches a content model if and only if it is 1412possible to trace out a path through the content model, obeying the sequence, 1413choice, and repetition operators and matching each element in the content 1414against an element type in the content model. <termref def="dt-compat">For 1415compatibility</termref>, it is an error if an element in the document can 1416match more than one occurrence of an element type in the content model. For 1417more information, see <specref ref="determinism"/>.</p> 1418<!--appendix <specref ref="determinism"/>.--> 1419<!-- appendix on deterministic content models. --> 1420<vcnote id="vc-PEinGroup"><head>Proper Group/PE Nesting</head><p>Parameter-entity <termref 1421def="dt-repltext">replacement text</termref> must be properly nested with <phrase 1422diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E11">[E11]</loc>parenthesized</phrase> 1423groups. That is to say, if either of the opening or closing parentheses in 1424a <nt def="NT-choice">choice</nt>, <nt def="NT-seq">seq</nt>, or <nt def="NT-Mixed">Mixed</nt> 1425construct is contained in the replacement text for a <termref def="dt-PERef">parameter 1426entity</termref>, both must be contained in the same replacement text.</p> 1427<p diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E19">[E19]</loc><termref 1428def="dt-interop">For interoperability</termref>, if a parameter-entity reference 1429appears in a <nt def="NT-choice">choice</nt>, <nt def="NT-seq">seq</nt>, or <nt 1430def="NT-Mixed">Mixed</nt> construct, its replacement text should contain at 1431least one non-blank character, and neither the first nor last non-blank character 1432of the replacement text should be a connector (<code>|</code> or <code>,</code>).</p> 1433</vcnote> 1434<p>Examples of element-content models:</p> 1435<eg><!ELEMENT spec (front, body, back?)> 1436<!ELEMENT div1 (head, (p | list | note)*, div2*)> 1437<!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*></eg> 1438</div3> 1439<div3 id="sec-mixed-content"> 1440<head>Mixed Content</head> 1441<p><termdef id="dt-mixed" term="Mixed Content">An element <termref def="dt-stag">type</termref> 1442has <term>mixed content</term> when elements of that type may contain character 1443data, optionally interspersed with <termref def="dt-parentchild">child</termref> 1444elements.</termdef> In this case, the types of the child elements may be constrained, 1445but not their order or their number of occurrences:</p> 1446<scrap lang="ebnf"> 1447<head>Mixed-content Declaration</head> 1448<prodgroup pcw2="5.5" pcw4="16" pcw5="11"> 1449<prod id="NT-Mixed"> 1450<lhs>Mixed</lhs><rhs>'(' <nt def="NT-S">S</nt>? '#PCDATA' (<nt def="NT-S">S</nt>? 1451'|' <nt def="NT-S">S</nt>? <nt def="NT-Name">Name</nt>)* <nt def="NT-S">S</nt>? 1452')*' </rhs> 1453<rhs>| '(' <nt def="NT-S">S</nt>? '#PCDATA' <nt def="NT-S">S</nt>? ')' </rhs> 1454<vc def="vc-PEinGroup"/><vc def="vc-MixedChildrenUnique"/> 1455</prod> 1456</prodgroup></scrap> 1457<p>where the <nt def="NT-Name">Name</nt>s give the types of elements that 1458may appear as children. <phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E10">[E10]</loc>The 1459keyword <kw>#PCDATA</kw> derives historically from the term <quote>parsed 1460character data.</quote></phrase></p> 1461<vcnote id="vc-MixedChildrenUnique"><head>No Duplicate Types</head><p>The 1462same name must not appear more than once in a single mixed-content declaration.</p> 1463</vcnote> 1464<p>Examples of mixed content declarations:</p> 1465<eg><!ELEMENT p (#PCDATA|a|ul|b|i|em)*> 1466<!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* > 1467<!ELEMENT b (#PCDATA)></eg> 1468</div3> 1469</div2> 1470<div2 id="attdecls"> 1471<head>Attribute-List Declarations</head> 1472<p><termref def="dt-attr">Attributes</termref> are used to associate name-value 1473pairs with <termref def="dt-element">elements</termref>. Attribute specifications 1474may appear only within <termref def="dt-stag">start-tags</termref> and <termref 1475def="dt-eetag">empty-element tags</termref>; thus, the productions used to 1476recognize them appear in <specref ref="sec-starttags"/>. Attribute-list declarations 1477may be used:</p> 1478<ulist> 1479<item><p>To define the set of attributes pertaining to a given element type.</p> 1480</item> 1481<item><p>To establish type constraints for these attributes.</p></item> 1482<item><p>To provide <termref def="dt-default">default values</termref> for 1483attributes.</p></item> 1484</ulist> 1485<p><termdef id="dt-attdecl" term="Attribute-List Declaration"> <term>Attribute-list 1486declarations</term> specify the name, data type, and default value (if any) 1487of each attribute associated with a given element type:</termdef></p> 1488<scrap lang="ebnf"> 1489<head>Attribute-list Declaration</head> 1490<prod id="NT-AttlistDecl"> 1491<lhs>AttlistDecl</lhs><rhs>'<!ATTLIST' <nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt> <nt 1492def="NT-AttDef">AttDef</nt>* <nt def="NT-S">S</nt>? '>'</rhs> 1493</prod> 1494<prod id="NT-AttDef"> 1495<lhs>AttDef</lhs><rhs><nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt> <nt 1496def="NT-S">S</nt> <nt def="NT-AttType">AttType</nt> <nt def="NT-S">S</nt> <nt 1497def="NT-DefaultDecl">DefaultDecl</nt></rhs> 1498</prod> 1499</scrap> 1500<p>The <nt def="NT-Name">Name</nt> in the <nt def="NT-AttlistDecl">AttlistDecl</nt> 1501rule is the type of an element. At user option, an XML processor may issue 1502a warning if attributes are declared for an element type not itself declared, 1503but this is not an error. The <nt def="NT-Name">Name</nt> in the <nt def="NT-AttDef">AttDef</nt> 1504rule is the name of the attribute.</p> 1505<p>When more than one <nt def="NT-AttlistDecl">AttlistDecl</nt> is provided 1506for a given element type, the contents of all those provided are merged. When 1507more than one definition is provided for the same attribute of a given element 1508type, the first declaration is binding and later declarations are ignored. <phrase 1509diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E9">[E9]</loc><termref 1510def="dt-interop">For interoperability,</termref> writers of DTDs may choose 1511to provide at most one attribute-list declaration for a given element type, 1512at most one attribute definition for a given attribute name in an attribute-list 1513declaration, and at least one attribute definition in each attribute-list 1514declaration.</phrase> For interoperability, an XML processor may at user option 1515issue a warning when more than one attribute-list declaration is provided 1516for a given element type, or more than one attribute definition is provided 1517for a given attribute, but this is not an error.</p> 1518<div3 id="sec-attribute-types"> 1519<head>Attribute Types</head> 1520<p>XML attribute types are of three kinds: a string type, a set of tokenized 1521types, and enumerated types. The string type may take any literal string as 1522a value; the tokenized types have varying lexical and semantic constraints<phrase 1523diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E8">[E8]</loc>. 1524The validity constraints noted in the grammar are applied after the attribute 1525value has been normalized as described in <specref ref="attdecls"/>.</phrase></p> 1526<scrap lang="ebnf"> 1527<head>Attribute Types</head> 1528<prodgroup pcw4="14" pcw5="11.5"> 1529<prod id="NT-AttType"> 1530<lhs>AttType</lhs><rhs><nt def="NT-StringType">StringType</nt> | <nt def="NT-TokenizedType">TokenizedType</nt> 1531| <nt def="NT-EnumeratedType">EnumeratedType</nt> </rhs> 1532</prod> 1533<prod id="NT-StringType"> 1534<lhs>StringType</lhs><rhs>'CDATA'</rhs> 1535</prod> 1536<prod id="NT-TokenizedType"> 1537<lhs>TokenizedType</lhs><rhs>'ID'</rhs><vc def="id"/><vc def="one-id-per-el"/> 1538<vc def="id-default"/> 1539<rhs>| 'IDREF'</rhs><vc def="idref"/> 1540<rhs>| 'IDREFS'</rhs><vc def="idref"/> 1541<rhs>| 'ENTITY'</rhs><vc def="entname"/> 1542<rhs>| 'ENTITIES'</rhs><vc def="entname"/> 1543<rhs>| 'NMTOKEN'</rhs><vc def="nmtok"/> 1544<rhs>| 'NMTOKENS'</rhs><vc def="nmtok"/> 1545</prod> 1546</prodgroup></scrap> 1547<vcnote id="id"><head>ID</head><p>Values of type <kw>ID</kw> must match the <nt 1548def="NT-Name">Name</nt> production. A name must not appear more than once 1549in an XML document as a value of this type; i.e., ID values must uniquely 1550identify the elements which bear them.</p> 1551</vcnote> 1552<vcnote id="one-id-per-el"><head>One ID per Element Type</head><p>No element 1553type may have more than one ID attribute specified.</p> 1554</vcnote> 1555<vcnote id="id-default"><head>ID Attribute Default</head><p>An ID attribute 1556must have a declared default of <kw>#IMPLIED</kw> or <kw>#REQUIRED</kw>.</p> 1557</vcnote> 1558<vcnote id="idref"><head>IDREF</head><p>Values of type <kw>IDREF</kw> must 1559match the <nt def="NT-Name">Name</nt> production, and values of type <kw>IDREFS</kw> 1560must match <nt def="NT-Names">Names</nt>; each <nt def="NT-Name">Name</nt> 1561must match the value of an ID attribute on some element in the XML document; 1562i.e. <kw>IDREF</kw> values must match the value of some ID attribute.</p> 1563</vcnote> 1564<vcnote id="entname"><head>Entity Name</head><p>Values of type <kw>ENTITY</kw> 1565must match the <nt def="NT-Name">Name</nt> production, values of type <kw>ENTITIES</kw> 1566must match <nt def="NT-Names">Names</nt>; each <nt def="NT-Name">Name</nt> 1567must match the name of an <termref def="dt-unparsed">unparsed entity</termref> 1568declared in the <termref def="dt-doctype">DTD</termref>.</p> 1569</vcnote> 1570<vcnote id="nmtok"><head>Name Token</head><p>Values of type <kw>NMTOKEN</kw> 1571must match the <nt def="NT-Nmtoken">Nmtoken</nt> production; values of type <kw>NMTOKENS</kw> 1572must match <termref def="NT-Nmtokens">Nmtokens</termref>.</p> 1573</vcnote> 1574<!-- why? 1575<p>The XML processor must normalize attribute values before 1576passing them to the application, as described in 1577<specref ref="AVNormalize"/>.</p>--> 1578<p><termdef id="dt-enumerated" term="Enumerated Attribute 1579Values"><term>Enumerated attributes</term> can take one of a list of values 1580provided in the declaration</termdef>. There are two kinds of enumerated types:</p> 1581<scrap lang="ebnf"> 1582<head>Enumerated Attribute Types</head> 1583<prod id="NT-EnumeratedType"> 1584<lhs>EnumeratedType</lhs><rhs><nt def="NT-NotationType">NotationType</nt> 1585| <nt def="NT-Enumeration">Enumeration</nt> </rhs> 1586</prod> 1587<prod id="NT-NotationType"> 1588<lhs>NotationType</lhs><rhs>'NOTATION' <nt def="NT-S">S</nt> '(' <nt def="NT-S">S</nt>? <nt 1589def="NT-Name">Name</nt> (<nt def="NT-S">S</nt>? '|' <nt def="NT-S">S</nt>? <nt 1590def="NT-Name">Name</nt>)* <nt def="NT-S">S</nt>? ')' </rhs><vc def="notatn"/> 1591<vc def="OneNotationPer" diff="add"/><vc def="NoNotationEmpty" diff="add"/> 1592</prod> 1593<prod id="NT-Enumeration"> 1594<lhs>Enumeration</lhs><rhs>'(' <nt def="NT-S">S</nt>? <nt def="NT-Nmtoken">Nmtoken</nt> 1595(<nt def="NT-S">S</nt>? '|' <nt def="NT-S">S</nt>? <nt def="NT-Nmtoken">Nmtoken</nt>)* <nt 1596def="NT-S">S</nt>? ')'</rhs><vc def="enum"/> 1597</prod> 1598</scrap> 1599<p>A <kw>NOTATION</kw> attribute identifies a <termref def="dt-notation">notation</termref>, 1600declared in the DTD with associated system and/or public identifiers, to be 1601used in interpreting the element to which the attribute is attached.</p> 1602<vcnote id="notatn"><head>Notation Attributes</head><p>Values of this type 1603must match one of the <titleref href="#Notations">notation</titleref> names 1604included in the declaration; all notation names in the declaration must be 1605declared.</p> 1606</vcnote> 1607<vcnote id="OneNotationPer" diff="add"><head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E7">[E7]</loc>One 1608Notation Per Element Type</head><p>No element type may have more than one <kw>NOTATION</kw> 1609attribute specified.</p> 1610</vcnote> 1611<vcnote id="NoNotationEmpty" diff="add"><head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E68">[E68]</loc>No 1612Notation on Empty Element</head><p><termref def="dt-compat">For compatibility</termref>, 1613an attribute of type <kw>NOTATION</kw> must not be declared on an element 1614declared <kw>EMPTY</kw>.</p> 1615</vcnote> 1616<vcnote id="enum"><head>Enumeration</head><p>Values of this type must match 1617one of the <nt def="NT-Nmtoken">Nmtoken</nt> tokens in the declaration.</p> 1618</vcnote> 1619<p><termref def="dt-interop">For interoperability,</termref> the same <nt 1620def="NT-Nmtoken">Nmtoken</nt> should not occur more than once in the enumerated 1621attribute types of a single element type.</p> 1622</div3> 1623<div3 id="sec-attr-defaults"> 1624<head>Attribute Defaults</head> 1625<p>An <termref def="dt-attdecl">attribute declaration</termref> provides information 1626on whether the attribute's presence is required, and if not, how an XML processor 1627should react if a declared attribute is absent in a document.</p> 1628<scrap lang="ebnf"> 1629<head>Attribute Defaults</head> 1630<prodgroup pcw4="14" pcw5="11.5"> 1631<prod id="NT-DefaultDecl"> 1632<lhs>DefaultDecl</lhs><rhs>'#REQUIRED' | '#IMPLIED' </rhs> 1633<rhs>| (('#FIXED' S)? <nt def="NT-AttValue">AttValue</nt>)</rhs><vc def="RequiredAttr"/> 1634<vc def="defattrvalid"/><wfc def="CleanAttrVals"/><vc def="FixedAttr"/> 1635</prod> 1636</prodgroup></scrap> 1637<p>In an attribute declaration, <kw>#REQUIRED</kw> means that the attribute 1638must always be provided, <kw>#IMPLIED</kw> that no default value is provided. <!-- not any more!! 1639<kw>#IMPLIED</kw> means that if the attribute is omitted 1640from an element of this type, 1641the XML processor must inform the application 1642that no value was specified; no constraint is placed on the behavior 1643of the application. --> <termdef id="dt-default" term="Attribute Default">If 1644the declaration is neither <kw>#REQUIRED</kw> nor <kw>#IMPLIED</kw>, then 1645the <nt def="NT-AttValue">AttValue</nt> value contains the declared <term>default</term> 1646value; the <kw>#FIXED</kw> keyword states that the attribute must always have 1647the default value. If a default value is declared, when an XML processor encounters 1648an omitted attribute, it is to behave as though the attribute were present 1649with the declared default value.</termdef></p> 1650<vcnote id="RequiredAttr"><head>Required Attribute</head><p>If the default 1651declaration is the keyword <kw>#REQUIRED</kw>, then the attribute must be 1652specified for all elements of the type in the attribute-list declaration.</p> 1653</vcnote> 1654<vcnote id="defattrvalid"><head>Attribute Default Legal</head><p>The declared 1655default value must meet the lexical constraints of the declared attribute 1656type.</p> 1657</vcnote> 1658<vcnote id="FixedAttr"><head>Fixed Attribute Default</head><p>If an attribute 1659has a default value declared with the <kw>#FIXED</kw> keyword, instances of 1660that attribute must match the default value.</p> 1661</vcnote> 1662<p>Examples of attribute-list declarations:</p> 1663<eg><!ATTLIST termdef 1664 id ID #REQUIRED 1665 name CDATA #IMPLIED> 1666<!ATTLIST list 1667 type (bullets|ordered|glossary) "ordered"> 1668<!ATTLIST form 1669 method CDATA #FIXED "POST"></eg> 1670</div3> 1671<div3 id="AVNormalize" diff="chg"> 1672<head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E70">[E70]</loc>Attribute-Value 1673Normalization</head> 1674<p>Before the value of an attribute is passed to the application or checked 1675for validity, the XML processor must normalize the attribute value by applying 1676the algorithm below, or by using some other method such that the value passed 1677to the application is the same as that produced by the algorithm.</p> 1678<olist> 1679<item><p>All line breaks must have been normalized on input to #xA as described 1680in <specref ref="sec-line-ends"/>, so the rest of this algorithm operates 1681on text normalized in this way.</p></item> 1682<item><p>Begin with a normalized value consisting of the empty string.</p> 1683</item> 1684<item><p>For each character, entity reference, or character reference in the 1685unnormalized attribute value, beginning with the first and continuing to the 1686last, do the following:</p> 1687<ulist> 1688<item><p>For a character reference, append the referenced character to the 1689normalized value.</p></item> 1690<item><p>For an entity reference, recursively apply step 3 of this algorithm 1691to the replacement text of the entity.</p></item> 1692<item><p>For a white space character (#x20, #xD, #xA, #x9), append a space 1693character (#x20) to the normalized value.</p></item> 1694<item><p>For another character, append the character to the normalized value.</p> 1695</item> 1696</ulist> 1697</item> 1698</olist> 1699<p>If the attribute type is not CDATA, then the XML processor must further 1700process the normalized attribute value by discarding any leading and trailing 1701space (#x20) characters, and by replacing sequences of space (#x20) characters 1702by a single space (#x20) character.</p> 1703<p>Note that if the unnormalized attribute value contains a character reference 1704to a white space character other than space (#x20), the normalized value contains 1705the referenced character itself (#xD, #xA or #x9). This contrasts with the 1706case where the unnormalized value contains a white space character (not a 1707reference), which is replaced with a space character (#x20) in the normalized 1708value and also contrasts with the case where the unnormalized value contains 1709an entity reference whose replacement text contains a white space character; 1710being recursively processed, the white space character is replaced with a 1711space character (#x20) in the normalized value.</p> 1712<p>All attributes for which no declaration has been read should be treated 1713by a non-validating <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E95">[E95]</loc>processor</phrase> 1714as if declared <kw>CDATA</kw>.</p> 1715<p>Following are examples of attribute normalization. Given the following 1716declarations:</p> 1717<eg><!ENTITY d "&#xD;"> 1718<!ENTITY a "&#xA;"> 1719<!ENTITY da "&#xD;&#xA;"></eg> 1720<p>the attribute specifications in the left column below would be normalized 1721to the character sequences of the middle column if the attribute <att>a</att> 1722is declared <kw>NMTOKENS</kw> and to those of the right columns if <att>a</att> 1723is declared <kw>CDATA</kw>.</p> 1724<table border="1" frame="border"><thead><tr><th>Attribute specification</th> 1725<th>a is NMTOKENS</th><th>a is CDATA</th></tr></thead><tbody><tr><td><eg>a=" 1726 1727xyz"</eg></td><td><code>x y z</code></td><td><code>#x20 #x20 x y z</code></td> 1728</tr><tr><td><eg>a="&d;&d;A&a;&a;B&da;"</eg></td><td><code>A 1729#x20 B</code></td><td><code>#x20 #x20 A #x20 #x20 B #x20 #x20</code></td> 1730</tr><tr><td><eg>a= 1731"&#xd;&#xd;A&#xa;&#xa;B&#xd;&#xa;"</eg></td><td><code>#xD 1732#xD A #xA #xA B #xD #xA</code></td><td><code>#xD #xD A #xA #xA B #xD #xD</code></td> 1733</tr></tbody></table> 1734<p>Note that the last example is invalid (but well-formed) if <att>a</att> 1735is declared to be of type <kw>NMTOKENS</kw>.</p> 1736</div3> 1737</div2> 1738<div2 id="sec-condition-sect"> 1739<head>Conditional Sections</head> 1740<p><termdef id="dt-cond-section" term="conditional section"> <term>Conditional 1741sections</term> are portions of the <termref def="dt-doctype">document type 1742declaration external subset</termref> which are included in, or excluded from, 1743the logical structure of the DTD based on the keyword which governs them.</termdef></p> 1744<scrap lang="ebnf"> 1745<head>Conditional Section</head> 1746<prodgroup pcw2="9" pcw4="14.5"> 1747<prod id="NT-conditionalSect"> 1748<lhs>conditionalSect</lhs><rhs><nt def="NT-includeSect">includeSect</nt> | <nt 1749def="NT-ignoreSect">ignoreSect</nt> </rhs> 1750</prod> 1751<prod id="NT-includeSect"> 1752<lhs>includeSect</lhs><rhs>'<![' S? 'INCLUDE' S? '[' <nt def="NT-extSubsetDecl">extSubsetDecl</nt> 1753']]>' </rhs><com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E90">[E90]</loc></com> 1754<vc def="condsec-nesting" diff="add"/> 1755</prod> 1756<prod id="NT-ignoreSect"> 1757<lhs>ignoreSect</lhs><rhs>'<![' S? 'IGNORE' S? '[' <nt def="NT-ignoreSectContents">ignoreSectContents</nt>* 1758']]>'</rhs><com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E90">[E90]</loc></com> 1759<vc def="condsec-nesting" diff="add"/> 1760</prod> 1761<prod id="NT-ignoreSectContents"> 1762<lhs>ignoreSectContents</lhs><rhs><nt def="NT-Ignore">Ignore</nt> ('<![' <nt 1763def="NT-ignoreSectContents">ignoreSectContents</nt> ']]>' <nt def="NT-Ignore">Ignore</nt>)*</rhs> 1764</prod> 1765<prod id="NT-Ignore"> 1766<lhs>Ignore</lhs><rhs><nt def="NT-Char">Char</nt>* - (<nt def="NT-Char">Char</nt>* 1767('<![' | ']]>') <nt def="NT-Char">Char</nt>*) </rhs> 1768</prod> 1769</prodgroup></scrap> 1770<vcnote id="condsec-nesting" diff="add"><head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E90">[E90]</loc>Proper 1771Conditional Section/PE Nesting</head><p>If any of the "<code><![</code>", 1772"<code>[</code>", or "<code>]]></code>" of a conditional section is contained 1773in the replacement text for a parameter-entity reference, all of them must 1774be contained in the same replacement text.</p> 1775</vcnote> 1776<p>Like the internal and external DTD subsets, a conditional section may contain 1777one or more complete declarations, comments, processing instructions, or nested 1778conditional sections, intermingled with white space.</p> 1779<p>If the keyword of the conditional section is <kw>INCLUDE</kw>, then the 1780contents of the conditional section are part of the DTD. If the keyword of 1781the conditional section is <kw>IGNORE</kw>, then the contents of the conditional 1782section are not logically part of the DTD. <phrase diff="del"><loc role="erratumref" 1783href="http://www.w3.org/XML/xml-19980210-errata#E90">[E90]</loc>Note that 1784for reliable parsing, the contents of even ignored conditional sections must 1785be read in order to detect nested conditional sections and ensure that the 1786end of the outermost (ignored) conditional section is properly detected.</phrase> 1787If a conditional section with a keyword of <kw>INCLUDE</kw> occurs within 1788a larger conditional section with a keyword of <kw>IGNORE</kw>, both the outer 1789and the inner conditional sections are ignored.<phrase diff="add"> <loc role="erratumref" 1790href="http://www.w3.org/XML/xml-19980210-errata#E90">[E90]</loc>The contents 1791of an ignored conditional section are parsed by ignoring all characters after 1792the "<code>[</code>" following the keyword, except conditional section starts 1793"<code><![</code>" and ends "<code>]]></code>", until the matching conditional 1794section end is found. Parameter entity references are not recognized in this 1795process.</phrase></p> 1796<p>If the keyword of the conditional section is a parameter-entity reference, 1797the parameter entity must be replaced by its content before the processor 1798decides whether to include or ignore the conditional section.</p> 1799<p>An example:</p> 1800<eg><!ENTITY % draft 'INCLUDE' > 1801<!ENTITY % final 'IGNORE' > 1802 1803<![%draft;[ 1804<!ELEMENT book (comments*, title, body, supplements?)> 1805]]> 1806<![%final;[ 1807<!ELEMENT book (title, body, supplements?)> 1808]]></eg> 1809</div2> 1810<!-- 1811<div2 id='sec-pass-to-app'> 1812<head>XML Processor Treatment of Logical Structure</head> 1813<p>When an XML processor encounters a start-tag, it must make 1814at least the following information available to the application: 1815<ulist> 1816<item> 1817<p>the element type's generic identifier</p> 1818</item> 1819<item> 1820<p>the names of attributes known to apply to this element type 1821(validating processors must make available names of all attributes 1822declared for the element type; non-validating processors must 1823make available at least the names of the attributes for which 1824values are specified. 1825</p> 1826</item> 1827</ulist> 1828</p> 1829</div2> 1830--> 1831</div1> 1832<!-- &Entities; --> 1833<div1 id="sec-physical-struct"> 1834<head>Physical Structures</head> 1835<p><termdef id="dt-entity" term="Entity">An XML document may consist of one 1836or many storage units. <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E6">[E6]</loc>These 1837are called <term>entities</term>; they all have <term>content</term> and are 1838all (except for the <termref def="dt-docent">document entity</termref> and 1839the <termref def="dt-doctype">external DTD subset</termref>) identified by 1840entity <term>name</term></phrase>.</termdef> Each XML document has one entity 1841called the <termref def="dt-docent">document entity</termref>, which serves 1842as the starting point for the <termref def="dt-xml-proc">XML processor</termref> 1843and may contain the whole document.</p> 1844<p>Entities may be either parsed or unparsed. <termdef id="dt-parsedent" term="Text Entity">A <term>parsed 1845entity's</term> contents are referred to as its <termref def="dt-repltext">replacement 1846text</termref>; this <termref def="dt-text">text</termref> is considered an 1847integral part of the document.</termdef></p> 1848<p><termdef id="dt-unparsed" term="Unparsed Entity">An <term>unparsed entity</term> 1849is a resource whose contents may or may not be <termref def="dt-text">text</termref>, 1850and if text, <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E25">[E25]</loc>may 1851be other than</phrase> XML. Each unparsed entity has an associated <termref 1852def="dt-notation">notation</termref>, identified by name. Beyond a requirement 1853that an XML processor make the identifiers for the entity and notation available 1854to the application, XML places no constraints on the contents of unparsed 1855entities.</termdef></p> 1856<p>Parsed entities are invoked by name using entity references; unparsed entities 1857by name, given in the value of <kw>ENTITY</kw> or <kw>ENTITIES</kw> attributes.</p> 1858<p><termdef id="gen-entity" term="general entity"><term>General entities</term> 1859are entities for use within the document content. In this specification, general 1860entities are sometimes referred to with the unqualified term <emph>entity</emph> 1861when this leads to no ambiguity.</termdef> <termdef id="dt-PE" term="Parameter entity"><phrase 1862diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E53">[E53]</loc><term>Parameter 1863entities</term></phrase> are parsed entities for use within the DTD.</termdef> 1864These two types of entities use different forms of reference and are recognized 1865in different contexts. Furthermore, they occupy different namespaces; a parameter 1866entity and a general entity with the same name are two distinct entities.</p> 1867<div2 id="sec-references"> 1868<head>Character and Entity References</head> 1869<p><termdef id="dt-charref" term="Character Reference"> A <term>character 1870reference</term> refers to a specific character in the ISO/IEC 10646 character 1871set, for example one not directly accessible from available input devices.</termdef></p> 1872<scrap lang="ebnf"> 1873<head>Character Reference</head> 1874<prod id="NT-CharRef"> 1875<lhs>CharRef</lhs><rhs>'&#' [0-9]+ ';' </rhs> 1876<rhs>| '&hcro;' [0-9a-fA-F]+ ';'</rhs><wfc def="wf-Legalchar"/> 1877</prod> 1878</scrap> 1879<wfcnote id="wf-Legalchar"><head>Legal Character</head><p>Characters referred 1880to using character references must match the production for <termref def="NT-Char">Char</termref>.</p> 1881</wfcnote> 1882<p>If the character reference begins with <quote><code>&#x</code></quote>, 1883the digits and letters up to the terminating <code>;</code> provide a hexadecimal 1884representation of the character's code point in ISO/IEC 10646. If it begins 1885just with <quote><code>&#</code></quote>, the digits up to the terminating <code>;</code> 1886provide a decimal representation of the character's code point.</p> 1887<p><termdef id="dt-entref" term="Entity Reference">An <term>entity reference</term> 1888refers to the content of a named entity.</termdef> <termdef id="dt-GERef" 1889term="General Entity Reference">References to parsed general entities use 1890ampersand (<code>&</code>) and semicolon (<code>;</code>) as delimiters.</termdef> <termdef 1891id="dt-PERef" term="Parameter-entity reference"> <term>Parameter-entity references</term> 1892use percent-sign (<code>%</code>) and semicolon (<code>;</code>) as delimiters.</termdef></p> 1893<scrap lang="ebnf"> 1894<head>Entity Reference</head> 1895<prod id="NT-Reference"> 1896<lhs>Reference</lhs><rhs><nt def="NT-EntityRef">EntityRef</nt> | <nt def="NT-CharRef">CharRef</nt></rhs> 1897</prod> 1898<prod id="NT-EntityRef"> 1899<lhs>EntityRef</lhs><rhs>'&' <nt def="NT-Name">Name</nt> ';'</rhs><wfc 1900def="wf-entdeclared"/><vc def="vc-entdeclared"/><wfc def="textent"/><wfc def="norecursion"/> 1901</prod> 1902<prod id="NT-PEReference"> 1903<lhs>PEReference</lhs><rhs>'%' <nt def="NT-Name">Name</nt> ';'</rhs><vc def="vc-entdeclared"/> 1904<wfc def="norecursion"/><wfc def="indtd"/> 1905</prod> 1906</scrap> 1907<wfcnote id="wf-entdeclared"><head>Entity Declared</head><p>In a document 1908without any DTD, a document with only an internal DTD subset which contains 1909no parameter entity references, or a document with <quote><code>standalone='yes'</code></quote>, <phrase 1910diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E34">[E34]</loc>for 1911an entity reference that does not occur within the external subset or a parameter 1912entity, the <nt def="NT-Name">Name</nt> given in the entity reference must <termref 1913def="dt-match">match</termref> that in an <titleref href="#sec-entity-decl">entity 1914declaration</titleref> that does not occur within the external subset or a 1915parameter entity</phrase>, except that well-formed documents need not declare 1916any of the following entities: &magicents;. <phrase diff="del"><loc role="erratumref" 1917href="http://www.w3.org/XML/xml-19980210-errata#E29">[E29]</loc>The declaration 1918of a parameter entity must precede any reference to it. Similarly, </phrase>The 1919declaration of a general entity must precede any reference to it which appears 1920in a default value in an attribute-list declaration.</p> 1921<p>Note that if entities are declared in the external subset or in external 1922parameter entities, a non-validating processor is <titleref href="#include-if-valid">not 1923obligated to</titleref> read and process their declarations; for such documents, 1924the rule that an entity must be declared is a well-formedness constraint only 1925if <titleref href="#sec-rmd">standalone='yes'</titleref>.</p> 1926</wfcnote> 1927<vcnote id="vc-entdeclared"><head>Entity Declared</head><p>In a document with 1928an external subset or external parameter entities with <quote><code>standalone='no'</code></quote>, 1929the <nt def="NT-Name">Name</nt> given in the entity reference must <termref 1930def="dt-match">match</termref> that in an <titleref href="#sec-entity-decl">entity 1931declaration</titleref>. For interoperability, valid documents should declare 1932the entities &magicents;, in the form specified in <specref ref="sec-predefined-ent"/>. 1933The declaration of a parameter entity must precede any reference to it. Similarly, 1934the declaration of a general entity must precede any <phrase diff="chg"><loc 1935role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E92">[E92]</loc>attribute-list 1936declaration containing a default value with a direct or indirect reference 1937to that general entity.</phrase></p> 1938</vcnote> 1939<!-- FINAL EDIT: is this duplication too clumsy? --> 1940<wfcnote id="textent"><head>Parsed Entity</head><p>An entity reference must 1941not contain the name of an <termref def="dt-unparsed">unparsed entity</termref>. 1942Unparsed entities may be referred to only in <termref def="dt-attrval">attribute 1943values</termref> declared to be of type <kw>ENTITY</kw> or <kw>ENTITIES</kw>.</p> 1944</wfcnote> 1945<wfcnote id="norecursion"><head>No Recursion</head><p>A parsed entity must 1946not contain a recursive reference to itself, either directly or indirectly.</p> 1947</wfcnote> 1948<wfcnote id="indtd"><head>In DTD</head><p>Parameter-entity references may 1949only appear in the <termref def="dt-doctype">DTD</termref>.</p> 1950</wfcnote> 1951<p>Examples of character and entity references:</p> 1952<eg>Type <key>less-than</key> (&hcro;3C;) to save options. 1953This document was prepared on &docdate; and 1954is classified &security-level;.</eg> 1955<p>Example of a parameter-entity reference:</p> 1956<eg><![CDATA[<!-- declare the parameter entity "ISOLat2"... --> 1957<!ENTITY % ISOLat2 1958 SYSTEM "http://www.xml.com/iso/isolat2-xml.entities" > 1959<!-- ... now reference it. --> 1960%ISOLat2;]]></eg> 1961</div2> 1962<div2 id="sec-entity-decl"> 1963<head>Entity Declarations</head> 1964<p><termdef id="dt-entdecl" term="entity declaration"> Entities are declared 1965thus:</termdef></p> 1966<scrap lang="ebnf"> 1967<head>Entity Declaration</head> 1968<prodgroup pcw2="5" pcw4="18.5"> 1969<prod id="NT-EntityDecl"> 1970<lhs>EntityDecl</lhs><rhs><nt def="NT-GEDecl">GEDecl</nt><!--</rhs><com>General entities</com> 1971<rhs>--> | <nt def="NT-PEDecl">PEDecl</nt></rhs> 1972<!--<com>Parameter entities</com>--> 1973</prod> 1974<prod id="NT-GEDecl"> 1975<lhs>GEDecl</lhs><rhs>'<!ENTITY' <nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt> <nt 1976def="NT-S">S</nt> <nt def="NT-EntityDef">EntityDef</nt> <nt def="NT-S">S</nt>? 1977'>'</rhs> 1978</prod> 1979<prod id="NT-PEDecl"> 1980<lhs>PEDecl</lhs><rhs>'<!ENTITY' <nt def="NT-S">S</nt> '%' <nt def="NT-S">S</nt> <nt 1981def="NT-Name">Name</nt> <nt def="NT-S">S</nt> <nt def="NT-PEDef">PEDef</nt> <nt 1982def="NT-S">S</nt>? '>'</rhs> 1983<!--<com>Parameter entities</com>--> 1984</prod> 1985<prod id="NT-EntityDef"> 1986<lhs>EntityDef</lhs><rhs><nt def="NT-EntityValue">EntityValue</nt> <!--</rhs> 1987<rhs>-->| (<nt def="NT-ExternalID">ExternalID</nt> <nt def="NT-NDataDecl">NDataDecl</nt>?)</rhs> 1988<!-- <nt def='NT-ExternalDef'>ExternalDef</nt></rhs> --> 1989</prod> 1990<!-- FINAL EDIT: what happened to WFs here? --> 1991<prod id="NT-PEDef"> 1992<lhs>PEDef</lhs><rhs><nt def="NT-EntityValue">EntityValue</nt> | <nt def="NT-ExternalID">ExternalID</nt></rhs> 1993</prod> 1994</prodgroup></scrap> 1995<p>The <nt def="NT-Name">Name</nt> identifies the entity in an <termref def="dt-entref">entity 1996reference</termref> or, in the case of an unparsed entity, in the value of 1997an <kw>ENTITY</kw> or <kw>ENTITIES</kw> attribute. If the same entity is declared 1998more than once, the first declaration encountered is binding; at user option, 1999an XML processor may issue a warning if entities are declared multiple times.</p> 2000<div3 id="sec-internal-ent"> 2001<head>Internal Entities</head> 2002<p><termdef id="dt-internent" term="Internal Entity Replacement Text">If the 2003entity definition is an <nt def="NT-EntityValue">EntityValue</nt>, the defined 2004entity is called an <term>internal entity</term>. There is no separate physical 2005storage object, and the content of the entity is given in the declaration.</termdef> 2006Note that some processing of entity and character references in the <termref 2007def="dt-litentval">literal entity value</termref> may be required to produce 2008the correct <termref def="dt-repltext">replacement text</termref>: see <specref 2009ref="intern-replacement"/>.</p> 2010<p>An internal entity is a <termref def="dt-parsedent">parsed entity</termref>.</p> 2011<p>Example of an internal entity declaration:</p> 2012<eg><!ENTITY Pub-Status "This is a pre-release of the 2013 specification."></eg> 2014</div3> 2015<div3 id="sec-external-ent"> 2016<head>External Entities</head> 2017<p><termdef id="dt-extent" term="External Entity">If the entity is not internal, 2018it is an <term>external entity</term>, declared as follows:</termdef></p> 2019<scrap lang="ebnf"> 2020<head>External Entity Declaration</head> 2021<!-- 2022<prod id='NT-ExternalDef'><lhs>ExternalDef</lhs> 2023<rhs></prod> --> 2024<prod id="NT-ExternalID"> 2025<lhs>ExternalID</lhs><rhs>'SYSTEM' <nt def="NT-S">S</nt> <nt def="NT-SystemLiteral">SystemLiteral</nt></rhs> 2026<rhs>| 'PUBLIC' <nt def="NT-S">S</nt> <nt def="NT-PubidLiteral">PubidLiteral</nt> <nt 2027def="NT-S">S</nt> <nt def="NT-SystemLiteral">SystemLiteral</nt> </rhs> 2028</prod> 2029<prod id="NT-NDataDecl"> 2030<lhs>NDataDecl</lhs><rhs><nt def="NT-S">S</nt> 'NDATA' <nt def="NT-S">S</nt> <nt 2031def="NT-Name">Name</nt></rhs><vc def="not-declared"/> 2032</prod> 2033</scrap> 2034<p>If the <nt def="NT-NDataDecl">NDataDecl</nt> is present, this is a general <termref 2035def="dt-unparsed">unparsed entity</termref>; otherwise it is a parsed entity.</p> 2036<vcnote id="not-declared"><head>Notation Declared</head><p>The <nt def="NT-Name">Name</nt> 2037must match the declared name of a <termref def="dt-notation">notation</termref>.</p> 2038</vcnote> 2039<p><phrase diff="chg"><termdef id="dt-sysid" term="System Identifier">The <nt 2040def="NT-SystemLiteral">SystemLiteral</nt> is called the entity's <term>system 2041identifier</term>. It is a <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E88">[E88]</loc>URI 2042reference</phrase><phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E66">[E66]</loc> 2043(as defined in <bibref ref="rfc2396"/>, updated by <bibref ref="rfc2732"/>)</phrase>, <loc 2044role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E76">[E76]</loc>meant 2045to be dereferenced to obtain input for the XML processor to construct the 2046entity's replacement text.</termdef> It is an error for a fragment identifier 2047(beginning with a <code>#</code> character) to be part of a system identifier.</phrase> 2048Unless otherwise provided by information outside the scope of this specification 2049(e.g. a special XML element type defined by a particular DTD, or a processing 2050instruction defined by a particular application specification), relative URIs 2051are relative to the location of the resource within which the entity declaration 2052occurs. A URI might thus be relative to the <termref def="dt-docent">document 2053entity</termref>, to the entity containing the <termref def="dt-doctype">external 2054DTD subset</termref>, or to some other <termref def="dt-extent">external parameter 2055entity</termref>.</p> 2056<p diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E78">[E78]</loc>URI 2057references require encoding and escaping of certain characters. The disallowed 2058characters include all non-ASCII characters, plus the excluded characters 2059listed in Section 2.4 of <bibref ref="rfc2396"/>, except for the number sign 2060(<code>#</code>) and percent sign (<code>%</code>) characters and the square 2061bracket characters re-allowed in <bibref ref="rfc2732"/>. Disallowed characters 2062must be escaped as follows:</p> 2063<olist diff="add"> 2064<item><p>Each disallowed character is converted to UTF-8 <bibref ref="rfc2279"/> 2065as one or more bytes.</p></item> 2066<item><p>Any octets corresponding to a disallowed character are escaped with 2067the URI escaping mechanism (that is, converted to <code>%</code><var>HH</var>, 2068where HH is the hexadecimal notation of the byte value).</p></item> 2069<item><p>The original character is replaced by the resulting character sequence.</p> 2070</item> 2071</olist> 2072<p><termdef id="dt-pubid" term="Public identifier"> In addition to a system 2073identifier, an external identifier may include a <term>public identifier</term>.</termdef> 2074An XML processor attempting to retrieve the entity's content may use the public 2075identifier to try to generate an alternative <phrase diff="chg"><loc role="erratumref" 2076href="http://www.w3.org/XML/xml-19980210-errata#E88">[E88]</loc>URI reference</phrase>. 2077If the processor is unable to do so, it must use the <phrase diff="chg"><loc 2078role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E88">[E88]</loc>URI 2079reference</phrase> specified in the system literal. Before a match is attempted, 2080all strings of white space in the public identifier must be normalized to 2081single space characters (#x20), and leading and trailing white space must 2082be removed.</p> 2083<p>Examples of external entity declarations:</p> 2084<eg><!ENTITY open-hatch 2085 SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml"> 2086<!ENTITY open-hatch 2087 PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN" 2088 "http://www.textuality.com/boilerplate/OpenHatch.xml"> 2089<!ENTITY hatch-pic 2090 SYSTEM "/grafix/OpenHatch.gif" 2091 NDATA gif ></eg> 2092</div3> 2093</div2> 2094<div2 id="TextEntities"> 2095<head>Parsed Entities</head> 2096<div3 id="sec-TextDecl"> 2097<head>The Text Declaration</head> 2098<p>External parsed entities <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E107">[E107]</loc>should</phrase 2099> each begin with a <term>text declaration</term>.</p> 2100<scrap lang="ebnf"> 2101<head>Text Declaration</head> 2102<prodgroup pcw4="12.5" pcw5="13"> 2103<prod id="NT-TextDecl"> 2104<lhs>TextDecl</lhs><rhs>&pio; <nt def="NT-VersionInfo">VersionInfo</nt>? <nt 2105def="NT-EncodingDecl">EncodingDecl</nt> <nt def="NT-S">S</nt>? &pic;</rhs> 2106</prod> 2107</prodgroup></scrap> 2108<p>The text declaration must be provided literally, not by reference to a 2109parsed entity. No text declaration may appear at any position other than the 2110beginning of an external parsed entity. <phrase diff="add"><loc role="erratumref" 2111href="http://www.w3.org/XML/xml-19980210-errata#E94">[E94]</loc>The text declaration 2112in an external parsed entity is not considered part of its <termref def="dt-repltext">replacement 2113text</termref>.</phrase></p> 2114</div3> 2115<div3 id="wf-entities"> 2116<head>Well-Formed Parsed Entities</head> 2117<p>The document entity is well-formed if it matches the production labeled <nt 2118def="NT-document">document</nt>. An external general parsed entity is well-formed 2119if it matches the production labeled <nt def="NT-extParsedEnt">extParsedEnt</nt>. <phrase 2120diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc>All 2121external parameter entities are well-formed by definition.</phrase></p> 2122<scrap lang="ebnf"> 2123<head>Well-Formed External Parsed Entity</head> 2124<prod id="NT-extParsedEnt"> 2125<lhs>extParsedEnt</lhs><rhs><nt def="NT-TextDecl">TextDecl</nt>? <nt def="NT-content">content</nt></rhs> 2126</prod> 2127<prod id="NT-extPE" diff="del"> 2128<lhs>extPE</lhs><rhs><nt def="NT-TextDecl">TextDecl</nt>? <nt def="NT-extSubsetDecl">extSubsetDecl</nt></rhs> 2129<com><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E109">[E109]</loc></com> 2130</prod> 2131</scrap> 2132<p>An internal general parsed entity is well-formed if its replacement text 2133matches the production labeled <nt def="NT-content">content</nt>. All internal 2134parameter entities are well-formed by definition.</p> 2135<p>A consequence of well-formedness in entities is that the logical and physical 2136structures in an XML document are properly nested; no <termref def="dt-stag">start-tag</termref>, <termref 2137def="dt-etag">end-tag</termref>, <termref def="dt-empty">empty-element tag</termref>, <termref 2138def="dt-element">element</termref>, <termref def="dt-comment">comment</termref>, <termref 2139def="dt-pi">processing instruction</termref>, <termref def="dt-charref">character 2140reference</termref>, or <termref def="dt-entref">entity reference</termref> 2141can begin in one entity and end in another.</p> 2142</div3> 2143<div3 id="charencoding"> 2144<head>Character Encoding in Entities</head> 2145<p>Each external parsed entity in an XML document may use a different encoding 2146for its characters. All XML processors must be able to read entities in <phrase 2147diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E56">[E56]</loc>both 2148the UTF-8 and UTF-16 encodings.</phrase> <phrase diff="add"><loc role="erratumref" 2149href="http://www.w3.org/XML/xml-19980210-errata#E77">[E77]</loc>The terms <quote>UTF-8</quote> 2150and <quote>UTF-16</quote> in this specification do not apply to character 2151encodings with any other labels, even if the encodings or labels are very 2152similar to UTF-8 or UTF-16.</phrase></p> 2153<p>Entities encoded in UTF-16 must begin with the Byte Order Mark described 2154by <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc>Annex 2155F of <bibref ref="ISO10646"/>, Annex H of <bibref ref="ISO10646-2000"/>, section 21562.4 of <bibref ref="Unicode"/>, and section 2.7 of <bibref ref="Unicode3"/></phrase> 2157(the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an encoding signature, 2158not part of either the markup or the character data of the XML document. XML 2159processors must be able to use this character to differentiate between UTF-8 2160and UTF-16 encoded documents.</p> 2161<p>Although an XML processor is required to read only entities in the UTF-8 2162and UTF-16 encodings, it is recognized that other encodings are used around 2163the world, and it may be desired for XML processors to read entities that 2164use them. <phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E47">[E47]</loc>In 2165the absence of external character encoding information (such as MIME headers),</phrase> 2166parsed entities which are stored in an encoding other than UTF-8 or UTF-16 2167must begin with a text declaration <phrase diff="add">(see <specref ref="sec-TextDecl"/>) </phrase>containing 2168an encoding declaration:</p> 2169<scrap lang="ebnf"> 2170<head>Encoding Declaration</head> 2171<prod id="NT-EncodingDecl"> 2172<lhs>EncodingDecl</lhs><rhs><nt def="NT-S">S</nt> 'encoding' <nt def="NT-Eq">Eq</nt> 2173('"' <nt def="NT-EncName">EncName</nt> '"' | "'" <nt def="NT-EncName">EncName</nt> 2174"'" ) </rhs> 2175</prod> 2176<prod id="NT-EncName"> 2177<lhs>EncName</lhs><rhs>[A-Za-z] ([A-Za-z0-9._] | '-')*</rhs><com>Encoding 2178name contains only Latin characters</com> 2179</prod> 2180</scrap> 2181<p>In the <termref def="dt-docent">document entity</termref>, the encoding 2182declaration is part of the <termref def="dt-xmldecl">XML declaration</termref>. 2183The <nt def="NT-EncName">EncName</nt> is the name of the encoding used.</p> 2184<!-- FINAL EDIT: check name of IANA and charset names --> 2185<p>In an encoding declaration, the values <quote><code>UTF-8</code></quote>, <quote><code>UTF-16</code></quote>, <quote><code>ISO-10646-UCS-2</code 2186></quote>, and <quote><code>ISO-10646-UCS-4</code></quote> should be used 2187for the various encodings and transformations of Unicode / ISO/IEC 10646, 2188the values <quote><code>ISO-8859-1</code></quote>, <quote><code>ISO-8859-2</code></quote>, 2189... <loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E106">[E106]</loc><phrase 2190diff="chg"><quote><code>ISO-8859-</code><var>n</var></quote> (where <var>n</var> 2191is the part number)</phrase> should be used for the parts of ISO 8859, and 2192the values <quote><code>ISO-2022-JP</code></quote>, <quote><code>Shift_JIS</code></quote>, 2193and <quote><code>EUC-JP</code></quote> should be used for the various encoded 2194forms of JIS X-0208-1997. <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E57">[E57]</loc>It 2195is recommended that character encodings registered (as <emph>charset</emph>s) 2196with the Internet Assigned Numbers Authority <phrase diff="chg"><loc role="erratumref" 2197href="http://www.w3.org/XML/xml-19980210-errata#E58">[E58]</loc><bibref ref="IANA"/></phrase>, 2198other than those just listed, be referred to using their registered names; 2199other encodings should use names starting with an <quote>x-</quote> prefix. 2200XML processors should match character encoding names in a case-insensitive 2201way and should either interpret an IANA-registered name as the encoding registered 2202at IANA for that name or treat it as unknown (processors are, of course, not 2203required to support all IANA-registered encodings).</phrase></p> 2204<p>In the absence of information provided by an external transport protocol 2205(e.g. HTTP or MIME), it is an <termref def="dt-error">error</termref> for 2206an entity including an encoding declaration to be presented to the XML processor 2207in an encoding other than that named in the declaration, <phrase diff="del"><loc 2208role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E5">[E5]</loc>for 2209an encoding declaration to occur other than at the beginning of an external 2210entity, </phrase>or for an entity which begins with neither a Byte Order Mark 2211nor an encoding declaration to use an encoding other than UTF-8. Note that 2212since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly 2213need an encoding declaration.</p> 2214<p diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E5">[E5]</loc>It 2215is <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E36">[E36]</loc>a 2216fatal</phrase> error for a <nt def="NT-TextDecl">TextDecl</nt> to occur other 2217than at the beginning of an external entity.</p> 2218<p>It is a <termref def="dt-fatal">fatal error</termref> when an XML processor 2219encounters an entity with an encoding that it is unable to process. <phrase 2220diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E79">[E79]</loc>It 2221is a fatal error if an XML entity is determined (via default, encoding declaration, 2222or higher-level protocol) to be in a certain encoding but contains octet sequences 2223that are not legal in that encoding. It is also a fatal error if an XML entity 2224contains no encoding declaration and its content is not legal UTF-8 or UTF-16.</phrase></p> 2225<p>Examples of <phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E23">[E23]</loc>text 2226declarations containing </phrase>encoding declarations:</p> 2227<eg><?xml encoding='UTF-8'?> 2228<?xml encoding='EUC-JP'?></eg> 2229</div3> 2230</div2> 2231<div2 id="entproc"> 2232<head>XML Processor Treatment of Entities and References</head> 2233<p>The table below summarizes the contexts in which character references, 2234entity references, and invocations of unparsed entities might appear and the 2235required behavior of an <termref def="dt-xml-proc">XML processor</termref> 2236in each case. The labels in the leftmost column describe the recognition context: <glist> 2237<gitem><label>Reference in Content</label> 2238<def> 2239<p>as a reference anywhere after the <termref def="dt-stag">start-tag</termref> 2240and before the <termref def="dt-etag">end-tag</termref> of an element; corresponds 2241to the nonterminal <nt def="NT-content">content</nt>.</p> 2242</def></gitem> 2243<gitem><label>Reference in Attribute Value</label> 2244<def> 2245<p>as a reference within either the value of an attribute in a <termref def="dt-stag">start-tag</termref>, 2246or a default value in an <termref def="dt-attdecl">attribute declaration</termref>; 2247corresponds to the nonterminal <nt def="NT-AttValue">AttValue</nt>.</p> 2248</def></gitem> 2249<gitem><label>Occurs as Attribute Value</label> 2250<def> 2251<p>as a <nt def="NT-Name">Name</nt>, not a reference, appearing either as 2252the value of an attribute which has been declared as type <kw>ENTITY</kw>, 2253or as one of the space-separated tokens in the value of an attribute which 2254has been declared as type <kw>ENTITIES</kw>.</p> 2255</def></gitem> 2256<gitem><label>Reference in Entity Value</label> 2257<def> 2258<p>as a reference within a parameter or internal entity's <termref def="dt-litentval">literal 2259entity value</termref> in the entity's declaration; corresponds to the nonterminal <nt 2260def="NT-EntityValue">EntityValue</nt>.</p> 2261</def></gitem> 2262<gitem><label>Reference in DTD</label> 2263<def> 2264<p diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E90">[E90]</loc>as 2265a reference within either the internal or external subsets of the <termref 2266def="dt-doctype">DTD</termref>, but outside of an <nt def="NT-EntityValue">EntityValue</nt>, <nt 2267def="NT-AttValue">AttValue</nt>, <nt def="NT-PI">PI</nt>, <nt def="NT-Comment">Comment</nt>, <nt 2268def="NT-SystemLiteral">SystemLiteral</nt>, <nt def="NT-PubidLiteral">PubidLiteral</nt>, 2269or the contents of an ignored conditional section (see <specref ref="sec-condition-sect"/>).</p> 2270<p>.</p> 2271</def></gitem> 2272</glist></p> 2273<table border="1" frame="border" cellpadding="7"><tbody align="center"><tr> 2274<td rowspan="2" colspan="1"></td><td colspan="4" align="center" valign="bottom">Entity 2275Type</td><td rowspan="2" align="center">Character</td></tr><tr align="center" 2276valign="bottom"><td>Parameter</td><td>Internal General</td><td>External Parsed 2277General</td><td>Unparsed</td></tr><tr align="center" valign="middle"><td align="right">Reference 2278in Content</td><td><titleref href="#not-recognized">Not recognized</titleref></td> 2279<td><titleref href="#included">Included</titleref></td><td><titleref href="#include-if-valid">Included 2280if validating</titleref></td><td><titleref href="#forbidden">Forbidden</titleref></td> 2281<td><titleref href="#included">Included</titleref></td></tr><tr align="center" 2282valign="middle"><td align="right">Reference in Attribute Value</td><td><titleref 2283href="#not-recognized">Not recognized</titleref></td><td><titleref href="#inliteral">Included 2284in literal</titleref></td><td><titleref href="#forbidden">Forbidden</titleref></td> 2285<td><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E51">[E51]</loc><titleref 2286diff="chg" href="#forbidden">Forbidden</titleref></td><td><titleref href="#included">Included</titleref></td> 2287</tr><tr align="center" valign="middle"><td align="right">Occurs as Attribute 2288Value</td><td><titleref href="#not-recognized">Not recognized</titleref></td> 2289<td><titleref href="#forbidden">Forbidden</titleref></td><td><loc role="erratumref" 2290href="http://www.w3.org/XML/xml-19980210-errata#E51">[E51]</loc><titleref 2291diff="chg" href="#forbidden">Forbidden</titleref></td><td><titleref href="#notify">Notify</titleref></td> 2292<td><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E51">[E51]</loc><titleref 2293diff="chg" href="#not-recognized">Not recognized</titleref></td></tr><tr align="center" 2294valign="middle"><td align="right">Reference in EntityValue</td><td><titleref 2295href="#inliteral">Included in literal</titleref></td><td><titleref href="#bypass">Bypassed</titleref></td> 2296<td><titleref href="#bypass">Bypassed</titleref></td><td><titleref href="#forbidden">Forbidden</titleref></td> 2297<td><titleref href="#included">Included</titleref></td></tr><tr align="center" 2298valign="middle"><td align="right">Reference in DTD</td><td><titleref href="#as-PE">Included 2299as PE</titleref></td><td><titleref href="#forbidden">Forbidden</titleref></td> 2300<td><titleref href="#forbidden">Forbidden</titleref></td><td><titleref href="#forbidden">Forbidden</titleref></td> 2301<td><titleref href="#forbidden">Forbidden</titleref></td></tr></tbody></table> 2302<div3 id="not-recognized"> 2303<head>Not Recognized</head> 2304<p>Outside the DTD, the <code>%</code> character has no special significance; 2305thus, what would be parameter entity references in the DTD are not recognized 2306as markup in <nt def="NT-content">content</nt>. Similarly, the names of unparsed 2307entities are not recognized except when they appear in the value of an appropriately 2308declared attribute.</p> 2309</div3> 2310<div3 id="included"> 2311<head>Included</head> 2312<p><termdef id="dt-include" term="Include">An entity is <term>included</term> 2313when its <termref def="dt-repltext">replacement text</termref> is retrieved 2314and processed, in place of the reference itself, as though it were part of 2315the document at the location the reference was recognized.</termdef> The replacement 2316text may contain both <termref def="dt-chardata">character data</termref> 2317and (except for parameter entities) <termref def="dt-markup">markup</termref>, 2318which must be recognized in the usual way<phrase diff="del"><loc role="erratumref" 2319href="http://www.w3.org/XML/xml-19980210-errata#E65">[E65]</loc>, except that 2320the replacement text of entities used to escape markup delimiters (the entities &magicents;) 2321is always treated as data</phrase>. (The string <quote><code>AT&amp;T;</code></quote> 2322expands to <quote><code>AT&T;</code></quote> and the remaining ampersand 2323is not recognized as an entity-reference delimiter.) A character reference 2324is <term>included</term> when the indicated character is processed in place 2325of the reference itself. </p> 2326</div3> 2327<div3 id="include-if-valid"> 2328<head>Included If Validating</head> 2329<p>When an XML processor recognizes a reference to a parsed entity, in order 2330to <termref def="dt-valid">validate</termref> the document, the processor 2331must <termref def="dt-include">include</termref> its replacement text. If 2332the entity is external, and the processor is not attempting to validate the 2333XML document, the processor <termref def="dt-may">may</termref>, but need 2334not, include the entity's replacement text. If a non-validating <phrase diff="chg"><loc 2335role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E95">[E95]</loc>processor</phrase> 2336does not include the replacement text, it must inform the application that 2337it recognized, but did not read, the entity.</p> 2338<p>This rule is based on the recognition that the automatic inclusion provided 2339by the SGML and XML entity mechanism, primarily designed to support modularity 2340in authoring, is not necessarily appropriate for other applications, in particular 2341document browsing. Browsers, for example, when encountering an external parsed 2342entity reference, might choose to provide a visual indication of the entity's 2343presence and retrieve it for display only on demand.</p> 2344</div3> 2345<div3 id="forbidden"> 2346<head>Forbidden</head> 2347<p>The following are forbidden, and constitute <termref def="dt-fatal">fatal</termref> 2348errors:</p> 2349<ulist> 2350<item><p>the appearance of a reference to an <termref def="dt-unparsed">unparsed 2351entity</termref>.</p></item> 2352<item><p>the appearance of any character or general-entity reference in the 2353DTD except within an <nt def="NT-EntityValue">EntityValue</nt> or <nt def="NT-AttValue">AttValue</nt>.</p> 2354</item> 2355<item><p>a reference to an external entity in an attribute value.</p></item> 2356</ulist> 2357</div3> 2358<div3 id="inliteral"> 2359<head>Included in Literal</head> 2360<p>When an <termref def="dt-entref">entity reference</termref> appears in 2361an attribute value, or a parameter entity reference appears in a literal entity 2362value, its <termref def="dt-repltext">replacement text</termref> is processed 2363in place of the reference itself as though it were part of the document at 2364the location the reference was recognized, except that a single or double 2365quote character in the replacement text is always treated as a normal data 2366character and will not terminate the literal. For example, this is well-formed:</p> 2367<eg diff="chg"><!-- <loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E4">[E4]</loc> --> 2368<![CDATA[<!ENTITY % YN '"Yes"' > 2369<!ENTITY WhatHeSaid "He said %YN;" >]]></eg> 2370<p>while this is not:</p> 2371<eg><!ENTITY EndAttr "27'" > 2372<element attribute='a-&EndAttr;></eg> 2373</div3> 2374<div3 id="notify"> 2375<head>Notify</head> 2376<p>When the name of an <termref def="dt-unparsed">unparsed entity</termref> 2377appears as a token in the value of an attribute of declared type <kw>ENTITY</kw> 2378or <kw>ENTITIES</kw>, a validating processor must inform the application of 2379the <termref def="dt-sysid">system</termref> and <termref def="dt-pubid">public</termref> 2380(if any) identifiers for both the entity and its associated <termref def="dt-notation">notation</termref>.</p> 2381</div3> 2382<div3 id="bypass"> 2383<head>Bypassed</head> 2384<p>When a general entity reference appears in the <nt def="NT-EntityValue">EntityValue</nt> 2385in an entity declaration, it is bypassed and left as is.</p> 2386</div3> 2387<div3 id="as-PE"> 2388<head>Included as PE</head> 2389<p>Just as with external parsed entities, parameter entities need only be <titleref 2390href="#include-if-valid">included if validating</titleref>. When a parameter-entity 2391reference is recognized in the DTD and included, its <termref def="dt-repltext">replacement 2392text</termref> is enlarged by the attachment of one leading and one following 2393space (#x20) character; the intent is to constrain the replacement text of 2394parameter entities to contain an integral number of grammatical tokens in 2395the DTD. <phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E96">[E96]</loc>This 2396behavior does not apply to parameter entity references within entity values; 2397these are described in <specref ref="inliteral"/>.</phrase></p> 2398</div3> 2399</div2> 2400<div2 id="intern-replacement"> 2401<head>Construction of Internal Entity Replacement Text</head> 2402<p>In discussing the treatment of internal entities, it is useful to distinguish 2403two forms of the entity's value. <termdef id="dt-litentval" term="Literal Entity Value">The <term>literal 2404entity value</term> is the quoted string actually present in the entity declaration, 2405corresponding to the non-terminal <nt def="NT-EntityValue">EntityValue</nt>.</termdef> <termdef 2406id="dt-repltext" term="Replacement Text">The <term>replacement text</term> 2407is the content of the entity, after replacement of character references and 2408parameter-entity references.</termdef></p> 2409<p>The literal entity value as given in an internal entity declaration (<nt 2410def="NT-EntityValue">EntityValue</nt>) may contain character, parameter-entity, 2411and general-entity references. Such references must be contained entirely 2412within the literal entity value. The actual replacement text that is <termref 2413def="dt-include">included</termref> as described above must contain the <emph>replacement 2414text</emph> of any parameter entities referred to, and must contain the character 2415referred to, in place of any character references in the literal entity value; 2416however, general-entity references must be left as-is, unexpanded. For example, 2417given the following declarations:</p> 2418<eg><![CDATA[<!ENTITY % pub "Éditions Gallimard" > 2419<!ENTITY rights "All rights reserved" > 2420<!ENTITY book "La Peste: Albert Camus, 2421© 1947 %pub;. &rights;" >]]></eg> 2422<p>then the replacement text for the entity <quote><code>book</code></quote> 2423is:</p> 2424<eg>La Peste: Albert Camus, 2425� 1947 �ditions Gallimard. &rights;</eg> 2426<p>The general-entity reference <quote><code>&rights;</code></quote> would 2427be expanded should the reference <quote><code>&book;</code></quote> appear 2428in the document's content or an attribute value.</p> 2429<p>These simple rules may have complex interactions; for a detailed discussion 2430of a difficult example, see <specref ref="sec-entexpand"/>.</p> 2431</div2> 2432<div2 id="sec-predefined-ent"> 2433<head>Predefined Entities</head> 2434<p><termdef id="dt-escape" term="escape">Entity and character references can 2435both be used to <term>escape</term> the left angle bracket, ampersand, and 2436other delimiters. A set of general entities (&magicents;) is specified for 2437this purpose. Numeric character references may also be used; they are expanded 2438immediately when recognized and must be treated as character data, so the 2439numeric character references <quote><code>&#60;</code></quote> and <quote><code>&#38;</code></quote> 2440may be used to escape <code><</code> and <code>&</code> when they occur 2441in character data.</termdef></p> 2442<p>All XML processors must recognize these entities whether they are declared 2443or not. <termref def="dt-interop">For interoperability</termref>, valid XML 2444documents should declare these entities, like any others, before using them. <phrase 2445diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E80">[E80]</loc>If 2446the entities <code>lt</code> or <code>amp</code> are declared, they must be 2447declared as internal entities whose replacement text is a character reference 2448to the <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E103">[E103]</loc>respective 2449character (less-than sign or ampersand)</phrase> being escaped; the double 2450escaping is required for these entities so that references to them produce 2451a well-formed result. If the entities <code>gt</code>, <code>apos</code>, 2452or <code>quot</code> are declared, they must be declared as internal entities 2453whose replacement text is the single character being escaped (or a character 2454reference to that character; the double escaping here is unnecessary but harmless). 2455For example:</phrase></p> 2456<eg><![CDATA[<!ENTITY lt "&#60;"> 2457<!ENTITY gt ">"> 2458<!ENTITY amp "&#38;"> 2459<!ENTITY apos "'"> 2460<!ENTITY quot """>]]></eg> 2461<p diff="del">Note that the <code><</code> and <code>&</code> characters 2462in the declarations of <quote><code>lt</code></quote> and <quote><code>amp</code></quote> 2463are doubly escaped to meet the requirement that entity replacement be well-formed.</p> 2464</div2> 2465<div2 id="Notations"> 2466<head>Notation Declarations</head> 2467<p><termdef id="dt-notation" term="Notation"><term>Notations</term> identify 2468by name the format of <termref def="dt-extent">unparsed entities</termref>, 2469the format of elements which bear a notation attribute, or the application 2470to which a <termref def="dt-pi">processing instruction</termref> is addressed.</termdef></p> 2471<p><termdef id="dt-notdecl" term="Notation Declaration"> <term>Notation declarations</term> 2472provide a name for the notation, for use in entity and attribute-list declarations 2473and in attribute specifications, and an external identifier for the notation 2474which may allow an XML processor or its client application to locate a helper 2475application capable of processing data in the given notation.</termdef></p> 2476<scrap lang="ebnf"> 2477<head>Notation Declarations</head> 2478<prod id="NT-NotationDecl"> 2479<lhs>NotationDecl</lhs><rhs>'<!NOTATION' <nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt> <nt 2480def="NT-S">S</nt> (<nt def="NT-ExternalID">ExternalID</nt> | <nt def="NT-PublicID">PublicID</nt>) <nt 2481def="NT-S">S</nt>? '>'</rhs><vc def="UniqueNotationName" diff="add"/> 2482</prod> 2483<prod id="NT-PublicID"> 2484<lhs>PublicID</lhs><rhs>'PUBLIC' <nt def="NT-S">S</nt> <nt def="NT-PubidLiteral">PubidLiteral</nt> </rhs> 2485</prod> 2486</scrap> 2487<vcnote id="UniqueNotationName" diff="add"><head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E22">[E22]</loc>Unique 2488Notation Name</head><p>Only one notation declaration can declare a given <nt 2489def="NT-Name">Name</nt>.</p> 2490</vcnote> 2491<p>XML processors must provide applications with the name and external identifier(s) 2492of any notation declared and referred to in an attribute value, attribute 2493definition, or entity declaration. They may additionally resolve the external 2494identifier into the <termref def="dt-sysid">system identifier</termref>, file 2495name, or other information needed to allow the application to call a processor 2496for data in the notation described. (It is not an error, however, for XML 2497documents to declare and refer to notations for which notation-specific applications 2498are not available on the system where the XML processor or application is 2499running.)</p> 2500</div2> 2501<div2 id="sec-doc-entity"> 2502<head>Document Entity</head> 2503<p><termdef id="dt-docent" term="Document Entity">The <term>document entity</term> 2504serves as the root of the entity tree and a starting-point for an <termref 2505def="dt-xml-proc">XML processor</termref>.</termdef> This specification does 2506not specify how the document entity is to be located by an XML processor; 2507unlike other entities, the document entity has no name and might well appear 2508on a processor input stream without any identification at all.</p> 2509</div2> 2510</div1> 2511<!-- &Conformance; --> 2512<div1 id="sec-conformance"> 2513<head>Conformance</head> 2514<div2 id="proc-types"> 2515<head>Validating and Non-Validating Processors</head> 2516<p>Conforming <termref def="dt-xml-proc">XML processors</termref> fall into 2517two classes: validating and non-validating.</p> 2518<p>Validating and non-validating processors alike must report violations of 2519this specification's well-formedness constraints in the content of the <termref 2520def="dt-docent">document entity</termref> and any other <termref def="dt-parsedent">parsed 2521entities</termref> that they read.</p> 2522<p><termdef id="dt-validating" term="Validating Processor"><term>Validating 2523processors</term> must<phrase diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E21">[E21]</loc>, 2524at user option,</phrase> report violations of the constraints expressed by 2525the declarations in the <termref def="dt-doctype">DTD</termref>, and failures 2526to fulfill the validity constraints given in this specification.</termdef> 2527To accomplish this, validating XML processors must read and process the entire 2528DTD and all external parsed entities referenced in the document.</p> 2529<p>Non-validating processors are required to check only the <termref def="dt-docent">document 2530entity</termref>, including the entire internal DTD subset, for well-formedness. <termdef 2531id="dt-use-mdecl" term="Process Declarations"> While they are not required 2532to check the document for validity, they are required to <term>process</term> 2533all the declarations they read in the internal DTD subset and in any parameter 2534entity that they read, up to the first reference to a parameter entity that 2535they do <emph>not</emph> read; that is to say, they must use the information 2536in those declarations to <titleref href="#AVNormalize">normalize</titleref> 2537attribute values, <titleref href="#included">include</titleref> the replacement 2538text of internal entities, and supply <titleref href="#sec-attr-defaults">default 2539attribute values</titleref>.</termdef> <phrase diff="add"><loc role="erratumref" 2540href="http://www.w3.org/XML/xml-19980210-errata#E33">[E33]</loc>Except when <code>standalone="yes"</code>, </phrase>they 2541must not <termref def="dt-use-mdecl">process</termref> <termref def="dt-entdecl">entity 2542declarations</termref> or <termref def="dt-attdecl">attribute-list declarations</termref> 2543encountered after a reference to a parameter entity that is not read, since 2544the entity may have contained overriding declarations.</p> 2545</div2> 2546<div2 id="safe-behavior"> 2547<head>Using XML Processors</head> 2548<p>The behavior of a validating XML processor is highly predictable; it must 2549read every piece of a document and report all well-formedness and validity 2550violations. Less is required of a non-validating processor; it need not read 2551any part of the document other than the document entity. This has two effects 2552that may be important to users of XML processors:</p> 2553<ulist> 2554<item><p>Certain well-formedness errors, specifically those that require reading 2555external entities, may not be detected by a non-validating processor. Examples 2556include the constraints entitled <titleref href="#wf-entdeclared">Entity Declared</titleref>, <titleref 2557href="#textent">Parsed Entity</titleref>, and <titleref href="#norecursion">No 2558Recursion</titleref>, as well as some of the cases described as <titleref 2559href="#forbidden">forbidden</titleref> in <specref ref="entproc"/>.</p></item> 2560<item><p>The information passed from the processor to the application may 2561vary, depending on whether the processor reads parameter and external entities. 2562For example, a non-validating processor may not <titleref href="#AVNormalize">normalize</titleref> 2563attribute values, <titleref href="#included">include</titleref> the replacement 2564text of internal entities, or supply <titleref href="#sec-attr-defaults">default 2565attribute values</titleref>, where doing so depends on having read declarations 2566in external or parameter entities.</p></item> 2567</ulist> 2568<p>For maximum reliability in interoperating between different XML processors, 2569applications which use non-validating processors should not rely on any behaviors 2570not required of such processors. Applications which require facilities such 2571as the use of default attributes or internal entities which are declared in 2572external entities should use validating XML processors.</p> 2573</div2> 2574</div1> 2575<div1 id="sec-notation"> 2576<head>Notation</head> 2577<p>The formal grammar of XML is given in this specification using a simple 2578Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines 2579one symbol, in the form</p> 2580<eg>symbol ::= expression</eg> 2581<p>Symbols are written with an initial capital letter if they are <phrase 2582diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E42">[E42]</loc>the 2583start symbol of a regular language,</phrase> otherwise with an initial lower 2584case letter. Literal strings are quoted.</p> 2585<p>Within the expression on the right-hand side of a rule, the following expressions 2586are used to match strings of one or more characters: <glist> 2587<gitem><label><code>#xN</code></label> 2588<def> 2589<p>where <code>N</code> is a hexadecimal integer, the expression matches the 2590character in ISO/IEC 10646 whose canonical (UCS-4) code value, when interpreted 2591as an unsigned binary number, has the value indicated. The number of leading 2592zeros in the <code>#xN</code> form is insignificant; the number of leading 2593zeros in the corresponding code value is governed by the character encoding 2594in use and is not significant for XML.</p> 2595</def></gitem> 2596<gitem><label><code>[a-zA-Z]</code>, <code>[#xN-#xN]</code></label> 2597<def> 2598<p>matches any <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E93">[E93]</loc><nt 2599def="NT-Char">Char</nt></phrase> with a value in the range(s) indicated (inclusive).</p> 2600</def></gitem> 2601<gitem diff="add"><label><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E3">[E3]</loc><code>[abc]</code>, <code>[#xN#xN#xN]</code 2602></label> 2603<def> 2604<p>matches any <nt def="NT-Char">Char</nt> with a value among the characters 2605enumerated. Enumerations and ranges can be mixed in one set of brackets.</p> 2606</def></gitem> 2607<gitem><label><code>[^a-z]</code>, <code>[^#xN-#xN]</code></label> 2608<def> 2609<p>matches any <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E93">[E93]</loc><nt 2610def="NT-Char">Char</nt></phrase> with a value <emph>outside</emph> the range 2611indicated.</p> 2612</def></gitem> 2613<gitem><label><code>[^abc]</code>, <code>[^#xN#xN#xN]</code></label> 2614<def> 2615<p>matches any <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E93">[E93]</loc><nt 2616def="NT-Char">Char</nt></phrase> with a value not among the characters given. <phrase 2617diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E3">[E3]</loc>Enumerations 2618and ranges of forbidden values can be mixed in one set of brackets.</phrase></p> 2619</def></gitem> 2620<gitem><label><code>"string"</code></label> 2621<def> 2622<p>matches a literal string <termref def="dt-match">matching</termref> that 2623given inside the double quotes.</p> 2624</def></gitem> 2625<gitem><label><code>'string'</code></label> 2626<def> 2627<p>matches a literal string <termref def="dt-match">matching</termref> that 2628given inside the single quotes.</p> 2629</def></gitem> 2630</glist> These symbols may be combined to match more complex patterns as follows, 2631where <code>A</code> and <code>B</code> represent simple expressions: <glist> 2632<gitem><label>(<code>expression</code>)</label> 2633<def> 2634<p><code>expression</code> is treated as a unit and may be combined as described 2635in this list.</p> 2636</def></gitem> 2637<gitem><label><code>A?</code></label> 2638<def> 2639<p>matches <code>A</code> or nothing; optional <code>A</code>.</p> 2640</def></gitem> 2641<gitem><label><code>A B</code></label> 2642<def> 2643<p>matches <code>A</code> followed by <code>B</code>. <phrase diff="add"><loc 2644role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E20">[E20]</loc>This 2645operator has higher precedence than alternation; thus <code>A B | C D</code> 2646is identical to <code>(A B) | (C D)</code>.</phrase></p> 2647</def></gitem> 2648<gitem><label><code>A | B</code></label> 2649<def> 2650<p>matches <code>A</code> or <code>B</code> but not both.</p> 2651</def></gitem> 2652<gitem><label><code>A - B</code></label> 2653<def> 2654<p>matches any string that matches <code>A</code> but does not match <code>B</code>.</p> 2655</def></gitem> 2656<gitem><label><code>A+</code></label> 2657<def> 2658<p>matches one or more occurrences of <code>A</code>.<phrase diff="add"><loc 2659role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E20">[E20]</loc>Concatenation 2660has higher precedence than alternation; thus <code>A+ | B+</code> is identical 2661to <code>(A+) | (B+)</code>.</phrase></p> 2662</def></gitem> 2663<gitem><label><code>A*</code></label> 2664<def> 2665<p>matches zero or more occurrences of <code>A</code>. <phrase diff="add"><loc 2666role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E20">[E20]</loc>Concatenation 2667has higher precedence than alternation; thus <code>A* | B*</code> is identical 2668to <code>(A*) | (B*)</code>.</phrase></p> 2669</def></gitem> 2670</glist> Other notations used in the productions are: <glist> 2671<gitem><label><code>/* ... */</code></label> 2672<def> 2673<p>comment.</p> 2674</def></gitem> 2675<gitem><label><code>[ wfc: ... ]</code></label> 2676<def> 2677<p>well-formedness constraint; this identifies by name a constraint on <termref 2678def="dt-wellformed">well-formed</termref> documents associated with a production.</p> 2679</def></gitem> 2680<gitem><label><code>[ vc: ... ]</code></label> 2681<def> 2682<p>validity constraint; this identifies by name a constraint on <termref def="dt-valid">valid</termref> 2683documents associated with a production.</p> 2684</def></gitem> 2685</glist></p> 2686</div1> 2687</body><back> 2688<!-- &SGML; --> 2689<!-- &Biblio; --> 2690<div1 id="sec-bibliography"> 2691<head>References</head> 2692<div2 id="sec-existing-stds"> 2693<head>Normative References</head> 2694<blist> 2695<bibl id="IANA" diff="chg" key="IANA-CHARSETS"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E58">[E58]</loc>(Internet 2696Assigned Numbers Authority) <titleref>Official Names for Character Sets</titleref>, 2697ed. Keld Simonsen et al. See <loc href="ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets">ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets</loc 2698>. </bibl> 2699<bibl id="RFC1766" href="http://www.ietf.org/rfc/rfc1766.txt" key="IETF RFC 1766">IETF 2700(Internet Engineering Task Force). <titleref>RFC 1766: Tags for the Identification 2701of Languages</titleref>, ed. H. Alvestrand. 1995.</bibl> 2702<bibl id="ISO639-old" diff="del" key="ISO 639"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E38">[E38]</loc> 2703(International Organization for Standardization). <titleref>ISO 639:1988 (E). 2704Code for the representation of names of languages.</titleref> [Geneva]: International 2705Organization for Standardization, 1988.</bibl> 2706<bibl id="ISO3166-old" diff="del" key="ISO 3166"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E38">[E38]</loc> 2707(International Organization for Standardization). <titleref>ISO 3166-1:1997 2708(E). Codes for the representation of names of countries and their subdivisions — 2709Part 1: Country codes</titleref> [Geneva]: International Organization for 2710Standardization, 1997.</bibl> 2711<bibl id="ISO10646" key="ISO/IEC 10646">ISO (International Organization for 2712Standardization). <titleref>ISO/IEC 10646-1993 (E). Information technology — 2713Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture 2714and Basic Multilingual Plane.</titleref> [Geneva]: International Organization 2715for Standardization, 1993 (plus amendments AM 1 through AM 7).</bibl> 2716<bibl id="ISO10646-2000" diff="add" key="ISO/IEC 10646-2000"><loc role="erratumref" 2717href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc> ISO (International 2718Organization for Standardization). <titleref>ISO/IEC 10646-1:2000. Information 2719technology — Universal Multiple-Octet Coded Character Set (UCS) — 2720Part 1: Architecture and Basic Multilingual Plane.</titleref> [Geneva]: International 2721Organization for Standardization, 2000.</bibl> 2722<bibl id="Unicode" key="Unicode">The Unicode Consortium. <emph>The Unicode 2723Standard, Version 2.0.</emph> Reading, Mass.: Addison-Wesley Developers Press, 27241996.</bibl> 2725<bibl id="Unicode3" diff="add" key="Unicode3"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc> 2726The Unicode Consortium. <emph>The Unicode Standard, Version 3.0.</emph> Reading, 2727Mass.: Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5.</bibl> 2728</blist></div2> 2729<div2 id="null"> 2730<!-- 2731ID made "null" to match its previous value in the First 2732Edition; it's odd, but if there's no set value, the stylesheet 2733currently generates an odd string that would be backwards 2734incompatible with any references anyone might have made before. 2735--> 2736<head>Other References</head> 2737<blist> 2738<bibl id="Aho" key="Aho/Ullman">Aho, Alfred V., Ravi Sethi, and Jeffrey D. 2739Ullman. <titleref>Compilers: Principles, Techniques, and Tools</titleref>. 2740Reading: Addison-Wesley, 1986, rpt. corr. 1988.</bibl> 2741<bibl id="Berners-Lee" key="Berners-Lee et al."> Berners-Lee, T., R. Fielding, 2742and L. Masinter. <titleref>Uniform Resource Identifiers (URI): Generic Syntax 2743and Semantics</titleref>. 1997. (Work in progress; see updates to RFC1738.)</bibl> 2744<bibl id="ABK" diff="chg" key="Br�ggemann-Klein"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E2">[E2]</loc>Br�ggemann-Klein, 2745Anne. Formal Models in Document Processing. Habilitationsschrift. Faculty 2746of Mathematics at the University of Freiburg, 1993. (See <loc href="ftp://ftp.informatik.uni-freiburg.de/documents/papers/brueggem/habil.ps">ftp://ftp.informatik.uni-freiburg.de/documents/papers/brueggem/habil.ps</loc 2747>.)</bibl> 2748<bibl id="ABKDW" diff="chg" key="Br�ggemann-Klein and Wood"><loc role="erratumref" 2749href="http://www.w3.org/XML/xml-19980210-errata#E2">[E2]</loc>Br�ggemann-Klein, 2750Anne, and Derick Wood. <titleref>Deterministic Regular Languages</titleref>. 2751Universit�t Freiburg, Institut f�r Informatik, Bericht 38, Oktober 1991. Extended 2752abstract in A. Finkel, M. Jantzen, Hrsg., STACS 1992, S. 173-184. Springer-Verlag, 2753Berlin 1992. Lecture Notes in Computer Science 577. Full version titled <titleref>One-Unambiguous 2754Regular Languages</titleref> in Information and Computation 140 (2): 229-253, 2755February 1998.</bibl> 2756<bibl id="Clark" key="Clark">James Clark. Comparison of SGML and XML. See <loc 2757href="http://www.w3.org/TR/NOTE-sgml-xml-971215">http://www.w3.org/TR/NOTE-sgml-xml-971215</loc>. </bibl> 2758<bibl id="IANA-LANGCODES" diff="add" href="http://www.isi.edu/in-notes/iana/assignments/languages/" 2759key="IANA-LANGCODES"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E58">[E58]</loc>(Internet 2760Assigned Numbers Authority) <titleref>Registry of Language Tags</titleref>, 2761ed. Keld Simonsen et al.</bibl> 2762<bibl id="RFC1738" diff="del" href="http://www.ietf.org/rfc/rfc1738.txt" key="IETF RFC1738">IETF 2763(Internet Engineering Task Force). <titleref>RFC 1738: Uniform Resource Locators 2764(URL)</titleref>, ed. T. Berners-Lee, L. Masinter, M. McCahill. 1994. </bibl> 2765<bibl id="RFC1808" diff="del" href="http://www.ietf.org/rfc/rfc1808.txt" key="IETF RFC1808">IETF 2766(Internet Engineering Task Force). <titleref>RFC 1808: Relative Uniform Resource 2767Locators</titleref>, ed. R. Fielding. 1995. </bibl> 2768<bibl id="RFC2141" href="http://www.ietf.org/rfc/rfc2141.txt" key="IETF RFC2141">IETF 2769(Internet Engineering Task Force). <emph>RFC 2141: URN Syntax</emph>, ed. 2770R. Moats. 1997. </bibl> 2771<bibl id="rfc2279" diff="add" href="http://www.ietf.org/rfc/rfc2279.txt" key="IETF RFC 2279"><loc 2772role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E78">[E78]</loc>IETF 2773(Internet Engineering Task Force). <titleref>RFC 2279: UTF-8, a transformation 2774format of ISO 10646</titleref>, <phrase diff="add">ed. F. Yergeau, </phrase>1998.</bibl> 2775<bibl id="rfc2376" diff="add" href="http://www.ietf.org/rfc/rfc2376.txt" key="IETF RFC 2376"><loc 2776role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E48">[E48]</loc>IETF 2777(Internet Engineering Task Force). <titleref>RFC 2376: XML Media Types</titleref>. 2778ed. E. Whitehead, M. Murata. 1998.</bibl> 2779<bibl id="rfc2396" diff="add" href="http://www.ietf.org/rfc/rfc2396.txt" key="IETF RFC 2396"><loc 2780role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E66">[E66]</loc>IETF 2781(Internet Engineering Task Force). <titleref>RFC 2396: Uniform Resource Identifiers 2782(URI): Generic Syntax</titleref>. T. Berners-Lee, R. Fielding, L. Masinter. 27831998.</bibl> 2784<bibl id="rfc2732" diff="add" href="http://www.ietf.org/rfc/rfc2732.txt" key="IETF RFC 2732"><loc 2785role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E66">[E66]</loc>IETF 2786(Internet Engineering Task Force). <titleref>RFC 2732: Format for Literal 2787IPv6 Addresses in URL's</titleref>. R. Hinden, B. Carpenter, L. Masinter. 27881999.</bibl> 2789<bibl id="rfc2781" diff="add" href="http://www.ietf.org/rfc/rfc2781.txt" key="IETF RFC 2781"><loc 2790role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E77">[E77]</loc> 2791IETF (Internet Engineering Task Force). <emph>RFC 2781: UTF-16, an encoding 2792of ISO 10646</emph>, ed. P. Hoffman, F. Yergeau. 2000.</bibl> 2793<bibl id="ISO639" diff="add" key="ISO 639"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E38">[E38]</loc> 2794(International Organization for Standardization). <titleref>ISO 639:1988 (E). 2795Code for the representation of names of languages.</titleref> [Geneva]: International 2796Organization for Standardization, 1988.</bibl> 2797<bibl id="ISO3166" diff="add" key="ISO 3166"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E38">[E38]</loc> 2798(International Organization for Standardization). <titleref>ISO 3166-1:1997 2799(E). Codes for the representation of names of countries and their subdivisions — 2800Part 1: Country codes</titleref> [Geneva]: International Organization for 2801Standardization, 1997.</bibl> 2802<bibl id="ISO8879" key="ISO 8879">ISO (International Organization for Standardization). <titleref>ISO 28038879:1986(E). Information processing — Text and Office Systems — 2804Standard Generalized Markup Language (SGML).</titleref> First edition — 28051986-10-15. [Geneva]: International Organization for Standardization, 1986. </bibl> 2806<bibl id="ISO10744" key="ISO/IEC 10744">ISO (International Organization for 2807Standardization). <titleref>ISO/IEC 10744-1992 (E). Information technology — 2808Hypermedia/Time-based Structuring Language (HyTime). </titleref> [Geneva]: 2809International Organization for Standardization, 1992. <emph>Extended Facilities 2810Annexe.</emph> [Geneva]: International Organization for Standardization, 1996. </bibl> 2811<bibl id="websgml" diff="add" href="http://www.sgmlsource.com/8879rev/n0029.htm" 2812key="WEBSGML"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E43">[E43]</loc>ISO 2813(International Organization for Standardization). <titleref>ISO 8879:1986 2814TC2. Information technology — Document Description and Processing Languages. </titleref> 2815[Geneva]: International Organization for Standardization, 1998.</bibl> 2816<bibl id="xml-names" diff="add" xmlns:xlink="http://www.w3.org/TR/WD-xlink" 2817href="http://www.w3.org/TR/REC-xml-names/" key="XML Names"><loc role="erratumref" 2818href="http://www.w3.org/XML/xml-19980210-errata#E98">[E98]</loc>Tim Bray, 2819Dave Hollander, and Andrew Layman, editors. <titleref>Namespaces in XML</titleref>. 2820Textuality, Hewlett-Packard, and Microsoft. World Wide Web Consortium, 1999.</bibl> 2821</blist></div2> 2822</div1> 2823<div1 id="CharClasses"> 2824<head>Character Classes</head> 2825<p>Following the characteristics defined in the Unicode standard, characters 2826are classed as base characters (among others, these contain the alphabetic 2827characters of the Latin alphabet<phrase diff="del"><loc role="erratumref" 2828href="http://www.w3.org/XML/xml-19980210-errata#E84">[E84]</loc>, without 2829diacritics</phrase>), ideographic characters, and combining characters (among 2830others, this class contains most diacritics)<phrase diff="del"><loc role="erratumref" 2831href="http://www.w3.org/XML/xml-19980210-errata#E30">[E30]</loc>; these classes 2832combine to form the class of letters.</phrase> Digits and extenders are also 2833distinguished.</p> 2834<scrap id="CHARACTERS" lang="ebnf"> 2835<head>Characters</head> 2836<prodgroup pcw3="3" pcw4="15"> 2837<prod id="NT-Letter"> 2838<lhs>Letter</lhs><rhs><nt def="NT-BaseChar">BaseChar</nt> | <nt def="NT-Ideographic">Ideographic</nt></rhs> 2839</prod> 2840<prod id="NT-BaseChar"> 2841<lhs>BaseChar</lhs><rhs>[#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] 2842| [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131] | [#x0134-#x013E] 2843| [#x0141-#x0148] | [#x014A-#x017E] | [#x0180-#x01C3] | [#x01CD-#x01F0] 2844| [#x01F4-#x01F5] | [#x01FA-#x0217] | [#x0250-#x02A8] | [#x02BB-#x02C1] 2845| #x0386 | [#x0388-#x038A] | #x038C | [#x038E-#x03A1] 2846| [#x03A3-#x03CE] | [#x03D0-#x03D6] | #x03DA | #x03DC 2847| #x03DE | #x03E0 | [#x03E2-#x03F3] | [#x0401-#x040C] 2848| [#x040E-#x044F] | [#x0451-#x045C] | [#x045E-#x0481] | [#x0490-#x04C4] 2849| [#x04C7-#x04C8] | [#x04CB-#x04CC] | [#x04D0-#x04EB] | [#x04EE-#x04F5] 2850| [#x04F8-#x04F9] | [#x0531-#x0556] | #x0559 | [#x0561-#x0586] 2851| [#x05D0-#x05EA] | [#x05F0-#x05F2] | [#x0621-#x063A] | [#x0641-#x064A] 2852| [#x0671-#x06B7] | [#x06BA-#x06BE] | [#x06C0-#x06CE] | [#x06D0-#x06D3] 2853| #x06D5 | [#x06E5-#x06E6] | [#x0905-#x0939] | #x093D 2854| [#x0958-#x0961] | [#x0985-#x098C] | [#x098F-#x0990] | [#x0993-#x09A8] 2855| [#x09AA-#x09B0] | #x09B2 | [#x09B6-#x09B9] | [#x09DC-#x09DD] 2856| [#x09DF-#x09E1] | [#x09F0-#x09F1] | [#x0A05-#x0A0A] | [#x0A0F-#x0A10] 2857| [#x0A13-#x0A28] | [#x0A2A-#x0A30] | [#x0A32-#x0A33] | [#x0A35-#x0A36] 2858| [#x0A38-#x0A39] | [#x0A59-#x0A5C] | #x0A5E | [#x0A72-#x0A74] 2859| [#x0A85-#x0A8B] | #x0A8D | [#x0A8F-#x0A91] | [#x0A93-#x0AA8] 2860| [#x0AAA-#x0AB0] | [#x0AB2-#x0AB3] | [#x0AB5-#x0AB9] | #x0ABD 2861| #x0AE0 | [#x0B05-#x0B0C] | [#x0B0F-#x0B10] | [#x0B13-#x0B28] 2862| [#x0B2A-#x0B30] | [#x0B32-#x0B33] | [#x0B36-#x0B39] | #x0B3D 2863| [#x0B5C-#x0B5D] | [#x0B5F-#x0B61] | [#x0B85-#x0B8A] | [#x0B8E-#x0B90] 2864| [#x0B92-#x0B95] | [#x0B99-#x0B9A] | #x0B9C | [#x0B9E-#x0B9F] 2865| [#x0BA3-#x0BA4] | [#x0BA8-#x0BAA] | [#x0BAE-#x0BB5] | [#x0BB7-#x0BB9] 2866| [#x0C05-#x0C0C] | [#x0C0E-#x0C10] | [#x0C12-#x0C28] | [#x0C2A-#x0C33] 2867| [#x0C35-#x0C39] | [#x0C60-#x0C61] | [#x0C85-#x0C8C] | [#x0C8E-#x0C90] 2868| [#x0C92-#x0CA8] | [#x0CAA-#x0CB3] | [#x0CB5-#x0CB9] | #x0CDE 2869| [#x0CE0-#x0CE1] | [#x0D05-#x0D0C] | [#x0D0E-#x0D10] | [#x0D12-#x0D28] 2870| [#x0D2A-#x0D39] | [#x0D60-#x0D61] | [#x0E01-#x0E2E] | #x0E30 2871| [#x0E32-#x0E33] | [#x0E40-#x0E45] | [#x0E81-#x0E82] | #x0E84 2872| [#x0E87-#x0E88] | #x0E8A | #x0E8D | [#x0E94-#x0E97] 2873| [#x0E99-#x0E9F] | [#x0EA1-#x0EA3] | #x0EA5 | #x0EA7 2874| [#x0EAA-#x0EAB] | [#x0EAD-#x0EAE] | #x0EB0 | [#x0EB2-#x0EB3] 2875| #x0EBD | [#x0EC0-#x0EC4] | [#x0F40-#x0F47] | [#x0F49-#x0F69] 2876| [#x10A0-#x10C5] | [#x10D0-#x10F6] | #x1100 | [#x1102-#x1103] 2877| [#x1105-#x1107] | #x1109 | [#x110B-#x110C] | [#x110E-#x1112] 2878| #x113C | #x113E | #x1140 | #x114C | #x114E | #x1150 2879| [#x1154-#x1155] | #x1159 | [#x115F-#x1161] | #x1163 2880| #x1165 | #x1167 | #x1169 | [#x116D-#x116E] | [#x1172-#x1173] 2881| #x1175 | #x119E | #x11A8 | #x11AB | [#x11AE-#x11AF] 2882| [#x11B7-#x11B8] | #x11BA | [#x11BC-#x11C2] | #x11EB 2883| #x11F0 | #x11F9 | [#x1E00-#x1E9B] | [#x1EA0-#x1EF9] 2884| [#x1F00-#x1F15] | [#x1F18-#x1F1D] | [#x1F20-#x1F45] | [#x1F48-#x1F4D] 2885| [#x1F50-#x1F57] | #x1F59 | #x1F5B | #x1F5D | [#x1F5F-#x1F7D] 2886| [#x1F80-#x1FB4] | [#x1FB6-#x1FBC] | #x1FBE | [#x1FC2-#x1FC4] 2887| [#x1FC6-#x1FCC] | [#x1FD0-#x1FD3] | [#x1FD6-#x1FDB] | [#x1FE0-#x1FEC] 2888| [#x1FF2-#x1FF4] | [#x1FF6-#x1FFC] | #x2126 | [#x212A-#x212B] 2889| #x212E | [#x2180-#x2182] | [#x3041-#x3094] | [#x30A1-#x30FA] 2890| [#x3105-#x312C] | [#xAC00-#xD7A3] </rhs> 2891</prod> 2892<prod id="NT-Ideographic"> 2893<lhs>Ideographic</lhs><rhs>[#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029] </rhs> 2894</prod> 2895<prod id="NT-CombiningChar"> 2896<lhs>CombiningChar</lhs><rhs>[#x0300-#x0345] | [#x0360-#x0361] | [#x0483-#x0486] 2897| [#x0591-#x05A1] | [#x05A3-#x05B9] | [#x05BB-#x05BD] | #x05BF 2898| [#x05C1-#x05C2] | #x05C4 | [#x064B-#x0652] | #x0670 2899| [#x06D6-#x06DC] | [#x06DD-#x06DF] | [#x06E0-#x06E4] | [#x06E7-#x06E8] 2900| [#x06EA-#x06ED] | [#x0901-#x0903] | #x093C | [#x093E-#x094C] 2901| #x094D | [#x0951-#x0954] | [#x0962-#x0963] | [#x0981-#x0983] 2902| #x09BC | #x09BE | #x09BF | [#x09C0-#x09C4] | [#x09C7-#x09C8] 2903| [#x09CB-#x09CD] | #x09D7 | [#x09E2-#x09E3] | #x0A02 2904| #x0A3C | #x0A3E | #x0A3F | [#x0A40-#x0A42] | [#x0A47-#x0A48] 2905| [#x0A4B-#x0A4D] | [#x0A70-#x0A71] | [#x0A81-#x0A83] | #x0ABC 2906| [#x0ABE-#x0AC5] | [#x0AC7-#x0AC9] | [#x0ACB-#x0ACD] | [#x0B01-#x0B03] 2907| #x0B3C | [#x0B3E-#x0B43] | [#x0B47-#x0B48] | [#x0B4B-#x0B4D] 2908| [#x0B56-#x0B57] | [#x0B82-#x0B83] | [#x0BBE-#x0BC2] | [#x0BC6-#x0BC8] 2909| [#x0BCA-#x0BCD] | #x0BD7 | [#x0C01-#x0C03] | [#x0C3E-#x0C44] 2910| [#x0C46-#x0C48] | [#x0C4A-#x0C4D] | [#x0C55-#x0C56] | [#x0C82-#x0C83] 2911| [#x0CBE-#x0CC4] | [#x0CC6-#x0CC8] | [#x0CCA-#x0CCD] | [#x0CD5-#x0CD6] 2912| [#x0D02-#x0D03] | [#x0D3E-#x0D43] | [#x0D46-#x0D48] | [#x0D4A-#x0D4D] 2913| #x0D57 | #x0E31 | [#x0E34-#x0E3A] | [#x0E47-#x0E4E] 2914| #x0EB1 | [#x0EB4-#x0EB9] | [#x0EBB-#x0EBC] | [#x0EC8-#x0ECD] 2915| [#x0F18-#x0F19] | #x0F35 | #x0F37 | #x0F39 | #x0F3E 2916| #x0F3F | [#x0F71-#x0F84] | [#x0F86-#x0F8B] | [#x0F90-#x0F95] 2917| #x0F97 | [#x0F99-#x0FAD] | [#x0FB1-#x0FB7] | #x0FB9 2918| [#x20D0-#x20DC] | #x20E1 | [#x302A-#x302F] | #x3099 2919| #x309A </rhs> 2920</prod> 2921<prod id="NT-Digit"> 2922<lhs>Digit</lhs><rhs>[#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] 2923| [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F] | [#x0AE6-#x0AEF] 2924| [#x0B66-#x0B6F] | [#x0BE7-#x0BEF] | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] 2925| [#x0D66-#x0D6F] | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29] </rhs> 2926</prod> 2927<prod id="NT-Extender"> 2928<lhs>Extender</lhs><rhs>#x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 2929| #x0E46 | #x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] 2930| [#x30FC-#x30FE] </rhs> 2931</prod> 2932</prodgroup></scrap> 2933<p>The character classes defined here can be derived from the Unicode <phrase 2934diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc>2.0</phrase> 2935character database as follows:</p> 2936<ulist> 2937<item><p>Name start characters must have one of the categories Ll, Lu, Lo, 2938Lt, Nl.</p></item> 2939<item><p>Name characters other than Name-start characters must have one of 2940the categories Mc, Me, Mn, Lm, or Nd.</p></item> 2941<item><p>Characters in the compatibility area (i.e. with character code greater 2942than #xF900 and less than #xFFFE) are not allowed in XML names.</p></item> 2943<item><p>Characters which have a font or compatibility decomposition (i.e. 2944those with a <quote>compatibility formatting tag</quote> in field 5 of the 2945database -- marked by field 5 beginning with a <quote><</quote>) are not 2946allowed.</p></item> 2947<item><p>The following characters are treated as name-start characters rather 2948than name characters, because the property file classifies them as Alphabetic: 2949[#x02BB-#x02C1], #x0559, #x06E5, #x06E6.</p></item> 2950<item><p>Characters #x20DD-#x20E0 are excluded (in accordance with Unicode <phrase 2951diff="add"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E67">[E67]</loc>2.0</phrase>, 2952section 5.14).</p></item> 2953<item><p>Character #x00B7 is classified as an extender, because the property 2954list so identifies it.</p></item> 2955<item><p>Character #x0387 is added as a name character, because #x00B7 is 2956its canonical equivalent.</p></item> 2957<item><p>Characters ':' and '_' are allowed as name-start characters.</p> 2958</item> 2959<item><p>Characters '-' and '.' are allowed as name characters.</p></item> 2960</ulist> 2961</div1> 2962<inform-div1 id="sec-xml-and-sgml"> 2963<head>XML and SGML</head> 2964<p><phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E43">[E43]</loc>XML 2965is designed to be a subset of SGML, in that every XML document should also 2966be a conforming SGML document.</phrase> For a detailed comparison of the additional 2967restrictions that XML places on documents beyond those of SGML, see <bibref 2968ref="Clark"/>.</p> 2969</inform-div1> 2970<inform-div1 id="sec-entexpand"> 2971<head>Expansion of Entity and Character References</head> 2972<p>This appendix contains some examples illustrating the sequence of entity- 2973and character-reference recognition and expansion, as specified in <specref 2974ref="entproc"/>.</p> 2975<p>If the DTD contains the declaration</p> 2976<eg><![CDATA[<!ENTITY example "<p>An ampersand (&#38;) may be escaped 2977numerically (&#38;#38;) or with a general entity 2978(&amp;).</p>" >]]></eg> 2979<p>then the XML processor will recognize the character references when it 2980parses the entity declaration, and resolve them before storing the following 2981string as the value of the entity <quote><code>example</code></quote>:</p> 2982<eg><![CDATA[<p>An ampersand (&) may be escaped 2983numerically (&#38;) or with a general entity 2984(&amp;).</p>]]></eg> 2985<p>A reference in the document to <quote><code>&example;</code></quote> 2986will cause the text to be reparsed, at which time the start- and end-tags 2987of the <el>p</el> element will be recognized and the three references will 2988be recognized and expanded, resulting in a <el>p</el> element with the following 2989content (all data, no delimiters or markup):</p> 2990<eg><![CDATA[An ampersand (&) may be escaped 2991numerically (&) or with a general entity 2992(&).]]></eg> 2993<p>A more complex example will illustrate the rules and their effects fully. 2994In the following example, the line numbers are solely for reference.</p> 2995<eg><![CDATA[1 <?xml version='1.0'?> 29962 <!DOCTYPE test [ 29973 <!ELEMENT test (#PCDATA) > 29984 <!ENTITY % xx '%zz;'> 29995 <!ENTITY % zz '<!ENTITY tricky "error-prone" >' > 30006 %xx; 30017 ]> 30028 <test>This sample shows a &tricky; method.</test>]]></eg> 3003<p>This produces the following:</p> 3004<ulist spacing="compact"> 3005<item><p>in line 4, the reference to character 37 is expanded immediately, 3006and the parameter entity <quote><code>xx</code></quote> is stored in the symbol 3007table with the value <quote><code>%zz;</code></quote>. Since the replacement 3008text is not rescanned, the reference to parameter entity <quote><code>zz</code></quote> 3009is not recognized. (And it would be an error if it were, since <quote><code>zz</code></quote> 3010is not yet declared.)</p></item> 3011<item><p>in line 5, the character reference <quote><code>&#60;</code></quote> 3012is expanded immediately and the parameter entity <quote><code>zz</code></quote> 3013is stored with the replacement text <quote><code><!ENTITY tricky "error-prone" 3014></code></quote>, which is a well-formed entity declaration.</p></item> 3015<item><p>in line 6, the reference to <quote><code>xx</code></quote> is recognized, 3016and the replacement text of <quote><code>xx</code></quote> (namely <quote><code>%zz;</code></quote>) 3017is parsed. The reference to <quote><code>zz</code></quote> is recognized in 3018its turn, and its replacement text (<quote><code><!ENTITY tricky "error-prone" 3019></code></quote>) is parsed. The general entity <quote><code>tricky</code></quote> 3020has now been declared, with the replacement text <quote><code>error-prone</code></quote>.</p> 3021</item> 3022<item><p>in line 8, the reference to the general entity <quote><code>tricky</code></quote> 3023is recognized, and it is expanded, so the full content of the <el>test</el> 3024element is the self-describing (and ungrammatical) string <emph>This sample 3025shows a error-prone method.</emph></p></item> 3026</ulist> 3027</inform-div1> 3028<inform-div1 id="determinism"> 3029<head>Deterministic Content Models</head> 3030<p><phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E102">[E102]</loc>As 3031noted in <specref ref="sec-element-content"/>, it is required that content 3032models in element type declarations be deterministic. This requirement is <termref 3033def="dt-compat">for compatibility</termref> with SGML (which calls deterministic 3034content models <quote>unambiguous</quote>);</phrase> XML processors built 3035using SGML systems may flag non-deterministic content models as errors.</p> 3036<p>For example, the content model <code>((b, c) | (b, d))</code> is non-deterministic, 3037because given an initial <el>b</el> the <phrase diff="chg"><loc role="erratumref" 3038href="http://www.w3.org/XML/xml-19980210-errata#E95">[E95]</loc>XML processor</phrase> 3039cannot know which <el>b</el> in the model is being matched without looking 3040ahead to see which element follows the <el>b</el>. In this case, the two references 3041to <el>b</el> can be collapsed into a single reference, making the model read <code>(b, 3042(c | d))</code>. An initial <el>b</el> now clearly matches only a single name 3043in the content model. The <phrase diff="chg"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E95">[E95]</loc>processor</phrase 3044> doesn't need to look ahead to see what follows; either <el>c</el> or <el>d</el> 3045would be accepted.</p> 3046<p>More formally: a finite state automaton may be constructed from the content 3047model using the standard algorithms, e.g. algorithm 3.5 in section 3.9 of 3048Aho, Sethi, and Ullman <bibref ref="Aho"/>. In many such algorithms, a follow 3049set is constructed for each position in the regular expression (i.e., each 3050leaf node in the syntax tree for the regular expression); if any position 3051has a follow set in which more than one following position is labeled with 3052the same element type name, then the content model is in error and may be 3053reported as an error.</p> 3054<p>Algorithms exist which allow many but not all non-deterministic content 3055models to be reduced automatically to equivalent deterministic models; see 3056Br�ggemann-Klein 1991 <bibref ref="ABK"/>.</p> 3057</inform-div1> 3058<inform-div1 id="sec-guessing"> 3059<head><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E105">[E105]</loc><loc 3060role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E48">[E48]</loc>Autodetection 3061of Character Encodings</head> 3062<p>The XML encoding declaration functions as an internal label on each entity, 3063indicating which character encoding is in use. Before an XML processor can 3064read the internal label, however, it apparently has to know what character 3065encoding is in use—which is what the internal label is trying to indicate. 3066In the general case, this is a hopeless situation. It is not entirely hopeless 3067in XML, however, because XML limits the general case in two ways: each implementation 3068is assumed to support only a finite set of character encodings, and the XML 3069encoding declaration is restricted in position and content in order to make 3070it feasible to autodetect the character encoding in use in each entity in 3071normal cases. Also, in many cases other sources of information are available 3072in addition to the XML data stream itself. Two cases may be distinguished, 3073depending on whether the XML entity is presented to the processor without, 3074or with, any accompanying (external) information. We consider the first case 3075first.</p> 3076<div2 id="sec-guessing-no-ext-info"> 3077<head diff="add">Detection Without External Encoding Information</head> 3078<p>Because each XML entity <phrase diff="add">not accompanied by external 3079encoding information and </phrase>not in UTF-8 or UTF-16 <phrase diff="chg">encoding</phrase> <emph>must</emph> 3080begin with an XML encoding declaration, in which the first characters must 3081be '<code><?xml</code>', any conforming processor can detect, after two 3082to four octets of input, which of the following cases apply. In reading this 3083list, it may help to know that in UCS-4, '<' is <quote><code>#x0000003C</code></quote> 3084and '?' is <quote><code>#x0000003F</code></quote>, and the Byte Order Mark 3085required of UTF-16 data streams is <quote><code>#xFEFF</code></quote>. <phrase 3086diff="add">The notation <var>##</var> is used to denote any byte value except <phrase 3087diff="chg">that two consecutive <var>##</var>s cannot be both 00</phrase>.</phrase></p> 3088<p diff="add">With a Byte Order Mark:</p> 3089<table diff="add" border="1" frame="border"><tbody><tr><td><code>00 00 FE 3090FF</code></td><td>UCS-4, big-endian machine (1234 order)</td></tr><tr><td><code>FF 3091FE 00 00</code></td><td>UCS-4, little-endian machine (4321 order)</td></tr> 3092<tr><td><code>00 00 FF FE</code></td><td>UCS-4, unusual octet order (2143)</td> 3093</tr><tr><td><code>FE FF 00 00</code></td><td>UCS-4, unusual octet order (3412)</td> 3094</tr><tr><td><code>FE FF ## ##</code></td><td>UTF-16, big-endian</td></tr> 3095<tr><td><code>FF FE ## ##</code></td><td>UTF-16, little-endian</td></tr><tr> 3096<td><code>EF BB BF</code></td><td>UTF-8</td></tr></tbody></table> 3097<p diff="add">Without a Byte Order Mark:</p> 3098<table diff="add" border="1" frame="border"><tbody><tr><td><code>00 00 00 3C</code></td> 3099<td rowspan="4">UCS-4 or other encoding with a 32-bit code unit and ASCII 3100characters encoded as ASCII values, in respectively big-endian (1234), little-endian 3101(4321) and two unusual byte orders (2143 and 3412). The encoding declaration 3102must be read to determine which of UCS-4 or other supported 32-bit encodings 3103applies.</td></tr><tr><td><code>3C 00 00 00</code></td> 3104<!--<td>UCS-4, little-endian machine (4321 order)</td>--> 3105</tr><tr><td><code>00 00 3C 00</code></td> 3106<!--<td>UCS-4, unusual octet order (2143)</td>--> 3107</tr><tr><td><code>00 3C 00 00</code></td> 3108<!--<td>UCS-4, unusual octet order (3412)</td>--> 3109</tr><tr><td><code>00 3C 00 3F</code></td><td>UTF-16BE or big-endian ISO-10646-UCS-2 3110or other encoding with a 16-bit code unit in big-endian order and ASCII characters 3111encoded as ASCII values (the encoding declaration must be read to determine 3112which)</td></tr><tr><td><code>3C 00 3F 00</code></td><td>UTF-16LE or little-endian 3113ISO-10646-UCS-2 or other encoding with a 16-bit code unit in little-endian 3114order and ASCII characters encoded as ASCII values (the encoding declaration 3115must be read to determine which)</td></tr><tr><td><code>3C 3F 78 6D</code></td> 3116<td>UTF-8, ISO 646, ASCII, some part of ISO 8859, Shift-JIS, EUC, or any other 31177-bit, 8-bit, or mixed-width encoding which ensures that the characters of 3118ASCII have their normal positions, width, and values; the actual encoding 3119declaration must be read to detect which of these applies, but since all of 3120these encodings use the same bit patterns for the relevant ASCII characters, 3121the encoding declaration itself may be read reliably</td></tr><tr><td><code>4C 31226F A7 94</code></td><td>EBCDIC (in some flavor; the full encoding declaration 3123must be read to tell which code page is in use)</td></tr><tr><td>Other</td> 3124<td>UTF-8 without an encoding declaration, or else the data stream is mislabeled 3125(lacking a required encoding declaration), corrupt, fragmentary, or enclosed 3126in a wrapper of some kind</td></tr></tbody></table> 3127<note diff="add"> 3128<p>In cases above which do not require reading the encoding declaration to 3129determine the encoding, section 4.3.3 still requires that the encoding declaration, 3130if present, be read and that the encoding name be checked to match the actual 3131encoding of the entity. Also, it is possible that new character encodings 3132will be invented that will make it necessary to use the encoding declaration 3133to determine the encoding, in cases where this is not required at present.</p> 3134</note> 3135<p>This level of autodetection is enough to read the XML encoding declaration 3136and parse the character-encoding identifier, which is still necessary to distinguish 3137the individual members of each family of encodings (e.g. to tell UTF-8 from 31388859, and the parts of 8859 from each other, or to distinguish the specific 3139EBCDIC code page in use, and so on).</p> 3140<p>Because the contents of the encoding declaration are restricted to <phrase 3141diff="chg">characters from the ASCII repertoire (however encoded)</phrase>, 3142a processor can reliably read the entire encoding declaration as soon as it 3143has detected which family of encodings is in use. Since in practice, all widely 3144used character encodings fall into one of the categories above, the XML encoding 3145declaration allows reasonably reliable in-band labeling of character encodings, 3146even when external sources of information at the operating-system or transport-protocol 3147level are unreliable. <phrase diff="del">Note that since external parsed entities 3148in UTF-16 may begin with any character, this autodetection does not always 3149work. Also, </phrase><phrase diff="add">Character encodings such as UTF-7 3150that make overloaded usage of ASCII-valued bytes may fail to be reliably detected.</phrase></p> 3151<p>Once the processor has detected the character encoding in use, it can act 3152appropriately, whether by invoking a separate input routine for each case, 3153or by calling the proper conversion function on each character of input.</p> 3154<p>Like any self-labeling system, the XML encoding declaration will not work 3155if any software changes the entity's character set or encoding without updating 3156the encoding declaration. Implementors of character-encoding routines should 3157be careful to ensure the accuracy of the internal and external information 3158used to label the entity.</p> 3159</div2> 3160<div2 id="sec-guessing-with-ext-info"> 3161<head diff="add">Priorities in the Presence of External Encoding Information</head> 3162<p>The second possible case occurs when the XML entity is accompanied by encoding 3163information, as in some file systems and some network protocols. When multiple 3164sources of information are available, their relative priority and the preferred 3165method of handling conflict should be specified as part of the higher-level 3166protocol used to deliver XML. <phrase diff="chg">In particular, please refer 3167to <bibref ref="rfc2376"/> or its successor, which defines the <code>text/xml</code> 3168and <code>application/xml</code> MIME types and provides some useful guidance. 3169In the interests of interoperability, however, the following rule is recommended.</phrase></p> 3170<ulist> 3171<item><p>If an XML entity is in a file, the Byte-Order Mark and encoding declaration <phrase 3172diff="del">PI </phrase>are used (if present) to determine the character encoding.<phrase 3173diff="del"><loc role="erratumref" href="http://www.w3.org/XML/xml-19980210-errata#E74">[E74]</loc> 3174All other heuristics and sources of information are solely for error recovery.</phrase></p> 3175</item> 3176</ulist> 3177<ulist diff="del"> 3178<item><p>If an XML entity is delivered with a MIME type of text/xml, then 3179the <code>charset</code> parameter on the MIME type determines the character 3180encoding method; all other heuristics and sources of information are solely 3181for error recovery.</p></item> 3182<item><p>If an XML entity is delivered with a MIME type of application/xml, 3183then the Byte-Order Mark and encoding-declaration PI are used (if present) 3184to determine the character encoding. All other heuristics and sources of information 3185are solely for error recovery.</p></item> 3186</ulist> 3187<p diff="del">These rules apply only in the absence of protocol-level documentation; 3188in particular, when the MIME types text/xml and application/xml are defined, 3189the recommendations of the relevant RFC will supersede these rules.</p> 3190</div2> 3191</inform-div1> 3192<inform-div1 id="sec-xml-wg"> 3193<head>W3C XML Working Group</head> 3194<p>This specification was prepared and approved for publication by the W3C 3195XML Working Group (WG). WG approval of this specification does not necessarily 3196imply that all WG members voted for its approval. The current and former members 3197of the XML WG are:</p> 3198<orglist> 3199<member><name>Jon Bosak</name><affiliation>Sun</affiliation><role>Chair</role> 3200</member> 3201<member><name>James Clark</name><role>Technical Lead</role></member> 3202<member><name>Tim Bray</name><affiliation>Textuality and Netscape</affiliation> 3203<role>XML Co-editor</role></member> 3204<member><name>Jean Paoli</name><affiliation>Microsoft</affiliation><role>XML 3205Co-editor</role></member> 3206<member><name>C. M. Sperberg-McQueen</name><affiliation>U. of Ill.</affiliation> 3207<role>XML Co-editor</role></member> 3208<member><name>Dan Connolly</name><affiliation>W3C</affiliation><role>W3C Liaison</role> 3209</member> 3210<member><name>Paula Angerstein</name><affiliation>Texcel</affiliation></member> 3211<member><name>Steve DeRose</name><affiliation>INSO</affiliation></member> 3212<member><name>Dave Hollander</name><affiliation>HP</affiliation></member> 3213<member><name>Eliot Kimber</name><affiliation>ISOGEN</affiliation></member> 3214<member><name>Eve Maler</name><affiliation>ArborText</affiliation></member> 3215<member><name>Tom Magliery</name><affiliation>NCSA</affiliation></member> 3216<member><name>Murray Maloney</name><affiliation diff="chg">SoftQuad, Grif 3217SA, Muzmo and Veo Systems</affiliation></member> 3218<member><name diff="chg">MURATA Makoto (FAMILY Given)</name><affiliation>Fuji 3219Xerox Information Systems</affiliation></member> 3220<member><name>Joel Nava</name><affiliation>Adobe</affiliation></member> 3221<member><name>Conleth O'Connell</name><affiliation>Vignette</affiliation> 3222</member> 3223<member><name>Peter Sharpe</name><affiliation>SoftQuad</affiliation></member> 3224<member><name>John Tigue</name><affiliation>DataChannel</affiliation></member> 3225</orglist> 3226</inform-div1> 3227<inform-div1 id="sec-core-wg" diff="add"> 3228<head>W3C XML Core Group</head> 3229<p>The second edition of this specification was prepared by the W3C XML Core 3230Working Group (WG). The members of the WG at the time of publication of this 3231edition were:</p> 3232<orglist> 3233<member><name>Paula Angerstein</name><affiliation>Vignette</affiliation></member> 3234<member><name>Daniel Austin</name><affiliation>Ask Jeeves</affiliation></member> 3235<member><name>Tim Boland</name></member> 3236<member><name>Allen Brown</name><affiliation>Microsoft</affiliation></member> 3237<member><name>Dan Connolly</name><affiliation>W3C</affiliation><role>Staff 3238Contact</role></member> 3239<member><name>John Cowan</name><affiliation>Reuters Limited</affiliation> 3240</member> 3241<member><name>John Evdemon</name><affiliation>XMLSolutions Corporation</affiliation> 3242</member> 3243<member><name>Paul Grosso</name><affiliation>Arbortext</affiliation><role>Co-Chair</role> 3244</member> 3245<member><name>Arnaud Le Hors</name><affiliation>IBM</affiliation><role>Co-Chair</role> 3246</member> 3247<member><name>Eve Maler</name><affiliation>Sun Microsystems</affiliation> 3248<role>Second Edition Editor</role></member> 3249<member><name>Jonathan Marsh</name><affiliation>Microsoft</affiliation></member> 3250<member><name>MURATA Makoto (FAMILY Given)</name><affiliation>IBM</affiliation> 3251</member> 3252<member><name>Mark Needleman</name><affiliation>Data Research Associates</affiliation> 3253</member> 3254<member><name>David Orchard</name><affiliation>Jamcracker</affiliation></member> 3255<member><name>Lew Shannon</name><affiliation>NCR</affiliation></member> 3256<member><name>Richard Tobin</name><affiliation>University of Edinburgh</affiliation> 3257</member> 3258<member><name>Daniel Veillard</name><affiliation>W3C</affiliation></member> 3259<member><name>Dan Vint</name><affiliation>Lexica</affiliation></member> 3260<member><name>Norman Walsh</name><affiliation>Sun Microsystems</affiliation> 3261</member> 3262<member><name>Fran�ois Yergeau</name><affiliation>Alis Technologies</affiliation> 3263<role>Errata List Editor</role></member> 3264<member><name>Kongyi Zhou</name><affiliation>Oracle</affiliation></member> 3265</orglist> 3266</inform-div1> 3267<inform-div1 diff="add"> 3268<head>Production Notes</head> 3269<p>This Second Edition was encoded in the <loc href="http://www.w3.org/XML/1998/06/xmlspec-v21.dtd">XMLspec 3270DTD</loc> (which has <loc href="http://www.w3.org/XML/1998/06/xmlspec-report-v21.htm">documentation</loc> 3271available). The HTML versions were produced with a combination of the <loc 3272href="http://www.w3.org/XML/1998/06/xmlspec.xsl">xmlspec.xsl</loc>, <loc href="http://www.w3.org/XML/1998/06/diffspec.xsl">diffspec.xsl</loc>, 3273and <loc href="http://www.w3.org/XML/1998/06/REC-xml-2e.xsl">REC-xml-2e.xsl</loc> 3274XSLT stylesheets. The PDF version was produced with the <loc href="http://www.tdb.uu.se/~jan/html2ps.html">html2ps</loc> 3275facility and a distiller program.</p> 3276</inform-div1> 3277</back></spec> 3278