1226031Sstas 2226031Sstas 3226031Sstas 4226031Sstas 5226031Sstas 6226031Sstas 7226031SstasNetwork Working Group P. Faltstrom 8226031SstasRequest for Comments: 3490 Cisco 9226031SstasCategory: Standards Track P. Hoffman 10226031Sstas IMC & VPNC 11226031Sstas A. Costello 12226031Sstas UC Berkeley 13226031Sstas March 2003 14226031Sstas 15226031Sstas 16226031Sstas Internationalizing Domain Names in Applications (IDNA) 17226031Sstas 18226031SstasStatus of this Memo 19226031Sstas 20226031Sstas This document specifies an Internet standards track protocol for the 21226031Sstas Internet community, and requests discussion and suggestions for 22226031Sstas improvements. Please refer to the current edition of the "Internet 23226031Sstas Official Protocol Standards" (STD 1) for the standardization state 24226031Sstas and status of this protocol. Distribution of this memo is unlimited. 25226031Sstas 26226031SstasCopyright Notice 27226031Sstas 28226031Sstas Copyright (C) The Internet Society (2003). All Rights Reserved. 29226031Sstas 30226031SstasAbstract 31226031Sstas 32226031Sstas Until now, there has been no standard method for domain names to use 33226031Sstas characters outside the ASCII repertoire. This document defines 34226031Sstas internationalized domain names (IDNs) and a mechanism called 35226031Sstas Internationalizing Domain Names in Applications (IDNA) for handling 36226031Sstas them in a standard fashion. IDNs use characters drawn from a large 37226031Sstas repertoire (Unicode), but IDNA allows the non-ASCII characters to be 38226031Sstas represented using only the ASCII characters already allowed in so- 39226031Sstas called host names today. This backward-compatible representation is 40226031Sstas required in existing protocols like DNS, so that IDNs can be 41226031Sstas introduced with no changes to the existing infrastructure. IDNA is 42226031Sstas only meant for processing domain names, not free text. 43226031Sstas 44226031SstasTable of Contents 45226031Sstas 46226031Sstas 1. Introduction.................................................. 2 47226031Sstas 1.1 Problem Statement......................................... 3 48226031Sstas 1.2 Limitations of IDNA....................................... 3 49226031Sstas 1.3 Brief overview for application developers................. 4 50226031Sstas 2. Terminology................................................... 5 51226031Sstas 3. Requirements and applicability................................ 7 52226031Sstas 3.1 Requirements.............................................. 7 53226031Sstas 3.2 Applicability............................................. 8 54226031Sstas 3.2.1. DNS resource records................................ 8 55226031Sstas 56226031Sstas 57226031Sstas 58226031SstasFaltstrom, et al. Standards Track [Page 1] 59226031Sstas 60226031SstasRFC 3490 IDNA March 2003 61226031Sstas 62226031Sstas 63226031Sstas 3.2.2. Non-domain-name data types stored in domain names... 9 64226031Sstas 4. Conversion operations......................................... 9 65226031Sstas 4.1 ToASCII................................................... 10 66226031Sstas 4.2 ToUnicode................................................. 11 67226031Sstas 5. ACE prefix.................................................... 12 68226031Sstas 6. Implications for typical applications using DNS............... 13 69226031Sstas 6.1 Entry and display in applications......................... 14 70226031Sstas 6.2 Applications and resolver libraries....................... 15 71226031Sstas 6.3 DNS servers............................................... 15 72226031Sstas 6.4 Avoiding exposing users to the raw ACE encoding........... 16 73226031Sstas 6.5 DNSSEC authentication of IDN domain names................ 16 74226031Sstas 7. Name server considerations.................................... 17 75226031Sstas 8. Root server considerations.................................... 17 76226031Sstas 9. References.................................................... 18 77226031Sstas 9.1 Normative References...................................... 18 78226031Sstas 9.2 Informative References.................................... 18 79226031Sstas 10. Security Considerations...................................... 19 80226031Sstas 11. IANA Considerations.......................................... 20 81226031Sstas 12. Authors' Addresses........................................... 21 82226031Sstas 13. Full Copyright Statement..................................... 22 83226031Sstas 84226031Sstas1. Introduction 85226031Sstas 86226031Sstas IDNA works by allowing applications to use certain ASCII name labels 87226031Sstas (beginning with a special prefix) to represent non-ASCII name labels. 88226031Sstas Lower-layer protocols need not be aware of this; therefore IDNA does 89226031Sstas not depend on changes to any infrastructure. In particular, IDNA 90226031Sstas does not depend on any changes to DNS servers, resolvers, or protocol 91226031Sstas elements, because the ASCII name service provided by the existing DNS 92226031Sstas is entirely sufficient for IDNA. 93226031Sstas 94226031Sstas This document does not require any applications to conform to IDNA, 95226031Sstas but applications can elect to use IDNA in order to support IDN while 96226031Sstas maintaining interoperability with existing infrastructure. If an 97226031Sstas application wants to use non-ASCII characters in domain names, IDNA 98226031Sstas is the only currently-defined option. Adding IDNA support to an 99226031Sstas existing application entails changes to the application only, and 100226031Sstas leaves room for flexibility in the user interface. 101226031Sstas 102226031Sstas A great deal of the discussion of IDN solutions has focused on 103226031Sstas transition issues and how IDN will work in a world where not all of 104226031Sstas the components have been updated. Proposals that were not chosen by 105226031Sstas the IDN Working Group would depend on user applications, resolvers, 106226031Sstas and DNS servers being updated in order for a user to use an 107226031Sstas internationalized domain name. Rather than rely on widespread 108226031Sstas updating of all components, IDNA depends on updates to user 109226031Sstas applications only; no changes are needed to the DNS protocol or any 110226031Sstas DNS servers or the resolvers on user's computers. 111226031Sstas 112226031Sstas 113226031Sstas 114226031SstasFaltstrom, et al. Standards Track [Page 2] 115226031Sstas 116226031SstasRFC 3490 IDNA March 2003 117226031Sstas 118226031Sstas 119226031Sstas1.1 Problem Statement 120226031Sstas 121226031Sstas The IDNA specification solves the problem of extending the repertoire 122226031Sstas of characters that can be used in domain names to include the Unicode 123226031Sstas repertoire (with some restrictions). 124226031Sstas 125226031Sstas IDNA does not extend the service offered by DNS to the applications. 126226031Sstas Instead, the applications (and, by implication, the users) continue 127226031Sstas to see an exact-match lookup service. Either there is a single 128226031Sstas exactly-matching name or there is no match. This model has served 129226031Sstas the existing applications well, but it requires, with or without 130226031Sstas internationalized domain names, that users know the exact spelling of 131226031Sstas the domain names that the users type into applications such as web 132226031Sstas browsers and mail user agents. The introduction of the larger 133226031Sstas repertoire of characters potentially makes the set of misspellings 134226031Sstas larger, especially given that in some cases the same appearance, for 135226031Sstas example on a business card, might visually match several Unicode code 136226031Sstas points or several sequences of code points. 137226031Sstas 138226031Sstas IDNA allows the graceful introduction of IDNs not only by avoiding 139226031Sstas upgrades to existing infrastructure (such as DNS servers and mail 140226031Sstas transport agents), but also by allowing some rudimentary use of IDNs 141226031Sstas in applications by using the ASCII representation of the non-ASCII 142226031Sstas name labels. While such names are very user-unfriendly to read and 143226031Sstas type, and hence are not suitable for user input, they allow (for 144226031Sstas instance) replying to email and clicking on URLs even though the 145226031Sstas domain name displayed is incomprehensible to the user. In order to 146226031Sstas allow user-friendly input and output of the IDNs, the applications 147226031Sstas need to be modified to conform to this specification. 148226031Sstas 149226031Sstas IDNA uses the Unicode character repertoire, which avoids the 150226031Sstas significant delays that would be inherent in waiting for a different 151226031Sstas and specific character set be defined for IDN purposes by some other 152226031Sstas standards developing organization. 153226031Sstas 154226031Sstas1.2 Limitations of IDNA 155226031Sstas 156226031Sstas The IDNA protocol does not solve all linguistic issues with users 157226031Sstas inputting names in different scripts. Many important language-based 158226031Sstas and script-based mappings are not covered in IDNA and need to be 159226031Sstas handled outside the protocol. For example, names that are entered in 160226031Sstas a mix of traditional and simplified Chinese characters will not be 161226031Sstas mapped to a single canonical name. Another example is Scandinavian 162226031Sstas names that are entered with U+00F6 (LATIN SMALL LETTER O WITH 163226031Sstas DIAERESIS) will not be mapped to U+00F8 (LATIN SMALL LETTER O WITH 164226031Sstas STROKE). 165226031Sstas 166226031Sstas 167226031Sstas 168226031Sstas 169226031Sstas 170226031SstasFaltstrom, et al. Standards Track [Page 3] 171226031Sstas 172226031SstasRFC 3490 IDNA March 2003 173226031Sstas 174226031Sstas 175226031Sstas An example of an important issue that is not considered in detail in 176226031Sstas IDNA is how to provide a high probability that a user who is entering 177226031Sstas a domain name based on visual information (such as from a business 178226031Sstas card or billboard) or aural information (such as from a telephone or 179226031Sstas radio) would correctly enter the IDN. Similar issues exist for ASCII 180226031Sstas domain names, for example the possible visual confusion between the 181226031Sstas letter 'O' and the digit zero, but the introduction of the larger 182226031Sstas repertoire of characters creates more opportunities of similar 183226031Sstas looking and similar sounding names. Note that this is a complex 184226031Sstas issue relating to languages, input methods on computers, and so on. 185226031Sstas Furthermore, the kind of matching and searching necessary for a high 186226031Sstas probability of success would not fit the role of the DNS and its 187226031Sstas exact matching function. 188226031Sstas 189226031Sstas1.3 Brief overview for application developers 190226031Sstas 191226031Sstas Applications can use IDNA to support internationalized domain names 192226031Sstas anywhere that ASCII domain names are already supported, including DNS 193226031Sstas master files and resolver interfaces. (Applications can also define 194226031Sstas protocols and interfaces that support IDNs directly using non-ASCII 195226031Sstas representations. IDNA does not prescribe any particular 196226031Sstas representation for new protocols, but it still defines which names 197226031Sstas are valid and how they are compared.) 198226031Sstas 199226031Sstas The IDNA protocol is contained completely within applications. It is 200226031Sstas not a client-server or peer-to-peer protocol: everything is done 201226031Sstas inside the application itself. When used with a DNS resolver 202226031Sstas library, IDNA is inserted as a "shim" between the application and the 203226031Sstas resolver library. When used for writing names into a DNS zone, IDNA 204226031Sstas is used just before the name is committed to the zone. 205226031Sstas 206226031Sstas There are two operations described in section 4 of this document: 207226031Sstas 208226031Sstas - The ToASCII operation is used before sending an IDN to something 209226031Sstas that expects ASCII names (such as a resolver) or writing an IDN 210226031Sstas into a place that expects ASCII names (such as a DNS master file). 211226031Sstas 212226031Sstas - The ToUnicode operation is used when displaying names to users, 213226031Sstas for example names obtained from a DNS zone. 214226031Sstas 215226031Sstas It is important to note that the ToASCII operation can fail. If it 216226031Sstas fails when processing a domain name, that domain name cannot be used 217226031Sstas as an internationalized domain name and the application has to have 218226031Sstas some method of dealing with this failure. 219226031Sstas 220226031Sstas IDNA requires that implementations process input strings with 221226031Sstas Nameprep [NAMEPREP], which is a profile of Stringprep [STRINGPREP], 222226031Sstas and then with Punycode [PUNYCODE]. Implementations of IDNA MUST 223226031Sstas 224226031Sstas 225226031Sstas 226226031SstasFaltstrom, et al. Standards Track [Page 4] 227226031Sstas 228226031SstasRFC 3490 IDNA March 2003 229226031Sstas 230226031Sstas 231226031Sstas fully implement Nameprep and Punycode; neither Nameprep nor Punycode 232226031Sstas are optional. 233226031Sstas 234226031Sstas2. Terminology 235226031Sstas 236226031Sstas The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", 237226031Sstas and "MAY" in this document are to be interpreted as described in BCP 238226031Sstas 14, RFC 2119 [RFC2119]. 239226031Sstas 240226031Sstas A code point is an integer value associated with a character in a 241226031Sstas coded character set. 242226031Sstas 243226031Sstas Unicode [UNICODE] is a coded character set containing tens of 244226031Sstas thousands of characters. A single Unicode code point is denoted by 245226031Sstas "U+" followed by four to six hexadecimal digits, while a range of 246226031Sstas Unicode code points is denoted by two hexadecimal numbers separated 247226031Sstas by "..", with no prefixes. 248226031Sstas 249226031Sstas ASCII means US-ASCII [USASCII], a coded character set containing 128 250226031Sstas characters associated with code points in the range 0..7F. Unicode 251226031Sstas is an extension of ASCII: it includes all the ASCII characters and 252226031Sstas associates them with the same code points. 253226031Sstas 254226031Sstas The term "LDH code points" is defined in this document to mean the 255226031Sstas code points associated with ASCII letters, digits, and the hyphen- 256226031Sstas minus; that is, U+002D, 30..39, 41..5A, and 61..7A. "LDH" is an 257226031Sstas abbreviation for "letters, digits, hyphen". 258226031Sstas 259226031Sstas [STD13] talks about "domain names" and "host names", but many people 260226031Sstas use the terms interchangeably. Further, because [STD13] was not 261226031Sstas terribly clear, many people who are sure they know the exact 262226031Sstas definitions of each of these terms disagree on the definitions. In 263226031Sstas this document the term "domain name" is used in general. This 264226031Sstas document explicitly cites [STD3] whenever referring to the host name 265226031Sstas syntax restrictions defined therein. 266226031Sstas 267226031Sstas A label is an individual part of a domain name. Labels are usually 268226031Sstas shown separated by dots; for example, the domain name 269226031Sstas "www.example.com" is composed of three labels: "www", "example", and 270226031Sstas "com". (The zero-length root label described in [STD13], which can 271226031Sstas be explicit as in "www.example.com." or implicit as in 272226031Sstas "www.example.com", is not considered a label in this specification.) 273226031Sstas IDNA extends the set of usable characters in labels that are text. 274226031Sstas For the rest of this document, the term "label" is shorthand for 275226031Sstas "text label", and "every label" means "every text label". 276226031Sstas 277226031Sstas 278226031Sstas 279226031Sstas 280226031Sstas 281226031Sstas 282226031SstasFaltstrom, et al. Standards Track [Page 5] 283226031Sstas 284226031SstasRFC 3490 IDNA March 2003 285226031Sstas 286226031Sstas 287226031Sstas An "internationalized label" is a label to which the ToASCII 288226031Sstas operation (see section 4) can be applied without failing (with the 289226031Sstas UseSTD3ASCIIRules flag unset). This implies that every ASCII label 290226031Sstas that satisfies the [STD13] length restriction is an internationalized 291226031Sstas label. Therefore the term "internationalized label" is a 292226031Sstas generalization, embracing both old ASCII labels and new non-ASCII 293226031Sstas labels. Although most Unicode characters can appear in 294226031Sstas internationalized labels, ToASCII will fail for some input strings, 295226031Sstas and such strings are not valid internationalized labels. 296226031Sstas 297226031Sstas An "internationalized domain name" (IDN) is a domain name in which 298226031Sstas every label is an internationalized label. This implies that every 299226031Sstas ASCII domain name is an IDN (which implies that it is possible for a 300226031Sstas name to be an IDN without it containing any non-ASCII characters). 301226031Sstas This document does not attempt to define an "internationalized host 302226031Sstas name". Just as has been the case with ASCII names, some DNS zone 303226031Sstas administrators may impose restrictions, beyond those imposed by DNS 304226031Sstas or IDNA, on the characters or strings that may be registered as 305226031Sstas labels in their zones. Such restrictions have no impact on the 306226031Sstas syntax or semantics of DNS protocol messages; a query for a name that 307226031Sstas matches no records will yield the same response regardless of the 308226031Sstas reason why it is not in the zone. Clients issuing queries or 309226031Sstas interpreting responses cannot be assumed to have any knowledge of 310226031Sstas zone-specific restrictions or conventions. 311226031Sstas 312226031Sstas In IDNA, equivalence of labels is defined in terms of the ToASCII 313226031Sstas operation, which constructs an ASCII form for a given label, whether 314226031Sstas or not the label was already an ASCII label. Labels are defined to 315226031Sstas be equivalent if and only if their ASCII forms produced by ToASCII 316226031Sstas match using a case-insensitive ASCII comparison. ASCII labels 317226031Sstas already have a notion of equivalence: upper case and lower case are 318226031Sstas considered equivalent. The IDNA notion of equivalence is an 319226031Sstas extension of that older notion. Equivalent labels in IDNA are 320226031Sstas treated as alternate forms of the same label, just as "foo" and "Foo" 321226031Sstas are treated as alternate forms of the same label. 322226031Sstas 323226031Sstas To allow internationalized labels to be handled by existing 324226031Sstas applications, IDNA uses an "ACE label" (ACE stands for ASCII 325226031Sstas Compatible Encoding). An ACE label is an internationalized label 326226031Sstas that can be rendered in ASCII and is equivalent to an 327226031Sstas internationalized label that cannot be rendered in ASCII. Given any 328226031Sstas internationalized label that cannot be rendered in ASCII, the ToASCII 329226031Sstas operation will convert it to an equivalent ACE label (whereas an 330226031Sstas ASCII label will be left unaltered by ToASCII). ACE labels are 331226031Sstas unsuitable for display to users. The ToUnicode operation will 332226031Sstas convert any label to an equivalent non-ACE label. In fact, an ACE 333226031Sstas label is formally defined to be any label that the ToUnicode 334226031Sstas operation would alter (whereas non-ACE labels are left unaltered by 335226031Sstas 336226031Sstas 337226031Sstas 338226031SstasFaltstrom, et al. Standards Track [Page 6] 339226031Sstas 340226031SstasRFC 3490 IDNA March 2003 341226031Sstas 342226031Sstas 343226031Sstas ToUnicode). Every ACE label begins with the ACE prefix specified in 344226031Sstas section 5. The ToASCII and ToUnicode operations are specified in 345226031Sstas section 4. 346226031Sstas 347226031Sstas The "ACE prefix" is defined in this document to be a string of ASCII 348226031Sstas characters that appears at the beginning of every ACE label. It is 349226031Sstas specified in section 5. 350226031Sstas 351226031Sstas A "domain name slot" is defined in this document to be a protocol 352226031Sstas element or a function argument or a return value (and so on) 353226031Sstas explicitly designated for carrying a domain name. Examples of domain 354226031Sstas name slots include: the QNAME field of a DNS query; the name argument 355226031Sstas of the gethostbyname() library function; the part of an email address 356226031Sstas following the at-sign (@) in the From: field of an email message 357226031Sstas header; and the host portion of the URI in the src attribute of an 358226031Sstas HTML <IMG> tag. General text that just happens to contain a domain 359226031Sstas name is not a domain name slot; for example, a domain name appearing 360226031Sstas in the plain text body of an email message is not occupying a domain 361226031Sstas name slot. 362226031Sstas 363226031Sstas An "IDN-aware domain name slot" is defined in this document to be a 364226031Sstas domain name slot explicitly designated for carrying an 365226031Sstas internationalized domain name as defined in this document. The 366226031Sstas designation may be static (for example, in the specification of the 367226031Sstas protocol or interface) or dynamic (for example, as a result of 368226031Sstas negotiation in an interactive session). 369226031Sstas 370226031Sstas An "IDN-unaware domain name slot" is defined in this document to be 371226031Sstas any domain name slot that is not an IDN-aware domain name slot. 372226031Sstas Obviously, this includes any domain name slot whose specification 373226031Sstas predates IDNA. 374226031Sstas 375226031Sstas3. Requirements and applicability 376226031Sstas 377226031Sstas3.1 Requirements 378226031Sstas 379226031Sstas IDNA conformance means adherence to the following four requirements: 380226031Sstas 381226031Sstas 1) Whenever dots are used as label separators, the following 382226031Sstas characters MUST be recognized as dots: U+002E (full stop), U+3002 383226031Sstas (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61 384226031Sstas (halfwidth ideographic full stop). 385226031Sstas 386226031Sstas 2) Whenever a domain name is put into an IDN-unaware domain name slot 387226031Sstas (see section 2), it MUST contain only ASCII characters. Given an 388226031Sstas internationalized domain name (IDN), an equivalent domain name 389226031Sstas satisfying this requirement can be obtained by applying the 390226031Sstas 391226031Sstas 392226031Sstas 393226031Sstas 394226031SstasFaltstrom, et al. Standards Track [Page 7] 395226031Sstas 396226031SstasRFC 3490 IDNA March 2003 397226031Sstas 398226031Sstas 399226031Sstas ToASCII operation (see section 4) to each label and, if dots are 400226031Sstas used as label separators, changing all the label separators to 401226031Sstas U+002E. 402226031Sstas 403226031Sstas 3) ACE labels obtained from domain name slots SHOULD be hidden from 404226031Sstas users when it is known that the environment can handle the non-ACE 405226031Sstas form, except when the ACE form is explicitly requested. When it 406226031Sstas is not known whether or not the environment can handle the non-ACE 407226031Sstas form, the application MAY use the non-ACE form (which might fail, 408226031Sstas such as by not being displayed properly), or it MAY use the ACE 409226031Sstas form (which will look unintelligle to the user). Given an 410226031Sstas internationalized domain name, an equivalent domain name 411226031Sstas containing no ACE labels can be obtained by applying the ToUnicode 412226031Sstas operation (see section 4) to each label. When requirements 2 and 413226031Sstas 3 both apply, requirement 2 takes precedence. 414226031Sstas 415226031Sstas 4) Whenever two labels are compared, they MUST be considered to match 416226031Sstas if and only if they are equivalent, that is, their ASCII forms 417226031Sstas (obtained by applying ToASCII) match using a case-insensitive 418226031Sstas ASCII comparison. Whenever two names are compared, they MUST be 419226031Sstas considered to match if and only if their corresponding labels 420226031Sstas match, regardless of whether the names use the same forms of label 421226031Sstas separators. 422226031Sstas 423226031Sstas3.2 Applicability 424226031Sstas 425226031Sstas IDNA is applicable to all domain names in all domain name slots 426226031Sstas except where it is explicitly excluded. 427226031Sstas 428226031Sstas This implies that IDNA is applicable to many protocols that predate 429226031Sstas IDNA. Note that IDNs occupying domain name slots in those protocols 430226031Sstas MUST be in ASCII form (see section 3.1, requirement 2). 431226031Sstas 432226031Sstas3.2.1. DNS resource records 433226031Sstas 434226031Sstas IDNA does not apply to domain names in the NAME and RDATA fields of 435226031Sstas DNS resource records whose CLASS is not IN. This exclusion applies 436226031Sstas to every non-IN class, present and future, except where future 437226031Sstas standards override this exclusion by explicitly inviting the use of 438226031Sstas IDNA. 439226031Sstas 440226031Sstas There are currently no other exclusions on the applicability of IDNA 441226031Sstas to DNS resource records; it depends entirely on the CLASS, and not on 442226031Sstas the TYPE. This will remain true, even as new types are defined, 443226031Sstas unless there is a compelling reason for a new type to complicate 444226031Sstas matters by imposing type-specific rules. 445226031Sstas 446226031Sstas 447226031Sstas 448226031Sstas 449226031Sstas 450226031SstasFaltstrom, et al. Standards Track [Page 8] 451226031Sstas 452226031SstasRFC 3490 IDNA March 2003 453226031Sstas 454226031Sstas 455226031Sstas3.2.2. Non-domain-name data types stored in domain names 456226031Sstas 457226031Sstas Although IDNA enables the representation of non-ASCII characters in 458226031Sstas domain names, that does not imply that IDNA enables the 459226031Sstas representation of non-ASCII characters in other data types that are 460226031Sstas stored in domain names. For example, an email address local part is 461226031Sstas sometimes stored in a domain label (hostmaster@example.com would be 462226031Sstas represented as hostmaster.example.com in the RDATA field of an SOA 463226031Sstas record). IDNA does not update the existing email standards, which 464226031Sstas allow only ASCII characters in local parts. Therefore, unless the 465226031Sstas email standards are revised to invite the use of IDNA for local 466226031Sstas parts, a domain label that holds the local part of an email address 467226031Sstas SHOULD NOT begin with the ACE prefix, and even if it does, it is to 468226031Sstas be interpreted literally as a local part that happens to begin with 469226031Sstas the ACE prefix. 470226031Sstas 471226031Sstas4. Conversion operations 472226031Sstas 473226031Sstas An application converts a domain name put into an IDN-unaware slot or 474226031Sstas displayed to a user. This section specifies the steps to perform in 475226031Sstas the conversion, and the ToASCII and ToUnicode operations. 476226031Sstas 477226031Sstas The input to ToASCII or ToUnicode is a single label that is a 478226031Sstas sequence of Unicode code points (remember that all ASCII code points 479226031Sstas are also Unicode code points). If a domain name is represented using 480226031Sstas a character set other than Unicode or US-ASCII, it will first need to 481226031Sstas be transcoded to Unicode. 482226031Sstas 483226031Sstas Starting from a whole domain name, the steps that an application 484226031Sstas takes to do the conversions are: 485226031Sstas 486226031Sstas 1) Decide whether the domain name is a "stored string" or a "query 487226031Sstas string" as described in [STRINGPREP]. If this conversion follows 488226031Sstas the "queries" rule from [STRINGPREP], set the flag called 489226031Sstas "AllowUnassigned". 490226031Sstas 491226031Sstas 2) Split the domain name into individual labels as described in 492226031Sstas section 3.1. The labels do not include the separator. 493226031Sstas 494226031Sstas 3) For each label, decide whether or not to enforce the restrictions 495226031Sstas on ASCII characters in host names [STD3]. (Applications already 496226031Sstas faced this choice before the introduction of IDNA, and can 497226031Sstas continue to make the decision the same way they always have; IDNA 498226031Sstas makes no new recommendations regarding this choice.) If the 499226031Sstas restrictions are to be enforced, set the flag called 500226031Sstas "UseSTD3ASCIIRules" for that label. 501226031Sstas 502226031Sstas 503226031Sstas 504226031Sstas 505226031Sstas 506226031SstasFaltstrom, et al. Standards Track [Page 9] 507226031Sstas 508226031SstasRFC 3490 IDNA March 2003 509226031Sstas 510226031Sstas 511226031Sstas 4) Process each label with either the ToASCII or the ToUnicode 512226031Sstas operation as appropriate. Typically, you use the ToASCII 513226031Sstas operation if you are about to put the name into an IDN-unaware 514226031Sstas slot, and you use the ToUnicode operation if you are displaying 515226031Sstas the name to a user; section 3.1 gives greater detail on the 516226031Sstas applicable requirements. 517226031Sstas 518226031Sstas 5) If ToASCII was applied in step 4 and dots are used as label 519226031Sstas separators, change all the label separators to U+002E (full stop). 520226031Sstas 521226031Sstas The following two subsections define the ToASCII and ToUnicode 522226031Sstas operations that are used in step 4. 523226031Sstas 524226031Sstas This description of the protocol uses specific procedure names, names 525226031Sstas of flags, and so on, in order to facilitate the specification of the 526226031Sstas protocol. These names, as well as the actual steps of the 527226031Sstas procedures, are not required of an implementation. In fact, any 528226031Sstas implementation which has the same external behavior as specified in 529226031Sstas this document conforms to this specification. 530226031Sstas 531226031Sstas4.1 ToASCII 532226031Sstas 533226031Sstas The ToASCII operation takes a sequence of Unicode code points that 534226031Sstas make up one label and transforms it into a sequence of code points in 535226031Sstas the ASCII range (0..7F). If ToASCII succeeds, the original sequence 536226031Sstas and the resulting sequence are equivalent labels. 537226031Sstas 538226031Sstas It is important to note that the ToASCII operation can fail. ToASCII 539226031Sstas fails if any step of it fails. If any step of the ToASCII operation 540226031Sstas fails on any label in a domain name, that domain name MUST NOT be 541226031Sstas used as an internationalized domain name. The method for dealing 542226031Sstas with this failure is application-specific. 543226031Sstas 544226031Sstas The inputs to ToASCII are a sequence of code points, the 545226031Sstas AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of 546226031Sstas ToASCII is either a sequence of ASCII code points or a failure 547226031Sstas condition. 548226031Sstas 549226031Sstas ToASCII never alters a sequence of code points that are all in the 550226031Sstas ASCII range to begin with (although it could fail). Applying the 551226031Sstas ToASCII operation multiple times has exactly the same effect as 552226031Sstas applying it just once. 553226031Sstas 554226031Sstas ToASCII consists of the following steps: 555226031Sstas 556226031Sstas 1. If the sequence contains any code points outside the ASCII range 557226031Sstas (0..7F) then proceed to step 2, otherwise skip to step 3. 558226031Sstas 559226031Sstas 560226031Sstas 561226031Sstas 562226031SstasFaltstrom, et al. Standards Track [Page 10] 563226031Sstas 564226031SstasRFC 3490 IDNA March 2003 565226031Sstas 566226031Sstas 567226031Sstas 2. Perform the steps specified in [NAMEPREP] and fail if there is an 568226031Sstas error. The AllowUnassigned flag is used in [NAMEPREP]. 569226031Sstas 570226031Sstas 3. If the UseSTD3ASCIIRules flag is set, then perform these checks: 571226031Sstas 572226031Sstas (a) Verify the absence of non-LDH ASCII code points; that is, the 573226031Sstas absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F. 574226031Sstas 575226031Sstas (b) Verify the absence of leading and trailing hyphen-minus; that 576226031Sstas is, the absence of U+002D at the beginning and end of the 577226031Sstas sequence. 578226031Sstas 579226031Sstas 4. If the sequence contains any code points outside the ASCII range 580226031Sstas (0..7F) then proceed to step 5, otherwise skip to step 8. 581226031Sstas 582226031Sstas 5. Verify that the sequence does NOT begin with the ACE prefix. 583226031Sstas 584226031Sstas 6. Encode the sequence using the encoding algorithm in [PUNYCODE] and 585226031Sstas fail if there is an error. 586226031Sstas 587226031Sstas 7. Prepend the ACE prefix. 588226031Sstas 589226031Sstas 8. Verify that the number of code points is in the range 1 to 63 590226031Sstas inclusive. 591226031Sstas 592226031Sstas4.2 ToUnicode 593226031Sstas 594226031Sstas The ToUnicode operation takes a sequence of Unicode code points that 595226031Sstas make up one label and returns a sequence of Unicode code points. If 596226031Sstas the input sequence is a label in ACE form, then the result is an 597226031Sstas equivalent internationalized label that is not in ACE form, otherwise 598226031Sstas the original sequence is returned unaltered. 599226031Sstas 600226031Sstas ToUnicode never fails. If any step fails, then the original input 601226031Sstas sequence is returned immediately in that step. 602226031Sstas 603226031Sstas The ToUnicode output never contains more code points than its input. 604226031Sstas Note that the number of octets needed to represent a sequence of code 605226031Sstas points depends on the particular character encoding used. 606226031Sstas 607226031Sstas The inputs to ToUnicode are a sequence of code points, the 608226031Sstas AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of 609226031Sstas ToUnicode is always a sequence of Unicode code points. 610226031Sstas 611226031Sstas 1. If all code points in the sequence are in the ASCII range (0..7F) 612226031Sstas then skip to step 3. 613226031Sstas 614226031Sstas 615226031Sstas 616226031Sstas 617226031Sstas 618226031SstasFaltstrom, et al. Standards Track [Page 11] 619226031Sstas 620226031SstasRFC 3490 IDNA March 2003 621226031Sstas 622226031Sstas 623226031Sstas 2. Perform the steps specified in [NAMEPREP] and fail if there is an 624226031Sstas error. (If step 3 of ToASCII is also performed here, it will not 625226031Sstas affect the overall behavior of ToUnicode, but it is not 626226031Sstas necessary.) The AllowUnassigned flag is used in [NAMEPREP]. 627226031Sstas 628226031Sstas 3. Verify that the sequence begins with the ACE prefix, and save a 629226031Sstas copy of the sequence. 630226031Sstas 631226031Sstas 4. Remove the ACE prefix. 632226031Sstas 633226031Sstas 5. Decode the sequence using the decoding algorithm in [PUNYCODE] and 634226031Sstas fail if there is an error. Save a copy of the result of this 635226031Sstas step. 636226031Sstas 637226031Sstas 6. Apply ToASCII. 638226031Sstas 639226031Sstas 7. Verify that the result of step 6 matches the saved copy from step 640226031Sstas 3, using a case-insensitive ASCII comparison. 641226031Sstas 642226031Sstas 8. Return the saved copy from step 5. 643226031Sstas 644226031Sstas5. ACE prefix 645226031Sstas 646226031Sstas The ACE prefix, used in the conversion operations (section 4), is two 647226031Sstas alphanumeric ASCII characters followed by two hyphen-minuses. It 648226031Sstas cannot be any of the prefixes already used in earlier documents, 649226031Sstas which includes the following: "bl--", "bq--", "dq--", "lq--", "mq--", 650226031Sstas "ra--", "wq--" and "zq--". The ToASCII and ToUnicode operations MUST 651226031Sstas recognize the ACE prefix in a case-insensitive manner. 652226031Sstas 653226031Sstas The ACE prefix for IDNA is "xn--" or any capitalization thereof. 654226031Sstas 655226031Sstas This means that an ACE label might be "xn--de-jg4avhby1noc0d", where 656226031Sstas "de-jg4avhby1noc0d" is the part of the ACE label that is generated by 657226031Sstas the encoding steps in [PUNYCODE]. 658226031Sstas 659226031Sstas While all ACE labels begin with the ACE prefix, not all labels 660226031Sstas beginning with the ACE prefix are necessarily ACE labels. Non-ACE 661226031Sstas labels that begin with the ACE prefix will confuse users and SHOULD 662226031Sstas NOT be allowed in DNS zones. 663226031Sstas 664226031Sstas 665226031Sstas 666226031Sstas 667226031Sstas 668226031Sstas 669226031Sstas 670226031Sstas 671226031Sstas 672226031Sstas 673226031Sstas 674226031SstasFaltstrom, et al. Standards Track [Page 12] 675226031Sstas 676226031SstasRFC 3490 IDNA March 2003 677226031Sstas 678226031Sstas 679226031Sstas6. Implications for typical applications using DNS 680226031Sstas 681226031Sstas In IDNA, applications perform the processing needed to input 682226031Sstas internationalized domain names from users, display internationalized 683226031Sstas domain names to users, and process the inputs and outputs from DNS 684226031Sstas and other protocols that carry domain names. 685226031Sstas 686226031Sstas The components and interfaces between them can be represented 687226031Sstas pictorially as: 688226031Sstas 689226031Sstas +------+ 690226031Sstas | User | 691226031Sstas +------+ 692226031Sstas ^ 693226031Sstas | Input and display: local interface methods 694226031Sstas | (pen, keyboard, glowing phosphorus, ...) 695226031Sstas +-------------------|-------------------------------+ 696226031Sstas | v | 697226031Sstas | +-----------------------------+ | 698226031Sstas | | Application | | 699226031Sstas | | (ToASCII and ToUnicode | | 700226031Sstas | | operations may be | | 701226031Sstas | | called here) | | 702226031Sstas | +-----------------------------+ | 703226031Sstas | ^ ^ | End system 704226031Sstas | | | | 705226031Sstas | Call to resolver: | | Application-specific | 706226031Sstas | ACE | | protocol: | 707226031Sstas | v | ACE unless the | 708226031Sstas | +----------+ | protocol is updated | 709226031Sstas | | Resolver | | to handle other | 710226031Sstas | +----------+ | encodings | 711226031Sstas | ^ | | 712226031Sstas +-----------------|----------|----------------------+ 713226031Sstas DNS protocol: | | 714226031Sstas ACE | | 715226031Sstas v v 716226031Sstas +-------------+ +---------------------+ 717226031Sstas | DNS servers | | Application servers | 718226031Sstas +-------------+ +---------------------+ 719226031Sstas 720226031Sstas The box labeled "Application" is where the application splits a 721226031Sstas domain name into labels, sets the appropriate flags, and performs the 722226031Sstas ToASCII and ToUnicode operations. This is described in section 4. 723226031Sstas 724226031Sstas 725226031Sstas 726226031Sstas 727226031Sstas 728226031Sstas 729226031Sstas 730226031SstasFaltstrom, et al. Standards Track [Page 13] 731226031Sstas 732226031SstasRFC 3490 IDNA March 2003 733226031Sstas 734226031Sstas 735226031Sstas6.1 Entry and display in applications 736226031Sstas 737226031Sstas Applications can accept domain names using any character set or sets 738226031Sstas desired by the application developer, and can display domain names in 739226031Sstas any charset. That is, the IDNA protocol does not affect the 740226031Sstas interface between users and applications. 741226031Sstas 742226031Sstas An IDNA-aware application can accept and display internationalized 743226031Sstas domain names in two formats: the internationalized character set(s) 744226031Sstas supported by the application, and as an ACE label. ACE labels that 745226031Sstas are displayed or input MUST always include the ACE prefix. 746226031Sstas Applications MAY allow input and display of ACE labels, but are not 747226031Sstas encouraged to do so except as an interface for special purposes, 748226031Sstas possibly for debugging, or to cope with display limitations as 749226031Sstas described in section 6.4.. ACE encoding is opaque and ugly, and 750226031Sstas should thus only be exposed to users who absolutely need it. Because 751226031Sstas name labels encoded as ACE name labels can be rendered either as the 752226031Sstas encoded ASCII characters or the proper decoded characters, the 753226031Sstas application MAY have an option for the user to select the preferred 754226031Sstas method of display; if it does, rendering the ACE SHOULD NOT be the 755226031Sstas default. 756226031Sstas 757226031Sstas Domain names are often stored and transported in many places. For 758226031Sstas example, they are part of documents such as mail messages and web 759226031Sstas pages. They are transported in many parts of many protocols, such as 760226031Sstas both the control commands and the RFC 2822 body parts of SMTP, and 761226031Sstas the headers and the body content in HTTP. It is important to 762226031Sstas remember that domain names appear both in domain name slots and in 763226031Sstas the content that is passed over protocols. 764226031Sstas 765226031Sstas In protocols and document formats that define how to handle 766226031Sstas specification or negotiation of charsets, labels can be encoded in 767226031Sstas any charset allowed by the protocol or document format. If a 768226031Sstas protocol or document format only allows one charset, the labels MUST 769226031Sstas be given in that charset. 770226031Sstas 771226031Sstas In any place where a protocol or document format allows transmission 772226031Sstas of the characters in internationalized labels, internationalized 773226031Sstas labels SHOULD be transmitted using whatever character encoding and 774226031Sstas escape mechanism that the protocol or document format uses at that 775226031Sstas place. 776226031Sstas 777226031Sstas All protocols that use domain name slots already have the capacity 778226031Sstas for handling domain names in the ASCII charset. Thus, ACE labels 779226031Sstas (internationalized labels that have been processed with the ToASCII 780226031Sstas operation) can inherently be handled by those protocols. 781226031Sstas 782226031Sstas 783226031Sstas 784226031Sstas 785226031Sstas 786226031SstasFaltstrom, et al. Standards Track [Page 14] 787226031Sstas 788226031SstasRFC 3490 IDNA March 2003 789226031Sstas 790226031Sstas 791226031Sstas6.2 Applications and resolver libraries 792226031Sstas 793226031Sstas Applications normally use functions in the operating system when they 794226031Sstas resolve DNS queries. Those functions in the operating system are 795226031Sstas often called "the resolver library", and the applications communicate 796226031Sstas with the resolver libraries through a programming interface (API). 797226031Sstas 798226031Sstas Because these resolver libraries today expect only domain names in 799226031Sstas ASCII, applications MUST prepare labels that are passed to the 800226031Sstas resolver library using the ToASCII operation. Labels received from 801226031Sstas the resolver library contain only ASCII characters; internationalized 802226031Sstas labels that cannot be represented directly in ASCII use the ACE form. 803226031Sstas ACE labels always include the ACE prefix. 804226031Sstas 805226031Sstas An operating system might have a set of libraries for performing the 806226031Sstas ToASCII operation. The input to such a library might be in one or 807226031Sstas more charsets that are used in applications (UTF-8 and UTF-16 are 808226031Sstas likely candidates for almost any operating system, and script- 809226031Sstas specific charsets are likely for localized operating systems). 810226031Sstas 811226031Sstas IDNA-aware applications MUST be able to work with both non- 812226031Sstas internationalized labels (those that conform to [STD13] and [STD3]) 813226031Sstas and internationalized labels. 814226031Sstas 815226031Sstas It is expected that new versions of the resolver libraries in the 816226031Sstas future will be able to accept domain names in other charsets than 817226031Sstas ASCII, and application developers might one day pass not only domain 818226031Sstas names in Unicode, but also in local script to a new API for the 819226031Sstas resolver libraries in the operating system. Thus the ToASCII and 820226031Sstas ToUnicode operations might be performed inside these new versions of 821226031Sstas the resolver libraries. 822226031Sstas 823226031Sstas Domain names passed to resolvers or put into the question section of 824226031Sstas DNS requests follow the rules for "queries" from [STRINGPREP]. 825226031Sstas 826226031Sstas6.3 DNS servers 827226031Sstas 828226031Sstas Domain names stored in zones follow the rules for "stored strings" 829226031Sstas from [STRINGPREP]. 830226031Sstas 831226031Sstas For internationalized labels that cannot be represented directly in 832226031Sstas ASCII, DNS servers MUST use the ACE form produced by the ToASCII 833226031Sstas operation. All IDNs served by DNS servers MUST contain only ASCII 834226031Sstas characters. 835226031Sstas 836226031Sstas If a signaling system which makes negotiation possible between old 837226031Sstas and new DNS clients and servers is standardized in the future, the 838226031Sstas encoding of the query in the DNS protocol itself can be changed from 839226031Sstas 840226031Sstas 841226031Sstas 842226031SstasFaltstrom, et al. Standards Track [Page 15] 843226031Sstas 844226031SstasRFC 3490 IDNA March 2003 845226031Sstas 846226031Sstas 847226031Sstas ACE to something else, such as UTF-8. The question whether or not 848226031Sstas this should be used is, however, a separate problem and is not 849226031Sstas discussed in this memo. 850226031Sstas 851226031Sstas6.4 Avoiding exposing users to the raw ACE encoding 852226031Sstas 853226031Sstas Any application that might show the user a domain name obtained from 854226031Sstas a domain name slot, such as from gethostbyaddr or part of a mail 855226031Sstas header, will need to be updated if it is to prevent users from seeing 856226031Sstas the ACE. 857226031Sstas 858226031Sstas If an application decodes an ACE name using ToUnicode but cannot show 859226031Sstas all of the characters in the decoded name, such as if the name 860226031Sstas contains characters that the output system cannot display, the 861226031Sstas application SHOULD show the name in ACE format (which always includes 862226031Sstas the ACE prefix) instead of displaying the name with the replacement 863226031Sstas character (U+FFFD). This is to make it easier for the user to 864226031Sstas transfer the name correctly to other programs. Programs that by 865226031Sstas default show the ACE form when they cannot show all the characters in 866226031Sstas a name label SHOULD also have a mechanism to show the name that is 867226031Sstas produced by the ToUnicode operation with as many characters as 868226031Sstas possible and replacement characters in the positions where characters 869226031Sstas cannot be displayed. 870226031Sstas 871226031Sstas The ToUnicode operation does not alter labels that are not valid ACE 872226031Sstas labels, even if they begin with the ACE prefix. After ToUnicode has 873226031Sstas been applied, if a label still begins with the ACE prefix, then it is 874226031Sstas not a valid ACE label, and is not equivalent to any of the 875226031Sstas intermediate Unicode strings constructed by ToUnicode. 876226031Sstas 877226031Sstas6.5 DNSSEC authentication of IDN domain names 878226031Sstas 879226031Sstas DNS Security [RFC2535] is a method for supplying cryptographic 880226031Sstas verification information along with DNS messages. Public Key 881226031Sstas Cryptography is used in conjunction with digital signatures to 882226031Sstas provide a means for a requester of domain information to authenticate 883226031Sstas the source of the data. This ensures that it can be traced back to a 884226031Sstas trusted source, either directly, or via a chain of trust linking the 885226031Sstas source of the information to the top of the DNS hierarchy. 886226031Sstas 887226031Sstas IDNA specifies that all internationalized domain names served by DNS 888226031Sstas servers that cannot be represented directly in ASCII must use the ACE 889226031Sstas form produced by the ToASCII operation. This operation must be 890226031Sstas performed prior to a zone being signed by the private key for that 891226031Sstas zone. Because of this ordering, it is important to recognize that 892226031Sstas DNSSEC authenticates the ASCII domain name, not the Unicode form or 893226031Sstas 894226031Sstas 895226031Sstas 896226031Sstas 897226031Sstas 898226031SstasFaltstrom, et al. Standards Track [Page 16] 899226031Sstas 900226031SstasRFC 3490 IDNA March 2003 901226031Sstas 902226031Sstas 903226031Sstas the mapping between the Unicode form and the ASCII form. In the 904226031Sstas presence of DNSSEC, this is the name that MUST be signed in the zone 905226031Sstas and MUST be validated against. 906226031Sstas 907226031Sstas One consequence of this for sites deploying IDNA in the presence of 908226031Sstas DNSSEC is that any special purpose proxies or forwarders used to 909226031Sstas transform user input into IDNs must be earlier in the resolution flow 910226031Sstas than DNSSEC authenticating nameservers for DNSSEC to work. 911226031Sstas 912226031Sstas7. Name server considerations 913226031Sstas 914226031Sstas Existing DNS servers do not know the IDNA rules for handling non- 915226031Sstas ASCII forms of IDNs, and therefore need to be shielded from them. 916226031Sstas All existing channels through which names can enter a DNS server 917226031Sstas database (for example, master files [STD13] and DNS update messages 918226031Sstas [RFC2136]) are IDN-unaware because they predate IDNA, and therefore 919226031Sstas requirement 2 of section 3.1 of this document provides the needed 920226031Sstas shielding, by ensuring that internationalized domain names entering 921226031Sstas DNS server databases through such channels have already been 922226031Sstas converted to their equivalent ASCII forms. 923226031Sstas 924226031Sstas It is imperative that there be only one ASCII encoding for a 925226031Sstas particular domain name. Because of the design of the ToASCII and 926226031Sstas ToUnicode operations, there are no ACE labels that decode to ASCII 927226031Sstas labels, and therefore name servers cannot contain multiple ASCII 928226031Sstas encodings of the same domain name. 929226031Sstas 930226031Sstas [RFC2181] explicitly allows domain labels to contain octets beyond 931226031Sstas the ASCII range (0..7F), and this document does not change that. 932226031Sstas Note, however, that there is no defined interpretation of octets 933226031Sstas 80..FF as characters. If labels containing these octets are returned 934226031Sstas to applications, unpredictable behavior could result. The ASCII form 935226031Sstas defined by ToASCII is the only standard representation for 936226031Sstas internationalized labels in the current DNS protocol. 937226031Sstas 938226031Sstas8. Root server considerations 939226031Sstas 940226031Sstas IDNs are likely to be somewhat longer than current domain names, so 941226031Sstas the bandwidth needed by the root servers is likely to go up by a 942226031Sstas small amount. Also, queries and responses for IDNs will probably be 943226031Sstas somewhat longer than typical queries today, so more queries and 944226031Sstas responses may be forced to go to TCP instead of UDP. 945226031Sstas 946226031Sstas 947226031Sstas 948226031Sstas 949226031Sstas 950226031Sstas 951226031Sstas 952226031Sstas 953226031Sstas 954226031SstasFaltstrom, et al. Standards Track [Page 17] 955226031Sstas 956226031SstasRFC 3490 IDNA March 2003 957226031Sstas 958226031Sstas 959226031Sstas9. References 960226031Sstas 961226031Sstas9.1 Normative References 962226031Sstas 963226031Sstas [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 964226031Sstas Requirement Levels", BCP 14, RFC 2119, March 1997. 965226031Sstas 966226031Sstas [STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of 967226031Sstas Internationalized Strings ("stringprep")", RFC 3454, 968226031Sstas December 2002. 969226031Sstas 970226031Sstas [NAMEPREP] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 971226031Sstas Profile for Internationalized Domain Names (IDN)", RFC 972226031Sstas 3491, March 2003. 973226031Sstas 974226031Sstas [PUNYCODE] Costello, A., "Punycode: A Bootstring encoding of 975226031Sstas Unicode for use with Internationalized Domain Names in 976226031Sstas Applications (IDNA)", RFC 3492, March 2003. 977226031Sstas 978226031Sstas [STD3] Braden, R., "Requirements for Internet Hosts -- 979226031Sstas Communication Layers", STD 3, RFC 1122, and 980226031Sstas "Requirements for Internet Hosts -- Application and 981226031Sstas Support", STD 3, RFC 1123, October 1989. 982226031Sstas 983226031Sstas [STD13] Mockapetris, P., "Domain names - concepts and 984226031Sstas facilities", STD 13, RFC 1034 and "Domain names - 985226031Sstas implementation and specification", STD 13, RFC 1035, 986226031Sstas November 1987. 987226031Sstas 988226031Sstas9.2 Informative References 989226031Sstas 990226031Sstas [RFC2535] Eastlake, D., "Domain Name System Security Extensions", 991226031Sstas RFC 2535, March 1999. 992226031Sstas 993226031Sstas [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS 994226031Sstas Specification", RFC 2181, July 1997. 995226031Sstas 996226031Sstas [UAX9] Unicode Standard Annex #9, The Bidirectional Algorithm, 997226031Sstas <http://www.unicode.org/unicode/reports/tr9/>. 998226031Sstas 999226031Sstas [UNICODE] The Unicode Consortium. The Unicode Standard, Version 1000226031Sstas 3.2.0 is defined by The Unicode Standard, Version 3.0 1001226031Sstas (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5), 1002226031Sstas as amended by the Unicode Standard Annex #27: Unicode 1003226031Sstas 3.1 (http://www.unicode.org/reports/tr27/) and by the 1004226031Sstas Unicode Standard Annex #28: Unicode 3.2 1005226031Sstas (http://www.unicode.org/reports/tr28/). 1006226031Sstas 1007226031Sstas 1008226031Sstas 1009226031Sstas 1010226031SstasFaltstrom, et al. Standards Track [Page 18] 1011226031Sstas 1012226031SstasRFC 3490 IDNA March 2003 1013226031Sstas 1014226031Sstas 1015226031Sstas [USASCII] Cerf, V., "ASCII format for Network Interchange", RFC 1016226031Sstas 20, October 1969. 1017226031Sstas 1018226031Sstas10. Security Considerations 1019226031Sstas 1020226031Sstas Security on the Internet partly relies on the DNS. Thus, any change 1021226031Sstas to the characteristics of the DNS can change the security of much of 1022226031Sstas the Internet. 1023226031Sstas 1024226031Sstas This memo describes an algorithm which encodes characters that are 1025226031Sstas not valid according to STD3 and STD13 into octet values that are 1026226031Sstas valid. No security issues such as string length increases or new 1027226031Sstas allowed values are introduced by the encoding process or the use of 1028226031Sstas these encoded values, apart from those introduced by the ACE encoding 1029226031Sstas itself. 1030226031Sstas 1031226031Sstas Domain names are used by users to identify and connect to Internet 1032226031Sstas servers. The security of the Internet is compromised if a user 1033226031Sstas entering a single internationalized name is connected to different 1034226031Sstas servers based on different interpretations of the internationalized 1035226031Sstas domain name. 1036226031Sstas 1037226031Sstas When systems use local character sets other than ASCII and Unicode, 1038226031Sstas this specification leaves the the problem of transcoding between the 1039226031Sstas local character set and Unicode up to the application. If different 1040226031Sstas applications (or different versions of one application) implement 1041226031Sstas different transcoding rules, they could interpret the same name 1042226031Sstas differently and contact different servers. This problem is not 1043226031Sstas solved by security protocols like TLS that do not take local 1044226031Sstas character sets into account. 1045226031Sstas 1046226031Sstas Because this document normatively refers to [NAMEPREP], [PUNYCODE], 1047226031Sstas and [STRINGPREP], it includes the security considerations from those 1048226031Sstas documents as well. 1049226031Sstas 1050226031Sstas If or when this specification is updated to use a more recent Unicode 1051226031Sstas normalization table, the new normalization table will need to be 1052226031Sstas compared with the old to spot backwards incompatible changes. If 1053226031Sstas there are such changes, they will need to be handled somehow, or 1054226031Sstas there will be security as well as operational implications. Methods 1055226031Sstas to handle the conflicts could include keeping the old normalization, 1056226031Sstas or taking care of the conflicting characters by operational means, or 1057226031Sstas some other method. 1058226031Sstas 1059226031Sstas Implementations MUST NOT use more recent normalization tables than 1060226031Sstas the one referenced from this document, even though more recent tables 1061226031Sstas may be provided by operating systems. If an application is unsure of 1062226031Sstas which version of the normalization tables are in the operating 1063226031Sstas 1064226031Sstas 1065226031Sstas 1066226031SstasFaltstrom, et al. Standards Track [Page 19] 1067226031Sstas 1068226031SstasRFC 3490 IDNA March 2003 1069226031Sstas 1070226031Sstas 1071226031Sstas system, the application needs to include the normalization tables 1072226031Sstas itself. Using normalization tables other than the one referenced 1073226031Sstas from this specification could have security and operational 1074226031Sstas implications. 1075226031Sstas 1076226031Sstas To help prevent confusion between characters that are visually 1077226031Sstas similar, it is suggested that implementations provide visual 1078226031Sstas indications where a domain name contains multiple scripts. Such 1079226031Sstas mechanisms can also be used to show when a name contains a mixture of 1080226031Sstas simplified and traditional Chinese characters, or to distinguish zero 1081226031Sstas and one from O and l. DNS zone adminstrators may impose restrictions 1082226031Sstas (subject to the limitations in section 2) that try to minimize 1083226031Sstas homographs. 1084226031Sstas 1085226031Sstas Domain names (or portions of them) are sometimes compared against a 1086226031Sstas set of privileged or anti-privileged domains. In such situations it 1087226031Sstas is especially important that the comparisons be done properly, as 1088226031Sstas specified in section 3.1 requirement 4. For labels already in ASCII 1089226031Sstas form, the proper comparison reduces to the same case-insensitive 1090226031Sstas ASCII comparison that has always been used for ASCII labels. 1091226031Sstas 1092226031Sstas The introduction of IDNA means that any existing labels that start 1093226031Sstas with the ACE prefix and would be altered by ToUnicode will 1094226031Sstas automatically be ACE labels, and will be considered equivalent to 1095226031Sstas non-ASCII labels, whether or not that was the intent of the zone 1096226031Sstas adminstrator or registrant. 1097226031Sstas 1098226031Sstas11. IANA Considerations 1099226031Sstas 1100226031Sstas IANA has assigned the ACE prefix in consultation with the IESG. 1101226031Sstas 1102226031Sstas 1103226031Sstas 1104226031Sstas 1105226031Sstas 1106226031Sstas 1107226031Sstas 1108226031Sstas 1109226031Sstas 1110226031Sstas 1111226031Sstas 1112226031Sstas 1113226031Sstas 1114226031Sstas 1115226031Sstas 1116226031Sstas 1117226031Sstas 1118226031Sstas 1119226031Sstas 1120226031Sstas 1121226031Sstas 1122226031SstasFaltstrom, et al. Standards Track [Page 20] 1123226031Sstas 1124226031SstasRFC 3490 IDNA March 2003 1125226031Sstas 1126226031Sstas 1127226031Sstas12. Authors' Addresses 1128226031Sstas 1129226031Sstas Patrik Faltstrom 1130226031Sstas Cisco Systems 1131226031Sstas Arstaangsvagen 31 J 1132226031Sstas S-117 43 Stockholm Sweden 1133226031Sstas 1134226031Sstas EMail: paf@cisco.com 1135226031Sstas 1136226031Sstas 1137226031Sstas Paul Hoffman 1138226031Sstas Internet Mail Consortium and VPN Consortium 1139226031Sstas 127 Segre Place 1140226031Sstas Santa Cruz, CA 95060 USA 1141226031Sstas 1142226031Sstas EMail: phoffman@imc.org 1143226031Sstas 1144226031Sstas 1145226031Sstas Adam M. Costello 1146226031Sstas University of California, Berkeley 1147226031Sstas 1148226031Sstas URL: http://www.nicemice.net/amc/ 1149226031Sstas 1150226031Sstas 1151226031Sstas 1152226031Sstas 1153226031Sstas 1154226031Sstas 1155226031Sstas 1156226031Sstas 1157226031Sstas 1158226031Sstas 1159226031Sstas 1160226031Sstas 1161226031Sstas 1162226031Sstas 1163226031Sstas 1164226031Sstas 1165226031Sstas 1166226031Sstas 1167226031Sstas 1168226031Sstas 1169226031Sstas 1170226031Sstas 1171226031Sstas 1172226031Sstas 1173226031Sstas 1174226031Sstas 1175226031Sstas 1176226031Sstas 1177226031Sstas 1178226031SstasFaltstrom, et al. Standards Track [Page 21] 1179226031Sstas 1180226031SstasRFC 3490 IDNA March 2003 1181226031Sstas 1182226031Sstas 1183226031Sstas13. Full Copyright Statement 1184226031Sstas 1185226031Sstas Copyright (C) The Internet Society (2003). All Rights Reserved. 1186226031Sstas 1187226031Sstas This document and translations of it may be copied and furnished to 1188226031Sstas others, and derivative works that comment on or otherwise explain it 1189226031Sstas or assist in its implementation may be prepared, copied, published 1190226031Sstas and distributed, in whole or in part, without restriction of any 1191226031Sstas kind, provided that the above copyright notice and this paragraph are 1192226031Sstas included on all such copies and derivative works. However, this 1193226031Sstas document itself may not be modified in any way, such as by removing 1194226031Sstas the copyright notice or references to the Internet Society or other 1195226031Sstas Internet organizations, except as needed for the purpose of 1196226031Sstas developing Internet standards in which case the procedures for 1197226031Sstas copyrights defined in the Internet Standards process must be 1198226031Sstas followed, or as required to translate it into languages other than 1199226031Sstas English. 1200226031Sstas 1201226031Sstas The limited permissions granted above are perpetual and will not be 1202226031Sstas revoked by the Internet Society or its successors or assigns. 1203226031Sstas 1204226031Sstas This document and the information contained herein is provided on an 1205226031Sstas "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1206226031Sstas TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 1207226031Sstas BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1208226031Sstas HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1209226031Sstas MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1210226031Sstas 1211226031SstasAcknowledgement 1212226031Sstas 1213226031Sstas Funding for the RFC Editor function is currently provided by the 1214226031Sstas Internet Society. 1215226031Sstas 1216226031Sstas 1217226031Sstas 1218226031Sstas 1219226031Sstas 1220226031Sstas 1221226031Sstas 1222226031Sstas 1223226031Sstas 1224226031Sstas 1225226031Sstas 1226226031Sstas 1227226031Sstas 1228226031Sstas 1229226031Sstas 1230226031Sstas 1231226031Sstas 1232226031Sstas 1233226031Sstas 1234226031SstasFaltstrom, et al. Standards Track [Page 22] 1235226031Sstas 1236