1226031Sstas
2226031Sstas
3226031Sstas
4226031Sstas
5226031Sstas
6226031Sstas
7226031SstasNetwork Working Group                                       P. Faltstrom
8226031SstasRequest for Comments: 3490                                         Cisco
9226031SstasCategory: Standards Track                                     P. Hoffman
10226031Sstas                                                              IMC & VPNC
11226031Sstas                                                             A. Costello
12226031Sstas                                                             UC Berkeley
13226031Sstas                                                              March 2003
14226031Sstas
15226031Sstas
16226031Sstas         Internationalizing Domain Names in Applications (IDNA)
17226031Sstas
18226031SstasStatus of this Memo
19226031Sstas
20226031Sstas   This document specifies an Internet standards track protocol for the
21226031Sstas   Internet community, and requests discussion and suggestions for
22226031Sstas   improvements.  Please refer to the current edition of the "Internet
23226031Sstas   Official Protocol Standards" (STD 1) for the standardization state
24226031Sstas   and status of this protocol.  Distribution of this memo is unlimited.
25226031Sstas
26226031SstasCopyright Notice
27226031Sstas
28226031Sstas   Copyright (C) The Internet Society (2003).  All Rights Reserved.
29226031Sstas
30226031SstasAbstract
31226031Sstas
32226031Sstas   Until now, there has been no standard method for domain names to use
33226031Sstas   characters outside the ASCII repertoire.  This document defines
34226031Sstas   internationalized domain names (IDNs) and a mechanism called
35226031Sstas   Internationalizing Domain Names in Applications (IDNA) for handling
36226031Sstas   them in a standard fashion.  IDNs use characters drawn from a large
37226031Sstas   repertoire (Unicode), but IDNA allows the non-ASCII characters to be
38226031Sstas   represented using only the ASCII characters already allowed in so-
39226031Sstas   called host names today.  This backward-compatible representation is
40226031Sstas   required in existing protocols like DNS, so that IDNs can be
41226031Sstas   introduced with no changes to the existing infrastructure.  IDNA is
42226031Sstas   only meant for processing domain names, not free text.
43226031Sstas
44226031SstasTable of Contents
45226031Sstas
46226031Sstas   1. Introduction..................................................  2
47226031Sstas      1.1 Problem Statement.........................................  3
48226031Sstas      1.2 Limitations of IDNA.......................................  3
49226031Sstas      1.3 Brief overview for application developers.................  4
50226031Sstas   2. Terminology...................................................  5
51226031Sstas   3. Requirements and applicability................................  7
52226031Sstas      3.1 Requirements..............................................  7
53226031Sstas      3.2 Applicability.............................................  8
54226031Sstas         3.2.1. DNS resource records................................  8
55226031Sstas
56226031Sstas
57226031Sstas
58226031SstasFaltstrom, et al.           Standards Track                     [Page 1]
59226031Sstas
60226031SstasRFC 3490                          IDNA                        March 2003
61226031Sstas
62226031Sstas
63226031Sstas         3.2.2. Non-domain-name data types stored in domain names...  9
64226031Sstas   4. Conversion operations.........................................  9
65226031Sstas      4.1 ToASCII................................................... 10
66226031Sstas      4.2 ToUnicode................................................. 11
67226031Sstas   5. ACE prefix.................................................... 12
68226031Sstas   6. Implications for typical applications using DNS............... 13
69226031Sstas      6.1 Entry and display in applications......................... 14
70226031Sstas      6.2 Applications and resolver libraries....................... 15
71226031Sstas      6.3 DNS servers............................................... 15
72226031Sstas      6.4 Avoiding exposing users to the raw ACE encoding........... 16
73226031Sstas      6.5  DNSSEC authentication of IDN domain names................ 16
74226031Sstas   7. Name server considerations.................................... 17
75226031Sstas   8. Root server considerations.................................... 17
76226031Sstas   9. References.................................................... 18
77226031Sstas      9.1 Normative References...................................... 18
78226031Sstas      9.2 Informative References.................................... 18
79226031Sstas   10. Security Considerations...................................... 19
80226031Sstas   11. IANA Considerations.......................................... 20
81226031Sstas   12. Authors' Addresses........................................... 21
82226031Sstas   13. Full Copyright Statement..................................... 22
83226031Sstas
84226031Sstas1. Introduction
85226031Sstas
86226031Sstas   IDNA works by allowing applications to use certain ASCII name labels
87226031Sstas   (beginning with a special prefix) to represent non-ASCII name labels.
88226031Sstas   Lower-layer protocols need not be aware of this; therefore IDNA does
89226031Sstas   not depend on changes to any infrastructure.  In particular, IDNA
90226031Sstas   does not depend on any changes to DNS servers, resolvers, or protocol
91226031Sstas   elements, because the ASCII name service provided by the existing DNS
92226031Sstas   is entirely sufficient for IDNA.
93226031Sstas
94226031Sstas   This document does not require any applications to conform to IDNA,
95226031Sstas   but applications can elect to use IDNA in order to support IDN while
96226031Sstas   maintaining interoperability with existing infrastructure.  If an
97226031Sstas   application wants to use non-ASCII characters in domain names, IDNA
98226031Sstas   is the only currently-defined option.  Adding IDNA support to an
99226031Sstas   existing application entails changes to the application only, and
100226031Sstas   leaves room for flexibility in the user interface.
101226031Sstas
102226031Sstas   A great deal of the discussion of IDN solutions has focused on
103226031Sstas   transition issues and how IDN will work in a world where not all of
104226031Sstas   the components have been updated.  Proposals that were not chosen by
105226031Sstas   the IDN Working Group would depend on user applications, resolvers,
106226031Sstas   and DNS servers being updated in order for a user to use an
107226031Sstas   internationalized domain name.  Rather than rely on widespread
108226031Sstas   updating of all components, IDNA depends on updates to user
109226031Sstas   applications only; no changes are needed to the DNS protocol or any
110226031Sstas   DNS servers or the resolvers on user's computers.
111226031Sstas
112226031Sstas
113226031Sstas
114226031SstasFaltstrom, et al.           Standards Track                     [Page 2]
115226031Sstas
116226031SstasRFC 3490                          IDNA                        March 2003
117226031Sstas
118226031Sstas
119226031Sstas1.1 Problem Statement
120226031Sstas
121226031Sstas   The IDNA specification solves the problem of extending the repertoire
122226031Sstas   of characters that can be used in domain names to include the Unicode
123226031Sstas   repertoire (with some restrictions).
124226031Sstas
125226031Sstas   IDNA does not extend the service offered by DNS to the applications.
126226031Sstas   Instead, the applications (and, by implication, the users) continue
127226031Sstas   to see an exact-match lookup service.  Either there is a single
128226031Sstas   exactly-matching name or there is no match.  This model has served
129226031Sstas   the existing applications well, but it requires, with or without
130226031Sstas   internationalized domain names, that users know the exact spelling of
131226031Sstas   the domain names that the users type into applications such as web
132226031Sstas   browsers and mail user agents.  The introduction of the larger
133226031Sstas   repertoire of characters potentially makes the set of misspellings
134226031Sstas   larger, especially given that in some cases the same appearance, for
135226031Sstas   example on a business card, might visually match several Unicode code
136226031Sstas   points or several sequences of code points.
137226031Sstas
138226031Sstas   IDNA allows the graceful introduction of IDNs not only by avoiding
139226031Sstas   upgrades to existing infrastructure (such as DNS servers and mail
140226031Sstas   transport agents), but also by allowing some rudimentary use of IDNs
141226031Sstas   in applications by using the ASCII representation of the non-ASCII
142226031Sstas   name labels.  While such names are very user-unfriendly to read and
143226031Sstas   type, and hence are not suitable for user input, they allow (for
144226031Sstas   instance) replying to email and clicking on URLs even though the
145226031Sstas   domain name displayed is incomprehensible to the user.  In order to
146226031Sstas   allow user-friendly input and output of the IDNs, the applications
147226031Sstas   need to be modified to conform to this specification.
148226031Sstas
149226031Sstas   IDNA uses the Unicode character repertoire, which avoids the
150226031Sstas   significant delays that would be inherent in waiting for a different
151226031Sstas   and specific character set be defined for IDN purposes by some other
152226031Sstas   standards developing organization.
153226031Sstas
154226031Sstas1.2 Limitations of IDNA
155226031Sstas
156226031Sstas   The IDNA protocol does not solve all linguistic issues with users
157226031Sstas   inputting names in different scripts.  Many important language-based
158226031Sstas   and script-based mappings are not covered in IDNA and need to be
159226031Sstas   handled outside the protocol.  For example, names that are entered in
160226031Sstas   a mix of traditional and simplified Chinese characters will not be
161226031Sstas   mapped to a single canonical name.  Another example is Scandinavian
162226031Sstas   names that are entered with U+00F6 (LATIN SMALL LETTER O WITH
163226031Sstas   DIAERESIS) will not be mapped to U+00F8 (LATIN SMALL LETTER O WITH
164226031Sstas   STROKE).
165226031Sstas
166226031Sstas
167226031Sstas
168226031Sstas
169226031Sstas
170226031SstasFaltstrom, et al.           Standards Track                     [Page 3]
171226031Sstas
172226031SstasRFC 3490                          IDNA                        March 2003
173226031Sstas
174226031Sstas
175226031Sstas   An example of an important issue that is not considered in detail in
176226031Sstas   IDNA is how to provide a high probability that a user who is entering
177226031Sstas   a domain name based on visual information (such as from a business
178226031Sstas   card or billboard) or aural information (such as from a telephone or
179226031Sstas   radio) would correctly enter the IDN.  Similar issues exist for ASCII
180226031Sstas   domain names, for example the possible visual confusion between the
181226031Sstas   letter 'O' and the digit zero, but the introduction of the larger
182226031Sstas   repertoire of characters creates more opportunities of similar
183226031Sstas   looking and similar sounding names.  Note that this is a complex
184226031Sstas   issue relating to languages, input methods on computers, and so on.
185226031Sstas   Furthermore, the kind of matching and searching necessary for a high
186226031Sstas   probability of success would not fit the role of the DNS and its
187226031Sstas   exact matching function.
188226031Sstas
189226031Sstas1.3 Brief overview for application developers
190226031Sstas
191226031Sstas   Applications can use IDNA to support internationalized domain names
192226031Sstas   anywhere that ASCII domain names are already supported, including DNS
193226031Sstas   master files and resolver interfaces.  (Applications can also define
194226031Sstas   protocols and interfaces that support IDNs directly using non-ASCII
195226031Sstas   representations.  IDNA does not prescribe any particular
196226031Sstas   representation for new protocols, but it still defines which names
197226031Sstas   are valid and how they are compared.)
198226031Sstas
199226031Sstas   The IDNA protocol is contained completely within applications.  It is
200226031Sstas   not a client-server or peer-to-peer protocol: everything is done
201226031Sstas   inside the application itself.  When used with a DNS resolver
202226031Sstas   library, IDNA is inserted as a "shim" between the application and the
203226031Sstas   resolver library.  When used for writing names into a DNS zone, IDNA
204226031Sstas   is used just before the name is committed to the zone.
205226031Sstas
206226031Sstas   There are two operations described in section 4 of this document:
207226031Sstas
208226031Sstas   -  The ToASCII operation is used before sending an IDN to something
209226031Sstas      that expects ASCII names (such as a resolver) or writing an IDN
210226031Sstas      into a place that expects ASCII names (such as a DNS master file).
211226031Sstas
212226031Sstas   -  The ToUnicode operation is used when displaying names to users,
213226031Sstas      for example names obtained from a DNS zone.
214226031Sstas
215226031Sstas   It is important to note that the ToASCII operation can fail.  If it
216226031Sstas   fails when processing a domain name, that domain name cannot be used
217226031Sstas   as an internationalized domain name and the application has to have
218226031Sstas   some method of dealing with this failure.
219226031Sstas
220226031Sstas   IDNA requires that implementations process input strings with
221226031Sstas   Nameprep [NAMEPREP], which is a profile of Stringprep [STRINGPREP],
222226031Sstas   and then with Punycode [PUNYCODE].  Implementations of IDNA MUST
223226031Sstas
224226031Sstas
225226031Sstas
226226031SstasFaltstrom, et al.           Standards Track                     [Page 4]
227226031Sstas
228226031SstasRFC 3490                          IDNA                        March 2003
229226031Sstas
230226031Sstas
231226031Sstas   fully implement Nameprep and Punycode; neither Nameprep nor Punycode
232226031Sstas   are optional.
233226031Sstas
234226031Sstas2. Terminology
235226031Sstas
236226031Sstas   The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
237226031Sstas   and "MAY" in this document are to be interpreted as described in BCP
238226031Sstas   14, RFC 2119 [RFC2119].
239226031Sstas
240226031Sstas   A code point is an integer value associated with a character in a
241226031Sstas   coded character set.
242226031Sstas
243226031Sstas   Unicode [UNICODE] is a coded character set containing tens of
244226031Sstas   thousands of characters.  A single Unicode code point is denoted by
245226031Sstas   "U+" followed by four to six hexadecimal digits, while a range of
246226031Sstas   Unicode code points is denoted by two hexadecimal numbers separated
247226031Sstas   by "..", with no prefixes.
248226031Sstas
249226031Sstas   ASCII means US-ASCII [USASCII], a coded character set containing 128
250226031Sstas   characters associated with code points in the range 0..7F.  Unicode
251226031Sstas   is an extension of ASCII: it includes all the ASCII characters and
252226031Sstas   associates them with the same code points.
253226031Sstas
254226031Sstas   The term "LDH code points" is defined in this document to mean the
255226031Sstas   code points associated with ASCII letters, digits, and the hyphen-
256226031Sstas   minus; that is, U+002D, 30..39, 41..5A, and 61..7A. "LDH" is an
257226031Sstas   abbreviation for "letters, digits, hyphen".
258226031Sstas
259226031Sstas   [STD13] talks about "domain names" and "host names", but many people
260226031Sstas   use the terms interchangeably.  Further, because [STD13] was not
261226031Sstas   terribly clear, many people who are sure they know the exact
262226031Sstas   definitions of each of these terms disagree on the definitions.  In
263226031Sstas   this document the term "domain name" is used in general.  This
264226031Sstas   document explicitly cites [STD3] whenever referring to the host name
265226031Sstas   syntax restrictions defined therein.
266226031Sstas
267226031Sstas   A label is an individual part of a domain name.  Labels are usually
268226031Sstas   shown separated by dots; for example, the domain name
269226031Sstas   "www.example.com" is composed of three labels: "www", "example", and
270226031Sstas   "com".  (The zero-length root label described in [STD13], which can
271226031Sstas   be explicit as in "www.example.com." or implicit as in
272226031Sstas   "www.example.com", is not considered a label in this specification.)
273226031Sstas   IDNA extends the set of usable characters in labels that are text.
274226031Sstas   For the rest of this document, the term "label" is shorthand for
275226031Sstas   "text label", and "every label" means "every text label".
276226031Sstas
277226031Sstas
278226031Sstas
279226031Sstas
280226031Sstas
281226031Sstas
282226031SstasFaltstrom, et al.           Standards Track                     [Page 5]
283226031Sstas
284226031SstasRFC 3490                          IDNA                        March 2003
285226031Sstas
286226031Sstas
287226031Sstas   An "internationalized label" is a label to which the ToASCII
288226031Sstas   operation (see section 4) can be applied without failing (with the
289226031Sstas   UseSTD3ASCIIRules flag unset).  This implies that every ASCII label
290226031Sstas   that satisfies the [STD13] length restriction is an internationalized
291226031Sstas   label.  Therefore the term "internationalized label" is a
292226031Sstas   generalization, embracing both old ASCII labels and new non-ASCII
293226031Sstas   labels.  Although most Unicode characters can appear in
294226031Sstas   internationalized labels, ToASCII will fail for some input strings,
295226031Sstas   and such strings are not valid internationalized labels.
296226031Sstas
297226031Sstas   An "internationalized domain name" (IDN) is a domain name in which
298226031Sstas   every label is an internationalized label.  This implies that every
299226031Sstas   ASCII domain name is an IDN (which implies that it is possible for a
300226031Sstas   name to be an IDN without it containing any non-ASCII characters).
301226031Sstas   This document does not attempt to define an "internationalized host
302226031Sstas   name".  Just as has been the case with ASCII names, some DNS zone
303226031Sstas   administrators may impose restrictions, beyond those imposed by DNS
304226031Sstas   or IDNA, on the characters or strings that may be registered as
305226031Sstas   labels in their zones.  Such restrictions have no impact on the
306226031Sstas   syntax or semantics of DNS protocol messages; a query for a name that
307226031Sstas   matches no records will yield the same response regardless of the
308226031Sstas   reason why it is not in the zone.  Clients issuing queries or
309226031Sstas   interpreting responses cannot be assumed to have any knowledge of
310226031Sstas   zone-specific restrictions or conventions.
311226031Sstas
312226031Sstas   In IDNA, equivalence of labels is defined in terms of the ToASCII
313226031Sstas   operation, which constructs an ASCII form for a given label, whether
314226031Sstas   or not the label was already an ASCII label.  Labels are defined to
315226031Sstas   be equivalent if and only if their ASCII forms produced by ToASCII
316226031Sstas   match using a case-insensitive ASCII comparison.  ASCII labels
317226031Sstas   already have a notion of equivalence: upper case and lower case are
318226031Sstas   considered equivalent.  The IDNA notion of equivalence is an
319226031Sstas   extension of that older notion.  Equivalent labels in IDNA are
320226031Sstas   treated as alternate forms of the same label, just as "foo" and "Foo"
321226031Sstas   are treated as alternate forms of the same label.
322226031Sstas
323226031Sstas   To allow internationalized labels to be handled by existing
324226031Sstas   applications, IDNA uses an "ACE label" (ACE stands for ASCII
325226031Sstas   Compatible Encoding).  An ACE label is an internationalized label
326226031Sstas   that can be rendered in ASCII and is equivalent to an
327226031Sstas   internationalized label that cannot be rendered in ASCII.  Given any
328226031Sstas   internationalized label that cannot be rendered in ASCII, the ToASCII
329226031Sstas   operation will convert it to an equivalent ACE label (whereas an
330226031Sstas   ASCII label will be left unaltered by ToASCII).  ACE labels are
331226031Sstas   unsuitable for display to users.  The ToUnicode operation will
332226031Sstas   convert any label to an equivalent non-ACE label.  In fact, an ACE
333226031Sstas   label is formally defined to be any label that the ToUnicode
334226031Sstas   operation would alter (whereas non-ACE labels are left unaltered by
335226031Sstas
336226031Sstas
337226031Sstas
338226031SstasFaltstrom, et al.           Standards Track                     [Page 6]
339226031Sstas
340226031SstasRFC 3490                          IDNA                        March 2003
341226031Sstas
342226031Sstas
343226031Sstas   ToUnicode).  Every ACE label begins with the ACE prefix specified in
344226031Sstas   section 5.  The ToASCII and ToUnicode operations are specified in
345226031Sstas   section 4.
346226031Sstas
347226031Sstas   The "ACE prefix" is defined in this document to be a string of ASCII
348226031Sstas   characters that appears at the beginning of every ACE label.  It is
349226031Sstas   specified in section 5.
350226031Sstas
351226031Sstas   A "domain name slot" is defined in this document to be a protocol
352226031Sstas   element or a function argument or a return value (and so on)
353226031Sstas   explicitly designated for carrying a domain name.  Examples of domain
354226031Sstas   name slots include: the QNAME field of a DNS query; the name argument
355226031Sstas   of the gethostbyname() library function; the part of an email address
356226031Sstas   following the at-sign (@) in the From: field of an email message
357226031Sstas   header; and the host portion of the URI in the src attribute of an
358226031Sstas   HTML <IMG> tag.  General text that just happens to contain a domain
359226031Sstas   name is not a domain name slot; for example, a domain name appearing
360226031Sstas   in the plain text body of an email message is not occupying a domain
361226031Sstas   name slot.
362226031Sstas
363226031Sstas   An "IDN-aware domain name slot" is defined in this document to be a
364226031Sstas   domain name slot explicitly designated for carrying an
365226031Sstas   internationalized domain name as defined in this document.  The
366226031Sstas   designation may be static (for example, in the specification of the
367226031Sstas   protocol or interface) or dynamic (for example, as a result of
368226031Sstas   negotiation in an interactive session).
369226031Sstas
370226031Sstas   An "IDN-unaware domain name slot" is defined in this document to be
371226031Sstas   any domain name slot that is not an IDN-aware domain name slot.
372226031Sstas   Obviously, this includes any domain name slot whose specification
373226031Sstas   predates IDNA.
374226031Sstas
375226031Sstas3. Requirements and applicability
376226031Sstas
377226031Sstas3.1 Requirements
378226031Sstas
379226031Sstas   IDNA conformance means adherence to the following four requirements:
380226031Sstas
381226031Sstas   1) Whenever dots are used as label separators, the following
382226031Sstas      characters MUST be recognized as dots: U+002E (full stop), U+3002
383226031Sstas      (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61
384226031Sstas      (halfwidth ideographic full stop).
385226031Sstas
386226031Sstas   2) Whenever a domain name is put into an IDN-unaware domain name slot
387226031Sstas      (see section 2), it MUST contain only ASCII characters.  Given an
388226031Sstas      internationalized domain name (IDN), an equivalent domain name
389226031Sstas      satisfying this requirement can be obtained by applying the
390226031Sstas
391226031Sstas
392226031Sstas
393226031Sstas
394226031SstasFaltstrom, et al.           Standards Track                     [Page 7]
395226031Sstas
396226031SstasRFC 3490                          IDNA                        March 2003
397226031Sstas
398226031Sstas
399226031Sstas      ToASCII operation (see section 4) to each label and, if dots are
400226031Sstas      used as label separators, changing all the label separators to
401226031Sstas      U+002E.
402226031Sstas
403226031Sstas   3) ACE labels obtained from domain name slots SHOULD be hidden from
404226031Sstas      users when it is known that the environment can handle the non-ACE
405226031Sstas      form, except when the ACE form is explicitly requested.  When it
406226031Sstas      is not known whether or not the environment can handle the non-ACE
407226031Sstas      form, the application MAY use the non-ACE form (which might fail,
408226031Sstas      such as by not being displayed properly), or it MAY use the ACE
409226031Sstas      form (which will look unintelligle to the user).  Given an
410226031Sstas      internationalized domain name, an equivalent domain name
411226031Sstas      containing no ACE labels can be obtained by applying the ToUnicode
412226031Sstas      operation (see section 4) to each label.  When requirements 2 and
413226031Sstas      3 both apply, requirement 2 takes precedence.
414226031Sstas
415226031Sstas   4) Whenever two labels are compared, they MUST be considered to match
416226031Sstas      if and only if they are equivalent, that is, their ASCII forms
417226031Sstas      (obtained by applying ToASCII) match using a case-insensitive
418226031Sstas      ASCII comparison.  Whenever two names are compared, they MUST be
419226031Sstas      considered to match if and only if their corresponding labels
420226031Sstas      match, regardless of whether the names use the same forms of label
421226031Sstas      separators.
422226031Sstas
423226031Sstas3.2 Applicability
424226031Sstas
425226031Sstas   IDNA is applicable to all domain names in all domain name slots
426226031Sstas   except where it is explicitly excluded.
427226031Sstas
428226031Sstas   This implies that IDNA is applicable to many protocols that predate
429226031Sstas   IDNA.  Note that IDNs occupying domain name slots in those protocols
430226031Sstas   MUST be in ASCII form (see section 3.1, requirement 2).
431226031Sstas
432226031Sstas3.2.1. DNS resource records
433226031Sstas
434226031Sstas   IDNA does not apply to domain names in the NAME and RDATA fields of
435226031Sstas   DNS resource records whose CLASS is not IN.  This exclusion applies
436226031Sstas   to every non-IN class, present and future, except where future
437226031Sstas   standards override this exclusion by explicitly inviting the use of
438226031Sstas   IDNA.
439226031Sstas
440226031Sstas   There are currently no other exclusions on the applicability of IDNA
441226031Sstas   to DNS resource records; it depends entirely on the CLASS, and not on
442226031Sstas   the TYPE.  This will remain true, even as new types are defined,
443226031Sstas   unless there is a compelling reason for a new type to complicate
444226031Sstas   matters by imposing type-specific rules.
445226031Sstas
446226031Sstas
447226031Sstas
448226031Sstas
449226031Sstas
450226031SstasFaltstrom, et al.           Standards Track                     [Page 8]
451226031Sstas
452226031SstasRFC 3490                          IDNA                        March 2003
453226031Sstas
454226031Sstas
455226031Sstas3.2.2. Non-domain-name data types stored in domain names
456226031Sstas
457226031Sstas   Although IDNA enables the representation of non-ASCII characters in
458226031Sstas   domain names, that does not imply that IDNA enables the
459226031Sstas   representation of non-ASCII characters in other data types that are
460226031Sstas   stored in domain names.  For example, an email address local part is
461226031Sstas   sometimes stored in a domain label (hostmaster@example.com would be
462226031Sstas   represented as hostmaster.example.com in the RDATA field of an SOA
463226031Sstas   record).  IDNA does not update the existing email standards, which
464226031Sstas   allow only ASCII characters in local parts.  Therefore, unless the
465226031Sstas   email standards are revised to invite the use of IDNA for local
466226031Sstas   parts, a domain label that holds the local part of an email address
467226031Sstas   SHOULD NOT begin with the ACE prefix, and even if it does, it is to
468226031Sstas   be interpreted literally as a local part that happens to begin with
469226031Sstas   the ACE prefix.
470226031Sstas
471226031Sstas4. Conversion operations
472226031Sstas
473226031Sstas   An application converts a domain name put into an IDN-unaware slot or
474226031Sstas   displayed to a user.  This section specifies the steps to perform in
475226031Sstas   the conversion, and the ToASCII and ToUnicode operations.
476226031Sstas
477226031Sstas   The input to ToASCII or ToUnicode is a single label that is a
478226031Sstas   sequence of Unicode code points (remember that all ASCII code points
479226031Sstas   are also Unicode code points).  If a domain name is represented using
480226031Sstas   a character set other than Unicode or US-ASCII, it will first need to
481226031Sstas   be transcoded to Unicode.
482226031Sstas
483226031Sstas   Starting from a whole domain name, the steps that an application
484226031Sstas   takes to do the conversions are:
485226031Sstas
486226031Sstas   1) Decide whether the domain name is a "stored string" or a "query
487226031Sstas      string" as described in [STRINGPREP].  If this conversion follows
488226031Sstas      the "queries" rule from [STRINGPREP], set the flag called
489226031Sstas      "AllowUnassigned".
490226031Sstas
491226031Sstas   2) Split the domain name into individual labels as described in
492226031Sstas      section 3.1.  The labels do not include the separator.
493226031Sstas
494226031Sstas   3) For each label, decide whether or not to enforce the restrictions
495226031Sstas      on ASCII characters in host names [STD3].  (Applications already
496226031Sstas      faced this choice before the introduction of IDNA, and can
497226031Sstas      continue to make the decision the same way they always have; IDNA
498226031Sstas      makes no new recommendations regarding this choice.)  If the
499226031Sstas      restrictions are to be enforced, set the flag called
500226031Sstas      "UseSTD3ASCIIRules" for that label.
501226031Sstas
502226031Sstas
503226031Sstas
504226031Sstas
505226031Sstas
506226031SstasFaltstrom, et al.           Standards Track                     [Page 9]
507226031Sstas
508226031SstasRFC 3490                          IDNA                        March 2003
509226031Sstas
510226031Sstas
511226031Sstas   4) Process each label with either the ToASCII or the ToUnicode
512226031Sstas      operation as appropriate.  Typically, you use the ToASCII
513226031Sstas      operation if you are about to put the name into an IDN-unaware
514226031Sstas      slot, and you use the ToUnicode operation if you are displaying
515226031Sstas      the name to a user; section 3.1 gives greater detail on the
516226031Sstas      applicable requirements.
517226031Sstas
518226031Sstas   5) If ToASCII was applied in step 4 and dots are used as label
519226031Sstas      separators, change all the label separators to U+002E (full stop).
520226031Sstas
521226031Sstas   The following two subsections define the ToASCII and ToUnicode
522226031Sstas   operations that are used in step 4.
523226031Sstas
524226031Sstas   This description of the protocol uses specific procedure names, names
525226031Sstas   of flags, and so on, in order to facilitate the specification of the
526226031Sstas   protocol.  These names, as well as the actual steps of the
527226031Sstas   procedures, are not required of an implementation.  In fact, any
528226031Sstas   implementation which has the same external behavior as specified in
529226031Sstas   this document conforms to this specification.
530226031Sstas
531226031Sstas4.1 ToASCII
532226031Sstas
533226031Sstas   The ToASCII operation takes a sequence of Unicode code points that
534226031Sstas   make up one label and transforms it into a sequence of code points in
535226031Sstas   the ASCII range (0..7F).  If ToASCII succeeds, the original sequence
536226031Sstas   and the resulting sequence are equivalent labels.
537226031Sstas
538226031Sstas   It is important to note that the ToASCII operation can fail.  ToASCII
539226031Sstas   fails if any step of it fails.  If any step of the ToASCII operation
540226031Sstas   fails on any label in a domain name, that domain name MUST NOT be
541226031Sstas   used as an internationalized domain name.  The method for dealing
542226031Sstas   with this failure is application-specific.
543226031Sstas
544226031Sstas   The inputs to ToASCII are a sequence of code points, the
545226031Sstas   AllowUnassigned flag, and the UseSTD3ASCIIRules flag.  The output of
546226031Sstas   ToASCII is either a sequence of ASCII code points or a failure
547226031Sstas   condition.
548226031Sstas
549226031Sstas   ToASCII never alters a sequence of code points that are all in the
550226031Sstas   ASCII range to begin with (although it could fail).  Applying the
551226031Sstas   ToASCII operation multiple times has exactly the same effect as
552226031Sstas   applying it just once.
553226031Sstas
554226031Sstas   ToASCII consists of the following steps:
555226031Sstas
556226031Sstas   1. If the sequence contains any code points outside the ASCII range
557226031Sstas      (0..7F) then proceed to step 2, otherwise skip to step 3.
558226031Sstas
559226031Sstas
560226031Sstas
561226031Sstas
562226031SstasFaltstrom, et al.           Standards Track                    [Page 10]
563226031Sstas
564226031SstasRFC 3490                          IDNA                        March 2003
565226031Sstas
566226031Sstas
567226031Sstas   2. Perform the steps specified in [NAMEPREP] and fail if there is an
568226031Sstas      error.  The AllowUnassigned flag is used in [NAMEPREP].
569226031Sstas
570226031Sstas   3. If the UseSTD3ASCIIRules flag is set, then perform these checks:
571226031Sstas
572226031Sstas     (a) Verify the absence of non-LDH ASCII code points; that is, the
573226031Sstas         absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
574226031Sstas
575226031Sstas     (b) Verify the absence of leading and trailing hyphen-minus; that
576226031Sstas         is, the absence of U+002D at the beginning and end of the
577226031Sstas         sequence.
578226031Sstas
579226031Sstas   4. If the sequence contains any code points outside the ASCII range
580226031Sstas      (0..7F) then proceed to step 5, otherwise skip to step 8.
581226031Sstas
582226031Sstas   5. Verify that the sequence does NOT begin with the ACE prefix.
583226031Sstas
584226031Sstas   6. Encode the sequence using the encoding algorithm in [PUNYCODE] and
585226031Sstas      fail if there is an error.
586226031Sstas
587226031Sstas   7. Prepend the ACE prefix.
588226031Sstas
589226031Sstas   8. Verify that the number of code points is in the range 1 to 63
590226031Sstas      inclusive.
591226031Sstas
592226031Sstas4.2 ToUnicode
593226031Sstas
594226031Sstas   The ToUnicode operation takes a sequence of Unicode code points that
595226031Sstas   make up one label and returns a sequence of Unicode code points.  If
596226031Sstas   the input sequence is a label in ACE form, then the result is an
597226031Sstas   equivalent internationalized label that is not in ACE form, otherwise
598226031Sstas   the original sequence is returned unaltered.
599226031Sstas
600226031Sstas   ToUnicode never fails.  If any step fails, then the original input
601226031Sstas   sequence is returned immediately in that step.
602226031Sstas
603226031Sstas   The ToUnicode output never contains more code points than its input.
604226031Sstas   Note that the number of octets needed to represent a sequence of code
605226031Sstas   points depends on the particular character encoding used.
606226031Sstas
607226031Sstas   The inputs to ToUnicode are a sequence of code points, the
608226031Sstas   AllowUnassigned flag, and the UseSTD3ASCIIRules flag.  The output of
609226031Sstas   ToUnicode is always a sequence of Unicode code points.
610226031Sstas
611226031Sstas   1. If all code points in the sequence are in the ASCII range (0..7F)
612226031Sstas      then skip to step 3.
613226031Sstas
614226031Sstas
615226031Sstas
616226031Sstas
617226031Sstas
618226031SstasFaltstrom, et al.           Standards Track                    [Page 11]
619226031Sstas
620226031SstasRFC 3490                          IDNA                        March 2003
621226031Sstas
622226031Sstas
623226031Sstas   2. Perform the steps specified in [NAMEPREP] and fail if there is an
624226031Sstas      error.  (If step 3 of ToASCII is also performed here, it will not
625226031Sstas      affect the overall behavior of ToUnicode, but it is not
626226031Sstas      necessary.)  The AllowUnassigned flag is used in [NAMEPREP].
627226031Sstas
628226031Sstas   3. Verify that the sequence begins with the ACE prefix, and save a
629226031Sstas      copy of the sequence.
630226031Sstas
631226031Sstas   4. Remove the ACE prefix.
632226031Sstas
633226031Sstas   5. Decode the sequence using the decoding algorithm in [PUNYCODE] and
634226031Sstas      fail if there is an error.  Save a copy of the result of this
635226031Sstas      step.
636226031Sstas
637226031Sstas   6. Apply ToASCII.
638226031Sstas
639226031Sstas   7. Verify that the result of step 6 matches the saved copy from step
640226031Sstas      3, using a case-insensitive ASCII comparison.
641226031Sstas
642226031Sstas   8. Return the saved copy from step 5.
643226031Sstas
644226031Sstas5. ACE prefix
645226031Sstas
646226031Sstas   The ACE prefix, used in the conversion operations (section 4), is two
647226031Sstas   alphanumeric ASCII characters followed by two hyphen-minuses.  It
648226031Sstas   cannot be any of the prefixes already used in earlier documents,
649226031Sstas   which includes the following: "bl--", "bq--", "dq--", "lq--", "mq--",
650226031Sstas   "ra--", "wq--" and "zq--".  The ToASCII and ToUnicode operations MUST
651226031Sstas   recognize the ACE prefix in a case-insensitive manner.
652226031Sstas
653226031Sstas   The ACE prefix for IDNA is "xn--" or any capitalization thereof.
654226031Sstas
655226031Sstas   This means that an ACE label might be "xn--de-jg4avhby1noc0d", where
656226031Sstas   "de-jg4avhby1noc0d" is the part of the ACE label that is generated by
657226031Sstas   the encoding steps in [PUNYCODE].
658226031Sstas
659226031Sstas   While all ACE labels begin with the ACE prefix, not all labels
660226031Sstas   beginning with the ACE prefix are necessarily ACE labels.  Non-ACE
661226031Sstas   labels that begin with the ACE prefix will confuse users and SHOULD
662226031Sstas   NOT be allowed in DNS zones.
663226031Sstas
664226031Sstas
665226031Sstas
666226031Sstas
667226031Sstas
668226031Sstas
669226031Sstas
670226031Sstas
671226031Sstas
672226031Sstas
673226031Sstas
674226031SstasFaltstrom, et al.           Standards Track                    [Page 12]
675226031Sstas
676226031SstasRFC 3490                          IDNA                        March 2003
677226031Sstas
678226031Sstas
679226031Sstas6. Implications for typical applications using DNS
680226031Sstas
681226031Sstas   In IDNA, applications perform the processing needed to input
682226031Sstas   internationalized domain names from users, display internationalized
683226031Sstas   domain names to users, and process the inputs and outputs from DNS
684226031Sstas   and other protocols that carry domain names.
685226031Sstas
686226031Sstas   The components and interfaces between them can be represented
687226031Sstas   pictorially as:
688226031Sstas
689226031Sstas                    +------+
690226031Sstas                    | User |
691226031Sstas                    +------+
692226031Sstas                       ^
693226031Sstas                       | Input and display: local interface methods
694226031Sstas                       | (pen, keyboard, glowing phosphorus, ...)
695226031Sstas   +-------------------|-------------------------------+
696226031Sstas   |                   v                               |
697226031Sstas   |          +-----------------------------+          |
698226031Sstas   |          |        Application          |          |
699226031Sstas   |          |   (ToASCII and ToUnicode    |          |
700226031Sstas   |          |      operations may be      |          |
701226031Sstas   |          |        called here)         |          |
702226031Sstas   |          +-----------------------------+          |
703226031Sstas   |                   ^        ^                      | End system
704226031Sstas   |                   |        |                      |
705226031Sstas   | Call to resolver: |        | Application-specific |
706226031Sstas   |              ACE  |        | protocol:            |
707226031Sstas   |                   v        | ACE unless the       |
708226031Sstas   |           +----------+     | protocol is updated  |
709226031Sstas   |           | Resolver |     | to handle other      |
710226031Sstas   |           +----------+     | encodings            |
711226031Sstas   |                 ^          |                      |
712226031Sstas   +-----------------|----------|----------------------+
713226031Sstas       DNS protocol: |          |
714226031Sstas                 ACE |          |
715226031Sstas                     v          v
716226031Sstas          +-------------+    +---------------------+
717226031Sstas          | DNS servers |    | Application servers |
718226031Sstas          +-------------+    +---------------------+
719226031Sstas
720226031Sstas   The box labeled "Application" is where the application splits a
721226031Sstas   domain name into labels, sets the appropriate flags, and performs the
722226031Sstas   ToASCII and ToUnicode operations.  This is described in section 4.
723226031Sstas
724226031Sstas
725226031Sstas
726226031Sstas
727226031Sstas
728226031Sstas
729226031Sstas
730226031SstasFaltstrom, et al.           Standards Track                    [Page 13]
731226031Sstas
732226031SstasRFC 3490                          IDNA                        March 2003
733226031Sstas
734226031Sstas
735226031Sstas6.1 Entry and display in applications
736226031Sstas
737226031Sstas   Applications can accept domain names using any character set or sets
738226031Sstas   desired by the application developer, and can display domain names in
739226031Sstas   any charset.  That is, the IDNA protocol does not affect the
740226031Sstas   interface between users and applications.
741226031Sstas
742226031Sstas   An IDNA-aware application can accept and display internationalized
743226031Sstas   domain names in two formats: the internationalized character set(s)
744226031Sstas   supported by the application, and as an ACE label.  ACE labels that
745226031Sstas   are displayed or input MUST always include the ACE prefix.
746226031Sstas   Applications MAY allow input and display of ACE labels, but are not
747226031Sstas   encouraged to do so except as an interface for special purposes,
748226031Sstas   possibly for debugging, or to cope with display limitations as
749226031Sstas   described in section 6.4..  ACE encoding is opaque and ugly, and
750226031Sstas   should thus only be exposed to users who absolutely need it.  Because
751226031Sstas   name labels encoded as ACE name labels can be rendered either as the
752226031Sstas   encoded ASCII characters or the proper decoded characters, the
753226031Sstas   application MAY have an option for the user to select the preferred
754226031Sstas   method of display; if it does, rendering the ACE SHOULD NOT be the
755226031Sstas   default.
756226031Sstas
757226031Sstas   Domain names are often stored and transported in many places.  For
758226031Sstas   example, they are part of documents such as mail messages and web
759226031Sstas   pages.  They are transported in many parts of many protocols, such as
760226031Sstas   both the control commands and the RFC 2822 body parts of SMTP, and
761226031Sstas   the headers and the body content in HTTP.  It is important to
762226031Sstas   remember that domain names appear both in domain name slots and in
763226031Sstas   the content that is passed over protocols.
764226031Sstas
765226031Sstas   In protocols and document formats that define how to handle
766226031Sstas   specification or negotiation of charsets, labels can be encoded in
767226031Sstas   any charset allowed by the protocol or document format.  If a
768226031Sstas   protocol or document format only allows one charset, the labels MUST
769226031Sstas   be given in that charset.
770226031Sstas
771226031Sstas   In any place where a protocol or document format allows transmission
772226031Sstas   of the characters in internationalized labels, internationalized
773226031Sstas   labels SHOULD be transmitted using whatever character encoding and
774226031Sstas   escape mechanism that the protocol or document format uses at that
775226031Sstas   place.
776226031Sstas
777226031Sstas   All protocols that use domain name slots already have the capacity
778226031Sstas   for handling domain names in the ASCII charset.  Thus, ACE labels
779226031Sstas   (internationalized labels that have been processed with the ToASCII
780226031Sstas   operation) can inherently be handled by those protocols.
781226031Sstas
782226031Sstas
783226031Sstas
784226031Sstas
785226031Sstas
786226031SstasFaltstrom, et al.           Standards Track                    [Page 14]
787226031Sstas
788226031SstasRFC 3490                          IDNA                        March 2003
789226031Sstas
790226031Sstas
791226031Sstas6.2 Applications and resolver libraries
792226031Sstas
793226031Sstas   Applications normally use functions in the operating system when they
794226031Sstas   resolve DNS queries.  Those functions in the operating system are
795226031Sstas   often called "the resolver library", and the applications communicate
796226031Sstas   with the resolver libraries through a programming interface (API).
797226031Sstas
798226031Sstas   Because these resolver libraries today expect only domain names in
799226031Sstas   ASCII, applications MUST prepare labels that are passed to the
800226031Sstas   resolver library using the ToASCII operation.  Labels received from
801226031Sstas   the resolver library contain only ASCII characters; internationalized
802226031Sstas   labels that cannot be represented directly in ASCII use the ACE form.
803226031Sstas   ACE labels always include the ACE prefix.
804226031Sstas
805226031Sstas   An operating system might have a set of libraries for performing the
806226031Sstas   ToASCII operation.  The input to such a library might be in one or
807226031Sstas   more charsets that are used in applications (UTF-8 and UTF-16 are
808226031Sstas   likely candidates for almost any operating system, and script-
809226031Sstas   specific charsets are likely for localized operating systems).
810226031Sstas
811226031Sstas   IDNA-aware applications MUST be able to work with both non-
812226031Sstas   internationalized labels (those that conform to [STD13] and [STD3])
813226031Sstas   and internationalized labels.
814226031Sstas
815226031Sstas   It is expected that new versions of the resolver libraries in the
816226031Sstas   future will be able to accept domain names in other charsets than
817226031Sstas   ASCII, and application developers might one day pass not only domain
818226031Sstas   names in Unicode, but also in local script to a new API for the
819226031Sstas   resolver libraries in the operating system.  Thus the ToASCII and
820226031Sstas   ToUnicode operations might be performed inside these new versions of
821226031Sstas   the resolver libraries.
822226031Sstas
823226031Sstas   Domain names passed to resolvers or put into the question section of
824226031Sstas   DNS requests follow the rules for "queries" from [STRINGPREP].
825226031Sstas
826226031Sstas6.3 DNS servers
827226031Sstas
828226031Sstas   Domain names stored in zones follow the rules for "stored strings"
829226031Sstas   from [STRINGPREP].
830226031Sstas
831226031Sstas   For internationalized labels that cannot be represented directly in
832226031Sstas   ASCII, DNS servers MUST use the ACE form produced by the ToASCII
833226031Sstas   operation.  All IDNs served by DNS servers MUST contain only ASCII
834226031Sstas   characters.
835226031Sstas
836226031Sstas   If a signaling system which makes negotiation possible between old
837226031Sstas   and new DNS clients and servers is standardized in the future, the
838226031Sstas   encoding of the query in the DNS protocol itself can be changed from
839226031Sstas
840226031Sstas
841226031Sstas
842226031SstasFaltstrom, et al.           Standards Track                    [Page 15]
843226031Sstas
844226031SstasRFC 3490                          IDNA                        March 2003
845226031Sstas
846226031Sstas
847226031Sstas   ACE to something else, such as UTF-8.  The question whether or not
848226031Sstas   this should be used is, however, a separate problem and is not
849226031Sstas   discussed in this memo.
850226031Sstas
851226031Sstas6.4 Avoiding exposing users to the raw ACE encoding
852226031Sstas
853226031Sstas   Any application that might show the user a domain name obtained from
854226031Sstas   a domain name slot, such as from gethostbyaddr or part of a mail
855226031Sstas   header, will need to be updated if it is to prevent users from seeing
856226031Sstas   the ACE.
857226031Sstas
858226031Sstas   If an application decodes an ACE name using ToUnicode but cannot show
859226031Sstas   all of the characters in the decoded name, such as if the name
860226031Sstas   contains characters that the output system cannot display, the
861226031Sstas   application SHOULD show the name in ACE format (which always includes
862226031Sstas   the ACE prefix) instead of displaying the name with the replacement
863226031Sstas   character (U+FFFD).  This is to make it easier for the user to
864226031Sstas   transfer the name correctly to other programs.  Programs that by
865226031Sstas   default show the ACE form when they cannot show all the characters in
866226031Sstas   a name label SHOULD also have a mechanism to show the name that is
867226031Sstas   produced by the ToUnicode operation with as many characters as
868226031Sstas   possible and replacement characters in the positions where characters
869226031Sstas   cannot be displayed.
870226031Sstas
871226031Sstas   The ToUnicode operation does not alter labels that are not valid ACE
872226031Sstas   labels, even if they begin with the ACE prefix.  After ToUnicode has
873226031Sstas   been applied, if a label still begins with the ACE prefix, then it is
874226031Sstas   not a valid ACE label, and is not equivalent to any of the
875226031Sstas   intermediate Unicode strings constructed by ToUnicode.
876226031Sstas
877226031Sstas6.5  DNSSEC authentication of IDN domain names
878226031Sstas
879226031Sstas   DNS Security [RFC2535] is a method for supplying cryptographic
880226031Sstas   verification information along with DNS messages.  Public Key
881226031Sstas   Cryptography is used in conjunction with digital signatures to
882226031Sstas   provide a means for a requester of domain information to authenticate
883226031Sstas   the source of the data.  This ensures that it can be traced back to a
884226031Sstas   trusted source, either directly, or via a chain of trust linking the
885226031Sstas   source of the information to the top of the DNS hierarchy.
886226031Sstas
887226031Sstas   IDNA specifies that all internationalized domain names served by DNS
888226031Sstas   servers that cannot be represented directly in ASCII must use the ACE
889226031Sstas   form produced by the ToASCII operation.  This operation must be
890226031Sstas   performed prior to a zone being signed by the private key for that
891226031Sstas   zone.  Because of this ordering, it is important to recognize that
892226031Sstas   DNSSEC authenticates the ASCII domain name, not the Unicode form or
893226031Sstas
894226031Sstas
895226031Sstas
896226031Sstas
897226031Sstas
898226031SstasFaltstrom, et al.           Standards Track                    [Page 16]
899226031Sstas
900226031SstasRFC 3490                          IDNA                        March 2003
901226031Sstas
902226031Sstas
903226031Sstas   the mapping between the Unicode form and the ASCII form.  In the
904226031Sstas   presence of DNSSEC, this is the name that MUST be signed in the zone
905226031Sstas   and MUST be validated against.
906226031Sstas
907226031Sstas   One consequence of this for sites deploying IDNA in the presence of
908226031Sstas   DNSSEC is that any special purpose proxies or forwarders used to
909226031Sstas   transform user input into IDNs must be earlier in the resolution flow
910226031Sstas   than DNSSEC authenticating nameservers for DNSSEC to work.
911226031Sstas
912226031Sstas7. Name server considerations
913226031Sstas
914226031Sstas   Existing DNS servers do not know the IDNA rules for handling non-
915226031Sstas   ASCII forms of IDNs, and therefore need to be shielded from them.
916226031Sstas   All existing channels through which names can enter a DNS server
917226031Sstas   database (for example, master files [STD13] and DNS update messages
918226031Sstas   [RFC2136]) are IDN-unaware because they predate IDNA, and therefore
919226031Sstas   requirement 2 of section 3.1 of this document provides the needed
920226031Sstas   shielding, by ensuring that internationalized domain names entering
921226031Sstas   DNS server databases through such channels have already been
922226031Sstas   converted to their equivalent ASCII forms.
923226031Sstas
924226031Sstas   It is imperative that there be only one ASCII encoding for a
925226031Sstas   particular domain name.  Because of the design of the ToASCII and
926226031Sstas   ToUnicode operations, there are no ACE labels that decode to ASCII
927226031Sstas   labels, and therefore name servers cannot contain multiple ASCII
928226031Sstas   encodings of the same domain name.
929226031Sstas
930226031Sstas   [RFC2181] explicitly allows domain labels to contain octets beyond
931226031Sstas   the ASCII range (0..7F), and this document does not change that.
932226031Sstas   Note, however, that there is no defined interpretation of octets
933226031Sstas   80..FF as characters.  If labels containing these octets are returned
934226031Sstas   to applications, unpredictable behavior could result.  The ASCII form
935226031Sstas   defined by ToASCII is the only standard representation for
936226031Sstas   internationalized labels in the current DNS protocol.
937226031Sstas
938226031Sstas8. Root server considerations
939226031Sstas
940226031Sstas   IDNs are likely to be somewhat longer than current domain names, so
941226031Sstas   the bandwidth needed by the root servers is likely to go up by a
942226031Sstas   small amount.  Also, queries and responses for IDNs will probably be
943226031Sstas   somewhat longer than typical queries today, so more queries and
944226031Sstas   responses may be forced to go to TCP instead of UDP.
945226031Sstas
946226031Sstas
947226031Sstas
948226031Sstas
949226031Sstas
950226031Sstas
951226031Sstas
952226031Sstas
953226031Sstas
954226031SstasFaltstrom, et al.           Standards Track                    [Page 17]
955226031Sstas
956226031SstasRFC 3490                          IDNA                        March 2003
957226031Sstas
958226031Sstas
959226031Sstas9. References
960226031Sstas
961226031Sstas9.1 Normative References
962226031Sstas
963226031Sstas   [RFC2119]    Bradner, S., "Key words for use in RFCs to Indicate
964226031Sstas                Requirement Levels", BCP 14, RFC 2119, March 1997.
965226031Sstas
966226031Sstas   [STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of
967226031Sstas                Internationalized Strings ("stringprep")", RFC 3454,
968226031Sstas                December 2002.
969226031Sstas
970226031Sstas   [NAMEPREP]   Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
971226031Sstas                Profile for Internationalized Domain Names (IDN)", RFC
972226031Sstas                3491, March 2003.
973226031Sstas
974226031Sstas   [PUNYCODE]   Costello, A., "Punycode: A Bootstring encoding of
975226031Sstas                Unicode for use with Internationalized Domain Names in
976226031Sstas                Applications (IDNA)", RFC 3492, March 2003.
977226031Sstas
978226031Sstas   [STD3]       Braden, R., "Requirements for Internet Hosts --
979226031Sstas                Communication Layers", STD 3, RFC 1122, and
980226031Sstas                "Requirements for Internet Hosts -- Application and
981226031Sstas                Support", STD 3, RFC 1123, October 1989.
982226031Sstas
983226031Sstas   [STD13]      Mockapetris, P., "Domain names - concepts and
984226031Sstas                facilities", STD 13, RFC 1034 and "Domain names -
985226031Sstas                implementation and specification", STD 13, RFC 1035,
986226031Sstas                November 1987.
987226031Sstas
988226031Sstas9.2 Informative References
989226031Sstas
990226031Sstas   [RFC2535]    Eastlake, D., "Domain Name System Security Extensions",
991226031Sstas                RFC 2535, March 1999.
992226031Sstas
993226031Sstas   [RFC2181]    Elz, R. and R. Bush, "Clarifications to the DNS
994226031Sstas                Specification", RFC 2181, July 1997.
995226031Sstas
996226031Sstas   [UAX9]       Unicode Standard Annex #9, The Bidirectional Algorithm,
997226031Sstas                <http://www.unicode.org/unicode/reports/tr9/>.
998226031Sstas
999226031Sstas   [UNICODE]    The Unicode Consortium. The Unicode Standard, Version
1000226031Sstas                3.2.0 is defined by The Unicode Standard, Version 3.0
1001226031Sstas                (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5),
1002226031Sstas                as amended by the Unicode Standard Annex #27: Unicode
1003226031Sstas                3.1 (http://www.unicode.org/reports/tr27/) and by the
1004226031Sstas                Unicode Standard Annex #28: Unicode 3.2
1005226031Sstas                (http://www.unicode.org/reports/tr28/).
1006226031Sstas
1007226031Sstas
1008226031Sstas
1009226031Sstas
1010226031SstasFaltstrom, et al.           Standards Track                    [Page 18]
1011226031Sstas
1012226031SstasRFC 3490                          IDNA                        March 2003
1013226031Sstas
1014226031Sstas
1015226031Sstas   [USASCII]    Cerf, V., "ASCII format for Network Interchange", RFC
1016226031Sstas                20, October 1969.
1017226031Sstas
1018226031Sstas10. Security Considerations
1019226031Sstas
1020226031Sstas   Security on the Internet partly relies on the DNS.  Thus, any change
1021226031Sstas   to the characteristics of the DNS can change the security of much of
1022226031Sstas   the Internet.
1023226031Sstas
1024226031Sstas   This memo describes an algorithm which encodes characters that are
1025226031Sstas   not valid according to STD3 and STD13 into octet values that are
1026226031Sstas   valid.  No security issues such as string length increases or new
1027226031Sstas   allowed values are introduced by the encoding process or the use of
1028226031Sstas   these encoded values, apart from those introduced by the ACE encoding
1029226031Sstas   itself.
1030226031Sstas
1031226031Sstas   Domain names are used by users to identify and connect to Internet
1032226031Sstas   servers.  The security of the Internet is compromised if a user
1033226031Sstas   entering a single internationalized name is connected to different
1034226031Sstas   servers based on different interpretations of the internationalized
1035226031Sstas   domain name.
1036226031Sstas
1037226031Sstas   When systems use local character sets other than ASCII and Unicode,
1038226031Sstas   this specification leaves the the problem of transcoding between the
1039226031Sstas   local character set and Unicode up to the application.  If different
1040226031Sstas   applications (or different versions of one application) implement
1041226031Sstas   different transcoding rules, they could interpret the same name
1042226031Sstas   differently and contact different servers.  This problem is not
1043226031Sstas   solved by security protocols like TLS that do not take local
1044226031Sstas   character sets into account.
1045226031Sstas
1046226031Sstas   Because this document normatively refers to [NAMEPREP], [PUNYCODE],
1047226031Sstas   and [STRINGPREP], it includes the security considerations from those
1048226031Sstas   documents as well.
1049226031Sstas
1050226031Sstas   If or when this specification is updated to use a more recent Unicode
1051226031Sstas   normalization table, the new normalization table will need to be
1052226031Sstas   compared with the old to spot backwards incompatible changes.  If
1053226031Sstas   there are such changes, they will need to be handled somehow, or
1054226031Sstas   there will be security as well as operational implications.  Methods
1055226031Sstas   to handle the conflicts could include keeping the old normalization,
1056226031Sstas   or taking care of the conflicting characters by operational means, or
1057226031Sstas   some other method.
1058226031Sstas
1059226031Sstas   Implementations MUST NOT use more recent normalization tables than
1060226031Sstas   the one referenced from this document, even though more recent tables
1061226031Sstas   may be provided by operating systems.  If an application is unsure of
1062226031Sstas   which version of the normalization tables are in the operating
1063226031Sstas
1064226031Sstas
1065226031Sstas
1066226031SstasFaltstrom, et al.           Standards Track                    [Page 19]
1067226031Sstas
1068226031SstasRFC 3490                          IDNA                        March 2003
1069226031Sstas
1070226031Sstas
1071226031Sstas   system, the application needs to include the normalization tables
1072226031Sstas   itself.  Using normalization tables other than the one referenced
1073226031Sstas   from this specification could have security and operational
1074226031Sstas   implications.
1075226031Sstas
1076226031Sstas   To help prevent confusion between characters that are visually
1077226031Sstas   similar, it is suggested that implementations provide visual
1078226031Sstas   indications where a domain name contains multiple scripts.  Such
1079226031Sstas   mechanisms can also be used to show when a name contains a mixture of
1080226031Sstas   simplified and traditional Chinese characters, or to distinguish zero
1081226031Sstas   and one from O and l.  DNS zone adminstrators may impose restrictions
1082226031Sstas   (subject to the limitations in section 2) that try to minimize
1083226031Sstas   homographs.
1084226031Sstas
1085226031Sstas   Domain names (or portions of them) are sometimes compared against a
1086226031Sstas   set of privileged or anti-privileged domains.  In such situations it
1087226031Sstas   is especially important that the comparisons be done properly, as
1088226031Sstas   specified in section 3.1 requirement 4.  For labels already in ASCII
1089226031Sstas   form, the proper comparison reduces to the same case-insensitive
1090226031Sstas   ASCII comparison that has always been used for ASCII labels.
1091226031Sstas
1092226031Sstas   The introduction of IDNA means that any existing labels that start
1093226031Sstas   with the ACE prefix and would be altered by ToUnicode will
1094226031Sstas   automatically be ACE labels, and will be considered equivalent to
1095226031Sstas   non-ASCII labels, whether or not that was the intent of the zone
1096226031Sstas   adminstrator or registrant.
1097226031Sstas
1098226031Sstas11. IANA Considerations
1099226031Sstas
1100226031Sstas   IANA has assigned the ACE prefix in consultation with the IESG.
1101226031Sstas
1102226031Sstas
1103226031Sstas
1104226031Sstas
1105226031Sstas
1106226031Sstas
1107226031Sstas
1108226031Sstas
1109226031Sstas
1110226031Sstas
1111226031Sstas
1112226031Sstas
1113226031Sstas
1114226031Sstas
1115226031Sstas
1116226031Sstas
1117226031Sstas
1118226031Sstas
1119226031Sstas
1120226031Sstas
1121226031Sstas
1122226031SstasFaltstrom, et al.           Standards Track                    [Page 20]
1123226031Sstas
1124226031SstasRFC 3490                          IDNA                        March 2003
1125226031Sstas
1126226031Sstas
1127226031Sstas12. Authors' Addresses
1128226031Sstas
1129226031Sstas   Patrik Faltstrom
1130226031Sstas   Cisco Systems
1131226031Sstas   Arstaangsvagen 31 J
1132226031Sstas   S-117 43 Stockholm  Sweden
1133226031Sstas
1134226031Sstas   EMail: paf@cisco.com
1135226031Sstas
1136226031Sstas
1137226031Sstas   Paul Hoffman
1138226031Sstas   Internet Mail Consortium and VPN Consortium
1139226031Sstas   127 Segre Place
1140226031Sstas   Santa Cruz, CA  95060  USA
1141226031Sstas
1142226031Sstas   EMail: phoffman@imc.org
1143226031Sstas
1144226031Sstas
1145226031Sstas   Adam M. Costello
1146226031Sstas   University of California, Berkeley
1147226031Sstas
1148226031Sstas   URL: http://www.nicemice.net/amc/
1149226031Sstas
1150226031Sstas
1151226031Sstas
1152226031Sstas
1153226031Sstas
1154226031Sstas
1155226031Sstas
1156226031Sstas
1157226031Sstas
1158226031Sstas
1159226031Sstas
1160226031Sstas
1161226031Sstas
1162226031Sstas
1163226031Sstas
1164226031Sstas
1165226031Sstas
1166226031Sstas
1167226031Sstas
1168226031Sstas
1169226031Sstas
1170226031Sstas
1171226031Sstas
1172226031Sstas
1173226031Sstas
1174226031Sstas
1175226031Sstas
1176226031Sstas
1177226031Sstas
1178226031SstasFaltstrom, et al.           Standards Track                    [Page 21]
1179226031Sstas
1180226031SstasRFC 3490                          IDNA                        March 2003
1181226031Sstas
1182226031Sstas
1183226031Sstas13. Full Copyright Statement
1184226031Sstas
1185226031Sstas   Copyright (C) The Internet Society (2003).  All Rights Reserved.
1186226031Sstas
1187226031Sstas   This document and translations of it may be copied and furnished to
1188226031Sstas   others, and derivative works that comment on or otherwise explain it
1189226031Sstas   or assist in its implementation may be prepared, copied, published
1190226031Sstas   and distributed, in whole or in part, without restriction of any
1191226031Sstas   kind, provided that the above copyright notice and this paragraph are
1192226031Sstas   included on all such copies and derivative works.  However, this
1193226031Sstas   document itself may not be modified in any way, such as by removing
1194226031Sstas   the copyright notice or references to the Internet Society or other
1195226031Sstas   Internet organizations, except as needed for the purpose of
1196226031Sstas   developing Internet standards in which case the procedures for
1197226031Sstas   copyrights defined in the Internet Standards process must be
1198226031Sstas   followed, or as required to translate it into languages other than
1199226031Sstas   English.
1200226031Sstas
1201226031Sstas   The limited permissions granted above are perpetual and will not be
1202226031Sstas   revoked by the Internet Society or its successors or assigns.
1203226031Sstas
1204226031Sstas   This document and the information contained herein is provided on an
1205226031Sstas   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
1206226031Sstas   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
1207226031Sstas   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
1208226031Sstas   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
1209226031Sstas   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
1210226031Sstas
1211226031SstasAcknowledgement
1212226031Sstas
1213226031Sstas   Funding for the RFC Editor function is currently provided by the
1214226031Sstas   Internet Society.
1215226031Sstas
1216226031Sstas
1217226031Sstas
1218226031Sstas
1219226031Sstas
1220226031Sstas
1221226031Sstas
1222226031Sstas
1223226031Sstas
1224226031Sstas
1225226031Sstas
1226226031Sstas
1227226031Sstas
1228226031Sstas
1229226031Sstas
1230226031Sstas
1231226031Sstas
1232226031Sstas
1233226031Sstas
1234226031SstasFaltstrom, et al.           Standards Track                    [Page 22]
1235226031Sstas
1236