lib/wind/rfc4518.txt

226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031SstasNetwork Working Group                                        K. Zeilenga
226031SstasRequest for Comments: 4518                           OpenLDAP Foundation
226031SstasCategory: Standards Track                                      June 2006
226031Sstas
226031Sstas
226031Sstas             Lightweight Directory Access Protocol (LDAP):
226031Sstas                  Internationalized String Preparation
226031Sstas
226031SstasStatus of This Memo
226031Sstas
226031Sstas   This document specifies an Internet standards track protocol for the
226031Sstas   Internet community, and requests discussion and suggestions for
226031Sstas   improvements.  Please refer to the current edition of the "Internet
226031Sstas   Official Protocol Standards" (STD 1) for the standardization state
226031Sstas   and status of this protocol.  Distribution of this memo is unlimited.
226031Sstas
226031SstasCopyright Notice
226031Sstas
226031Sstas   Copyright (C) The Internet Society (2006).
226031Sstas
226031SstasAbstract
226031Sstas
226031Sstas   The previous Lightweight Directory Access Protocol (LDAP) technical
226031Sstas   specifications did not precisely define how character string matching
226031Sstas   is to be performed.  This led to a number of usability and
226031Sstas   interoperability problems.  This document defines string preparation
226031Sstas   algorithms for character-based matching rules defined for use in
226031Sstas   LDAP.
226031Sstas
226031Sstas1.  Introduction
226031Sstas
226031Sstas1.1.  Background
226031Sstas
226031Sstas   A Lightweight Directory Access Protocol (LDAP) [RFC4510] matching
226031Sstas   rule [RFC4517] defines an algorithm for determining whether a
226031Sstas   presented value matches an attribute value in accordance with the
226031Sstas   criteria defined for the rule.  The proposition may be evaluated to
226031Sstas   True, False, or Undefined.
226031Sstas
226031Sstas      True      - the attribute contains a matching value,
226031Sstas
226031Sstas      False     - the attribute contains no matching value,
226031Sstas
226031Sstas      Undefined - it cannot be determined whether the attribute contains
226031Sstas                  a matching value.
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                     [Page 1]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031Sstas   For instance, the caseIgnoreMatch matching rule may be used to
226031Sstas   compare whether the commonName attribute contains a particular value
226031Sstas   without regard for case and insignificant spaces.
226031Sstas
226031Sstas1.2.  X.500 String Matching Rules
226031Sstas
226031Sstas   "X.520: Selected attribute types" [X.520] provides (among other
226031Sstas   things) value syntaxes and matching rules for comparing values
226031Sstas   commonly used in the directory [X.500].  These specifications are
226031Sstas   inadequate for strings composed of Unicode [Unicode] characters.
226031Sstas
226031Sstas   The caseIgnoreMatch matching rule [X.520], for example, is simply
226031Sstas   defined as being a case-insensitive comparison where insignificant
226031Sstas   spaces are ignored.  For printableString, there is only one space
226031Sstas   character and case mapping is bijective, hence this definition is
226031Sstas   sufficient.  However, for Unicode string types such as
226031Sstas   universalString, this is not sufficient.  For example, a case-
226031Sstas   insensitive matching implementation that folded lowercase characters
226031Sstas   to uppercase would yield different results than an implementation
226031Sstas   that used uppercase to lowercase folding.  Or one implementation may
226031Sstas   view space as referring to only SPACE (U+0020), a second
226031Sstas   implementation may view any character with the space separator (Zs)
226031Sstas   property as a space, and another implementation may view any
226031Sstas   character with the whitespace (WS) category as a space.
226031Sstas
226031Sstas   The lack of precise specification for character string matching has
226031Sstas   led to significant interoperability problems.  When used in
226031Sstas   certificate chain validation, security vulnerabilities can arise.  To
226031Sstas   address these problems, this document defines precise algorithms for
226031Sstas   preparing character strings for matching.
226031Sstas
226031Sstas1.3.  Relationship to "stringprep"
226031Sstas
226031Sstas   The character string preparation algorithms described in this
226031Sstas   document are based upon the "stringprep" approach [RFC3454].  In
226031Sstas   "stringprep", presented and stored values are first prepared for
226031Sstas   comparison so that a character-by-character comparison yields the
226031Sstas   "correct" result.
226031Sstas
226031Sstas   The approach used here is a refinement of the "stringprep" [RFC3454]
226031Sstas   approach.  Each algorithm involves two additional preparation steps.
226031Sstas
226031Sstas   a) Prior to applying the Unicode string preparation steps outlined in
226031Sstas      "stringprep", the string is transcoded to Unicode.
226031Sstas
226031Sstas   b) After applying the Unicode string preparation steps outlined in
226031Sstas      "stringprep", the string is modified to appropriately handle
226031Sstas      characters insignificant to the matching rule.
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                     [Page 2]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031Sstas   Hence, preparation of character strings for X.500 [X.500] matching
226031Sstas   [X.501] involves the following steps:
226031Sstas
226031Sstas      1) Transcode
226031Sstas      2) Map
226031Sstas      3) Normalize
226031Sstas      4) Prohibit
226031Sstas      5) Check Bidi (Bidirectional)
226031Sstas      6) Insignificant Character Handling
226031Sstas
226031Sstas   These steps are described in Section 2.
226031Sstas
226031Sstas   It is noted that while various tables of Unicode characters included
226031Sstas   or referenced by this specification are derived from Unicode
226031Sstas   [Unicode] data, these tables are to be considered definitive for the
226031Sstas   purpose of implementing this specification.
226031Sstas
226031Sstas1.4.  Relationship to the LDAP Technical Specification
226031Sstas
226031Sstas   This document is an integral part of the LDAP technical specification
226031Sstas   [RFC4510], which obsoletes the previously defined LDAP technical
226031Sstas   specification [RFC3377] in its entirety.
226031Sstas
226031Sstas   This document details new LDAP internationalized character string
226031Sstas   preparation algorithms used by [RFC4517] and possible other technical
226031Sstas   specifications defining LDAP syntaxes and/or matching rules.
226031Sstas
226031Sstas1.5.  Relationship to X.500
226031Sstas
226031Sstas   LDAP is defined [RFC4510] in X.500 terms as an X.500 access
226031Sstas   mechanism.  As such, there is a strong desire for alignment between
226031Sstas   LDAP and X.500 syntax and semantics.  The character string
226031Sstas   preparation algorithms described in this document are based upon
226031Sstas   "Internationalized String Matching Rules for X.500" [XMATCH] proposal
226031Sstas   to ITU/ISO Joint Study Group 2.
226031Sstas
226031Sstas1.6.  Conventions and Terms
226031Sstas
226031Sstas   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
226031Sstas   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
226031Sstas   document are to be interpreted as described in BCP 14 [RFC2119].
226031Sstas
226031Sstas   Character names in this document use the notation for code points and
226031Sstas   names from the Unicode Standard [Unicode].  For example, the letter
226031Sstas   "a" may be represented as either <U+0061> or <LATIN SMALL LETTER A>.
226031Sstas   In the lists of mappings and the prohibited characters, the "U+" is
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                     [Page 3]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031Sstas   left off to make the lists easier to read.  The comments for
226031Sstas   character ranges are shown in square brackets (such as "[CONTROL
226031Sstas   CHARACTERS]") and do not come from the standard.
226031Sstas
226031Sstas   Note: a glossary of terms used in Unicode can be found in [Glossary].
226031Sstas   Information on the Unicode character encoding model can be found in
226031Sstas   [CharModel].
226031Sstas
226031Sstas   The term "combining mark", as used in this specification, refers to
226031Sstas   any Unicode [Unicode] code point that has a mark property (Mn, Mc,
226031Sstas   Me).  Appendix A provides a definitive list of combining marks.
226031Sstas
226031Sstas2.  String Preparation
226031Sstas
226031Sstas   The following six-step process SHALL be applied to each presented and
226031Sstas   attribute value in preparation for character string matching rule
226031Sstas   evaluation.
226031Sstas
226031Sstas      1) Transcode
226031Sstas      2) Map
226031Sstas      3) Normalize
226031Sstas      4) Prohibit
226031Sstas      5) Check bidi
226031Sstas      6) Insignificant Character Handling
226031Sstas
226031Sstas   Failure in any step causes the assertion to evaluate to Undefined.
226031Sstas
226031Sstas   The character repertoire of this process is Unicode 3.2 [Unicode].
226031Sstas
226031Sstas   Note that this six-step process specification is intended to describe
226031Sstas   expected matching behavior.  Implementations are free to use
226031Sstas   alternative processes so long as the matching rule evaluation
226031Sstas   behavior provided is consistent with the behavior described by this
226031Sstas   specification.
226031Sstas
226031Sstas2.1.  Transcode
226031Sstas
226031Sstas   Each non-Unicode string value is transcoded to Unicode.
226031Sstas
226031Sstas   PrintableString [X.680] values are transcoded directly to Unicode.
226031Sstas
226031Sstas   UniversalString, UTF8String, and bmpString [X.680] values need not be
226031Sstas   transcoded as they are Unicode-based strings (in the case of
226031Sstas   bmpString, a subset of Unicode).
226031Sstas
226031Sstas   TeletexString [X.680] values are transcoded to Unicode.  As there is
226031Sstas   no standard for mapping TeletexString values to Unicode, the mapping
226031Sstas   is left a local matter.
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                     [Page 4]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031Sstas   For these and other reasons, use of TeletexString is NOT RECOMMENDED.
226031Sstas
226031Sstas   The output is the transcoded string.
226031Sstas
226031Sstas2.2.  Map
226031Sstas
226031Sstas   SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code
226031Sstas   points are mapped to nothing.  COMBINING GRAPHEME JOINER (U+034F) and
226031Sstas   VARIATION SELECTORs (U+180B-180D, FF00-FE0F) code points are also
226031Sstas   mapped to nothing.  The OBJECT REPLACEMENT CHARACTER (U+FFFC) is
226031Sstas   mapped to nothing.
226031Sstas
226031Sstas   CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE
226031Sstas   TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR)
226031Sstas   (U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020).
226031Sstas
226031Sstas   All other control code (e.g., Cc) points or code points with a
226031Sstas   control function (e.g., Cf) are mapped to nothing.  The following is
226031Sstas   a complete list of these code points: U+0000-0008, 000E-001F, 007F-
226031Sstas   0084, 0086-009F, 06DD, 070F, 180E, 200C-200F, 202A-202E, 2060-2063,
226031Sstas   206A-206F, FEFF, FFF9-FFFB, 1D173-1D17A, E0001, E0020-E007F.
226031Sstas
226031Sstas   ZERO WIDTH SPACE (U+200B) is mapped to nothing.  All other code
226031Sstas   points with Separator (space, line, or paragraph) property (e.g., Zs,
226031Sstas   Zl, or Zp) are mapped to SPACE (U+0020).  The following is a complete
226031Sstas   list of these code points: U+0020, 00A0, 1680, 2000-200A, 2028-2029,
226031Sstas   202F, 205F, 3000.
226031Sstas
226031Sstas   For case ignore, numeric, and stored prefix string matching rules,
226031Sstas   characters are case folded per B.2 of [RFC3454].
226031Sstas
226031Sstas   The output is the mapped string.
226031Sstas
226031Sstas2.3.  Normalize
226031Sstas
226031Sstas   The input string is to be normalized to Unicode Form KC
226031Sstas   (compatibility composed) as described in [UAX15].  The output is the
226031Sstas   normalized string.
226031Sstas
226031Sstas2.4.  Prohibit
226031Sstas
226031Sstas   All Unassigned code points are prohibited.  Unassigned code points
226031Sstas   are listed in Table A.1 of [RFC3454].
226031Sstas
226031Sstas   Characters that, per Section 5.8 of [RFC3454], change display
226031Sstas   properties or are deprecated are prohibited.  These characters are
226031Sstas   listed in Table C.8 of [RFC3454].
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                     [Page 5]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031Sstas   Private Use code points are prohibited.  These characters are listed
226031Sstas   in Table C.3 of [RFC3454].
226031Sstas
226031Sstas   All non-character code points are prohibited.  These code points are
226031Sstas   listed in Table C.4 of [RFC3454].
226031Sstas
226031Sstas   Surrogate codes are prohibited.  These characters are listed in Table
226031Sstas   C.5 of [RFC3454].
226031Sstas
226031Sstas   The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited.
226031Sstas
226031Sstas   The step fails if the input string contains any prohibited code
226031Sstas   point.  Otherwise, the output is the input string.
226031Sstas
226031Sstas2.5.  Check bidi
226031Sstas
226031Sstas   Bidirectional characters are ignored.
226031Sstas
226031Sstas2.6.  Insignificant Character Handling
226031Sstas
226031Sstas   In this step, the string is modified to ensure proper handling of
226031Sstas   characters insignificant to the matching rule.  This modification
226031Sstas   differs from matching rule to matching rule.
226031Sstas
226031Sstas   Section 2.6.1 applies to case ignore and exact string matching.
226031Sstas   Section 2.6.2 applies to numericString matching.
226031Sstas   Section 2.6.3 applies to telephoneNumber matching.
226031Sstas
226031Sstas2.6.1.  Insignificant Space Handling
226031Sstas
226031Sstas   For the purposes of this section, a space is defined to be the SPACE
226031Sstas   (U+0020) code point followed by no combining marks.
226031Sstas
226031Sstas       NOTE - The previous steps ensure that the string cannot contain
226031Sstas              any code points in the separator class, other than SPACE
226031Sstas              (U+0020).
226031Sstas
226031Sstas   For input strings that are attribute values or non-substring
226031Sstas   assertion values:  If the input string contains no non-space
226031Sstas   character, then the output is exactly two SPACEs.  Otherwise (the
226031Sstas   input string contains at least one non-space character), the string
226031Sstas   is modified such that the string starts with exactly one space
226031Sstas   character, ends with exactly one SPACE character, and any inner
226031Sstas   (non-empty) sequence of space characters is replaced with exactly two
226031Sstas   SPACE characters.  For instance, the input strings
226031Sstas   "foo<SPACE>bar<SPACE><SPACE>", result in the output
226031Sstas   "<SPACE>foo<SPACE><SPACE>bar<SPACE>".
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                     [Page 6]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031Sstas   For input strings that are substring assertion values: If the string
226031Sstas   being prepared contains no non-space characters, then the output
226031Sstas   string is exactly one SPACE.  Otherwise, the following steps are
226031Sstas   taken:
226031Sstas
226031Sstas   -  If the input string is an initial substring, it is modified to
226031Sstas      start with exactly one SPACE character;
226031Sstas
226031Sstas   -  If the input string is an initial or an any substring that ends in
226031Sstas      one or more space characters, it is modified to end with exactly
226031Sstas      one SPACE character;
226031Sstas
226031Sstas   -  If the input string is an any or a final substring that starts in
226031Sstas      one or more space characters, it is modified to start with exactly
226031Sstas      one SPACE character; and
226031Sstas
226031Sstas   -  If the input string is a final substring, it is modified to end
226031Sstas      with exactly one SPACE character.
226031Sstas
226031Sstas   For instance, for the input string "foo<SPACE>bar<SPACE><SPACE>" as
226031Sstas   an initial substring, the output would be
226031Sstas   "<SPACE>foo<SPACE><SPACE>bar<SPACE>".  As an any or final substring,
226031Sstas   the same input would result in "foo<SPACE>bar<SPACE>".
226031Sstas
226031Sstas   Appendix B discusses the rationale for the behavior.
226031Sstas
226031Sstas2.6.2.  numericString Insignificant Character Handling
226031Sstas
226031Sstas   For the purposes of this section, a space is defined to be the SPACE
226031Sstas   (U+0020) code point followed by no combining marks.
226031Sstas
226031Sstas   All spaces are regarded as insignificant and are to be removed.
226031Sstas
226031Sstas   For example, removal of spaces from the Form KC string:
226031Sstas       "<SPACE><SPACE>123<SPACE><SPACE>456<SPACE><SPACE>"
226031Sstas   would result in the output string:
226031Sstas       "123456"
226031Sstas   and the Form KC string:
226031Sstas       "<SPACE><SPACE><SPACE>"
226031Sstas   would result in the output string:
226031Sstas       "" (an empty string).
226031Sstas
226031Sstas2.6.3.  telephoneNumber Insignificant Character Handling
226031Sstas
226031Sstas   For the purposes of this section, a hyphen is defined to be a
226031Sstas   HYPHEN-MINUS (U+002D), ARMENIAN HYPHEN (U+058A), HYPHEN (U+2010),
226031Sstas   NON-BREAKING HYPHEN (U+2011), MINUS SIGN (U+2212), SMALL HYPHEN-MINUS
226031Sstas   (U+FE63), or FULLWIDTH HYPHEN-MINUS (U+FF0D) code point followed by
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                     [Page 7]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031Sstas   no combining marks and a space is defined to be the SPACE (U+0020)
226031Sstas   code point followed by no combining marks.
226031Sstas
226031Sstas   All hyphens and spaces are considered insignificant and are to be
226031Sstas   removed.
226031Sstas
226031Sstas   For example, removal of hyphens and spaces from the Form KC string:
226031Sstas       "<SPACE><HYPHEN>123<SPACE><SPACE>456<SPACE><HYPHEN>"
226031Sstas   would result in the output string:
226031Sstas       "123456"
226031Sstas   and the Form KC string:
226031Sstas       "<HYPHEN><HYPHEN><HYPHEN>"
226031Sstas   would result in the (empty) output string:
226031Sstas       "".
226031Sstas
226031Sstas3.  Security Considerations
226031Sstas
226031Sstas   "Preparation of Internationalized Strings ("stringprep")" [RFC3454]
226031Sstas   security considerations generally apply to the algorithms described
226031Sstas   here.
226031Sstas
226031Sstas4.  Acknowledgements
226031Sstas
226031Sstas   The approach used in this document is based upon design principles
226031Sstas   and algorithms described in "Preparation of Internationalized Strings
226031Sstas   ('stringprep')" [RFC3454] by Paul Hoffman and Marc Blanchet.  Some
226031Sstas   additional guidance was drawn from Unicode Technical Standards,
226031Sstas   Technical Reports, and Notes.
226031Sstas
226031Sstas   This document is a product of the IETF LDAP Revision (LDAPBIS)
226031Sstas   Working Group.
226031Sstas
226031Sstas5.  References
226031Sstas
226031Sstas5.1.  Normative References
226031Sstas
226031Sstas   [RFC2119]     Bradner, S., "Key words for use in RFCs to Indicate
226031Sstas                 Requirement Levels", BCP 14, RFC 2119, March 1997.
226031Sstas
226031Sstas   [RFC3454]     Hoffman, P. and M. Blanchet, "Preparation of
226031Sstas                 Internationalized Strings ("stringprep")", RFC 3454,
226031Sstas                 December 2002.
226031Sstas
226031Sstas   [RFC4510]     Zeilenga, K., "Lightweight Directory Access Protocol
226031Sstas                 (LDAP): Technical Specification Road Map", RFC 4510,
226031Sstas                 June 2006.
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                     [Page 8]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031Sstas   [RFC4517]     Legg, S., Ed., "Lightweight Directory Access Protocol
226031Sstas                 (LDAP): Syntaxes and Matching Rules", RFC 4517, June
226031Sstas                 2006.
226031Sstas
226031Sstas   [Unicode]     The Unicode Consortium, "The Unicode Standard, Version
226031Sstas                 3.2.0" is defined by "The Unicode Standard, Version
226031Sstas                 3.0" (Reading, MA, Addison-Wesley, 2000.  ISBN 0-201-
226031Sstas                 61633-5), as amended by the "Unicode Standard Annex
226031Sstas                 #27: Unicode 3.1"
226031Sstas                 (http://www.unicode.org/reports/tr27/) and by the
226031Sstas                 "Unicode Standard Annex #28: Unicode 3.2"
226031Sstas                 (http://www.unicode.org/reports/tr28/).
226031Sstas
226031Sstas   [UAX15]       Davis, M. and M. Duerst, "Unicode Standard Annex #15:
226031Sstas                 Unicode Normalization Forms, Version 3.2.0".
226031Sstas                 <http://www.unicode.org/unicode/reports/tr15/tr15-
226031Sstas                 22.html>, March 2002.
226031Sstas
226031Sstas   [X.680]       International Telecommunication Union -
226031Sstas                 Telecommunication Standardization Sector, "Abstract
226031Sstas                 Syntax Notation One (ASN.1) - Specification of Basic
226031Sstas                 Notation", X.680(2002) (also ISO/IEC 8824-1:2002).
226031Sstas
226031Sstas5.2.  Informative References
226031Sstas
226031Sstas   [X.500]       International Telecommunication Union -
226031Sstas                 Telecommunication Standardization Sector, "The
226031Sstas                 Directory -- Overview of concepts, models and
226031Sstas                 services," X.500(1993) (also ISO/IEC 9594-1:1994).
226031Sstas
226031Sstas   [X.501]       International Telecommunication Union -
226031Sstas                 Telecommunication Standardization Sector, "The
226031Sstas                 Directory -- Models," X.501(1993) (also ISO/IEC 9594-
226031Sstas                 2:1994).
226031Sstas
226031Sstas   [X.520]       International Telecommunication Union -
226031Sstas                 Telecommunication Standardization Sector, "The
226031Sstas                 Directory: Selected Attribute Types", X.520(1993) (also
226031Sstas                 ISO/IEC 9594-6:1994).
226031Sstas
226031Sstas   [Glossary]    The Unicode Consortium, "Unicode Glossary",
226031Sstas                 <http://www.unicode.org/glossary/>.
226031Sstas
226031Sstas   [CharModel]   Whistler, K. and M. Davis, "Unicode Technical Report
226031Sstas                 #17, Character Encoding Model", UTR17,
226031Sstas                 <http://www.unicode.org/unicode/reports/tr17/>, August
226031Sstas                 2000.
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                     [Page 9]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031Sstas   [RFC3377]     Hodges, J. and R. Morgan, "Lightweight Directory Access
226031Sstas                 Protocol (v3): Technical Specification", RFC 3377,
226031Sstas                 September 2002.
226031Sstas
226031Sstas   [RFC4515]     Smith, M., Ed. and T. Howes, "Lightweight Directory
226031Sstas                 Access Protocol (LDAP): String Representation of Search
226031Sstas                 Filters", RFC 4515, June 2006.
226031Sstas
226031Sstas   [XMATCH]      Zeilenga, K., "Internationalized String Matching Rules
226031Sstas                 for X.500", Work in Progress.
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                    [Page 10]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031SstasAppendix A.  Combining Marks
226031Sstas
226031Sstas   This appendix is normative.
226031Sstas
226031Sstas   This table was derived from Unicode [Unicode] data files; it lists
226031Sstas   all code points with the Mn, Mc, or Me properties.  This table is to
226031Sstas   be considered definitive for the purposes of implementation of this
226031Sstas   specification.
226031Sstas
226031Sstas         0300-034F 0360-036F 0483-0486 0488-0489 0591-05A1
226031Sstas         05A3-05B9 05BB-05BC 05BF 05C1-05C2 05C4 064B-0655 0670
226031Sstas         06D6-06DC 06DE-06E4 06E7-06E8 06EA-06ED 0711 0730-074A
226031Sstas         07A6-07B0 0901-0903 093C 093E-094F 0951-0954 0962-0963
226031Sstas         0981-0983 09BC 09BE-09C4 09C7-09C8 09CB-09CD 09D7
226031Sstas         09E2-09E3 0A02 0A3C 0A3E-0A42 0A47-0A48 0A4B-0A4D
226031Sstas         0A70-0A71 0A81-0A83 0ABC 0ABE-0AC5 0AC7-0AC9 0ACB-0ACD
226031Sstas         0B01-0B03 0B3C 0B3E-0B43 0B47-0B48 0B4B-0B4D 0B56-0B57
226031Sstas         0B82 0BBE-0BC2 0BC6-0BC8 0BCA-0BCD 0BD7 0C01-0C03
226031Sstas         0C3E-0C44 0C46-0C48 0C4A-0C4D 0C55-0C56 0C82-0C83
226031Sstas         0CBE-0CC4 0CC6-0CC8 0CCA-0CCD 0CD5-0CD6 0D02-0D03
226031Sstas         0D3E-0D43 0D46-0D48 0D4A-0D4D 0D57 0D82-0D83 0DCA
226031Sstas         0DCF-0DD4 0DD6 0DD8-0DDF 0DF2-0DF3 0E31 0E34-0E3A
226031Sstas         0E47-0E4E 0EB1 0EB4-0EB9 0EBB-0EBC 0EC8-0ECD 0F18-0F19
226031Sstas         0F35 0F37 0F39 0F3E-0F3F 0F71-0F84 0F86-0F87 0F90-0F97
226031Sstas         0F99-0FBC 0FC6 102C-1032 1036-1039 1056-1059 1712-1714
226031Sstas         1732-1734 1752-1753 1772-1773 17B4-17D3 180B-180D 18A9
226031Sstas         20D0-20EA 302A-302F 3099-309A FB1E FE00-FE0F FE20-FE23
226031Sstas         1D165-1D169 1D16D-1D172 1D17B-1D182 1D185-1D18B
226031Sstas         1D1AA-1D1AD
226031Sstas
226031SstasAppendix B.  Substrings Matching
226031Sstas
226031Sstas   This appendix is non-normative.
226031Sstas
226031Sstas   In the absence of substrings matching, the insignificant space
226031Sstas   handling for case ignore/exact matching could be simplified.
226031Sstas   Specifically, the handling could be to require that all sequences of
226031Sstas   one or more spaces be replaced with one space and, if the string
226031Sstas   contains non-space characters, removal of all leading spaces and
226031Sstas   trailing spaces.
226031Sstas
226031Sstas   In the presence of substrings matching, this simplified space
226031Sstas   handling would lead to unexpected and undesirable matching behavior.
226031Sstas   For instance:
226031Sstas
226031Sstas   1) (CN=foo\20*\20bar) would match the CN value "foobar";
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                    [Page 11]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031Sstas   2) (CN=*\20foobar\20*) would match "foobar", but
226031Sstas      (CN=*\20*foobar*\20*) would not.
226031Sstas
226031Sstas   Note to readers not familiar with LDAP substrings matching: the LDAP
226031Sstas   filter [RFC4515] assertion (CN=A*B*C) says to "match any value (of
226031Sstas   the attribute CN) that begins with A, contains B after A, ends with C
226031Sstas   where C is also after B."
226031Sstas
226031Sstas   The first case illustrates that this simplified space handling would
226031Sstas   cause leading and trailing spaces in substrings of the string to be
226031Sstas   regarded as insignificant.  However, only leading and trailing (as
226031Sstas   well as multiple consecutive spaces) of the string (as a whole) are
226031Sstas   insignificant.
226031Sstas
226031Sstas   The second case illustrates that this simplified space handling would
226031Sstas   cause sub-partitioning failures.  That is, if a prepared any
226031Sstas   substring matches a partition of the attribute value, then an
226031Sstas   assertion constructed by subdividing that substring into multiple
226031Sstas   substrings should also match.
226031Sstas
226031Sstas   In designing an appropriate approach for space handling for
226031Sstas   substrings matching, one must study key aspects of X.500 case
226031Sstas   exact/ignore matching.  X.520 [X.520] says:
226031Sstas
226031Sstas      The [substrings] rule returns TRUE if there is a partitioning of
226031Sstas      the attribute value (into portions) such that:
226031Sstas
226031Sstas         -  the specified substrings (initial, any, final) match
226031Sstas            different portions of the value in the order of the strings
226031Sstas            sequence;
226031Sstas
226031Sstas         -  initial, if present, matches the first portion of the value;
226031Sstas
226031Sstas         -  final, if present, matches the last portion of the value;
226031Sstas
226031Sstas         -  any, if present, matches some arbitrary portion of the
226031Sstas            value.
226031Sstas
226031Sstas   That is, the substrings assertion (CN=foo\20*\20bar) matches the
226031Sstas   attribute value "foo<SPACE><SPACE>bar" as the value can be
226031Sstas   partitioned into the portions "foo<SPACE>" and "<SPACE>bar" meeting
226031Sstas   the above requirements.
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                    [Page 12]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031Sstas   X.520 also says:
226031Sstas
226031Sstas      [T]he following spaces are regarded as not significant:
226031Sstas
226031Sstas         -  leading spaces (i.e., those preceding the first character
226031Sstas            that is not a space);
226031Sstas
226031Sstas         -  trailing spaces (i.e., those following the last character
226031Sstas            that is not a space);
226031Sstas
226031Sstas         -  multiple consecutive spaces (these are taken as equivalent
226031Sstas            to a single space character).
226031Sstas
226031Sstas   This statement applies to the assertion values and attribute values
226031Sstas   as whole strings, and not individually to substrings of an assertion
226031Sstas   value.  In particular, the statements should be taken to mean that if
226031Sstas   an assertion value and attribute value match without any
226031Sstas   consideration to insignificant characters, then that assertion value
226031Sstas   should also match any attribute value that differs only by inclusion
226031Sstas   nor removal of insignificant characters.
226031Sstas
226031Sstas   Hence the assertion (CN=foo\20*\20bar) matches
226031Sstas   "foo<SPACE><SPACE><SPACE>bar" and "foo<SPACE>bar" as these values
226031Sstas   only differ from "foo<SPACE><SPACE>bar" by the inclusion or removal
226031Sstas   of insignificant spaces.
226031Sstas
226031Sstas   Astute readers of this text will also note that there are special
226031Sstas   cases where the specified space handling does not ignore spaces that
226031Sstas   could be considered insignificant.  For instance, the assertion
226031Sstas   (CN=\20*\20*\20) does not match "<SPACE><SPACE><SPACE>"
226031Sstas   (insignificant spaces present in value) or " " (insignificant spaces
226031Sstas   not present in value).  However, as these cases have no practical
226031Sstas   application that cannot be met by simple assertions, e.g., (cn=\20),
226031Sstas   and this minor anomaly can only be fully addressed by a preparation
226031Sstas   algorithm to be used in conjunction with character-by-character
226031Sstas   partitioning and matching, the anomaly is considered acceptable.
226031Sstas
226031SstasAuthor's Address
226031Sstas
226031Sstas   Kurt D. Zeilenga
226031Sstas   OpenLDAP Foundation
226031Sstas
226031Sstas   EMail: Kurt@OpenLDAP.org
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                    [Page 13]
226031Sstas
226031SstasRFC 4518       LDAP: Internationalized String Preparation      June 2006
226031Sstas
226031Sstas
226031SstasFull Copyright Statement
226031Sstas
226031Sstas   Copyright (C) The Internet Society (2006).
226031Sstas
226031Sstas   This document is subject to the rights, licenses and restrictions
226031Sstas   contained in BCP 78, and except as set forth therein, the authors
226031Sstas   retain all their rights.
226031Sstas
226031Sstas   This document and the information contained herein are provided on an
226031Sstas   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
226031Sstas   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
226031Sstas   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
226031Sstas   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
226031Sstas   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
226031Sstas   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
226031Sstas
226031SstasIntellectual Property
226031Sstas
226031Sstas   The IETF takes no position regarding the validity or scope of any
226031Sstas   Intellectual Property Rights or other rights that might be claimed to
226031Sstas   pertain to the implementation or use of the technology described in
226031Sstas   this document or the extent to which any license under such rights
226031Sstas   might or might not be available; nor does it represent that it has
226031Sstas   made any independent effort to identify any such rights.  Information
226031Sstas   on the procedures with respect to rights in RFC documents can be
226031Sstas   found in BCP 78 and BCP 79.
226031Sstas
226031Sstas   Copies of IPR disclosures made to the IETF Secretariat and any
226031Sstas   assurances of licenses to be made available, or the result of an
226031Sstas   attempt made to obtain a general license or permission for the use of
226031Sstas   such proprietary rights by implementers or users of this
226031Sstas   specification can be obtained from the IETF on-line IPR repository at
226031Sstas   http://www.ietf.org/ipr.
226031Sstas
226031Sstas   The IETF invites any interested party to bring to its attention any
226031Sstas   copyrights, patents or patent applications, or other proprietary
226031Sstas   rights that may cover technology that may be required to implement
226031Sstas   this standard.  Please address the information to the IETF at
226031Sstas   ietf-ipr@ietf.org.
226031Sstas
226031SstasAcknowledgement
226031Sstas
226031Sstas   Funding for the RFC Editor function is provided by the IETF
226031Sstas   Administrative Support Activity (IASA).
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031Sstas
226031SstasZeilenga                    Standards Track                    [Page 14]
226031Sstas