1226031Sstas 2226031Sstas 3226031Sstas 4226031Sstas 5226031Sstas 6226031Sstas 7226031SstasNetwork Working Group K. Zeilenga 8226031SstasRequest for Comments: 4518 OpenLDAP Foundation 9226031SstasCategory: Standards Track June 2006 10226031Sstas 11226031Sstas 12226031Sstas Lightweight Directory Access Protocol (LDAP): 13226031Sstas Internationalized String Preparation 14226031Sstas 15226031SstasStatus of This Memo 16226031Sstas 17226031Sstas This document specifies an Internet standards track protocol for the 18226031Sstas Internet community, and requests discussion and suggestions for 19226031Sstas improvements. Please refer to the current edition of the "Internet 20226031Sstas Official Protocol Standards" (STD 1) for the standardization state 21226031Sstas and status of this protocol. Distribution of this memo is unlimited. 22226031Sstas 23226031SstasCopyright Notice 24226031Sstas 25226031Sstas Copyright (C) The Internet Society (2006). 26226031Sstas 27226031SstasAbstract 28226031Sstas 29226031Sstas The previous Lightweight Directory Access Protocol (LDAP) technical 30226031Sstas specifications did not precisely define how character string matching 31226031Sstas is to be performed. This led to a number of usability and 32226031Sstas interoperability problems. This document defines string preparation 33226031Sstas algorithms for character-based matching rules defined for use in 34226031Sstas LDAP. 35226031Sstas 36226031Sstas1. Introduction 37226031Sstas 38226031Sstas1.1. Background 39226031Sstas 40226031Sstas A Lightweight Directory Access Protocol (LDAP) [RFC4510] matching 41226031Sstas rule [RFC4517] defines an algorithm for determining whether a 42226031Sstas presented value matches an attribute value in accordance with the 43226031Sstas criteria defined for the rule. The proposition may be evaluated to 44226031Sstas True, False, or Undefined. 45226031Sstas 46226031Sstas True - the attribute contains a matching value, 47226031Sstas 48226031Sstas False - the attribute contains no matching value, 49226031Sstas 50226031Sstas Undefined - it cannot be determined whether the attribute contains 51226031Sstas a matching value. 52226031Sstas 53226031Sstas 54226031Sstas 55226031Sstas 56226031Sstas 57226031Sstas 58226031SstasZeilenga Standards Track [Page 1] 59226031Sstas 60226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 61226031Sstas 62226031Sstas 63226031Sstas For instance, the caseIgnoreMatch matching rule may be used to 64226031Sstas compare whether the commonName attribute contains a particular value 65226031Sstas without regard for case and insignificant spaces. 66226031Sstas 67226031Sstas1.2. X.500 String Matching Rules 68226031Sstas 69226031Sstas "X.520: Selected attribute types" [X.520] provides (among other 70226031Sstas things) value syntaxes and matching rules for comparing values 71226031Sstas commonly used in the directory [X.500]. These specifications are 72226031Sstas inadequate for strings composed of Unicode [Unicode] characters. 73226031Sstas 74226031Sstas The caseIgnoreMatch matching rule [X.520], for example, is simply 75226031Sstas defined as being a case-insensitive comparison where insignificant 76226031Sstas spaces are ignored. For printableString, there is only one space 77226031Sstas character and case mapping is bijective, hence this definition is 78226031Sstas sufficient. However, for Unicode string types such as 79226031Sstas universalString, this is not sufficient. For example, a case- 80226031Sstas insensitive matching implementation that folded lowercase characters 81226031Sstas to uppercase would yield different results than an implementation 82226031Sstas that used uppercase to lowercase folding. Or one implementation may 83226031Sstas view space as referring to only SPACE (U+0020), a second 84226031Sstas implementation may view any character with the space separator (Zs) 85226031Sstas property as a space, and another implementation may view any 86226031Sstas character with the whitespace (WS) category as a space. 87226031Sstas 88226031Sstas The lack of precise specification for character string matching has 89226031Sstas led to significant interoperability problems. When used in 90226031Sstas certificate chain validation, security vulnerabilities can arise. To 91226031Sstas address these problems, this document defines precise algorithms for 92226031Sstas preparing character strings for matching. 93226031Sstas 94226031Sstas1.3. Relationship to "stringprep" 95226031Sstas 96226031Sstas The character string preparation algorithms described in this 97226031Sstas document are based upon the "stringprep" approach [RFC3454]. In 98226031Sstas "stringprep", presented and stored values are first prepared for 99226031Sstas comparison so that a character-by-character comparison yields the 100226031Sstas "correct" result. 101226031Sstas 102226031Sstas The approach used here is a refinement of the "stringprep" [RFC3454] 103226031Sstas approach. Each algorithm involves two additional preparation steps. 104226031Sstas 105226031Sstas a) Prior to applying the Unicode string preparation steps outlined in 106226031Sstas "stringprep", the string is transcoded to Unicode. 107226031Sstas 108226031Sstas b) After applying the Unicode string preparation steps outlined in 109226031Sstas "stringprep", the string is modified to appropriately handle 110226031Sstas characters insignificant to the matching rule. 111226031Sstas 112226031Sstas 113226031Sstas 114226031SstasZeilenga Standards Track [Page 2] 115226031Sstas 116226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 117226031Sstas 118226031Sstas 119226031Sstas Hence, preparation of character strings for X.500 [X.500] matching 120226031Sstas [X.501] involves the following steps: 121226031Sstas 122226031Sstas 1) Transcode 123226031Sstas 2) Map 124226031Sstas 3) Normalize 125226031Sstas 4) Prohibit 126226031Sstas 5) Check Bidi (Bidirectional) 127226031Sstas 6) Insignificant Character Handling 128226031Sstas 129226031Sstas These steps are described in Section 2. 130226031Sstas 131226031Sstas It is noted that while various tables of Unicode characters included 132226031Sstas or referenced by this specification are derived from Unicode 133226031Sstas [Unicode] data, these tables are to be considered definitive for the 134226031Sstas purpose of implementing this specification. 135226031Sstas 136226031Sstas1.4. Relationship to the LDAP Technical Specification 137226031Sstas 138226031Sstas This document is an integral part of the LDAP technical specification 139226031Sstas [RFC4510], which obsoletes the previously defined LDAP technical 140226031Sstas specification [RFC3377] in its entirety. 141226031Sstas 142226031Sstas This document details new LDAP internationalized character string 143226031Sstas preparation algorithms used by [RFC4517] and possible other technical 144226031Sstas specifications defining LDAP syntaxes and/or matching rules. 145226031Sstas 146226031Sstas1.5. Relationship to X.500 147226031Sstas 148226031Sstas LDAP is defined [RFC4510] in X.500 terms as an X.500 access 149226031Sstas mechanism. As such, there is a strong desire for alignment between 150226031Sstas LDAP and X.500 syntax and semantics. The character string 151226031Sstas preparation algorithms described in this document are based upon 152226031Sstas "Internationalized String Matching Rules for X.500" [XMATCH] proposal 153226031Sstas to ITU/ISO Joint Study Group 2. 154226031Sstas 155226031Sstas1.6. Conventions and Terms 156226031Sstas 157226031Sstas The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 158226031Sstas "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 159226031Sstas document are to be interpreted as described in BCP 14 [RFC2119]. 160226031Sstas 161226031Sstas Character names in this document use the notation for code points and 162226031Sstas names from the Unicode Standard [Unicode]. For example, the letter 163226031Sstas "a" may be represented as either <U+0061> or <LATIN SMALL LETTER A>. 164226031Sstas In the lists of mappings and the prohibited characters, the "U+" is 165226031Sstas 166226031Sstas 167226031Sstas 168226031Sstas 169226031Sstas 170226031SstasZeilenga Standards Track [Page 3] 171226031Sstas 172226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 173226031Sstas 174226031Sstas 175226031Sstas left off to make the lists easier to read. The comments for 176226031Sstas character ranges are shown in square brackets (such as "[CONTROL 177226031Sstas CHARACTERS]") and do not come from the standard. 178226031Sstas 179226031Sstas Note: a glossary of terms used in Unicode can be found in [Glossary]. 180226031Sstas Information on the Unicode character encoding model can be found in 181226031Sstas [CharModel]. 182226031Sstas 183226031Sstas The term "combining mark", as used in this specification, refers to 184226031Sstas any Unicode [Unicode] code point that has a mark property (Mn, Mc, 185226031Sstas Me). Appendix A provides a definitive list of combining marks. 186226031Sstas 187226031Sstas2. String Preparation 188226031Sstas 189226031Sstas The following six-step process SHALL be applied to each presented and 190226031Sstas attribute value in preparation for character string matching rule 191226031Sstas evaluation. 192226031Sstas 193226031Sstas 1) Transcode 194226031Sstas 2) Map 195226031Sstas 3) Normalize 196226031Sstas 4) Prohibit 197226031Sstas 5) Check bidi 198226031Sstas 6) Insignificant Character Handling 199226031Sstas 200226031Sstas Failure in any step causes the assertion to evaluate to Undefined. 201226031Sstas 202226031Sstas The character repertoire of this process is Unicode 3.2 [Unicode]. 203226031Sstas 204226031Sstas Note that this six-step process specification is intended to describe 205226031Sstas expected matching behavior. Implementations are free to use 206226031Sstas alternative processes so long as the matching rule evaluation 207226031Sstas behavior provided is consistent with the behavior described by this 208226031Sstas specification. 209226031Sstas 210226031Sstas2.1. Transcode 211226031Sstas 212226031Sstas Each non-Unicode string value is transcoded to Unicode. 213226031Sstas 214226031Sstas PrintableString [X.680] values are transcoded directly to Unicode. 215226031Sstas 216226031Sstas UniversalString, UTF8String, and bmpString [X.680] values need not be 217226031Sstas transcoded as they are Unicode-based strings (in the case of 218226031Sstas bmpString, a subset of Unicode). 219226031Sstas 220226031Sstas TeletexString [X.680] values are transcoded to Unicode. As there is 221226031Sstas no standard for mapping TeletexString values to Unicode, the mapping 222226031Sstas is left a local matter. 223226031Sstas 224226031Sstas 225226031Sstas 226226031SstasZeilenga Standards Track [Page 4] 227226031Sstas 228226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 229226031Sstas 230226031Sstas 231226031Sstas For these and other reasons, use of TeletexString is NOT RECOMMENDED. 232226031Sstas 233226031Sstas The output is the transcoded string. 234226031Sstas 235226031Sstas2.2. Map 236226031Sstas 237226031Sstas SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code 238226031Sstas points are mapped to nothing. COMBINING GRAPHEME JOINER (U+034F) and 239226031Sstas VARIATION SELECTORs (U+180B-180D, FF00-FE0F) code points are also 240226031Sstas mapped to nothing. The OBJECT REPLACEMENT CHARACTER (U+FFFC) is 241226031Sstas mapped to nothing. 242226031Sstas 243226031Sstas CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE 244226031Sstas TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR) 245226031Sstas (U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020). 246226031Sstas 247226031Sstas All other control code (e.g., Cc) points or code points with a 248226031Sstas control function (e.g., Cf) are mapped to nothing. The following is 249226031Sstas a complete list of these code points: U+0000-0008, 000E-001F, 007F- 250226031Sstas 0084, 0086-009F, 06DD, 070F, 180E, 200C-200F, 202A-202E, 2060-2063, 251226031Sstas 206A-206F, FEFF, FFF9-FFFB, 1D173-1D17A, E0001, E0020-E007F. 252226031Sstas 253226031Sstas ZERO WIDTH SPACE (U+200B) is mapped to nothing. All other code 254226031Sstas points with Separator (space, line, or paragraph) property (e.g., Zs, 255226031Sstas Zl, or Zp) are mapped to SPACE (U+0020). The following is a complete 256226031Sstas list of these code points: U+0020, 00A0, 1680, 2000-200A, 2028-2029, 257226031Sstas 202F, 205F, 3000. 258226031Sstas 259226031Sstas For case ignore, numeric, and stored prefix string matching rules, 260226031Sstas characters are case folded per B.2 of [RFC3454]. 261226031Sstas 262226031Sstas The output is the mapped string. 263226031Sstas 264226031Sstas2.3. Normalize 265226031Sstas 266226031Sstas The input string is to be normalized to Unicode Form KC 267226031Sstas (compatibility composed) as described in [UAX15]. The output is the 268226031Sstas normalized string. 269226031Sstas 270226031Sstas2.4. Prohibit 271226031Sstas 272226031Sstas All Unassigned code points are prohibited. Unassigned code points 273226031Sstas are listed in Table A.1 of [RFC3454]. 274226031Sstas 275226031Sstas Characters that, per Section 5.8 of [RFC3454], change display 276226031Sstas properties or are deprecated are prohibited. These characters are 277226031Sstas listed in Table C.8 of [RFC3454]. 278226031Sstas 279226031Sstas 280226031Sstas 281226031Sstas 282226031SstasZeilenga Standards Track [Page 5] 283226031Sstas 284226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 285226031Sstas 286226031Sstas 287226031Sstas Private Use code points are prohibited. These characters are listed 288226031Sstas in Table C.3 of [RFC3454]. 289226031Sstas 290226031Sstas All non-character code points are prohibited. These code points are 291226031Sstas listed in Table C.4 of [RFC3454]. 292226031Sstas 293226031Sstas Surrogate codes are prohibited. These characters are listed in Table 294226031Sstas C.5 of [RFC3454]. 295226031Sstas 296226031Sstas The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited. 297226031Sstas 298226031Sstas The step fails if the input string contains any prohibited code 299226031Sstas point. Otherwise, the output is the input string. 300226031Sstas 301226031Sstas2.5. Check bidi 302226031Sstas 303226031Sstas Bidirectional characters are ignored. 304226031Sstas 305226031Sstas2.6. Insignificant Character Handling 306226031Sstas 307226031Sstas In this step, the string is modified to ensure proper handling of 308226031Sstas characters insignificant to the matching rule. This modification 309226031Sstas differs from matching rule to matching rule. 310226031Sstas 311226031Sstas Section 2.6.1 applies to case ignore and exact string matching. 312226031Sstas Section 2.6.2 applies to numericString matching. 313226031Sstas Section 2.6.3 applies to telephoneNumber matching. 314226031Sstas 315226031Sstas2.6.1. Insignificant Space Handling 316226031Sstas 317226031Sstas For the purposes of this section, a space is defined to be the SPACE 318226031Sstas (U+0020) code point followed by no combining marks. 319226031Sstas 320226031Sstas NOTE - The previous steps ensure that the string cannot contain 321226031Sstas any code points in the separator class, other than SPACE 322226031Sstas (U+0020). 323226031Sstas 324226031Sstas For input strings that are attribute values or non-substring 325226031Sstas assertion values: If the input string contains no non-space 326226031Sstas character, then the output is exactly two SPACEs. Otherwise (the 327226031Sstas input string contains at least one non-space character), the string 328226031Sstas is modified such that the string starts with exactly one space 329226031Sstas character, ends with exactly one SPACE character, and any inner 330226031Sstas (non-empty) sequence of space characters is replaced with exactly two 331226031Sstas SPACE characters. For instance, the input strings 332226031Sstas "foo<SPACE>bar<SPACE><SPACE>", result in the output 333226031Sstas "<SPACE>foo<SPACE><SPACE>bar<SPACE>". 334226031Sstas 335226031Sstas 336226031Sstas 337226031Sstas 338226031SstasZeilenga Standards Track [Page 6] 339226031Sstas 340226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 341226031Sstas 342226031Sstas 343226031Sstas For input strings that are substring assertion values: If the string 344226031Sstas being prepared contains no non-space characters, then the output 345226031Sstas string is exactly one SPACE. Otherwise, the following steps are 346226031Sstas taken: 347226031Sstas 348226031Sstas - If the input string is an initial substring, it is modified to 349226031Sstas start with exactly one SPACE character; 350226031Sstas 351226031Sstas - If the input string is an initial or an any substring that ends in 352226031Sstas one or more space characters, it is modified to end with exactly 353226031Sstas one SPACE character; 354226031Sstas 355226031Sstas - If the input string is an any or a final substring that starts in 356226031Sstas one or more space characters, it is modified to start with exactly 357226031Sstas one SPACE character; and 358226031Sstas 359226031Sstas - If the input string is a final substring, it is modified to end 360226031Sstas with exactly one SPACE character. 361226031Sstas 362226031Sstas For instance, for the input string "foo<SPACE>bar<SPACE><SPACE>" as 363226031Sstas an initial substring, the output would be 364226031Sstas "<SPACE>foo<SPACE><SPACE>bar<SPACE>". As an any or final substring, 365226031Sstas the same input would result in "foo<SPACE>bar<SPACE>". 366226031Sstas 367226031Sstas Appendix B discusses the rationale for the behavior. 368226031Sstas 369226031Sstas2.6.2. numericString Insignificant Character Handling 370226031Sstas 371226031Sstas For the purposes of this section, a space is defined to be the SPACE 372226031Sstas (U+0020) code point followed by no combining marks. 373226031Sstas 374226031Sstas All spaces are regarded as insignificant and are to be removed. 375226031Sstas 376226031Sstas For example, removal of spaces from the Form KC string: 377226031Sstas "<SPACE><SPACE>123<SPACE><SPACE>456<SPACE><SPACE>" 378226031Sstas would result in the output string: 379226031Sstas "123456" 380226031Sstas and the Form KC string: 381226031Sstas "<SPACE><SPACE><SPACE>" 382226031Sstas would result in the output string: 383226031Sstas "" (an empty string). 384226031Sstas 385226031Sstas2.6.3. telephoneNumber Insignificant Character Handling 386226031Sstas 387226031Sstas For the purposes of this section, a hyphen is defined to be a 388226031Sstas HYPHEN-MINUS (U+002D), ARMENIAN HYPHEN (U+058A), HYPHEN (U+2010), 389226031Sstas NON-BREAKING HYPHEN (U+2011), MINUS SIGN (U+2212), SMALL HYPHEN-MINUS 390226031Sstas (U+FE63), or FULLWIDTH HYPHEN-MINUS (U+FF0D) code point followed by 391226031Sstas 392226031Sstas 393226031Sstas 394226031SstasZeilenga Standards Track [Page 7] 395226031Sstas 396226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 397226031Sstas 398226031Sstas 399226031Sstas no combining marks and a space is defined to be the SPACE (U+0020) 400226031Sstas code point followed by no combining marks. 401226031Sstas 402226031Sstas All hyphens and spaces are considered insignificant and are to be 403226031Sstas removed. 404226031Sstas 405226031Sstas For example, removal of hyphens and spaces from the Form KC string: 406226031Sstas "<SPACE><HYPHEN>123<SPACE><SPACE>456<SPACE><HYPHEN>" 407226031Sstas would result in the output string: 408226031Sstas "123456" 409226031Sstas and the Form KC string: 410226031Sstas "<HYPHEN><HYPHEN><HYPHEN>" 411226031Sstas would result in the (empty) output string: 412226031Sstas "". 413226031Sstas 414226031Sstas3. Security Considerations 415226031Sstas 416226031Sstas "Preparation of Internationalized Strings ("stringprep")" [RFC3454] 417226031Sstas security considerations generally apply to the algorithms described 418226031Sstas here. 419226031Sstas 420226031Sstas4. Acknowledgements 421226031Sstas 422226031Sstas The approach used in this document is based upon design principles 423226031Sstas and algorithms described in "Preparation of Internationalized Strings 424226031Sstas ('stringprep')" [RFC3454] by Paul Hoffman and Marc Blanchet. Some 425226031Sstas additional guidance was drawn from Unicode Technical Standards, 426226031Sstas Technical Reports, and Notes. 427226031Sstas 428226031Sstas This document is a product of the IETF LDAP Revision (LDAPBIS) 429226031Sstas Working Group. 430226031Sstas 431226031Sstas5. References 432226031Sstas 433226031Sstas5.1. Normative References 434226031Sstas 435226031Sstas [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 436226031Sstas Requirement Levels", BCP 14, RFC 2119, March 1997. 437226031Sstas 438226031Sstas [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 439226031Sstas Internationalized Strings ("stringprep")", RFC 3454, 440226031Sstas December 2002. 441226031Sstas 442226031Sstas [RFC4510] Zeilenga, K., "Lightweight Directory Access Protocol 443226031Sstas (LDAP): Technical Specification Road Map", RFC 4510, 444226031Sstas June 2006. 445226031Sstas 446226031Sstas 447226031Sstas 448226031Sstas 449226031Sstas 450226031SstasZeilenga Standards Track [Page 8] 451226031Sstas 452226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 453226031Sstas 454226031Sstas 455226031Sstas [RFC4517] Legg, S., Ed., "Lightweight Directory Access Protocol 456226031Sstas (LDAP): Syntaxes and Matching Rules", RFC 4517, June 457226031Sstas 2006. 458226031Sstas 459226031Sstas [Unicode] The Unicode Consortium, "The Unicode Standard, Version 460226031Sstas 3.2.0" is defined by "The Unicode Standard, Version 461226031Sstas 3.0" (Reading, MA, Addison-Wesley, 2000. ISBN 0-201- 462226031Sstas 61633-5), as amended by the "Unicode Standard Annex 463226031Sstas #27: Unicode 3.1" 464226031Sstas (http://www.unicode.org/reports/tr27/) and by the 465226031Sstas "Unicode Standard Annex #28: Unicode 3.2" 466226031Sstas (http://www.unicode.org/reports/tr28/). 467226031Sstas 468226031Sstas [UAX15] Davis, M. and M. Duerst, "Unicode Standard Annex #15: 469226031Sstas Unicode Normalization Forms, Version 3.2.0". 470226031Sstas <http://www.unicode.org/unicode/reports/tr15/tr15- 471226031Sstas 22.html>, March 2002. 472226031Sstas 473226031Sstas [X.680] International Telecommunication Union - 474226031Sstas Telecommunication Standardization Sector, "Abstract 475226031Sstas Syntax Notation One (ASN.1) - Specification of Basic 476226031Sstas Notation", X.680(2002) (also ISO/IEC 8824-1:2002). 477226031Sstas 478226031Sstas5.2. Informative References 479226031Sstas 480226031Sstas [X.500] International Telecommunication Union - 481226031Sstas Telecommunication Standardization Sector, "The 482226031Sstas Directory -- Overview of concepts, models and 483226031Sstas services," X.500(1993) (also ISO/IEC 9594-1:1994). 484226031Sstas 485226031Sstas [X.501] International Telecommunication Union - 486226031Sstas Telecommunication Standardization Sector, "The 487226031Sstas Directory -- Models," X.501(1993) (also ISO/IEC 9594- 488226031Sstas 2:1994). 489226031Sstas 490226031Sstas [X.520] International Telecommunication Union - 491226031Sstas Telecommunication Standardization Sector, "The 492226031Sstas Directory: Selected Attribute Types", X.520(1993) (also 493226031Sstas ISO/IEC 9594-6:1994). 494226031Sstas 495226031Sstas [Glossary] The Unicode Consortium, "Unicode Glossary", 496226031Sstas <http://www.unicode.org/glossary/>. 497226031Sstas 498226031Sstas [CharModel] Whistler, K. and M. Davis, "Unicode Technical Report 499226031Sstas #17, Character Encoding Model", UTR17, 500226031Sstas <http://www.unicode.org/unicode/reports/tr17/>, August 501226031Sstas 2000. 502226031Sstas 503226031Sstas 504226031Sstas 505226031Sstas 506226031SstasZeilenga Standards Track [Page 9] 507226031Sstas 508226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 509226031Sstas 510226031Sstas 511226031Sstas [RFC3377] Hodges, J. and R. Morgan, "Lightweight Directory Access 512226031Sstas Protocol (v3): Technical Specification", RFC 3377, 513226031Sstas September 2002. 514226031Sstas 515226031Sstas [RFC4515] Smith, M., Ed. and T. Howes, "Lightweight Directory 516226031Sstas Access Protocol (LDAP): String Representation of Search 517226031Sstas Filters", RFC 4515, June 2006. 518226031Sstas 519226031Sstas [XMATCH] Zeilenga, K., "Internationalized String Matching Rules 520226031Sstas for X.500", Work in Progress. 521226031Sstas 522226031Sstas 523226031Sstas 524226031Sstas 525226031Sstas 526226031Sstas 527226031Sstas 528226031Sstas 529226031Sstas 530226031Sstas 531226031Sstas 532226031Sstas 533226031Sstas 534226031Sstas 535226031Sstas 536226031Sstas 537226031Sstas 538226031Sstas 539226031Sstas 540226031Sstas 541226031Sstas 542226031Sstas 543226031Sstas 544226031Sstas 545226031Sstas 546226031Sstas 547226031Sstas 548226031Sstas 549226031Sstas 550226031Sstas 551226031Sstas 552226031Sstas 553226031Sstas 554226031Sstas 555226031Sstas 556226031Sstas 557226031Sstas 558226031Sstas 559226031Sstas 560226031Sstas 561226031Sstas 562226031SstasZeilenga Standards Track [Page 10] 563226031Sstas 564226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 565226031Sstas 566226031Sstas 567226031SstasAppendix A. Combining Marks 568226031Sstas 569226031Sstas This appendix is normative. 570226031Sstas 571226031Sstas This table was derived from Unicode [Unicode] data files; it lists 572226031Sstas all code points with the Mn, Mc, or Me properties. This table is to 573226031Sstas be considered definitive for the purposes of implementation of this 574226031Sstas specification. 575226031Sstas 576226031Sstas 0300-034F 0360-036F 0483-0486 0488-0489 0591-05A1 577226031Sstas 05A3-05B9 05BB-05BC 05BF 05C1-05C2 05C4 064B-0655 0670 578226031Sstas 06D6-06DC 06DE-06E4 06E7-06E8 06EA-06ED 0711 0730-074A 579226031Sstas 07A6-07B0 0901-0903 093C 093E-094F 0951-0954 0962-0963 580226031Sstas 0981-0983 09BC 09BE-09C4 09C7-09C8 09CB-09CD 09D7 581226031Sstas 09E2-09E3 0A02 0A3C 0A3E-0A42 0A47-0A48 0A4B-0A4D 582226031Sstas 0A70-0A71 0A81-0A83 0ABC 0ABE-0AC5 0AC7-0AC9 0ACB-0ACD 583226031Sstas 0B01-0B03 0B3C 0B3E-0B43 0B47-0B48 0B4B-0B4D 0B56-0B57 584226031Sstas 0B82 0BBE-0BC2 0BC6-0BC8 0BCA-0BCD 0BD7 0C01-0C03 585226031Sstas 0C3E-0C44 0C46-0C48 0C4A-0C4D 0C55-0C56 0C82-0C83 586226031Sstas 0CBE-0CC4 0CC6-0CC8 0CCA-0CCD 0CD5-0CD6 0D02-0D03 587226031Sstas 0D3E-0D43 0D46-0D48 0D4A-0D4D 0D57 0D82-0D83 0DCA 588226031Sstas 0DCF-0DD4 0DD6 0DD8-0DDF 0DF2-0DF3 0E31 0E34-0E3A 589226031Sstas 0E47-0E4E 0EB1 0EB4-0EB9 0EBB-0EBC 0EC8-0ECD 0F18-0F19 590226031Sstas 0F35 0F37 0F39 0F3E-0F3F 0F71-0F84 0F86-0F87 0F90-0F97 591226031Sstas 0F99-0FBC 0FC6 102C-1032 1036-1039 1056-1059 1712-1714 592226031Sstas 1732-1734 1752-1753 1772-1773 17B4-17D3 180B-180D 18A9 593226031Sstas 20D0-20EA 302A-302F 3099-309A FB1E FE00-FE0F FE20-FE23 594226031Sstas 1D165-1D169 1D16D-1D172 1D17B-1D182 1D185-1D18B 595226031Sstas 1D1AA-1D1AD 596226031Sstas 597226031SstasAppendix B. Substrings Matching 598226031Sstas 599226031Sstas This appendix is non-normative. 600226031Sstas 601226031Sstas In the absence of substrings matching, the insignificant space 602226031Sstas handling for case ignore/exact matching could be simplified. 603226031Sstas Specifically, the handling could be to require that all sequences of 604226031Sstas one or more spaces be replaced with one space and, if the string 605226031Sstas contains non-space characters, removal of all leading spaces and 606226031Sstas trailing spaces. 607226031Sstas 608226031Sstas In the presence of substrings matching, this simplified space 609226031Sstas handling would lead to unexpected and undesirable matching behavior. 610226031Sstas For instance: 611226031Sstas 612226031Sstas 1) (CN=foo\20*\20bar) would match the CN value "foobar"; 613226031Sstas 614226031Sstas 615226031Sstas 616226031Sstas 617226031Sstas 618226031SstasZeilenga Standards Track [Page 11] 619226031Sstas 620226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 621226031Sstas 622226031Sstas 623226031Sstas 2) (CN=*\20foobar\20*) would match "foobar", but 624226031Sstas (CN=*\20*foobar*\20*) would not. 625226031Sstas 626226031Sstas Note to readers not familiar with LDAP substrings matching: the LDAP 627226031Sstas filter [RFC4515] assertion (CN=A*B*C) says to "match any value (of 628226031Sstas the attribute CN) that begins with A, contains B after A, ends with C 629226031Sstas where C is also after B." 630226031Sstas 631226031Sstas The first case illustrates that this simplified space handling would 632226031Sstas cause leading and trailing spaces in substrings of the string to be 633226031Sstas regarded as insignificant. However, only leading and trailing (as 634226031Sstas well as multiple consecutive spaces) of the string (as a whole) are 635226031Sstas insignificant. 636226031Sstas 637226031Sstas The second case illustrates that this simplified space handling would 638226031Sstas cause sub-partitioning failures. That is, if a prepared any 639226031Sstas substring matches a partition of the attribute value, then an 640226031Sstas assertion constructed by subdividing that substring into multiple 641226031Sstas substrings should also match. 642226031Sstas 643226031Sstas In designing an appropriate approach for space handling for 644226031Sstas substrings matching, one must study key aspects of X.500 case 645226031Sstas exact/ignore matching. X.520 [X.520] says: 646226031Sstas 647226031Sstas The [substrings] rule returns TRUE if there is a partitioning of 648226031Sstas the attribute value (into portions) such that: 649226031Sstas 650226031Sstas - the specified substrings (initial, any, final) match 651226031Sstas different portions of the value in the order of the strings 652226031Sstas sequence; 653226031Sstas 654226031Sstas - initial, if present, matches the first portion of the value; 655226031Sstas 656226031Sstas - final, if present, matches the last portion of the value; 657226031Sstas 658226031Sstas - any, if present, matches some arbitrary portion of the 659226031Sstas value. 660226031Sstas 661226031Sstas That is, the substrings assertion (CN=foo\20*\20bar) matches the 662226031Sstas attribute value "foo<SPACE><SPACE>bar" as the value can be 663226031Sstas partitioned into the portions "foo<SPACE>" and "<SPACE>bar" meeting 664226031Sstas the above requirements. 665226031Sstas 666226031Sstas 667226031Sstas 668226031Sstas 669226031Sstas 670226031Sstas 671226031Sstas 672226031Sstas 673226031Sstas 674226031SstasZeilenga Standards Track [Page 12] 675226031Sstas 676226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 677226031Sstas 678226031Sstas 679226031Sstas X.520 also says: 680226031Sstas 681226031Sstas [T]he following spaces are regarded as not significant: 682226031Sstas 683226031Sstas - leading spaces (i.e., those preceding the first character 684226031Sstas that is not a space); 685226031Sstas 686226031Sstas - trailing spaces (i.e., those following the last character 687226031Sstas that is not a space); 688226031Sstas 689226031Sstas - multiple consecutive spaces (these are taken as equivalent 690226031Sstas to a single space character). 691226031Sstas 692226031Sstas This statement applies to the assertion values and attribute values 693226031Sstas as whole strings, and not individually to substrings of an assertion 694226031Sstas value. In particular, the statements should be taken to mean that if 695226031Sstas an assertion value and attribute value match without any 696226031Sstas consideration to insignificant characters, then that assertion value 697226031Sstas should also match any attribute value that differs only by inclusion 698226031Sstas nor removal of insignificant characters. 699226031Sstas 700226031Sstas Hence the assertion (CN=foo\20*\20bar) matches 701226031Sstas "foo<SPACE><SPACE><SPACE>bar" and "foo<SPACE>bar" as these values 702226031Sstas only differ from "foo<SPACE><SPACE>bar" by the inclusion or removal 703226031Sstas of insignificant spaces. 704226031Sstas 705226031Sstas Astute readers of this text will also note that there are special 706226031Sstas cases where the specified space handling does not ignore spaces that 707226031Sstas could be considered insignificant. For instance, the assertion 708226031Sstas (CN=\20*\20*\20) does not match "<SPACE><SPACE><SPACE>" 709226031Sstas (insignificant spaces present in value) or " " (insignificant spaces 710226031Sstas not present in value). However, as these cases have no practical 711226031Sstas application that cannot be met by simple assertions, e.g., (cn=\20), 712226031Sstas and this minor anomaly can only be fully addressed by a preparation 713226031Sstas algorithm to be used in conjunction with character-by-character 714226031Sstas partitioning and matching, the anomaly is considered acceptable. 715226031Sstas 716226031SstasAuthor's Address 717226031Sstas 718226031Sstas Kurt D. Zeilenga 719226031Sstas OpenLDAP Foundation 720226031Sstas 721226031Sstas EMail: Kurt@OpenLDAP.org 722226031Sstas 723226031Sstas 724226031Sstas 725226031Sstas 726226031Sstas 727226031Sstas 728226031Sstas 729226031Sstas 730226031SstasZeilenga Standards Track [Page 13] 731226031Sstas 732226031SstasRFC 4518 LDAP: Internationalized String Preparation June 2006 733226031Sstas 734226031Sstas 735226031SstasFull Copyright Statement 736226031Sstas 737226031Sstas Copyright (C) The Internet Society (2006). 738226031Sstas 739226031Sstas This document is subject to the rights, licenses and restrictions 740226031Sstas contained in BCP 78, and except as set forth therein, the authors 741226031Sstas retain all their rights. 742226031Sstas 743226031Sstas This document and the information contained herein are provided on an 744226031Sstas "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 745226031Sstas OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 746226031Sstas ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 747226031Sstas INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 748226031Sstas INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 749226031Sstas WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 750226031Sstas 751226031SstasIntellectual Property 752226031Sstas 753226031Sstas The IETF takes no position regarding the validity or scope of any 754226031Sstas Intellectual Property Rights or other rights that might be claimed to 755226031Sstas pertain to the implementation or use of the technology described in 756226031Sstas this document or the extent to which any license under such rights 757226031Sstas might or might not be available; nor does it represent that it has 758226031Sstas made any independent effort to identify any such rights. Information 759226031Sstas on the procedures with respect to rights in RFC documents can be 760226031Sstas found in BCP 78 and BCP 79. 761226031Sstas 762226031Sstas Copies of IPR disclosures made to the IETF Secretariat and any 763226031Sstas assurances of licenses to be made available, or the result of an 764226031Sstas attempt made to obtain a general license or permission for the use of 765226031Sstas such proprietary rights by implementers or users of this 766226031Sstas specification can be obtained from the IETF on-line IPR repository at 767226031Sstas http://www.ietf.org/ipr. 768226031Sstas 769226031Sstas The IETF invites any interested party to bring to its attention any 770226031Sstas copyrights, patents or patent applications, or other proprietary 771226031Sstas rights that may cover technology that may be required to implement 772226031Sstas this standard. Please address the information to the IETF at 773226031Sstas ietf-ipr@ietf.org. 774226031Sstas 775226031SstasAcknowledgement 776226031Sstas 777226031Sstas Funding for the RFC Editor function is provided by the IETF 778226031Sstas Administrative Support Activity (IASA). 779226031Sstas 780226031Sstas 781226031Sstas 782226031Sstas 783226031Sstas 784226031Sstas 785226031Sstas 786226031SstasZeilenga Standards Track [Page 14] 787226031Sstas 788