1* Copyright (C) 2004-2012, International Business Machines 2* Corporation and others. All Rights Reserved. 3* 4* file name: changes.txt 5* encoding: US-ASCII 6* tab size: 8 (not used) 7* indentation:4 8* 9* created on: 2004may06 10* created by: Markus W. Scherer 11* 12* change log for Unicode updates 13 14---------------------------------------------------------------------------- *** 15 16Unicode 6.2 update 17 18http://www.unicode.org/review/pri230/ 19http://www.unicode.org/versions/beta-6.2.0.html 20http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0 21http://www.unicode.org/review/pri227/ Changes to Script Extensions Property Values 22http://www.unicode.org/review/pri228/ Changing some common characters from Punctuation to Symbol 23http://www.unicode.org/review/pri229/ Linebreaking Changes for Pictographic Symbols 24http://www.unicode.org/reports/tr46/tr46-8.html IDNA 25http://unicode.org/Public/idna/6.2.0/ 26 27*** ICU Trac 28 29- ticket 9515: Unicode 6.2: final ICU update 30 31- ticket 9514: UCA 6.2: fix UCARules.txt 32 33- ticket 9437: update ICU to Unicode 6.2 34- C++ branches/markus/uni62 at r32050 from trunk at r32041 35- Java branches/markus/uni62 at r32068 from trunk at r32066 36 37*** Unicode version numbers 38- makedata.mak 39- uchar.h 40 (configure.in & configure: have been modified to extract the version from uchar.h) 41- com.ibm.icu.util.VersionInfo 42- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ 43 44*** data files & enums & parser code 45 46* file preparation 47 48- download UCD, UCA & IDNA files 49- make sure that the Unicode data folder passed into preparseucd.py 50 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) 51- modify preparseucd.py: NamesList.txt is now in UTF-8 52- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src 53- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 54- Check test file diffs for previously commented-out, known-failing data lines; 55 probably need to keep those commented out. 56 57* PropertyValueAliases.txt changes 58- 1 new Line_Break (lb) value: 59 lb ; RI ; Regional_Indicator 60 -> uchar.h & UCharacter.LineBreak 61- 1 new Word_Break (WB) value: 62 WB ; RI ; Regional_Indicator 63 -> uchar.h & UCharacter.WordBreak 64- 1 new Grapheme_Cluster_Break (GCB) value: 65 GCB; RI ; Regional_Indicator 66 -> uchar.h & UCharacter.GraphemeClusterBreak 67 68* 3 new numeric values 69 The new value -1, which was really supposed to be NaN but that would have required 70 new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1, 71 but encodeNumericValue() in corepropsbuilder.cpp had to be fixed. 72 cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1 73 cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1 74 The two new values 216000 and 432000 require an addition to the encoding of numeric values. 75 cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000 76 cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000 77 -> uprops.h, uchar.c & UCharacterProperty.java 78 -> cucdtst.c & UCharacterTest.java 79 80* generate normalization data files 81- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib 82- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in 83- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata 84- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 85- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 86- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 87- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 88 89* build ICU (make install) 90 so that the tools build can pick up the new definitions from the installed header files. 91* build Unicode tools using CMake+make 92 93* generate core properties data files 94- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src 95- in initial bootstrapping, change the UCA version 96 in source/data/unidata/FractionalUCA.txt to match the new Unicode version 97- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src 98- rebuild ICU (make install) & tools 99 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR, 100 check if the UCA version in FractionalUCA.txt matches the new Unicode version 101 (see step above) 102- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm 103- rebuild ICU (make install) & tools 104 105* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 106 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 107- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 108- Unicode 6.0..6.2: U+2260, U+226E, U+226F 109- nothing new in 6.2, no test file to update 110 111* update Java data files 112- refresh just the UCD-related files, just to be safe 113- see (ICU4C)/source/data/icu4j-readme.txt 114- mkdir /tmp/icu4j 115- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 116 output: 117 ... 118 Unicode .icu files built to ./out/build/icudt50l 119 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b 120 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b 121 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 122 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b 123 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b" 124 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/ 125 mkdir -p /tmp/icu4j/main/shared/data 126 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 127 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/ 128 mkdir -p /tmp/icu4j/main/shared/data 129 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 130 make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data' 131- copy the big-endian Unicode data files to another location, 132 separate from the other data files 133 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 134 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr 135 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b 136 ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu 137 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b 138 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 139 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr 140- refresh ICU4J 141 ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b 142 143* refresh Java test .txt files 144- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 145 146* UCA 147 148- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/ 149- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that 150- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 151- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 152 (note removing the underscore before "Rules") 153- update (ICU4C)/source/test/testdata/CollationTest_*.txt 154 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 155 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 156- check test file diffs for previously commented-out, known-failing data lines; 157 probably need to keep those commented out 158- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani 159- run genuca, see command line above 160- rebuild ICU4C 161- refresh ICU4J collation data: 162 (subset of instructions above for properties data refresh, except copies all coll/*) 163 ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 164 ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 165 ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 166 ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b 167- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 168- note on intltest: if collate/UCAConformanceTest fails, then 169 utility/MultithreadTest/TestCollators will fail as well; 170 fix the conformance test before looking into the multi-thread test 171 172* test ICU, fix test code where necessary 173 174* When refreshing all of ICU4J data from ICU4C 175- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 176- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 177or 178- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 179 180*** LayoutEngine script information 181- skipped for Unicode 6.2: no new scripts 182 183*** merge the Unicode update branches back onto the trunk 184- do not merge the icudata.jar and testdata.jar, 185 instead rebuild them from merged & tested ICU4C 186 187---------------------------------------------------------------------------- *** 188 189Future Unicode update 190 191Tools simplified since the Unicode 6.1 update. See 192- http://site.icu-project.org/design/props/ppucd 193- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972 194 195* Unicode version numbers 196- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates 197 198* file preparation 199- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py: 200- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src 201- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 202- Check test file diffs for previously commented-out, known-failing data lines; 203 probably need to keep those commented out. 204 205* PropertyValueAliases.txt changes 206- Script codes that are in ISO 15924 but not in Unicode are now listed in 207 preparseucd.py, in the _scripts_only_in_iso15924 variable. 208 If there are new ISO codes, then add them. 209 If Unicode adds some of them, then remove them from the .py variable. 210 211* UnicodeData.txt changes 212- No more manual changes for CJK ranges for algorithmic names; 213 those are now written to ppucd.txt and genprops reads them from there. 214 215* generate core properties data files (makeprops.sh was deleted) 216- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src 217 218* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt 219- it is now generated by preparseucd.py 220 221* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt 222- it is now generated by preparseucd.py 223- make sure that the Unicode data folder passed into preparseucd.py 224 includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt 225 (can be in some subfolder) 226 227* generate normalization data files 228- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib 229- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in 230- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata 231- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 232- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 233- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 234- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 235 236* build ICU (make install) 237* build Unicode tools using CMake+make 238 239* new way to call genuca (makeuca.sh was deleted) 240- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src 241 242---------------------------------------------------------------------------- *** 243 244Unicode 6.1 update 245 246*** ICU Trac 247 248- ticket 8995 final update to Unicode 6.1 249- ticket 8994 regenerate source/layout/CanonData.cpp 250 251- ticket 8961 support Unicode "Age" value *names* 252- ticket 8963 support multiple character name aliases & types 253 254- ticket 8827 "update ICU to Unicode 6.1" 255- C++ branches/markus/uni61 at r30864 from trunk at r30843 256- Java branches/markus/uni61 at r30865 from trunk at r30863 257 258*** Unicode version numbers 259- makedata.mak 260- uchar.h 261 (configure.in & configure: have been modified to extract the version from uchar.h) 262- com.ibm.icu.util.VersionInfo 263- icutools/unicode/makedefs.sh 264 + also review & update other definitions in that file, 265 e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l 266 267*** data files & enums & parser code 268 269* file preparation 270 271~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed 272- This prepares both unidata and testdata files in respective output subfolders. 273- Check test file diffs for previously commented-out, known-failing data lines; 274 probably need to keep those commented out. 275 276* PropertyValueAliases.txt changes 277- 11 new block names: 278 Arabic_Extended_A 279 Arabic_Mathematical_Alphabetic_Symbols 280 Chakma 281 Meetei_Mayek_Extensions 282 Meroitic_Cursive 283 Meroitic_Hieroglyphs 284 Miao 285 Sharada 286 Sora_Sompeng 287 Sundanese_Supplement 288 Takri 289 -> add to uchar.h 290 -> add to UCharacter.UnicodeBlock IDs 291 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+) 292 replace public static final int \1_ID = \2; \3 293 -> add to UCharacter.UnicodeBlock objects 294 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 295 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 296- 1 new Joining_Group (jg) value: 297 Rohingya_Yeh 298 -> uchar.h & UCharacter.JoiningGroup 299- 2 new Line_Break (lb) values: 300 CJ=Conditional_Japanese_Starter 301 HL=Hebrew_Letter 302 -> uchar.h & UCharacter.LineBreak 303- 7 new scripts: 304 sc ; Cakm ; Chakma 305 sc ; Merc ; Meroitic_Cursive 306 sc ; Mero ; Meroitic_Hieroglyphs 307 sc ; Plrd ; Miao 308 sc ; Shrd ; Sharada 309 sc ; Sora ; Sora_Sompeng 310 sc ; Takr ; Takri 311 -> remove these from SyntheticPropertyValueAliases.txt 312 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 313 and in com.ibm.icu.dev.test.lang.TestUScript.java 314- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 315 (added 2011-06-21) 316 Khoj 322 Khojki 317 Tirh 326 Tirhuta 318 and another one added 2011-12-09 319 Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs) 320 -> uscript.h 321 -> com.ibm.icu.lang.UScript 322 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 323 replace public static final int \1 = \2;\3 324 -> SyntheticPropertyValueAliases.txt 325 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 326 and in com.ibm.icu.dev.test.lang.TestUScript.java 327 328* UnicodeData.txt changes 329- the last Unihan code point changes from U+9FCB to U+9FCC 330 search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive) 331 + do change gennames.c 332 + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java 333 334* DerivedBidiClass.txt changes 335- 2 new default-AL blocks: 336# Arabic Extended-A: U+08A0 - U+08FF (was default-R) 337# Arabic Mathematical Alphabetic Symbols: 338# U+1EE00 - U+1EEFF (was default-R) 339- 2 new default-R blocks: 340# Meroitic Hieroglyphs: 341# U+10980 - U+1099F 342# Meroitic Cursive: U+109A0 - U+109FF 343 -> should be picked up by the explicit data in the file 344 345* NameAliases.txt changes 346- from 347 # Each line has two fields 348 # First field: Code point 349 # Second field: Alias 350- to 351 # Each line has three fields, as described here: 352 # 353 # First field: Code point 354 # Second field: Alias 355 # Third field: Type 356- Also, the file previously allowed multiple aliases but only now does it 357 actually provide multiple, even multiple of the same type. For example, 358 FEFF;BYTE ORDER MARK;alternate 359 FEFF;BOM;abbreviation 360 FEFF;ZWNBSP;abbreviation 361- This breaks our gennames parser, unames.icu data structure, and API. 362 Fix gennames to only pick up "correction" aliases. 363 New ticket #8963 for further changes. 364 365* run genpname/preparse.pl (on Linux) 366 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 367 + make sure that data.h is writable 368 + perl preparse.pl ~/svn.icu/trunk/src > out.txt 369 + preparse.pl shows no errors, out.txt Info and Warning lines look ok 370 371* build ICU (make install) 372 so that the tools build can pick up the new definitions from the installed header files. 373* build Unicode tools (at least genpname) using CMake+make 374 375* run genpname 376 (builds both pnames.icu and propname_data.h) 377- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 378- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource 379 380* build ICU (make install) 381* build Unicode tools using CMake+make 382 383* update source/data/unidata/norm2/nfkc_cf.txt 384- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt 385 386* update source/data/unidata/norm2/uts46.txt 387- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt 388 to ~/svn.icu/tools/trunk/src/unicode/py 389- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008". 390- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py 391- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2 392 393* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 394 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 395- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 396- Unicode 6.0..6.1: U+2260, U+226E, U+226F 397- nothing new in 6.1, no test file to update 398 399* generate core properties data files 400- in initial bootstrapping, change the UCA version 401 in source/data/unidata/FractionalUCA.txt to match the new Unicode version 402- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 403- rebuild ICU & tools 404 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR, 405 check if the UCA version in FractionalUCA.txt matches the new Unicode version 406 (see step above) 407- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm: 408 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 409- rebuild ICU & tools 410 411* update Java data files 412- refresh just the UCD-related files, just to be safe 413- see (ICU4C)/source/data/icu4j-readme.txt 414- mkdir /tmp/icu4j 415- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 416 output: 417 ... 418 Unicode .icu files built to ./out/build/icudt49l 419 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b 420 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b 421 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 422 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b 423 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b" 424 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/ 425 mkdir -p /tmp/icu4j/main/shared/data 426 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 427 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/ 428 mkdir -p /tmp/icu4j/main/shared/data 429 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 430 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data' 431- copy the big-endian Unicode data files to another location, 432 separate from the other data files 433 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 434 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr 435 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b 436 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu 437 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b 438 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 439 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr 440- refresh ICU4J 441 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b 442 443* refresh Java test .txt files 444- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 445 446* test ICU so far, fix test code where necessary 447- temporarily ignore collation issues that look like UCA/UCD mismatches, 448 until UCA data is updated 449 450* UCA 451 452- get output from Mark's tools; look in 453 http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt 454- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 455- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 456 (note removing the underscore before "Rules") 457- update (ICU)/source/test/testdata/CollationTest_*.txt 458 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 459 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 460- check test file diffs for previously commented-out, known-failing data lines; 461 probably need to keep those commented out 462- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani 463- run makeuca.sh: 464 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 465- rebuild ICU4C 466- refresh ICU4J collation data: 467 (subset of instructions above for properties data refresh, except copies all coll/*) 468 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 469 ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 470 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 471 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b 472- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 473- note on intltest: if collate/UCAConformanceTest fails, then 474 utility/MultithreadTest/TestCollators will fail as well; 475 fix the conformance test before looking into the multi-thread test 476 477* When refreshing all of ICU4J data from ICU4C 478- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 479- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 480or 481- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 482 483*** LayoutEngine script information 484 485(For details see the Unicode 5.2 change log below.) 486 487* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. 488 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp 489 in the working directory. 490 (It also generates ScriptRunData.cpp, which is no longer needed.) 491 492 The generated files have a current copyright date and "@draft" statement. 493 494- diff current <icu>/source/layout files vs. generated ones 495 ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout 496 review and manually merge desired changes; 497 fix gratuitous changes, incorrect @draft and missing aliases; 498 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. 499- if you just copy the above files, then 500 fix mixed line endings, review the diffs as above and restore changes to API tags etc.; 501 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 502 503*** merge the Unicode update branches back onto the trunk 504- do not merge the icudata.jar and testdata.jar, 505 instead rebuild them from merged & tested ICU4C 506 507---------------------------------------------------------------------------- *** 508 509ICU 4.8 (no Unicode update, just new script codes) 510 511* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 512 (added 2010-12-21) 513 Afak 439 Afaka 514 Jurc 510 Jurchen 515 Mroo 199 Mro, Mru 516 Nshu 499 Nüshu 517 Shrd 319 Sharada, Śāradā 518 Sora 398 Sora Sompeng 519 Takr 321 Takri, Ṭākrī, Ṭāṅkrī 520 Tang 520 Tangut 521 Wole 480 Woleai 522 -> uscript.h 523 -> com.ibm.icu.lang.UScript 524 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 525 replace public static final int \1 = \2;\3 526 -> genpname/SyntheticPropertyValueAliases.txt 527 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 528 and in com.ibm.icu.dev.test.lang.TestUScript.java 529 530* run genpname/preparse.pl (on Linux) 531 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 532 + make sure that data.h is writable 533 + perl preparse.pl ~/svn.icu/trunk/src > out.txt 534 + preparse.pl shows no errors, out.txt Info and Warning lines look ok 535 536* rebuild Unicode tools (at least genpname) using make 537- You might first need to "make install" ICU so that the tools build can pick 538 up the new definitions from the installed header files. 539 540* run genpname 541 (builds both pnames.icu and propname_data.h) 542- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 543- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource 544- rebuild ICU & tools 545 546* run genprops 547- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0 548- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0 549- rebuild ICU & tools 550 551* update Java data files 552- refresh just the UCD-related files, just to be safe 553- see (ICU4C)/source/data/icu4j-readme.txt 554- mkdir /tmp/icu4j 555- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 556- copy the big-endian Unicode data files to another location, 557 separate from the other data files 558 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 559 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 560 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 561- refresh ICU4J 562 ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b 563 564* should have updated the layout engine script codes but forgot 565 566---------------------------------------------------------------------------- *** 567 568Unicode 6.0 update 569 570*** related ICU Trac tickets 571 5727264 Unicode 6.0 Update 573 574*** Unicode version numbers 575- makedata.mak 576- uchar.h 577 (configure.in & configure: have been modified to extract the version from uchar.h) 578- com.ibm.icu.util.VersionInfo 579 580*** data files & enums & parser code 581 582* file preparation 583 584~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed 585- This now prepares both unidata and testdata files in respective output subfolders. 586 587* PropertyAliases.txt changes 588- new Script_Extensions property defined in the new ScriptExtensions.txt file 589 but not listed in PropertyAliases.txt; reported to unicode.org; 590 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt 591 scx; Script_Extensions 592 -> uchar.h with new UProperty section 593 -> com.ibm.icu.lang.UProperty, parallel with uchar.h 594 595* PropertyValueAliases.txt changes 596- 12 new block names: 597 Alchemical_Symbols 598 Bamum_Supplement 599 Batak 600 Brahmi 601 CJK_Unified_Ideographs_Extension_D 602 Emoticons 603 Ethiopic_Extended_A 604 Kana_Supplement 605 Mandaic 606 Miscellaneous_Symbols_And_Pictographs 607 Playing_Cards 608 Transport_And_Map_Symbols 609 -> add to uchar.h 610 -> add to UCharacter.UnicodeBlock 611 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 612 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 613- Joining_Group (jg) values: 614 Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias 615 -> uchar.h & UCharacter.JoiningGroup 616- 3 new scripts: 617 sc ; Batk ; Batak 618 sc ; Brah ; Brahmi 619 sc ; Mand ; Mandaic 620 -> remove these from SyntheticPropertyValueAliases.txt 621 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN 622 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 623 and in com.ibm.icu.dev.test.lang.TestUScript.java 624- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 625 (added 2009-11-11..2010-07-18) 626 Bass 259 Bassa Vah 627 Dupl 755 Duployan shortand 628 Elba 226 Elbasan 629 Gran 343 Grantha 630 Kpel 436 Kpelle 631 Loma 437 Loma 632 Mend 438 Mende 633 Merc 101 Meroitic Cursive 634 Narb 106 Old North Arabian 635 Nbat 159 Nabataean 636 Palm 126 Palmyrene 637 Sind 318 Sindhi 638 Wara 262 Warang Citi 639 -> uscript.h 640 -> com.ibm.icu.lang.UScript 641 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 642 replace public static final int \1 = \2;\3 643 -> SyntheticPropertyValueAliases.txt 644 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 645 and in com.ibm.icu.dev.test.lang.TestUScript.java 646- ISO 15924 name change 647 Mero 100 Meroitic Hieroglyphs (was Meroitic) 648 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC 649- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt 650 651* UnicodeData.txt changes 652- new CJK block: 653 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;; 654 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;; 655 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion 656 657* build Unicode tools using CMake+make 658 659* run genpname/preparse.pl (on Linux) 660 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 661 + make sure that data.h is writable 662 + perl preparse.pl ~/svn.icu/trunk/src > out.txt 663 + preparse.pl shows no errors, out.txt Info and Warning lines look ok 664 665* rebuild Unicode tools (at least genpname) using make 666- You might first need to "make install" ICU so that the tools build can pick 667 up the new definitions from the installed header files. 668 669* run genpname 670- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 671- rebuild ICU & tools 672 673* update source/data/unidata/norm2/nfkc_cf.txt 674- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt 675 676* update source/data/unidata/norm2/uts46.txt 677- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt 678 to ~/svn.icu/tools/trunk/src/unicode/py 679- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values 680- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py 681- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2 682 683* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 684 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 685- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 686- Unicode 6.0: U+2260, U+226E, U+226F 687 688* generate core properties data files 689- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 690- rebuild ICU & tools 691- run makeuca.sh so that genuca picks up the new nfc.nrm: 692 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 693- rebuild ICU & tools 694 695* implement new Script_Extensions property (provisional) 696- parser & generator: genprops & uprops.icu 697- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp 698- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java 699 700* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2 701- (one-time change) 702- genbidi/gencase/genprops tools changes 703- re-run makeprops.sh (see above) 704- UCharacterProperty.java, UCharacterTypeIterator.java, 705 UBiDiProps.java, UCaseProps.java, and several others with minor changes; 706 UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java 707 708* update Java data files 709- refresh just the UCD-related files, just to be safe 710- see (ICU4C)/source/data/icu4j-readme.txt 711- mkdir /tmp/icu4j 712- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 713 output: 714 ... 715 Unicode .icu files built to ./out/build/icudt45l 716 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b 717 echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 718 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b 719 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b 720 mkdir -p /tmp/icu4j/main/shared/data 721 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 722- copy the big-endian Unicode data files to another location, 723 separate from the other data files 724 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 725 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr 726 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b 727 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu 728 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b 729 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 730 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr 731- refresh ICU4J 732 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b 733 734* refresh Java test .txt files 735- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 736 737* un-hardcode normalization skippable (NF*_Inert) test data 738- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools 739 740* copy updated break iterator test files 741- now handled by early ucdcopy.py and 742 copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata 743 (old instructions: 744 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt 745 to ~/svn.icu/trunk/src/source/test/testdata) 746- they are not used in ICU4J 747 748* UCA 749 750- get output from Mark's tools; look in 751 http://www.unicode.org/~book/incoming/mark/uca6.0.0/ 752 http://www.macchiato.com/unicode/utc/additional-uca-files 753 http://www.unicode.org/Public/UCA/6.0.0/ 754 http://www.unicode.org/~mdavis/uca/ 755- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 756- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 757- update Han-implicit ranges for new CJK extensions: 758 swapCJK() in ucol.cpp & ImplicitCEGenerator.java 759- genuca: allow bytes 02 for U+FFFE, new merge-sort character; 760 do not add it into invuca so that tailoring primary-after an ignorable works 761- genuca: permit space between [variable top] bytes 762- ucol.cpp: treat noncharacters like unassigned rather than ignorable 763- run makeuca.sh: 764 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 765- rebuild ICU4C 766- refresh ICU4J collation data: 767 (subset of instructions above for properties data refresh, except copies all coll/*) 768 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 769 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 770 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 771 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b 772- update (ICU)/source/test/testdata/CollationTest_*.txt 773 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 774 with output from Mark's Unicode tools 775- run all tests with the *_SHORT.txt or the full files (the full ones have comments) 776- note on intltest: if collate/UCAConformanceTest fails, then 777 utility/MultithreadTest/TestCollators will fail as well; 778 fix the conformance test before looking into the multi-thread test 779 780* When refreshing all of ICU4J data from ICU4C 781- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 782- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 783or 784- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 785 786*** LayoutEngine script information 787 788(For details see the Unicode 5.2 change log below.) 789 790* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, 791ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates 792ScriptRunData.cpp, which is no longer needed.) 793 794The generated files have a current copyright date and "@draft" statement. 795 796* copy the above files into <icu>/source/layout, replacing the old files. 797* fix mixed line endings 798* review the diffs and fix incorrect @draft and missing aliases; 799 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. 800* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 801 802---------------------------------------------------------------------------- *** 803 804Unicode 5.2 update 805 806*** related ICU Trac tickets 807 8087084 Unicode 5.2 809 8107167 verify collation bytes 8117235 Java test NAME_ALIAS 8127236 Java DerivedCoreProperties.txt test 8137237 Java BidiTest.txt 8147238 UTrie2 in core unidata 8157239 test for tailoring gaps 8167240 Java fix CollationMiscTest 8177243 update layout engine for Unicode 5.2 818 819*** Unicode version numbers 820- makedata.mak 821- uchar.h 822- configure.in & configure 823- update ucdVersion in gennames.c if an algorithmic range changes 824 825*** data files & enums & parser code 826 827* file preparation 828 829python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata 830- includes finding files regardless of version numbers, 831 copying them, and performing the equivalent processing of the 832 ucdstrip and ucdmerge tools on the desired set of files 833 834* notes on changes 835- PropertyAliases.txt 836 moved from numeric to enumerated: 837 ccc ; Canonical_Combining_Class 838 new string properties: 839 NFKC_CF ; NFKC_Casefold 840 Name_Alias; Name_Alias 841 new binary properties: 842 Cased ; Cased 843 CI ; Case_Ignorable 844 CWCF ; Changes_When_Casefolded 845 CWCM ; Changes_When_Casemapped 846 CWKCF ; Changes_When_NFKC_Casefolded 847 CWL ; Changes_When_Lowercased 848 CWT ; Changes_When_Titlecased 849 CWU ; Changes_When_Uppercased 850 new CJK Unihan properties (not supported by ICU) 851- PropertyValueAliases.txt 852 new block names 853 new scripts 854 one script code change: 855 sc ; Qaai ; Inherited 856 -> 857 sc ; Zinh ; Inherited ; Qaai 858 new Line_Break (lb) value: 859 lb ; CP ; Close_Parenthesis 860 new Joining_Group (jg) values: Farsi_Yeh, Nya 861 other new values: 862 ccc; 214; ATA ; Attached_Above 863- DerivedBidiClass.txt 864 new default-R range: U+1E800 - U+1EFFF 865- UnicodeData.txt 866 all of the ISO comments are gone 867 new CJK block end: 868 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last> 869 new CJK block: 870 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;; 871 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;; 872 873* genpname 874- run preparse.pl 875 + cd \svn\icuproj\icu\trunk\source\tools\genpname 876 + make sure that data.h is writable 877 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt 878 + preparse.pl complains with errors like the following: 879 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34. 880 This is because ICU 4.0 had scripts from ISO 15924 which are now 881 added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt 882 and PropertyValueAliases.txt. 883 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: 884 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt 885 + preparse.pl complains with errors about block names missing from uchar.h; add them 886 887* uchar.h & uscript.h & uprops.h & uprops.c & genprops 888- new block & script values 889 + 26 new blocks 890 copy new blocks from Blocks.txt 891 MS VC++ 2008 regular expression: 892 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$" 893 replace with " UBLOCK_\3 = 172, /*[\1]*/" 894 + several new script values already added in ICU 4.0 for ISO 15924 coverage 895 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above) 896 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage 897 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2) 898 (added to SyntheticPropertyValueAliases.txt) 899- new Joining Group (JG) values: Farsi_Yeh, Nya 900- new Line_Break (lb) value: 901 lb ; CP ; Close_Parenthesis 902 903* hardcoded Unihan range end/limit 904- Unihan range end moves from 9FC3 to 9FCB 905 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive) 906 + do change gennames.c 907 908* Compare definitions of new binary properties with what we used to use 909 in algorithms, to see if the definitions changed. 910- Verified that definitions for Cased and Case_Ignorable are unchanged. 911 The gencase tool now parses the newly public Case_Ignorable values 912 in case the definition changes in the future. 913 914* uchar.c & uprops.h & uprops.c & genprops 915- new numeric values that didn't exist in Unicode data before: 916 1/7, 1/9, 1/10, 3/10, 1/16, 3/16 917 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5, 918 therefore redesign the encoding of numeric types and values for formatVersion 6; 919 design for simple numbers up to at least 144 ("one gross"), 920 large values up to at least 10^20, 921 and fractions with numerators -1..17 and denominators 1..16 922 to cover current and expected future values 923 (e.g., more Han numeric values, Meroitic twelfths) 924 925* reimplement Hangul_Syllable_Type for new Jamo characters 926- the old code assumed that all Jamo characters are in the 11xx block 927- Unicode 5.2 fills holes there and adds new Jamo characters in 928 A960..A97F; Hangul Jamo Extended-A 929 and in 930 D7B0..D7FF; Hangul Jamo Extended-B 931- Hangul_Syllable_Type can be trivially derived from a subset of 932 Grapheme_Cluster_Break values 933 934* build Unicode data source code for hardcoding core data 935C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data 936 937ICU data make path is \svn\icuproj\icu\trunk\source\data\ 938ICU root path is \svn\icuproj\icu\trunk 939Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 940Information: cannot find "brklocal.mk". Not building user-additional break iterator files. 941Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. 942Information: cannot find "collocal.mk". Not building user-additional resource bundle files. 943Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. 944Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. 945Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. 946Information: cannot find "spreplocal.mk". Not building user-additional stringprep files. 947Creating data file for Unicode Property Names 948Creating data file for Unicode Character Properties 949Creating data file for Unicode Case Mapping Properties 950Creating data file for Unicode BiDi/Shaping Properties 951Creating data file for Unicode Normalization 952Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l" 953Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp" 954 955- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common 956 and rebuild the common library 957 958*** UCA 959 960- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools) 961- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools 962- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools 963[ Begin obsolete instructions: 964 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files. 965 - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py 966 on Windows: 967 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt 968 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt 969 End obsolete instructions] 970- run all tests with the *_SHORT.txt or the full files (the full ones have comments) 971 not just the *_STUB.txt files 972- note on intltest: if collate/UCAConformanceTest fails, then 973 utility/MultithreadTest/TestCollators will fail as well; 974 fix the conformance test before looking into the multi-thread test 975 976*** Implement Cased & Case_Ignorable properties 977- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable() 978- Problem: These properties should be disjoint, but aren't 979- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not 980- change ucase.icu to be able to store any combination of Cased and Case_Ignorable 981 982*** Implement Changes_When_Xyz properties 983- without stored data 984 985*** Implement Name_Alias property 986- add it as another name field in unames.icu 987- make it available via u_charName() and UCharNameChoice and 988- consider it in u_charFromName() 989 990*** Break iterators 991 992* Update break iterator rules to new UAX versions and new property values 993* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary 994 995*** new BidiTest file 996- review format and data 997- copy BidiTest.txt to source/test/testdata 998- write test code using this data 999- fix ICU code where it fails the conformance test 1000 1001*** Java 1002- generally, find and update code corresponding to C/C++ 1003- UCharacter.UnicodeBlock constants: 1004 a) add an _ID integer per new block, update COUNT 1005 b) add a class instance per new block 1006 Visual Studio regex: 1007 find UBLOCK_{[^ ]+} = [0-9]+, {/.+} 1008 replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 1009- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias() 1010 1011- port test changes to Java 1012 1013*** LayoutEngine script information 1014 1015(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833) 1016 1017* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, 1018ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates 1019ScriptRunData.cpp, which is no longer needed.) 1020 1021The generated files have a current copyright date and "@draft" statement. 1022 1023-> Eric Mader wrote in email on 20090930: 1024 "I think the tool has been modified to update @draft to @stable for 1025 older scripts and to add @draft for new scripts. 1026 (I worked with an intern on this last year.) 1027 You should check the output after you run it." 1028 1029* copy the above files into <icu>/source/layout, replacing the old files. 1030* fix mixed line endings 1031* review the diffs and fix incorrect @draft and missing aliases 1032* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 1033 1034Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 1035and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 1036 1037-> Eric Mader wrote in email on 20090930: 1038 "This is just a matter of making sure that all the per-script tables have 1039 entries for any new scripts that were added. 1040 If any new Indic characters were added, then the class tables in 1041 IndicClassTables.cpp should be updated to reflect this. 1042 John Emmons should know how to do this if it's required." 1043 1044* rebuild the layout and layoutex libraries. 1045 1046*** Documentation 1047- Update User Guide 1048 + Jamo_Short_Name, sfc->scf, binary property value aliases 1049 1050---------------------------------------------------------------------------- *** 1051 1052Unicode 5.1 update 1053 1054*** related ICU Trac tickets 1055 10565696 Update to Unicode 5.1 1057 1058*** Unicode version numbers 1059- makedata.mak 1060- uchar.h 1061- configure.in & configure 1062- update ucdVersion in gennames.c if an algorithmic range changes 1063 1064*** data files & enums & parser code 1065 1066* file preparation 1067- ucdstrip: 1068 DerivedCoreProperties.txt 1069 DerivedNormalizationProps.txt 1070 NormalizationTest.txt 1071 PropList.txt 1072 Scripts.txt 1073 GraphemeBreakProperty.txt 1074 SentenceBreakProperty.txt 1075 WordBreakProperty.txt 1076- ucdstrip and ucdmerge: 1077 EastAsianWidth.txt 1078 LineBreak.txt 1079 1080* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) 1081copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\ 1082copy 5.1.0\ucd\Blocks.txt ..\unidata\ 1083copy 5.1.0\ucd\CaseFolding.txt ..\unidata\ 1084copy 5.1.0\ucd\DerivedAge.txt ..\unidata\ 1085copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ 1086copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ 1087copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ 1088copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ 1089copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\ 1090copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\ 1091copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\ 1092copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\ 1093copy 5.1.0\ucd\UnicodeData.txt ..\unidata\ 1094 1095ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt 1096ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt 1097ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt 1098ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt 1099ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt 1100ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt 1101ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt 1102ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt 1103ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt 1104ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt 1105 1106* genpname 1107- run preparse.pl 1108 + cd \svn\icuproj\icu\uni51\source\tools\genpname 1109 + make sure that data.h is writable 1110 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt 1111 + preparse.pl complains with errors like the following: 1112 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30. 1113 This is because ICU 3.8 had scripts from ISO 15924 which are now 1114 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt 1115 and PropertyValueAliases.txt. 1116 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: 1117 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii 1118 + PropertyValueAliases.txt now explicitly contains values for boolean properties: 1119 N/Y, No/Yes, F/T, False/True 1120 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases. 1121 It will use further values from the file if present. 1122 1123* uchar.h & uscript.h & uprops.h & uprops.c & genprops 1124- new block & script values 1125 + 17 new blocks 1126 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage 1127 (removed from SyntheticPropertyValueAliases.txt) 1128 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1) 1129 (added to SyntheticPropertyValueAliases.txt) 1130- uprops.icu (uprops.h) only provides 7 bits for script codes. 1131 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now. 1132 There is none above 127 yet which is the script code for an 1133 assigned Unicode character, so ICU 4.0 uprops.icu does not store any 1134 script code values greater than 127. 1135 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129 1136 in a parallel bit field, and that overflows now. 1137 Also, future values >=128 would be incompatible anyway. 1138 uprops.h is modified to move around several of the bit fields 1139 in the properties vector words, and now uses 8 bits for the script code. 1140 Two other bit fields also grow to accommodate future growth: 1141 Block (current count: 172) grows from 8 to 9 bits, 1142 and Word_Break grows from 4 to 5 bits. 1143- renamed property Simple_Case_Folding (sfc->scf) 1144 + nothing to be done: handled as normal alias 1145- new property JSN Jamo_Short_Name 1146 + no new API: only contributes to the Name property 1147- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark 1148- new Joining Group (JG) value: Burushashki_Yeh_Barree 1149- new Sentence_Break (SB) values: 1150 SB ; CR ; CR 1151 SB ; EX ; Extend 1152 SB ; LF ; LF 1153 SB ; SC ; SContinue 1154- new Word_Break (WB) values: 1155 WB ; CR ; CR 1156 WB ; Extend ; Extend 1157 WB ; LF ; LF 1158 WB ; MB ; MidNumLet 1159 1160* Further changes in the 2008-02-29 update: 1161- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP 1162 because they should not normally be invisible. 1163- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed) 1164- new Grapheme_Cluster_Break (GCB) value: PP=Prepend 1165- new Word_Break (WB) value: NL=Newline 1166 1167* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison) 1168- Unihan range end moves from 9FBB to 9FC3 1169 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive) 1170 + do change gennames.c 1171 1172* build Unicode data source code for hardcoding core data 1173C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data 1174 1175ICU data make path is \svn\icuproj\icu\uni51\source\data\ 1176ICU root path is \svn\icuproj\icu\uni51 1177Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 1178Information: cannot find "brklocal.mk". Not building user-additional break iterator files. 1179Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. 1180Information: cannot find "collocal.mk". Not building user-additional resource bundle files. 1181Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. 1182Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. 1183Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. 1184Creating data file for Unicode Character Properties 1185Creating data file for Unicode Case Mapping Properties 1186Creating data file for Unicode BiDi/Shaping Properties 1187Creating data file for Unicode Normalization 1188Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l" 1189Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp" 1190 1191- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common 1192 and rebuild the common library 1193 1194*** Break iterators 1195 1196* Update break iterator rules to new UAX versions and new property values 1197 1198*** UCA 1199 1200* update FractionalUCA.txt and UCARules.txt with new canonical closure 1201 1202*** Test suites 1203- Test that APIs using Unicode property value aliases (like UnicodeSet) 1204 support all of the boolean values N/Y, No/Yes, F/T, False/True 1205 -> TestBinaryValues() tests in both cintltst and intltest 1206 1207*** LayoutEngine script information 1208* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, 1209ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates 1210ScriptRunData.cpp, which is no longer needed.) 1211 1212The generated files have a current copyright date and "@draft" statement. 1213 1214* copy the above files into <icu>/source/layout, replacing the old files. 1215 1216Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 1217and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 1218 1219* rebuild the layout and layoutex libraries. 1220 1221*** Documentation 1222- Update User Guide 1223 + Jamo_Short_Name, sfc->scf, binary property value aliases 1224 1225---------------------------------------------------------------------------- *** 1226 1227Unicode 5.0 update 1228 1229*** related Jitterbugs 1230 12315084 RFE: Update to Unicode 5.0 1232 1233*** data files & enums & parser code 1234 1235* file preparation 1236- ucdstrip: 1237 DerivedCoreProperties.txt 1238 DerivedNormalizationProps.txt 1239 NormalizationTest.txt 1240 PropList.txt 1241 Scripts.txt 1242 GraphemeBreakProperty.txt 1243 SentenceBreakProperty.txt 1244 WordBreakProperty.txt 1245- ucdstrip and ucdmerge: 1246 EastAsianWidth.txt 1247 LineBreak.txt 1248 1249* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) 1250copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\ 1251copy 5.0.0\ucd\Blocks.txt ..\unidata\ 1252copy 5.0.0\ucd\CaseFolding.txt ..\unidata\ 1253copy 5.0.0\ucd\DerivedAge.txt ..\unidata\ 1254copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ 1255copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ 1256copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ 1257copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ 1258copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\ 1259copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\ 1260copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\ 1261copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\ 1262copy 5.0.0\ucd\UnicodeData.txt ..\unidata\ 1263 1264ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt 1265ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt 1266ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt 1267ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt 1268ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt 1269ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt 1270ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt 1271ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt 1272ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt 1273ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt 1274 1275* update FractionalUCA.txt and UCARules.txt with new canonical closure 1276 1277* genpname 1278- run preparse.pl 1279 + make sure that data.h is writable 1280 + perl preparse.pl \cvs\oss\icu > out.txt 1281 1282* uchar.h & uscript.h & uprops.h & uprops.c & genprops 1283- new block & script values 1284 + script values already added in ICU 3.6 because all of ISO 15924 is now covered 1285 1286* build Unicode data source code for hardcoding core data 1287C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data 1288 1289ICU data make path is \cvs\oss\icu\source\data\ 1290ICU root path is \cvs\oss\icu 1291Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 1292[etc.] 1293Creating data file for Unicode Character Properties 1294Creating data file for Unicode Case Mapping Properties 1295Creating data file for Unicode BiDi/Shaping Properties 1296Creating data file for Unicode Normalization 1297Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l" 1298Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp" 1299 1300- copy the .c source files to C:\cvs\oss\icu\source\common 1301 and rebuild the common library 1302 1303*** Unicode version numbers 1304- makedata.mak 1305- uchar.h 1306- configure.in 1307 1308*** LayoutEngine script information 1309* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, 1310ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates 1311ScriptRunData.cpp, which is no longer needed.) 1312 1313The generated files have a current copyright date and "@draft" statement. 1314 1315* copy the above files into <icu>/source/layout, replacing the old files. 1316 1317Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 1318and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 1319 1320* rebuild the layout and layoutex libraries. 1321 1322---------------------------------------------------------------------------- *** 1323 1324Unicode 4.1 update 1325 1326*** related Jitterbugs 1327 13284332 RFE: Update to Unicode 4.1 13294157 RBBI, TR29 4.1 updates 1330 1331*** data files & enums & parser code 1332 1333* file preparation 1334- ucdstrip: 1335 DerivedCoreProperties.txt 1336 DerivedNormalizationProps.txt 1337 NormalizationTest.txt 1338 GraphemeBreakProperty.txt 1339 SentenceBreakProperty.txt 1340 WordBreakProperty.txt 1341- ucdstrip and ucdmerge: 1342 EastAsianWidth.txt 1343 LineBreak.txt 1344 1345* add new files to the repository 1346 GraphemeBreakProperty.txt 1347 SentenceBreakProperty.txt 1348 WordBreakProperty.txt 1349 1350* update FractionalUCA.txt and UCARules.txt with new canonical closure 1351 1352* genpname 1353- handle new enumerated properties in sub read_uchar 1354- run preparse.pl 1355 1356* uchar.h & uscript.h & uprops.h & uprops.c & genprops 1357- new binary properties 1358 + Pattern_Syntax 1359 + Pattern_White_Space 1360- new enumerated properties 1361 + Grapheme_Cluster_Break 1362 + Sentence_Break 1363 + Word_Break 1364- new block & script & line break values 1365 1366* gencase 1367- case-ignorable changes 1368 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods 1369 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk 1370 1371*** Unicode version numbers 1372- makedata.mak 1373- uchar.h 1374- configure.in 1375 1376*** tests 1377- verify that u_charMirror() round-trips 1378- test all new properties and some new values of old properties 1379 1380*** other code 1381 1382* hardcoded Unihan range end/limit 1383- Unihan range end moves from 9FA5 to 9FBB 1384 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive) 1385 + do not modify BOCU/BOCSU code because that would change the encoding 1386 and break binary compatibility! 1387 + similarly, do not change the GB 18030 range data (ucnvmbcs.c), 1388 NamePrepProfile.txt 1389 + ignore trietest.c: test data is arbitrary 1390 + ignore tstnorm.cpp: test optimization, not important 1391 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF 1392 + do change line_th.txt and word_th.txt 1393 by replacing hardcoded ranges with the new property values 1394 + do change gennames.c 1395 1396source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 1397source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 1398source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5, 1399 1400* case mappings 1401- compare new special casing context conditions with previous ones 1402 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods 1403 1404* genpname 1405- consider storing only the short name if it is the same as the long name 1406 1407*** other reviews 1408- UAX #29 changes (grapheme/word/sentence breaks) 1409- UAX #14 changes (line breaks) 1410- Pattern_Syntax & Pattern_White_Space 1411 1412---------------------------------------------------------------------------- *** 1413 1414Unicode 4.0.1 update 1415 1416*** related Jitterbugs 1417 14183170 RFE: Update to Unicode 4.0.1 14193171 Add new Unicode 4.0.1 properties 14203520 use Unicode 4.0.1 updates for break iteration 1421 1422*** data files & enums & parser code 1423 1424* file preparation 1425- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt 1426- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt 1427 1428* file fixes 1429- fix UnicodeData.txt general categories of Ethiopic digits Nd->No 1430 according to PRI #26 1431 http://www.unicode.org/review/resolved-pri.html#pri26 1432- undone again because no corrigendum in sight; 1433 instead modified tests to not check consistency on this for Unicode 4.0.1 1434 1435* ucdterms.txt 1436- update from http://www.unicode.org/copyright.html 1437 formatted for plain text 1438 1439* uchar.h & uprops.h & uprops.c & genprops 1440- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed 1441- add U_LB_INSEPARABLE due to a spelling fix 1442 + put short name comment only on line with new constant 1443 for genpname perl script parser 1444- new binary properties 1445 + STerm 1446 + Variation_Selector 1447 1448* genpname 1449- fix genpname perl script so that it doesn't choke on more than 2 names per property value 1450- perl script: correctly calculate the maximum number of fields per row 1451 1452* uscript.h 1453- new script code Hrkt=Katakana_Or_Hiragana 1454 1455* gennorm.c track changes in DerivedNormalizationProps.txt 1456- "FNC" -> "FC_NFKC" 1457- single field "NFD_NO" -> two fields "NFD_QC; N" etc. 1458 1459* genprops/props2.c track changes in DerivedNumericValues.txt 1460- changed from 3 columns to 2, dropping the numeric type 1461 + assume that the type is always numeric for Han characters, 1462 and that only those are added in addition to what UnicodeData.txt lists 1463 1464*** Unicode version numbers 1465- makedata.mak 1466- uchar.h 1467- configure.in 1468 1469*** tests 1470- update test of default bidi classes according to PRI #28 1471 /tsutil/cucdtst/TestUnicodeData 1472 http://www.unicode.org/review/resolved-pri.html#pri28 1473- bidi tests: change exemplar character for ES depending on Unicode version 1474- change hardcoded expected property values where they change 1475 1476*** other code 1477 1478* name matching 1479- read UCD.html 1480 1481* scripts 1482- use new Hrkt=Katakana_Or_Hiragana 1483 1484* ZWJ & ZWNJ 1485- are now part of combining character sequences 1486- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ 1487