#
331722 |
|
29-Mar-2018 |
eadler |
Revert r330897:
This was intended to be a non-functional change. It wasn't. The commit message was thus wrong. In addition it broke arm, and merged crypto related code.
Revert with prejudice.
This revert skips files touched in r316370 since that commit was since MFCed. This revert also skips files that require $FreeBSD$ property changes.
Thank you to those who helped me get out of this mess including but not limited to gonzo, kevans, rgrimes.
Requested by: gjb (re)
|
#
330897 |
|
14-Mar-2018 |
eadler |
Partial merge of the SPDX changes
These changes are incomplete but are making it difficult to determine what other changes can/should be merged.
No objections from: pfg
|
#
302408 |
|
07-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
290494 |
|
07-Nov-2015 |
bapt |
Improve collation string and locales support
Merge collation support from Illumos and DragonflyBSD.
Locales are now generated with the new localedef(1) tool from CLDR POSIX files. The generated files are now identified as "BSD 1.0" format.
The libc now only read "BSD 1.0" locales definitions, all other version will be set to "C" The localedef(1) tool has been imported from Illumos and modified to use tree(3) instead of the CDDL avl(3) A set of tool created by edwin@ and extended by marino@ for dragonfly has been added to be able to generate locales and the Makefiles from the vanilla CLDR unicode databases + a universal UTF-8 charmap (by marino@) Update the locales to unicode v27 Given our regex(3) does not support multibyte (yet) it has been forced to always use locale C Remove now unused colldef(1) and mklocale(1) Finish implementing the numeric BSD extension for ctypes The number of supported locales has grown from 175 to 250 locales. Among the new locales: 6 Arabic locales (AE EG JO MA QA SA), Different variations of spanish locales. Added new 3 components locales for mn_Cyrl_MN, sr_Cyrl_RS sr_Latn_RS, zh_Hans_CN, zh_Hant_HK and zh_Hant_TW. Some aliases has been for 2 components version when possible.
Thanks: Garrett D'Amore (Illumos) who made sure all his work was done under BSD license!, Edwin Groothuis (edwin@) for the work he made on tools to be able to generate locales definition usable in freebsd sources out of vanilla CLDR definitions, John Marino (DragonflyBSD) who first merge the Illumos work into Dragonfly and spent hours tracking down bugs.
|
#
287125 |
|
25-Aug-2015 |
ed |
Make UTF-8 parsing and generation more strict.
- in mbrtowc() we need to disallow codepoints above 0x10ffff. - In wcrtomb() we need to disallow codepoints between 0xd800 and 0xdfff.
Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D3399
|
#
286491 |
|
08-Aug-2015 |
bapt |
Remove 5 and 6 bytes sequences which are illegal in UTF-8 space. (part2)
Per rfc3629 value greater than 0x10ffff should be rejected
Suggested by: jilles
|
#
286490 |
|
08-Aug-2015 |
bapt |
Remove 5 and 6 bytes sequences which are illegal in UTF-8 space.
Per rfc3629 value greater than 0x10ffff should be rejected
Suggested by: jilles
|
#
268272 |
|
04-Jul-2014 |
pfg |
minor perf enhancement for UTF-8
Reduce some duplicate code.
Reference: https://www.illumos.org/issues/628
Obtained from: Illumos MFC after: 1 week
|
#
265167 |
|
30-Apr-2014 |
pfg |
citrus: Avoid invalid code points.
From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF.
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4
Reported by: Stefan Sperling Obtained from: OpenBSD MFC after: 5 days
|
#
265095 |
|
29-Apr-2014 |
pfg |
citrus: Avoid invalid code points.
From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF.
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4
Reported by: Stefan Sperling Obtained from: OpenBSD MFC after: 5 days
|
#
227753 |
|
20-Nov-2011 |
theraven |
Implement xlocale APIs from Darwin, mainly for use by libc++. This adds a load of _l suffixed versions of various standard library functions that use the global locale, making them take an explicit locale parameter. Also adds support for per-thread locales. This work was funded by the FreeBSD Foundation.
Please test any code you have that uses the C standard locale functions!
Reviewed by: das (gdtoa changes) Approved by: dim (mentor)
|
#
172661 |
|
15-Oct-2007 |
ache |
Add comment explaining __mb_sb_limit trick here.
|
#
172619 |
|
13-Oct-2007 |
ache |
The problem is: currently our single byte ctype(3) functions are broken for wide characters locales in the argument range >= 0x80 - they may return false positives.
Example 1: for UTF-8 locale we currently have: iswspace(0xA0)==1 and isspace(0xA0)==1 (because iswspace() and isspace() are the same code) but must have iswspace(0xA0)==1 and isspace(0xA0)==0 (because there is no such character and all others in the range 0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte range because our internal wchar_t representation for UTF-8 is UCS-4).
Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may return false positives (must be 0). (because iswalpha() and isalpha() are the same code)
This change address this issue separating single byte and wide ctype and also fix iswascii() (currently iswascii() is broken for arguments > 0xFF). This change is 100% binary compatible with old binaries.
Reviewied by: i18n@
|
#
157289 |
|
30-Mar-2006 |
trhodes |
Fix a bug where, for 6-byte sequences, the top 6 bits get compared to 111111 rather than the top 7 bits being compared against 1111110 causing illegal bytes fe and ff being treated the same as legal bytes fc and fd.
|
#
142654 |
|
27-Feb-2005 |
phantom |
. Static'ize functions exported via function reference variables only. . Replace inclusion of sys/param.h to sys/cdefs.h and sys/types.h where appropriate. . move _*_init() prototypes to mblocal.h, and remove these prototypes from .c files . use _none_init() in __setrunelocale() instead of duplicating code . move __mb* variables from table.c to none.c allowing us to not to export _none_*() externs, and appropriately remove them from mblocal.h
Ok'ed by: tjr
|
#
141716 |
|
12-Feb-2005 |
stefanf |
Fix comparisons that test if an unsigned value is < 0.
Reviewed by: tjr
|
#
132687 |
|
27-Jul-2004 |
tjr |
Add UTF-8-specific implementations of mbsnrtowcs() and wcsnrtombs(). These convert plain ASCII characters in-line, making them only slightly slower than the single-byte ("NONE" encoding) version when processing ASCII strings.
|
#
131881 |
|
09-Jul-2004 |
tjr |
Add fast paths for conversion of plain ASCII characters.
|
#
129336 |
|
17-May-2004 |
tjr |
Use conversion state objects to store the accumulated wide character, low bound, and the number of bytes remaining instead of storing the raw byte sequence and deriving them every time mbrtowc() is called. This is much faster -- about twice as fast in some crude benchmarks.
|
#
129153 |
|
12-May-2004 |
tjr |
Move prototypes of various encoding-related functions into a new header file to avoid extern'ing them all over the place.
|
#
128155 |
|
12-Apr-2004 |
tjr |
Perform some basic validation of multibyte conversion state objects.
|
#
128081 |
|
09-Apr-2004 |
tjr |
Don't cast away const qualifiers.
Spotted by: bde
|
#
128004 |
|
07-Apr-2004 |
tjr |
Allow partial multibyte characters to accumulate in conversion state objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required by C99.
|
#
122467 |
|
11-Nov-2003 |
tjr |
Fix a typo that caused mbrtowc() to always return 0.
|
#
121893 |
|
02-Nov-2003 |
tjr |
Convert the Big5, EUC, MSKanji and UTF-8 encoding methods to implement mbrtowc() and wcrtomb() directly. GB18030, GBK and UTF2 are left unconverted; GB18030 will be done eventually, but GBK and UTF2 may just be removed, as they are subsets of GB18030 and UTF-8 respectively.
|
#
111082 |
|
18-Feb-2003 |
nectar |
Whack 28 unused variables.
|
#
104828 |
|
10-Oct-2002 |
tjr |
Add a UTF-8 encoding method, which will eventually replace the antique "UTF2" method. Although UTF-8 and the old UTF2 encoding are compatible for 16-bit characters, the new UTF-8 implementation is much more strict about rejecting malformed input and also handles the full 31 bit range of characters.
|