History log of /freebsd-11-stable/lib/libc/locale/utf8.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 331722 29-Mar-2018 eadler

Revert r330897:

This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.

Revert with prejudice.

This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.

Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.

Requested by: gjb (re)


# 330897 14-Mar-2018 eadler

Partial merge of the SPDX changes

These changes are incomplete but are making it difficult
to determine what other changes can/should be merged.

No objections from: pfg


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 290494 07-Nov-2015 bapt

Improve collation string and locales support

Merge collation support from Illumos and DragonflyBSD.

Locales are now generated with the new localedef(1) tool from CLDR POSIX files.
The generated files are now identified as "BSD 1.0" format.

The libc now only read "BSD 1.0" locales definitions, all other version will be
set to "C"
The localedef(1) tool has been imported from Illumos and modified to use tree(3)
instead of the CDDL avl(3)
A set of tool created by edwin@ and extended by marino@ for dragonfly has been
added to be able to generate locales and the Makefiles from the vanilla CLDR
unicode databases + a universal UTF-8 charmap (by marino@)
Update the locales to unicode v27
Given our regex(3) does not support multibyte (yet) it has been forced to always
use locale C
Remove now unused colldef(1) and mklocale(1)
Finish implementing the numeric BSD extension for ctypes
The number of supported locales has grown from 175 to 250 locales. Among the new
locales: 6 Arabic locales (AE EG JO MA QA SA), Different variations of spanish
locales.
Added new 3 components locales for mn_Cyrl_MN, sr_Cyrl_RS sr_Latn_RS,
zh_Hans_CN, zh_Hant_HK and zh_Hant_TW. Some aliases has been for 2 components
version when possible.

Thanks: Garrett D'Amore (Illumos) who made sure all his work was done under
BSD license!, Edwin Groothuis (edwin@) for the work he made on tools to be able
to generate locales definition usable in freebsd sources out of vanilla CLDR
definitions, John Marino (DragonflyBSD) who first merge the Illumos work into
Dragonfly and spent hours tracking down bugs.


# 287125 25-Aug-2015 ed

Make UTF-8 parsing and generation more strict.

- in mbrtowc() we need to disallow codepoints above 0x10ffff.
- In wcrtomb() we need to disallow codepoints between 0xd800 and 0xdfff.

Reviewed by: bapt
Differential Revision: https://reviews.freebsd.org/D3399


# 286491 08-Aug-2015 bapt

Remove 5 and 6 bytes sequences which are illegal in UTF-8 space. (part2)

Per rfc3629 value greater than 0x10ffff should be rejected

Suggested by: jilles


# 286490 08-Aug-2015 bapt

Remove 5 and 6 bytes sequences which are illegal in UTF-8 space.

Per rfc3629 value greater than 0x10ffff should be rejected

Suggested by: jilles


# 268272 04-Jul-2014 pfg

minor perf enhancement for UTF-8

Reduce some duplicate code.

Reference:
https://www.illumos.org/issues/628

Obtained from: Illumos
MFC after: 1 week


# 265167 30-Apr-2014 pfg

citrus: Avoid invalid code points.

From the OpenBSD log:
The UTF-8 decoder should not accept byte sequences which decode to unicode
code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF.

http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
http://unicode.org/faq/utf_bom.html#utf8-4

Reported by: Stefan Sperling
Obtained from: OpenBSD
MFC after: 5 days


# 265095 29-Apr-2014 pfg

citrus: Avoid invalid code points.

From the OpenBSD log:
The UTF-8 decoder should not accept byte sequences which decode to unicode
code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF.

http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
http://unicode.org/faq/utf_bom.html#utf8-4

Reported by: Stefan Sperling
Obtained from: OpenBSD
MFC after: 5 days


# 227753 20-Nov-2011 theraven

Implement xlocale APIs from Darwin, mainly for use by libc++. This adds a
load of _l suffixed versions of various standard library functions that use
the global locale, making them take an explicit locale parameter. Also
adds support for per-thread locales. This work was funded by the FreeBSD
Foundation.

Please test any code you have that uses the C standard locale functions!

Reviewed by: das (gdtoa changes)
Approved by: dim (mentor)


# 172661 15-Oct-2007 ache

Add comment explaining __mb_sb_limit trick here.


# 172619 13-Oct-2007 ache

The problem is: currently our single byte ctype(3) functions are broken
for wide characters locales in the argument range >= 0x80 - they may
return false positives.

Example 1: for UTF-8 locale we currently have:
iswspace(0xA0)==1 and isspace(0xA0)==1
(because iswspace() and isspace() are the same code)
but must have
iswspace(0xA0)==1 and isspace(0xA0)==0
(because there is no such character and all others in the range
0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte
range because our internal wchar_t representation for UTF-8 is UCS-4).

Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may
return false positives (must be 0).
(because iswalpha() and isalpha() are the same code)

This change address this issue separating single byte and wide ctype
and also fix iswascii() (currently iswascii() is broken for
arguments > 0xFF).
This change is 100% binary compatible with old binaries.

Reviewied by: i18n@


# 157289 30-Mar-2006 trhodes

Fix a bug where, for 6-byte sequences, the top 6 bits get compared to
111111 rather than the top 7 bits being compared against 1111110 causing
illegal bytes fe and ff being treated the same as legal bytes fc and fd.


# 142654 27-Feb-2005 phantom

. Static'ize functions exported via function reference variables only.
. Replace inclusion of sys/param.h to sys/cdefs.h and sys/types.h where
appropriate.
. move _*_init() prototypes to mblocal.h, and remove these prototypes
from .c files
. use _none_init() in __setrunelocale() instead of duplicating code
. move __mb* variables from table.c to none.c allowing us to not to
export _none_*() externs, and appropriately remove them from mblocal.h

Ok'ed by: tjr


# 141716 12-Feb-2005 stefanf

Fix comparisons that test if an unsigned value is < 0.

Reviewed by: tjr


# 132687 27-Jul-2004 tjr

Add UTF-8-specific implementations of mbsnrtowcs() and wcsnrtombs().
These convert plain ASCII characters in-line, making them only slightly
slower than the single-byte ("NONE" encoding) version when processing
ASCII strings.


# 131881 09-Jul-2004 tjr

Add fast paths for conversion of plain ASCII characters.


# 129336 17-May-2004 tjr

Use conversion state objects to store the accumulated wide character,
low bound, and the number of bytes remaining instead of storing the
raw byte sequence and deriving them every time mbrtowc() is called.
This is much faster -- about twice as fast in some crude benchmarks.


# 129153 12-May-2004 tjr

Move prototypes of various encoding-related functions into a new header
file to avoid extern'ing them all over the place.


# 128155 12-Apr-2004 tjr

Perform some basic validation of multibyte conversion state objects.


# 128081 09-Apr-2004 tjr

Don't cast away const qualifiers.

Spotted by: bde


# 128004 07-Apr-2004 tjr

Allow partial multibyte characters to accumulate in conversion state
objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required
by C99.


# 122467 11-Nov-2003 tjr

Fix a typo that caused mbrtowc() to always return 0.


# 121893 02-Nov-2003 tjr

Convert the Big5, EUC, MSKanji and UTF-8 encoding methods to implement
mbrtowc() and wcrtomb() directly. GB18030, GBK and UTF2 are left
unconverted; GB18030 will be done eventually, but GBK and UTF2 may just
be removed, as they are subsets of GB18030 and UTF-8 respectively.


# 111082 18-Feb-2003 nectar

Whack 28 unused variables.


# 104828 10-Oct-2002 tjr

Add a UTF-8 encoding method, which will eventually replace the antique
"UTF2" method. Although UTF-8 and the old UTF2 encoding are compatible
for 16-bit characters, the new UTF-8 implementation is much more strict
about rejecting malformed input and also handles the full 31 bit range
of characters.