History log of /seL4-test-master/projects/musllibc/src/multibyte/mbrtowc.c
Revision Date Author Comments
# 6cec7bc5 21-Jun-2016 Rich Felker <dalias@aerifal.cx>

remove comments on copyright status from UTF-8 implementation files

despite clarifications made to the COPYRIGHT file in commit
f0a61399330bae42beeb27d6ecd05570b3382a60, there continues to be
confusion about whether the permissions granted actually apply to all
files. I am the sole author of these files and clearly intend, and
have always intended, for the grant of permission to apply to them.


# 1507ebf8 15-Jun-2015 Rich Felker <dalias@aerifal.cx>

byte-based C locale, phase 1: multibyte character handling functions

this patch makes the functions which work directly on multibyte
characters treat the high bytes as individual abstract code units
rather than as multibyte sequences when MB_CUR_MAX is 1. since
MB_CUR_MAX is presently defined as a constant 4, all of the new code
added is dead code, and optimizing compilers' code generation should
not be affected at all. a future commit will activate the new code.

as abstract code units, bytes 0x80 to 0xff are represented by wchar_t
values 0xdf80 to 0xdfff, at the end of the surrogates range. this
ensures that they will never be misinterpreted as Unicode characters,
and that all wctype functions return false for these "characters"
without needing locale-specific logic. a high range outside of Unicode
such as 0x7fffff80 to 0x7fffffff was also considered, but since C11's
char16_t also needs to be able to represent conversions of these
bytes, the surrogate range was the natural choice.


# e89cfe51 01-Jul-2014 Rich Felker <dalias@aerifal.cx>

fix aliasing violations in mbtowc and mbrtowc

these functions were setting wc to point to wchar_t aliasing itself as
a "cheap" way to support null wc arguments. doing so was anything but
cheap, since even without the aliasing violation, it would limit the
compiler's ability to optimize.

making wc point to a dummy object is equally easy and does not suffer
from the above problems.


# 57174444 11-Dec-2013 Szabolcs Nagy <nsz@port70.net>

include cleanups: remove unused headers and add feature test macros


# 23ab8c25 08-Apr-2013 Rich Felker <dalias@aerifal.cx>

mbrtowc: do not leave mbstate_t in permanent-fail state after EILSEQ

the standard is clear that the old behavior is conforming: "In this
case, [EILSEQ] shall be stored in errno and the conversion state is
undefined."

however, the specification of mbrtowc has one peculiarity when the
source argument is a null pointer: in this case, it's required to
behave as mbrtowc(NULL, "", 1, ps). no motivation is provided for this
requirement, but the natural one that comes to mind is that the intent
is to reset the mbstate_t object. for stateful encodings, such
behavior is actually specified: "If the corresponding wide character
is the null wide character, the resulting state described shall be the
initial conversion state." but in the case of UTF-8 where the
mbstate_t object contains a partially-decoded character rather than a
shift state, a subsequent '\0' byte indicates that the previous
partial character is incomplete and thus an illegal sequence.

naturally, applications using their own mbstate_t object should clear
it themselves after an error, but the standard presently provides no
way to clear the builtin mbstate_t object used when the ps argument is
a null pointer. I suspect this issue may be addressed in the future by
specifying that a null source argument resets the state, as this seems
to have been the intent all along.

for what it's worth, this change also slightly reduces code size.


# a49e038b 08-Apr-2013 Rich Felker <dalias@aerifal.cx>

optimize mbrtowc

this simple change, in my measurements, makes about a 7% performance
improvement. at first glance this change would seem like a
compiler-specific hack, since the modified code is not even used.
however, I suspect the reason is that I'm eliminating a second path
into the main body of the code, allowing the compiler more flexibility
to optimize the normal (hot) path into the main body. so even if it
weren't for the measurable (and quite notable) difference in
performance, I think the change makes sense.


# 400c5e5c 06-Sep-2012 Rich Felker <dalias@aerifal.cx>

use restrict everywhere it's required by c99 and/or posix 2008

to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.


# 9ae8d5fc 25-Mar-2011 Rich Felker <dalias@aerifal.cx>

fix all implicit conversion between signed/unsigned pointers

sadly the C language does not specify any such implicit conversion, so
this is not a matter of just fixing warnings (as gcc treats it) but
actual errors. i would like to revisit a number of these changes and
possibly revise the types used to reduce the number of casts required.


# f9d880d2 13-Feb-2011 Rich Felker <dalias@aerifal.cx>

cleanup multibyte stuff to remove ugly casts, sanitize the ptr align casts


# 0b44a031 11-Feb-2011 Rich Felker <dalias@aerifal.cx>

initial check-in, version 0.5.0