#
16f18d03 |
|
15-Jun-2015 |
Rich Felker <dalias@aerifal.cx> |
byte-based C locale, phase 2: stdio and iconv (multibyte callers) this patch adjusts libc components which use the multibyte functions internally, and which depend on them operating in a particular encoding, to make the appropriate locale changes before calling them and restore the calling thread's locale afterwards. activating the byte-based C locale without these changes would cause regressions in stdio and iconv. in the case of iconv, the current implementation was simply using the multibyte functions as UTF-8 conversions. setting a multibyte UTF-8 locale for the duration of the iconv operation allows the code to continue working. in the case of stdio, POSIX requires that FILE streams have an encoding rule bound at the time of setting wide orientation. as long as all locales, including the C locale, used the same encoding, treating high bytes as UTF-8, there was no need to store an encoding rule as part of the stream's state. a new locale field in the FILE structure points to the locale that should be made active during fgetwc/fputwc/ungetwc on the stream. it cannot point to the locale active at the time the stream becomes oriented, because this locale could be mutable (the global locale) or could be destroyed (locale_t objects produced by newlocale) before the stream is closed. instead, a pointer to the static C or C.UTF-8 locale object added in commit commit aeeac9ca5490d7d90fe061ab72da446c01ddf746 is used. this is valid since categories other than LC_CTYPE will not affect these functions.
|
#
3b0e8326 |
|
21-May-2015 |
Rich Felker <dalias@aerifal.cx> |
remove outdated and misleading comment in iconv.c the comment claimed that EUC/GBK/Big5 are not implemented, which has been incorrect since commit 19b4a0a20efc6b9df98b6a43536ecdd628ba4643.
|
#
39b8ce66 |
|
21-May-2015 |
Rich Felker <dalias@aerifal.cx> |
in iconv_open, accept "CHAR" and "" as aliases for "UTF-8" while not a requirement, it's common convention in other iconv implementations to accept "CHAR" as an alias for nl_langinfo(CODESET), meaning the encoding used for char[] strings in the current locale, and also "" as an alternate form. supporting this is not costly and improves compatibility.
|
#
109bd65a |
|
17-Aug-2013 |
Rich Felker <dalias@aerifal.cx> |
add hkscs/big5-2003/eten extensions to iconv big5 with these changes, the character set implemented as "big5" in musl is a pure superset of cp950, the canonical "big5", and agrees with the normative parts of Unicode. this means it has minor differences from both hkscs and big5-2003: - the range A2CC-A2CE maps to CJK ideographs rather than numerals, contrary to changes made in big5-2003. - C6CD maps to a CJK ideograph rather than its corresponding Kangxi radical character, contrary to changes made in hkscs. - F9FE maps to U+2593 rather than U+FFED. of these differences, none but the last are visually distinct, and the last is a character used purely for text-based graphics, not to convey linguistic content. should there be future demand for strict conformance to big5-2003 or hkscs mappings, the present charset aliases can be replaced with distinct variants. reportedly there are other non-standard big5 extensions in common use in Taiwan and perhaps elsewhere, which could also be added as layers on top of the existing big5 support. there may be additional characters which should be added to the hkscs table: the whatwg standard for big5 defines what appears to be a superset of hkscs.
|
#
19b4a0a2 |
|
07-Aug-2013 |
Rich Felker <dalias@aerifal.cx> |
add Big5 charset support to iconv at this point, it is just the common base charset equivalent to Windows CP 950, with no further extensions. HKSCS and possibly other supersets will be added later. other aliases may need to be added too.
|
#
734062b2 |
|
05-Aug-2013 |
Rich Felker <dalias@aerifal.cx> |
iconv support for legacy Korean encodings like for other character sets, stateful iso-2022 form is not supported yet but everything else should work. all charset aliases are treated the same, as Windows codepage 949, because reportedly the EUC-KR charset name is in widespread (mis?)usage in email and on the web for data which actually uses the extended characters outside the standard 93x94 grid. this could easily be changed if desired. the principle of this converter for handling the giant bulk of rare Hangul syllables outside of the standard KS X 1001 93x94 grid is the same as the GB18030 converter's treatment of non-explicitly-coded Unicode codepoints: sequences in the extension range are mapped to an integer index N, and the converter explicitly computes the Nth Hangul syllable not explicitly encoded in the character map. empirically, this requires at most 7 passes over the grid. this approach reduces the table size required for Korean legacy encodings from roughly 44k to 17k and should have minimal performance impact on real-world text conversions since the "slow" characters are rare. where it does have impact, the cost is merely a large constant time factor.
|
#
6a4cfbdb |
|
26-Jun-2013 |
Rich Felker <dalias@aerifal.cx> |
fix iconv conversion to legacy 8bit codepages this seems to have been a simple copy-and-paste error from the code for converting from legacy codepages.
|
#
400c5e5c |
|
06-Sep-2012 |
Rich Felker <dalias@aerifal.cx> |
use restrict everywhere it's required by c99 and/or posix 2008 to deal with the fact that the public headers may be used with pre-c99 compilers, __restrict is used in place of restrict, and defined appropriately for any supported compiler. we also avoid the form [restrict] since older versions of gcc rejected it due to a bug in the original c99 standard, and instead use the form *restrict.
|
#
26710be7 |
|
18-Jun-2012 |
Rich Felker <dalias@aerifal.cx> |
fix multiple iconv bugs reading utf-16/32 and wchar_t
|
#
673633c6 |
|
18-Jun-2012 |
Rich Felker <dalias@aerifal.cx> |
fix iconv dest utf-16: unavailable chars must be replaced; EILSEQ is wrong
|
#
a2f149b5 |
|
18-Jun-2012 |
Rich Felker <dalias@aerifal.cx> |
fix erroneous utf-16 encoding with surrogates in iconv apparently this was never tested before.
|
#
80d7859f |
|
21-Apr-2012 |
Rich Felker <dalias@aerifal.cx> |
fix major breakage in iconv, bogus rejecting of dest charsets
|
#
0e2331c9 |
|
12-Jul-2011 |
Rich Felker <dalias@aerifal.cx> |
gb18030 support in iconv (only from, not to) also support (and restrict to subsets) older chinese sets, and explicitly refuse to convert to cjk (since there's no code for it yet)
|
#
95a85e04 |
|
12-Jul-2011 |
Rich Felker <dalias@aerifal.cx> |
legacy japanese charset support in iconv (only from, not to)
|
#
594b16e0 |
|
11-Jul-2011 |
Rich Felker <dalias@aerifal.cx> |
simplify iconv and support more legacy codepages
|
#
2f0c415c |
|
03-Jul-2011 |
Rich Felker <dalias@aerifal.cx> |
iconv was not returning -1 on most failure this broke most uses of iconv in real-world programs, especially glib's iconv wrappers.
|
#
bb8d3d00 |
|
07-Apr-2011 |
Rich Felker <dalias@aerifal.cx> |
fix breakage due to converting a return type to size_t in iconv...
|
#
9ae8d5fc |
|
25-Mar-2011 |
Rich Felker <dalias@aerifal.cx> |
fix all implicit conversion between signed/unsigned pointers sadly the C language does not specify any such implicit conversion, so this is not a matter of just fixing warnings (as gcc treats it) but actual errors. i would like to revisit a number of these changes and possibly revise the types used to reduce the number of casts required.
|
#
7fe308eb |
|
13-Feb-2011 |
Rich Felker <dalias@aerifal.cx> |
use a more-correct integer type, and silence 64-bit warnings as a bonus
|
#
0b44a031 |
|
11-Feb-2011 |
Rich Felker <dalias@aerifal.cx> |
initial check-in, version 0.5.0
|