Deleted Added
full compact
multibyte.3 (122730) multibyte.3 (123222)
1.\" Copyright (c) 2002, 2003 Tim J. Robbins. All rights reserved.
1.\" Copyright (c) 1993
2.\" The Regents of the University of California. All rights reserved.
3.\"
4.\" This code is derived from software contributed to Berkeley by
5.\" Donn Seeley of BSDI.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions

--- 19 unchanged lines hidden (view full) ---

28.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
29.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
30.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
31.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
32.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
33.\" SUCH DAMAGE.
34.\"
35.\" @(#)multibyte.3 8.1 (Berkeley) 6/4/93
2.\" Copyright (c) 1993
3.\" The Regents of the University of California. All rights reserved.
4.\"
5.\" This code is derived from software contributed to Berkeley by
6.\" Donn Seeley of BSDI.
7.\"
8.\" Redistribution and use in source and binary forms, with or without
9.\" modification, are permitted provided that the following conditions

--- 19 unchanged lines hidden (view full) ---

29.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
30.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
31.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
32.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
33.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
34.\" SUCH DAMAGE.
35.\"
36.\" @(#)multibyte.3 8.1 (Berkeley) 6/4/93
36.\" $FreeBSD: head/lib/libc/locale/multibyte.3 122730 2003-11-15 02:26:04Z tjr $
37.\" $FreeBSD: head/lib/libc/locale/multibyte.3 123222 2003-12-07 06:33:52Z tjr $
37.\"
38.\"
38.Dd August 10, 2003
39.Dd December 7, 2003
39.Dt MULTIBYTE 3
40.Os
41.Sh NAME
40.Dt MULTIBYTE 3
41.Os
42.Sh NAME
42.Nm mblen ,
43.Nm mbstowcs ,
44.Nm mbtowc ,
45.Nm wcstombs ,
46.Nm wctomb
47.Nd multibyte character support for C
43.Nm multibyte
44.Nd multibyte and wide character manipulation functions
48.Sh LIBRARY
49.Lb libc
50.Sh SYNOPSIS
45.Sh LIBRARY
46.Lb libc
47.Sh SYNOPSIS
48.In limits.h
51.In stdlib.h
49.In stdlib.h
52.Ft int
53.Fn mblen "const char *mbchar" "size_t nbytes"
54.Ft size_t
55.Fn mbstowcs "wchar_t * restrict wcstring" "const char * restrict mbstring" "size_t nwchars"
56.Ft int
57.Fn mbtowc "wchar_t * restrict wcharp" "const char * restrict mbchar" "size_t nbytes"
58.Ft size_t
59.Fn wcstombs "char * restrict mbstring" "const wchar_t * restrict wcstring" "size_t nbytes"
60.Ft int
61.Fn wctomb "char *mbchar" "wchar_t wchar"
50.In wchar.h
62.Sh DESCRIPTION
51.Sh DESCRIPTION
63The basic elements of some written natural languages such as Chinese
52The basic elements of some written natural languages, such as Chinese,
64cannot be represented uniquely with single C
65.Va char Ns s .
66The C standard supports two different ways of dealing with
53cannot be represented uniquely with single C
54.Va char Ns s .
55The C standard supports two different ways of dealing with
67extended natural language encodings,
68.Em wide
69characters and
70.Em multibyte
71characters.
56extended natural language encodings:
57wide characters and
58multibyte characters.
72Wide characters are an internal representation
73which allows each basic element to map
74to a single object of type
75.Va wchar_t .
76Multibyte characters are used for input and output
77and code each basic element as a sequence of C
78.Va char Ns s .
79Individual basic elements may map into one or more

--- 17 unchanged lines hidden (view full) ---

97indicators to switch to and from
98particular modes within the given representation.
99If explicit bytes are used to signal shifting,
100these are not recognized as separate characters
101but are lumped with a neighboring character.
102There is always a distinguished
103.Sq initial
104shift state.
59Wide characters are an internal representation
60which allows each basic element to map
61to a single object of type
62.Va wchar_t .
63Multibyte characters are used for input and output
64and code each basic element as a sequence of C
65.Va char Ns s .
66Individual basic elements may map into one or more

--- 17 unchanged lines hidden (view full) ---

84indicators to switch to and from
85particular modes within the given representation.
86If explicit bytes are used to signal shifting,
87these are not recognized as separate characters
88but are lumped with a neighboring character.
89There is always a distinguished
90.Sq initial
91shift state.
105The
106.Fn mbstowcs
107and
108.Fn wcstombs
109functions assume that multibyte strings are interpreted
110starting from the initial shift state.
111The
92Some functions (e.g.
112.Fn mblen ,
113.Fn mbtowc
114and
93.Fn mblen ,
94.Fn mbtowc
95and
115.Fn wctomb
116functions maintain static shift state internally.
117A call with a null
118.Fa mbchar
119pointer returns nonzero if the current locale requires shift states,
120zero otherwise;
121if shift states are required, the shift state is reset to the initial state.
122The internal shift states are undefined after a call to
96.Fn wctomb )
97maintain static shift state internally, whereas
98others store in an
99.Vt mbstate_t
100object passed by the caller.
101Shift states are undefined after a call to
123.Fn setlocale
124with the
125.Dv LC_CTYPE
126or
127.Dv LC_ALL
128categories.
129.Pp
130For convenience in processing,
131the wide character with value 0
132(the null wide character)
133is recognized as the wide character string terminator,
134and the character with value 0
135(the null byte)
136is recognized as the multibyte character string terminator.
137Null bytes are not permitted within multibyte characters.
138.Pp
102.Fn setlocale
103with the
104.Dv LC_CTYPE
105or
106.Dv LC_ALL
107categories.
108.Pp
109For convenience in processing,
110the wide character with value 0
111(the null wide character)
112is recognized as the wide character string terminator,
113and the character with value 0
114(the null byte)
115is recognized as the multibyte character string terminator.
116Null bytes are not permitted within multibyte characters.
117.Pp
139The
140.Fn mblen
141function computes the length in bytes
142of a multibyte character
143.Fa mbchar .
144Up to
145.Fa nbytes
146bytes are examined.
147.Pp
148The
149.Fn mbtowc
150function converts a multibyte character
151.Fa mbchar
152into a wide character and stores the result
153in the object pointed to by
154.Fa wcharp .
155Up to
156.Fa nbytes
157bytes are examined.
158.Pp
159The
160.Fn wctomb
161function converts a wide character
162.Fa wchar
163into a multibyte character and stores
164the result in
165.Fa mbchar .
166The object pointed to by
167.Fa mbchar
168must be large enough to accommodate the multibyte character.
169.Pp
170The
171.Fn mbstowcs
172function converts a multibyte character string
173.Fa mbstring
174into a wide character string
175.Fa wcstring .
176No more than
177.Fa nwchars
178wide characters are stored.
179A terminating null wide character is appended if there is room.
180.Pp
181The
182.Fn wcstombs
183function converts a wide character string
184.Fa wcstring
185into a multibyte character string
186.Fa mbstring .
187Up to
188.Fa nbytes
189bytes are stored in
190.Fa mbstring .
191Partial multibyte characters at the end of the string are not stored.
192The multibyte character string is null terminated if there is room.
193.Sh "RETURN VALUES
194If
195.Fa mbchar
196is
197.Dv NULL ,
198the
199.Fn mblen ,
200.Fn mbtowc
201and
202.Fn wctomb
203functions return nonzero if shift states are supported,
204zero otherwise.
205If
206.Fa mbchar
207is valid,
208then these functions return
209the number of bytes processed in
210.Fa mbchar ,
211or \-1 if no multibyte character
212could be recognized or converted.
213.Pp
214The
215.Fn mbstowcs
216function returns the number of wide characters converted,
217not counting any terminating null wide character.
218The
219.Fn wcstombs
220function returns the number of bytes converted,
221not counting any terminating null byte.
222If any invalid multibyte characters are encountered,
223both functions return \-1.
118The C library provides the following functions for dealing with
119multibyte characters:
120.Bl -column "Description"
121.It Sy "Function Description"
122.It "mblen get number of bytes in a character"
123.It "mbrlen get number of bytes in a character (restartable)"
124.It "mbrtowc convert a character to a wide-character code (restartable)"
125.It "mbsrtowcs convert a character string to a wide-character string (restartable)"
126.It "mbstowcs convert a character string to a wide-character string"
127.It "mbtowc convert a character to a wide-character code"
128.It "wcrtomb convert a wide-character code to a character (restartable)"
129.It "wcstombs convert a wide-character string to a character string"
130.It "wcsrtombs convert a wide-character string to a character string (restartable)"
131.It "wctomb convert a wide-character code to a character"
132.El
224.Sh SEE ALSO
133.Sh SEE ALSO
225.Xr btowc 3 ,
226.Xr mbrlen 3 ,
227.Xr mbrtowc 3 ,
228.Xr mbrune 3 ,
229.Xr mbsinit 3 ,
230.Xr mbsrtowcs 3 ,
231.Xr rune 3 ,
134.Xr mklocale 1 ,
135.Xr stdio 3 ,
232.Xr setlocale 3 ,
136.Xr setlocale 3 ,
233.Xr wcrtomb 3 ,
234.Xr wcsrtombs 3 ,
235.Xr big5 5 ,
236.Xr euc 5 ,
237.Xr gb18030 5 ,
238.Xr gb2312 5 ,
239.Xr gbk 5 ,
240.Xr mskanji 5 ,
241.Xr utf2 5 ,
242.Xr utf8 5
243.Sh STANDARDS
137.Xr big5 5 ,
138.Xr euc 5 ,
139.Xr gb18030 5 ,
140.Xr gb2312 5 ,
141.Xr gbk 5 ,
142.Xr mskanji 5 ,
143.Xr utf2 5 ,
144.Xr utf8 5
145.Sh STANDARDS
244The
245.Fn mblen ,
246.Fn mbstowcs ,
247.Fn mbtowc ,
248.Fn wcstombs
146These functions conform to
147.St -isoC
249and
148and
250.Fn wctomb
251functions conform to
252.St -isoC .
149.St -isoC-99
150as documented in their individual manual pages.
253.Sh BUGS
254The current implementation does not support shift states.
151.Sh BUGS
152The current implementation does not support shift states.