• Home
  • History
  • Annotate
  • Line#
  • Navigate
  • Raw
  • Download
  • only in /asuswrt-rt-n18u-9.0.0.4.380.2695/release/src-rt-6.x.4708/toolchains/hndtools-armeabi-2011.09/share/doc/arm-arm-none-eabi/html/cpp/
1<html lang="en">
2<head>
3<title>Character sets - The C Preprocessor</title>
4<meta http-equiv="Content-Type" content="text/html">
5<meta name="description" content="The C Preprocessor">
6<meta name="generator" content="makeinfo 4.13">
7<link title="Top" rel="start" href="index.html#Top">
8<link rel="up" href="Overview.html#Overview" title="Overview">
9<link rel="next" href="Initial-processing.html#Initial-processing" title="Initial processing">
10<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
11<!--
12Copyright (C) 1987, 1989, 1991, 1992, 1993, 1994, 1995, 1996,
131997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,
142008, 2009, 2010, 2011
15Free Software Foundation, Inc.
16
17Permission is granted to copy, distribute and/or modify this document
18under the terms of the GNU Free Documentation License, Version 1.3 or
19any later version published by the Free Software Foundation.  A copy of
20the license is included in the
21section entitled ``GNU Free Documentation License''.
22
23This manual contains no Invariant Sections.  The Front-Cover Texts are
24(a) (see below), and the Back-Cover Texts are (b) (see below).
25
26(a) The FSF's Front-Cover Text is:
27
28     A GNU Manual
29
30(b) The FSF's Back-Cover Text is:
31
32     You have freedom to copy and modify this GNU Manual, like GNU
33     software.  Copies published by the Free Software Foundation raise
34     funds for GNU development.
35-->
36<meta http-equiv="Content-Style-Type" content="text/css">
37<style type="text/css"><!--
38  pre.display { font-family:inherit }
39  pre.format  { font-family:inherit }
40  pre.smalldisplay { font-family:inherit; font-size:smaller }
41  pre.smallformat  { font-family:inherit; font-size:smaller }
42  pre.smallexample { font-size:smaller }
43  pre.smalllisp    { font-size:smaller }
44  span.sc    { font-variant:small-caps }
45  span.roman { font-family:serif; font-weight:normal; } 
46  span.sansserif { font-family:sans-serif; font-weight:normal; } 
47--></style>
48<link rel="stylesheet" type="text/css" href="../cs.css">
49</head>
50<body>
51<div class="node">
52<a name="Character-sets"></a>
53<p>
54Next:&nbsp;<a rel="next" accesskey="n" href="Initial-processing.html#Initial-processing">Initial processing</a>,
55Up:&nbsp;<a rel="up" accesskey="u" href="Overview.html#Overview">Overview</a>
56<hr>
57</div>
58
59<h3 class="section">1.1 Character sets</h3>
60
61<p>Source code character set processing in C and related languages is
62rather complicated.  The C standard discusses two character sets, but
63there are really at least four.
64
65   <p>The files input to CPP might be in any character set at all.  CPP's
66very first action, before it even looks for line boundaries, is to
67convert the file into the character set it uses for internal
68processing.  That set is what the C standard calls the <dfn>source</dfn>
69character set.  It must be isomorphic with ISO 10646, also known as
70Unicode.  CPP uses the UTF-8 encoding of Unicode.
71
72   <p>The character sets of the input files are specified using the
73<samp><span class="option">-finput-charset=</span></samp> option.
74
75   <p>All preprocessing work (the subject of the rest of this manual) is
76carried out in the source character set.  If you request textual
77output from the preprocessor with the <samp><span class="option">-E</span></samp> option, it will be
78in UTF-8.
79
80   <p>After preprocessing is complete, string and character constants are
81converted again, into the <dfn>execution</dfn> character set.  This
82character set is under control of the user; the default is UTF-8,
83matching the source character set.  Wide string and character
84constants have their own character set, which is not called out
85specifically in the standard.  Again, it is under control of the user. 
86The default is UTF-16 or UTF-32, whichever fits in the target's
87<code>wchar_t</code> type, in the target machine's byte
88order.<a rel="footnote" href="#fn-1" name="fnd-1"><sup>1</sup></a>  Octal and hexadecimal escape sequences do not undergo
89conversion; <tt>'\x12'</tt> has the value 0x12 regardless of the currently
90selected execution character set.  All other escapes are replaced by
91the character in the source character set that they represent, then
92converted to the execution character set, just like unescaped
93characters.
94
95   <p>Unless the experimental <samp><span class="option">-fextended-identifiers</span></samp> option is used,
96GCC does not permit the use of characters outside the ASCII range, nor
97&lsquo;<samp><span class="samp">\u</span></samp>&rsquo; and &lsquo;<samp><span class="samp">\U</span></samp>&rsquo; escapes, in identifiers.  Even with that
98option, characters outside the ASCII range can only be specified with
99the &lsquo;<samp><span class="samp">\u</span></samp>&rsquo; and &lsquo;<samp><span class="samp">\U</span></samp>&rsquo; escapes, not used directly in identifiers.
100
101   <div class="footnote">
102<hr>
103<h4>Footnotes</h4><p class="footnote"><small>[<a name="fn-1" href="#fnd-1">1</a>]</small> UTF-16 does not meet the requirements of the C
104standard for a wide character set, but the choice of 16-bit
105<code>wchar_t</code> is enshrined in some system ABIs so we cannot fix
106this.</p>
107
108   <hr></div>
109
110   </body></html>
111
112