Deleted Added
full compact
cppinternals.texi (169690) cppinternals.texi (220755)
1\input texinfo
2@setfilename cppinternals.info
3@settitle The GNU C Preprocessor Internals
4
5@include gcc-common.texi
6
7@ifinfo
8@dircategory Software development

--- 58 unchanged lines hidden (view full) ---

67@page
68
69@node Top
70@top
71@chapter Cpplib---the GNU C Preprocessor
72
73The GNU C preprocessor is
74implemented as a library, @dfn{cpplib}, so it can be easily shared between
1\input texinfo
2@setfilename cppinternals.info
3@settitle The GNU C Preprocessor Internals
4
5@include gcc-common.texi
6
7@ifinfo
8@dircategory Software development

--- 58 unchanged lines hidden (view full) ---

67@page
68
69@node Top
70@top
71@chapter Cpplib---the GNU C Preprocessor
72
73The GNU C preprocessor is
74implemented as a library, @dfn{cpplib}, so it can be easily shared between
75a stand-alone preprocessor, and a preprocessor integrated with the C,
76C++ and Objective-C front ends. It is also available for use by other
77programs, though this is not recommended as its exposed interface has
78not yet reached a point of reasonable stability.
75a stand-alone preprocessor, and a preprocessor integrated with the C
76and C++ front ends. It is also available for use by other programs,
77though this is not recommended as its exposed interface has not yet
78reached a point of reasonable stability.
79
80The library has been written to be re-entrant, so that it can be used
81to preprocess many files simultaneously if necessary. It has also been
82written with the preprocessing token as the fundamental unit; the
83preprocessor in previous versions of GCC would operate on text strings
84as the fundamental unit.
85
86This brief manual documents the internals of cpplib, and explains some
87of the tricky issues. It is intended that, along with the comments in
88the source code, a reasonably competent C programmer should be able to
89figure out what the code is doing, and why things have been implemented
90the way they have.
91
92@menu
93* Conventions:: Conventions used in the code.
79
80The library has been written to be re-entrant, so that it can be used
81to preprocess many files simultaneously if necessary. It has also been
82written with the preprocessing token as the fundamental unit; the
83preprocessor in previous versions of GCC would operate on text strings
84as the fundamental unit.
85
86This brief manual documents the internals of cpplib, and explains some
87of the tricky issues. It is intended that, along with the comments in
88the source code, a reasonably competent C programmer should be able to
89figure out what the code is doing, and why things have been implemented
90the way they have.
91
92@menu
93* Conventions:: Conventions used in the code.
94* Lexer:: The combined C, C++ and Objective-C Lexer.
94* Lexer:: The combined C and C++ Lexer.
95* Hash Nodes:: All identifiers are entered into a hash table.
96* Macro Expansion:: Macro expansion algorithm.
97* Token Spacing:: Spacing and paste avoidance issues.
98* Line Numbering:: Tracking location within files.
99* Guard Macros:: Optimizing header files with guard macros.
100* Files:: File handling.
101* Concept Index:: Index.
102@end menu

--- 23 unchanged lines hidden (view full) ---

126@node Lexer
127@unnumbered The Lexer
128@cindex lexer
129@cindex newlines
130@cindex escaped newlines
131
132@section Overview
133The lexer is contained in the file @file{lex.c}. It is a hand-coded
95* Hash Nodes:: All identifiers are entered into a hash table.
96* Macro Expansion:: Macro expansion algorithm.
97* Token Spacing:: Spacing and paste avoidance issues.
98* Line Numbering:: Tracking location within files.
99* Guard Macros:: Optimizing header files with guard macros.
100* Files:: File handling.
101* Concept Index:: Index.
102@end menu

--- 23 unchanged lines hidden (view full) ---

126@node Lexer
127@unnumbered The Lexer
128@cindex lexer
129@cindex newlines
130@cindex escaped newlines
131
132@section Overview
133The lexer is contained in the file @file{lex.c}. It is a hand-coded
134lexer, and not implemented as a state machine. It can understand C, C++
135and Objective-C source code, and has been extended to allow reasonably
136successful preprocessing of assembly language. The lexer does not make
137an initial pass to strip out trigraphs and escaped newlines, but handles
138them as they are encountered in a single pass of the input file. It
139returns preprocessing tokens individually, not a line at a time.
134lexer, and not implemented as a state machine. It can understand C and
135C++ source code, and has been extended to allow reasonably successful
136preprocessing of assembly language. The lexer does not make an initial
137pass to strip out trigraphs and escaped newlines, but handles them as
138they are encountered in a single pass of the input file. It returns
139preprocessing tokens individually, not a line at a time.
140
141It is mostly transparent to users of the library, since the library's
142interface for obtaining the next token, @code{cpp_get_token}, takes care
143of lexing new tokens, handling directives, and expanding macros as
144necessary. However, the lexer does expose some functionality so that
145clients of the library can easily spell a given token, such as
146@code{cpp_spell_token} and @code{cpp_token_len}. These functions are
147useful when generating diagnostics, and for emitting the preprocessed

--- 150 unchanged lines hidden (view full) ---

298Another place where state flags are used to change behavior is whilst
299lexing header names. Normally, a @samp{<} would be lexed as a single
300token. After a @code{#include} directive, though, it should be lexed as
301a single token as far as the nearest @samp{>} character. Note that we
302don't allow the terminators of header names to be escaped; the first
303@samp{"} or @samp{>} terminates the header name.
304
305Interpretation of some character sequences depends upon whether we are
140
141It is mostly transparent to users of the library, since the library's
142interface for obtaining the next token, @code{cpp_get_token}, takes care
143of lexing new tokens, handling directives, and expanding macros as
144necessary. However, the lexer does expose some functionality so that
145clients of the library can easily spell a given token, such as
146@code{cpp_spell_token} and @code{cpp_token_len}. These functions are
147useful when generating diagnostics, and for emitting the preprocessed

--- 150 unchanged lines hidden (view full) ---

298Another place where state flags are used to change behavior is whilst
299lexing header names. Normally, a @samp{<} would be lexed as a single
300token. After a @code{#include} directive, though, it should be lexed as
301a single token as far as the nearest @samp{>} character. Note that we
302don't allow the terminators of header names to be escaped; the first
303@samp{"} or @samp{>} terminates the header name.
304
305Interpretation of some character sequences depends upon whether we are
306lexing C, C++ or Objective-C, and on the revision of the standard in
307force. For example, @samp{::} is a single token in C++, but in C it is
308two separate @samp{:} tokens and almost certainly a syntax error. Such
306lexing C or C++, and on the revision of the standard in force. For
307example, @samp{::} is a single token in C++, but in C it is two
308separate @samp{:} tokens and almost certainly a syntax error. Such
309cases are handled by @code{_cpp_lex_direct} based upon command-line
310flags stored in the @code{cpp_options} structure.
311
312Once a token has been lexed, it leads an independent existence. The
313spelling of numbers, identifiers and strings is copied to permanent
314storage from the original input buffer, so a token remains valid and
315correct even if its source buffer is freed with @code{_cpp_pop_buffer}.
316The storage holding the spellings of such tokens remains until the

--- 749 unchanged lines hidden ---
309cases are handled by @code{_cpp_lex_direct} based upon command-line
310flags stored in the @code{cpp_options} structure.
311
312Once a token has been lexed, it leads an independent existence. The
313spelling of numbers, identifiers and strings is copied to permanent
314storage from the original input buffer, so a token remains valid and
315correct even if its source buffer is freed with @code{_cpp_pop_buffer}.
316The storage holding the spellings of such tokens remains until the

--- 749 unchanged lines hidden ---