cppinternals.texi (169690) | cppinternals.texi (220755) |
---|---|
1\input texinfo 2@setfilename cppinternals.info 3@settitle The GNU C Preprocessor Internals 4 5@include gcc-common.texi 6 7@ifinfo 8@dircategory Software development --- 58 unchanged lines hidden (view full) --- 67@page 68 69@node Top 70@top 71@chapter Cpplib---the GNU C Preprocessor 72 73The GNU C preprocessor is 74implemented as a library, @dfn{cpplib}, so it can be easily shared between | 1\input texinfo 2@setfilename cppinternals.info 3@settitle The GNU C Preprocessor Internals 4 5@include gcc-common.texi 6 7@ifinfo 8@dircategory Software development --- 58 unchanged lines hidden (view full) --- 67@page 68 69@node Top 70@top 71@chapter Cpplib---the GNU C Preprocessor 72 73The GNU C preprocessor is 74implemented as a library, @dfn{cpplib}, so it can be easily shared between |
75a stand-alone preprocessor, and a preprocessor integrated with the C, 76C++ and Objective-C front ends. It is also available for use by other 77programs, though this is not recommended as its exposed interface has 78not yet reached a point of reasonable stability. | 75a stand-alone preprocessor, and a preprocessor integrated with the C 76and C++ front ends. It is also available for use by other programs, 77though this is not recommended as its exposed interface has not yet 78reached a point of reasonable stability. |
79 80The library has been written to be re-entrant, so that it can be used 81to preprocess many files simultaneously if necessary. It has also been 82written with the preprocessing token as the fundamental unit; the 83preprocessor in previous versions of GCC would operate on text strings 84as the fundamental unit. 85 86This brief manual documents the internals of cpplib, and explains some 87of the tricky issues. It is intended that, along with the comments in 88the source code, a reasonably competent C programmer should be able to 89figure out what the code is doing, and why things have been implemented 90the way they have. 91 92@menu 93* Conventions:: Conventions used in the code. | 79 80The library has been written to be re-entrant, so that it can be used 81to preprocess many files simultaneously if necessary. It has also been 82written with the preprocessing token as the fundamental unit; the 83preprocessor in previous versions of GCC would operate on text strings 84as the fundamental unit. 85 86This brief manual documents the internals of cpplib, and explains some 87of the tricky issues. It is intended that, along with the comments in 88the source code, a reasonably competent C programmer should be able to 89figure out what the code is doing, and why things have been implemented 90the way they have. 91 92@menu 93* Conventions:: Conventions used in the code. |
94* Lexer:: The combined C, C++ and Objective-C Lexer. | 94* Lexer:: The combined C and C++ Lexer. |
95* Hash Nodes:: All identifiers are entered into a hash table. 96* Macro Expansion:: Macro expansion algorithm. 97* Token Spacing:: Spacing and paste avoidance issues. 98* Line Numbering:: Tracking location within files. 99* Guard Macros:: Optimizing header files with guard macros. 100* Files:: File handling. 101* Concept Index:: Index. 102@end menu --- 23 unchanged lines hidden (view full) --- 126@node Lexer 127@unnumbered The Lexer 128@cindex lexer 129@cindex newlines 130@cindex escaped newlines 131 132@section Overview 133The lexer is contained in the file @file{lex.c}. It is a hand-coded | 95* Hash Nodes:: All identifiers are entered into a hash table. 96* Macro Expansion:: Macro expansion algorithm. 97* Token Spacing:: Spacing and paste avoidance issues. 98* Line Numbering:: Tracking location within files. 99* Guard Macros:: Optimizing header files with guard macros. 100* Files:: File handling. 101* Concept Index:: Index. 102@end menu --- 23 unchanged lines hidden (view full) --- 126@node Lexer 127@unnumbered The Lexer 128@cindex lexer 129@cindex newlines 130@cindex escaped newlines 131 132@section Overview 133The lexer is contained in the file @file{lex.c}. It is a hand-coded |
134lexer, and not implemented as a state machine. It can understand C, C++ 135and Objective-C source code, and has been extended to allow reasonably 136successful preprocessing of assembly language. The lexer does not make 137an initial pass to strip out trigraphs and escaped newlines, but handles 138them as they are encountered in a single pass of the input file. It 139returns preprocessing tokens individually, not a line at a time. | 134lexer, and not implemented as a state machine. It can understand C and 135C++ source code, and has been extended to allow reasonably successful 136preprocessing of assembly language. The lexer does not make an initial 137pass to strip out trigraphs and escaped newlines, but handles them as 138they are encountered in a single pass of the input file. It returns 139preprocessing tokens individually, not a line at a time. |
140 141It is mostly transparent to users of the library, since the library's 142interface for obtaining the next token, @code{cpp_get_token}, takes care 143of lexing new tokens, handling directives, and expanding macros as 144necessary. However, the lexer does expose some functionality so that 145clients of the library can easily spell a given token, such as 146@code{cpp_spell_token} and @code{cpp_token_len}. These functions are 147useful when generating diagnostics, and for emitting the preprocessed --- 150 unchanged lines hidden (view full) --- 298Another place where state flags are used to change behavior is whilst 299lexing header names. Normally, a @samp{<} would be lexed as a single 300token. After a @code{#include} directive, though, it should be lexed as 301a single token as far as the nearest @samp{>} character. Note that we 302don't allow the terminators of header names to be escaped; the first 303@samp{"} or @samp{>} terminates the header name. 304 305Interpretation of some character sequences depends upon whether we are | 140 141It is mostly transparent to users of the library, since the library's 142interface for obtaining the next token, @code{cpp_get_token}, takes care 143of lexing new tokens, handling directives, and expanding macros as 144necessary. However, the lexer does expose some functionality so that 145clients of the library can easily spell a given token, such as 146@code{cpp_spell_token} and @code{cpp_token_len}. These functions are 147useful when generating diagnostics, and for emitting the preprocessed --- 150 unchanged lines hidden (view full) --- 298Another place where state flags are used to change behavior is whilst 299lexing header names. Normally, a @samp{<} would be lexed as a single 300token. After a @code{#include} directive, though, it should be lexed as 301a single token as far as the nearest @samp{>} character. Note that we 302don't allow the terminators of header names to be escaped; the first 303@samp{"} or @samp{>} terminates the header name. 304 305Interpretation of some character sequences depends upon whether we are |
306lexing C, C++ or Objective-C, and on the revision of the standard in 307force. For example, @samp{::} is a single token in C++, but in C it is 308two separate @samp{:} tokens and almost certainly a syntax error. Such | 306lexing C or C++, and on the revision of the standard in force. For 307example, @samp{::} is a single token in C++, but in C it is two 308separate @samp{:} tokens and almost certainly a syntax error. Such |
309cases are handled by @code{_cpp_lex_direct} based upon command-line 310flags stored in the @code{cpp_options} structure. 311 312Once a token has been lexed, it leads an independent existence. The 313spelling of numbers, identifiers and strings is copied to permanent 314storage from the original input buffer, so a token remains valid and 315correct even if its source buffer is freed with @code{_cpp_pop_buffer}. 316The storage holding the spellings of such tokens remains until the --- 749 unchanged lines hidden --- | 309cases are handled by @code{_cpp_lex_direct} based upon command-line 310flags stored in the @code{cpp_options} structure. 311 312Once a token has been lexed, it leads an independent existence. The 313spelling of numbers, identifiers and strings is copied to permanent 314storage from the original input buffer, so a token remains valid and 315correct even if its source buffer is freed with @code{_cpp_pop_buffer}. 316The storage holding the spellings of such tokens remains until the --- 749 unchanged lines hidden --- |