cpplex.c: Update comments.
* cpplex.c: Update comments. * README.Portability: Small update. From-SVN: r35058
This commit is contained in:
parent
bf4467813b
commit
f67798e710
|
@ -1,3 +1,8 @@
|
||||||
|
2000-07-16 Neil Booth <NeilB@earthling.net>
|
||||||
|
|
||||||
|
* cpplex.c: Update comments.
|
||||||
|
* README.Portability: Small update.
|
||||||
|
|
||||||
2000-07-16 Neil Booth <NeilB@earthling.net>
|
2000-07-16 Neil Booth <NeilB@earthling.net>
|
||||||
|
|
||||||
* README.Portability: Small update.
|
* README.Portability: Small update.
|
||||||
|
|
|
@ -46,6 +46,10 @@ should be written
|
||||||
|
|
||||||
free ((PTR) h->value.expansion);
|
free ((PTR) h->value.expansion);
|
||||||
|
|
||||||
|
Further, an initial investigation indicates that pointers to functions
|
||||||
|
returning void are okay. Thus the example given by "Calling functions
|
||||||
|
through pointers to functions" below appears not to cause a problem.
|
||||||
|
|
||||||
|
|
||||||
String literals
|
String literals
|
||||||
---------------
|
---------------
|
||||||
|
@ -87,7 +91,7 @@ needs to be coded in some other way.
|
||||||
signed keyword
|
signed keyword
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
The signed keyword did not exist in K+R compilers, it was introduced
|
The signed keyword did not exist in K+R compilers; it was introduced
|
||||||
in ISO C89, so you cannot use it. In both K+R and standard C,
|
in ISO C89, so you cannot use it. In both K+R and standard C,
|
||||||
unqualified char and bitfields may be signed or unsigned. There is no
|
unqualified char and bitfields may be signed or unsigned. There is no
|
||||||
way to portably declare signed chars or signed bitfields.
|
way to portably declare signed chars or signed bitfields.
|
||||||
|
|
169
gcc/cpplex.c
169
gcc/cpplex.c
|
@ -71,7 +71,8 @@ struct cpp_context
|
||||||
/* Pushed token to be returned by next call to get_raw_token. */
|
/* Pushed token to be returned by next call to get_raw_token. */
|
||||||
const cpp_token *pushed_token;
|
const cpp_token *pushed_token;
|
||||||
|
|
||||||
struct macro_args *args; /* 0 for arguments and object-like macros. */
|
struct macro_args *args; /* The arguments for a function-like
|
||||||
|
macro. NULL otherwise. */
|
||||||
unsigned short posn; /* Current posn, index into u. */
|
unsigned short posn; /* Current posn, index into u. */
|
||||||
unsigned short count; /* No. of tokens in u. */
|
unsigned short count; /* No. of tokens in u. */
|
||||||
unsigned short level;
|
unsigned short level;
|
||||||
|
@ -762,8 +763,7 @@ cpp_ideq (token, string)
|
||||||
have been pushed on the top of the stack as a CPP_BACKSLASH. The
|
have been pushed on the top of the stack as a CPP_BACKSLASH. The
|
||||||
newline ('\n' or '\r') handler looks at the token at the top of the
|
newline ('\n' or '\r') handler looks at the token at the top of the
|
||||||
stack to see if it is a CPP_BACKSLASH, and if so discards both.
|
stack to see if it is a CPP_BACKSLASH, and if so discards both.
|
||||||
Otherwise it pushes the newline (CPP_VSPACE) token as normal. Hence
|
Hence the '=' handler would never see any intervening tokens.
|
||||||
the '=' handler would never see any intervening escaped newlines.
|
|
||||||
|
|
||||||
To make trigraphs work in this context, as in precedence trigraphs
|
To make trigraphs work in this context, as in precedence trigraphs
|
||||||
are highest and converted before anything else, the '?' handler does
|
are highest and converted before anything else, the '?' handler does
|
||||||
|
@ -2023,7 +2023,168 @@ _cpp_spell_operator (type)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
/* Macro expansion algorithm. TODO. */
|
/* Macro expansion algorithm.
|
||||||
|
|
||||||
|
Macro expansion is implemented by a single-pass algorithm; there are
|
||||||
|
no rescan passes involved. cpp_get_token expands just enough to be
|
||||||
|
able to return a token to the caller, a consequence is that when it
|
||||||
|
returns the preprocessor can be in a state of mid-expansion. The
|
||||||
|
algorithm does not work by fully expanding a macro invocation into
|
||||||
|
some kind of token list, and then returning them one by one.
|
||||||
|
|
||||||
|
Our expansion state is recorded in a context stack. We start out with
|
||||||
|
a single context on the stack, let's call it base context. This
|
||||||
|
consists of the token list returned by lex_line that forms the next
|
||||||
|
logical line in the source file.
|
||||||
|
|
||||||
|
The current level in the context stack is stored in the cur_context
|
||||||
|
member of the cpp_reader structure. The context it references keeps,
|
||||||
|
amongst other things, a count of how many tokens form that context and
|
||||||
|
our position within those tokens.
|
||||||
|
|
||||||
|
Fundamentally, calling cpp_get_token will return the next token from
|
||||||
|
the current context. If we're at the end of the current context, that
|
||||||
|
context is popped from the stack first, unless it is the base context,
|
||||||
|
in which case the next logical line is lexed from the source file.
|
||||||
|
|
||||||
|
However, before returning the token, if it is a CPP_NAME token
|
||||||
|
_cpp_get_token checks to see if it is a macro and if it is enabled.
|
||||||
|
Each time it encounters a macro name, it calls push_macro_context.
|
||||||
|
This function checks that the macro should be expanded (with
|
||||||
|
is_macro_enabled), and if so pushes a new macro context on the stack
|
||||||
|
which becomes the current context. It then loops back to read the
|
||||||
|
first token of the macro context.
|
||||||
|
|
||||||
|
A macro context basically consists of the token list representing the
|
||||||
|
macro's replacement list, which was saved in the hash table by
|
||||||
|
save_macro_expansion when its #define statement was parsed. If the
|
||||||
|
macro is function-like, it also contains the tokens that form the
|
||||||
|
arguments to the macro. I say more about macro arguments below, but
|
||||||
|
for now just saying that each argument is a set of pointers to tokens
|
||||||
|
is enough.
|
||||||
|
|
||||||
|
When taking tokens from a macro context, we may get a CPP_MACRO_ARG
|
||||||
|
token. This represents an argument passed to the macro, with the
|
||||||
|
argument number stored in the token's AUX field. The argument should
|
||||||
|
be substituted, this is achieved by pushing an "argument context". An
|
||||||
|
argument context is just refers to the tokens forming the argument,
|
||||||
|
which are obtained directly from the macro context. The STRINGIFY
|
||||||
|
flag on a CPP_MACRO_ARG token indicates that the argument should be
|
||||||
|
stringified.
|
||||||
|
|
||||||
|
Here's a few simple rules the context stack obeys:-
|
||||||
|
|
||||||
|
1) The lex_line token list is always context zero.
|
||||||
|
|
||||||
|
2) Context 1, if it exists, must be a macro context.
|
||||||
|
|
||||||
|
3) An argument context can only appear above a macro context.
|
||||||
|
|
||||||
|
4) A macro context can appear above the base context, another macro
|
||||||
|
context, or an argument context.
|
||||||
|
|
||||||
|
5) These imply that the minimal level of an argument context is 2.
|
||||||
|
|
||||||
|
The only tricky thing left is ensuring that macros are enabled and
|
||||||
|
disabled correctly. The algorithm controls macro expansion by the
|
||||||
|
level of the context a token is taken from in the context stack. If a
|
||||||
|
token is taken from a level equal to no_expand_level (a member of
|
||||||
|
struct cpp_reader), no expansion is performed.
|
||||||
|
|
||||||
|
When popping a context off the stack, if no_expand_level equals the
|
||||||
|
level of the popped context, it is reduced by one to match the new
|
||||||
|
context level, so that expansion is still disabled. It does not
|
||||||
|
increase if a context is pushed, though. It starts out life as
|
||||||
|
UINT_MAX, which has the effect that initially macro expansion is
|
||||||
|
enabled. I explain how this mechanism works below.
|
||||||
|
|
||||||
|
The standard requires:-
|
||||||
|
|
||||||
|
1) Arguments to be fully expanded before substitution.
|
||||||
|
|
||||||
|
2) Stringified arguments to not be expanded, nor the tokens
|
||||||
|
immediately surrounding a ## operator.
|
||||||
|
|
||||||
|
3) Continual rescanning until there are no more macros left to
|
||||||
|
replace.
|
||||||
|
|
||||||
|
4) Once a macro has been expanded in stage 1) or 3), it cannot be
|
||||||
|
expanded again during later rescans. This prevents infinite
|
||||||
|
recursion.
|
||||||
|
|
||||||
|
The first thing to observe is that stage 3) is mostly redundant.
|
||||||
|
Since a macro is disabled once it has been expanded, how can a rescan
|
||||||
|
find an unexpanded macro name? There are only two cases where this is
|
||||||
|
possible:-
|
||||||
|
|
||||||
|
a) If the macro name results from a token paste operation.
|
||||||
|
|
||||||
|
b) If the macro in question is a function-like macro that hasn't
|
||||||
|
already been expanded because previously there was not the required
|
||||||
|
'(' token immediately following it. This is only possible when an
|
||||||
|
argument is substituted, and after substitution the last token of
|
||||||
|
the argument can bind with a parenthesis appearing in the tokens
|
||||||
|
following the substitution. Note that if the '(' appears within the
|
||||||
|
argument, the ')' must too, as expanding macro arguments cannot
|
||||||
|
"suck in" tokens outside the argument.
|
||||||
|
|
||||||
|
So we tackle this as follows. When parsing the macro invocation for
|
||||||
|
arguments, we record the tokens forming each argument as a list of
|
||||||
|
pointers to those tokens. We do not expand any tokens that are "raw",
|
||||||
|
i.e. directly from the macro invocation, but other tokens that come
|
||||||
|
from (nested) argument substitution are fully expanded.
|
||||||
|
|
||||||
|
This is achieved by setting the no_expand_level to that of the macro
|
||||||
|
invocation. A CPP_MACRO_ARG token never appears in the list of tokens
|
||||||
|
forming an argument, because parse_args (indirectly) calls
|
||||||
|
get_raw_token which automatically pushes argument contexts and traces
|
||||||
|
into them. Since these contexts are at a higher level than the
|
||||||
|
no_expand_level, they get fully macro expanded.
|
||||||
|
|
||||||
|
"Raw" and non-raw tokens are separated in arguments by null pointers,
|
||||||
|
with the policy that the initial state of an argument is raw. If the
|
||||||
|
first token is not raw, it should be preceded by a null pointer. When
|
||||||
|
tracing through the tokens of an argument context, each time
|
||||||
|
get_raw_token encounters a null pointer, it toggles the flag
|
||||||
|
CONTEXT_RAW.
|
||||||
|
|
||||||
|
This flag, when set, indicates to is_macro_disabled that we are
|
||||||
|
reading raw tokens which should be macro-expanded. Similarly, if
|
||||||
|
clear, is_macro_disabled suppresses re-expansion.
|
||||||
|
|
||||||
|
It's probably time for an example.
|
||||||
|
|
||||||
|
#define hash #
|
||||||
|
#define str(x) #x
|
||||||
|
#define xstr(y) str(y hash)
|
||||||
|
str(hash) // "hash"
|
||||||
|
xstr(hash) // "# hash"
|
||||||
|
|
||||||
|
In the invocation of str, parse_args turns off macro expansion and so
|
||||||
|
parses the argument as <hash>. This is the only token (pointer)
|
||||||
|
passed as the argument to str. Since <hash> is raw there is no need
|
||||||
|
for an initial null pointer. stringify_arg is called from
|
||||||
|
get_raw_token when tracing through the expansion of str, since the
|
||||||
|
argument has the STRINGIFY flag set. stringify_arg turns off
|
||||||
|
macro_expansion by setting the no_expand_level to that of the argument
|
||||||
|
context. Thus it gets the token <hash> and stringifies it to "hash"
|
||||||
|
correctly.
|
||||||
|
|
||||||
|
Similary xstr is passed <hash>. However, when parse_args is parsing
|
||||||
|
the invocation of str() in xstr's expansion, get_raw_token encounters
|
||||||
|
a CPP_MACRO_ARG token for y. Transparently to parse_args, it pushes
|
||||||
|
an argument context, and enters the tokens of the argument,
|
||||||
|
i.e. <hash>. This is at a higher context level than parse_args
|
||||||
|
disabled, and so is_macro_disabled permits expansion of it and a macro
|
||||||
|
context is pushed on top of the argument context. This contains the
|
||||||
|
<#> token, and the end result is that <hash> is macro expanded.
|
||||||
|
However, after popping off the argument context, the <hash> of xstr's
|
||||||
|
expansion does not get macro expanded because we're back at the
|
||||||
|
no_expand_level. The end result is that the argument passed to str is
|
||||||
|
<NULL> <#> <NULL> <hash>. Note the nulls - policy is we start off
|
||||||
|
raw, <#> is not raw, but then <hash> is.
|
||||||
|
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
/* Free the storage allocated for macro arguments. */
|
/* Free the storage allocated for macro arguments. */
|
||||||
|
|
Loading…
Reference in New Issue