cpplex.c: Update comments.

* cpplex.c: Update comments. * README.Portability: Small update. From-SVN: r35058
2000-07-16 13:35:23 +00:00 · 2000-07-16 13:35:23 +00:00 · f67798e710
parent bf4467813b
commit f67798e710
3 changed files with 175 additions and 5 deletions
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@ -1,3 +1,8 @@
 2000-07-16  Neil Booth  <NeilB@earthling.net>
 	* cpplex.c: Update comments.
 	* README.Portability: Small update.
 2000-07-16  Neil Booth  <NeilB@earthling.net>
 	* README.Portability:  Small update.
--- a/gcc/README.Portability
+++ b/gcc/README.Portability
@ -46,6 +46,10 @@ should be written
  free ((PTR) h->value.expansion);
 Further, an initial investigation indicates that pointers to functions
 returning void are okay.  Thus the example given by "Calling functions
 through pointers to functions" below appears not to cause a problem.
 String literals
 ---------------
@ -87,7 +91,7 @@ needs to be coded in some other way.
 signed keyword
 --------------
-The signed keyword did not exist in K+R compilers, it was introduced
+The signed keyword did not exist in K+R compilers; it was introduced
 in ISO C89, so you cannot use it.  In both K+R and standard C,
 unqualified char and bitfields may be signed or unsigned.  There is no
 way to portably declare signed chars or signed bitfields.
--- a/gcc/cpplex.c
+++ b/gcc/cpplex.c
@ -71,7 +71,8 @@ struct cpp_context
  /* Pushed token to be returned by next call to get_raw_token.  */
  const cpp_token *pushed_token;
-  struct macro_args *args;	/* 0 for arguments and object-like macros.  */
+  struct macro_args *args;	/* The arguments for a function-like
 				   macro.  NULL otherwise.  */
  unsigned short posn;		/* Current posn, index into u.  */
  unsigned short count;		/* No. of tokens in u.  */
  unsigned short level;
@ -762,8 +763,7 @@ cpp_ideq (token, string)
 have been pushed on the top of the stack as a CPP_BACKSLASH.  The
 newline ('\n' or '\r') handler looks at the token at the top of the
 stack to see if it is a CPP_BACKSLASH, and if so discards both.
- Otherwise it pushes the newline (CPP_VSPACE) token as normal.  Hence
+ Hence the '=' handler would never see any intervening tokens.
 the '=' handler would never see any intervening escaped newlines.
 To make trigraphs work in this context, as in precedence trigraphs
 are highest and converted before anything else, the '?' handler does
@ -2023,7 +2023,168 @@ _cpp_spell_operator (type)
 }
-/* Macro expansion algorithm.  TODO.  */
+/* Macro expansion algorithm.
 Macro expansion is implemented by a single-pass algorithm; there are
 no rescan passes involved.  cpp_get_token expands just enough to be
 able to return a token to the caller, a consequence is that when it
 returns the preprocessor can be in a state of mid-expansion.  The
 algorithm does not work by fully expanding a macro invocation into
 some kind of token list, and then returning them one by one.
 Our expansion state is recorded in a context stack.  We start out with
 a single context on the stack, let's call it base context.  This
 consists of the token list returned by lex_line that forms the next
 logical line in the source file.
 The current level in the context stack is stored in the cur_context
 member of the cpp_reader structure.  The context it references keeps,
 amongst other things, a count of how many tokens form that context and
 our position within those tokens.
 Fundamentally, calling cpp_get_token will return the next token from
 the current context.  If we're at the end of the current context, that
 context is popped from the stack first, unless it is the base context,
 in which case the next logical line is lexed from the source file.
 However, before returning the token, if it is a CPP_NAME token
 _cpp_get_token checks to see if it is a macro and if it is enabled.
 Each time it encounters a macro name, it calls push_macro_context.
 This function checks that the macro should be expanded (with
 is_macro_enabled), and if so pushes a new macro context on the stack
 which becomes the current context.  It then loops back to read the
 first token of the macro context.
 A macro context basically consists of the token list representing the
 macro's replacement list, which was saved in the hash table by
 save_macro_expansion when its #define statement was parsed.  If the
 macro is function-like, it also contains the tokens that form the
 arguments to the macro.  I say more about macro arguments below, but
 for now just saying that each argument is a set of pointers to tokens
 is enough.
 When taking tokens from a macro context, we may get a CPP_MACRO_ARG
 token.  This represents an argument passed to the macro, with the
 argument number stored in the token's AUX field.  The argument should
 be substituted, this is achieved by pushing an "argument context".  An
 argument context is just refers to the tokens forming the argument,
 which are obtained directly from the macro context.  The STRINGIFY
 flag on a CPP_MACRO_ARG token indicates that the argument should be
 stringified.
 Here's a few simple rules the context stack obeys:-
  1) The lex_line token list is always context zero.
  2) Context 1, if it exists, must be a macro context.
  3) An argument context can only appear above a macro context.
  4) A macro context can appear above the base context, another macro
  context, or an argument context.
  5) These imply that the minimal level of an argument context is 2.
 The only tricky thing left is ensuring that macros are enabled and
 disabled correctly.  The algorithm controls macro expansion by the
 level of the context a token is taken from in the context stack.  If a
 token is taken from a level equal to no_expand_level (a member of
 struct cpp_reader), no expansion is performed.
 When popping a context off the stack, if no_expand_level equals the
 level of the popped context, it is reduced by one to match the new
 context level, so that expansion is still disabled.  It does not
 increase if a context is pushed, though.  It starts out life as
 UINT_MAX, which has the effect that initially macro expansion is
 enabled.  I explain how this mechanism works below.
 The standard requires:-
  1) Arguments to be fully expanded before substitution.
  2) Stringified arguments to not be expanded, nor the tokens
  immediately surrounding a ## operator.
  3) Continual rescanning until there are no more macros left to
  replace.
  4) Once a macro has been expanded in stage 1) or 3), it cannot be
  expanded again during later rescans.  This prevents infinite
  recursion.
 The first thing to observe is that stage 3) is mostly redundant.
 Since a macro is disabled once it has been expanded, how can a rescan
 find an unexpanded macro name?  There are only two cases where this is
 possible:-
  a) If the macro name results from a token paste operation.
  b) If the macro in question is a function-like macro that hasn't
  already been expanded because previously there was not the required
  '(' token immediately following it.  This is only possible when an
  argument is substituted, and after substitution the last token of
  the argument can bind with a parenthesis appearing in the tokens
  following the substitution.  Note that if the '(' appears within the
  argument, the ')' must too, as expanding macro arguments cannot
  "suck in" tokens outside the argument.
 So we tackle this as follows.  When parsing the macro invocation for
 arguments, we record the tokens forming each argument as a list of
 pointers to those tokens.  We do not expand any tokens that are "raw",
 i.e. directly from the macro invocation, but other tokens that come
 from (nested) argument substitution are fully expanded.
 This is achieved by setting the no_expand_level to that of the macro
 invocation.  A CPP_MACRO_ARG token never appears in the list of tokens
 forming an argument, because parse_args (indirectly) calls
 get_raw_token which automatically pushes argument contexts and traces
 into them.  Since these contexts are at a higher level than the
 no_expand_level, they get fully macro expanded.
 "Raw" and non-raw tokens are separated in arguments by null pointers,
 with the policy that the initial state of an argument is raw.  If the
 first token is not raw, it should be preceded by a null pointer.  When
 tracing through the tokens of an argument context, each time
 get_raw_token encounters a null pointer, it toggles the flag
 CONTEXT_RAW.
 This flag, when set, indicates to is_macro_disabled that we are
 reading raw tokens which should be macro-expanded.  Similarly, if
 clear, is_macro_disabled suppresses re-expansion.
 It's probably time for an example.
 #define hash #
 #define str(x) #x
 #define xstr(y) str(y hash)
 str(hash)			// "hash"
 xstr(hash)			// "# hash"
 In the invocation of str, parse_args turns off macro expansion and so
 parses the argument as <hash>.  This is the only token (pointer)
 passed as the argument to str.  Since <hash> is raw there is no need
 for an initial null pointer.  stringify_arg is called from
 get_raw_token when tracing through the expansion of str, since the
 argument has the STRINGIFY flag set.  stringify_arg turns off
 macro_expansion by setting the no_expand_level to that of the argument
 context.  Thus it gets the token <hash> and stringifies it to "hash"
 correctly.
 Similary xstr is passed <hash>.  However, when parse_args is parsing
 the invocation of str() in xstr's expansion, get_raw_token encounters
 a CPP_MACRO_ARG token for y.  Transparently to parse_args, it pushes
 an argument context, and enters the tokens of the argument,
 i.e. <hash>.  This is at a higher context level than parse_args
 disabled, and so is_macro_disabled permits expansion of it and a macro
 context is pushed on top of the argument context.  This contains the
 <#> token, and the end result is that <hash> is macro expanded.
 However, after popping off the argument context, the <hash> of xstr's
 expansion does not get macro expanded because we're back at the
 no_expand_level.  The end result is that the argument passed to str is
 <NULL> <#> <NULL> <hash>.  Note the nulls - policy is we start off
 raw, <#> is not raw, but then <hash> is.
 */
 /* Free the storage allocated for macro arguments.  */