This patch implements C++23 P2334R1, which is easy because Joseph has done
all the hard work for C2X already.
Unlike the C N2645 paper, the C++ P2334R1 contains one important addition
(but not in the normative text):
"While this is a new preprocessor feature and cannot be treated as a defect
report, implementations that support older versions of the standard are
encouraged to implement this feature in the older language modes as well
as C++23."
so there are different variants how to implement it.
One is ignoring that sentence and only implementing it
for -std=c++23/-std=gnu++23 like it is only implemented for -std=c2x.
Another option would be to implement it also in the older GNU modes but
not in the C/CXX modes (but it would be strange if we did that just for
C++ and not for C).
Yet another option is to enable it unconditionally.
And yet another option would be to enable it unconditionally but emit
a warning (or pedwarn) when it is seen.
Note, when it is enabled for the older language modes, as Joseph wrote
in the c11-elifdef-1.c testcase, it can result e.g. in rejecting previously
valid code:
#define A
#undef B
#if 0
#elifdef A
#error "#elifdef A applied"
#endif
#if 0
#elifndef B
#error "#elifndef B applied"
#endif
Note, seems clang went the enable it unconditionally in all standard
versions of both C and C++, no warnings or anything whatsoever, so
essentially treated it as a DR that changed behavior of e.g. the above code.
After feedback, this option enables #elifdef/#elifndef for -std=c2x
and -std=c++2{b,3} and enables it also for -std=gnu*, but for GNU modes
older than C2X or C++23 if -pedantic it emits a pedwarn on the directives
that either would be rejected in the corresponding -std=c* modes, e.g.
#if 1
#elifdef A // pedwarn if -pedantic
#endif
or when the directives would be silently accepted, but when they are
recognized it changes behavior, so e.g.
#define A
#if 0
#elifdef A // pedwarn if -pedantic
#define M 1
#endif
It won't pedwarn if the directives would be silently ignored and wouldn't
change anything, like:
#define A
#if 0
#elifndef A
#define M 1
#endif
or
#undef B
#if 0
#elifdef B
#define M 1
#endif
2021-10-06 Jakub Jelinek <jakub@redhat.com>
libcpp/
* init.c (lang_defaults): Implement P2334R1, enable elifdef for
-std=c++23 and -std=gnu++23.
* directives.c (_cpp_handle_directive): Support elifdef/elifndef if
either CPP_OPTION (pfile, elifdef) or !CPP_OPTION (pfile, std).
(do_elif): For older non-std modes if pedantic pedwarn about
#elifdef/#elifndef directives that change behavior.
gcc/testsuite/
* gcc.dg/cpp/gnu11-elifdef-1.c: New test.
* gcc.dg/cpp/gnu11-elifdef-2.c: New test.
* gcc.dg/cpp/gnu11-elifdef-3.c: New test.
* gcc.dg/cpp/gnu11-elifdef-4.c: New test.
* g++.dg/cpp/elifdef-1.C: New test.
* g++.dg/cpp/elifdef-2.C: New test.
* g++.dg/cpp/elifdef-3.C: New test.
* g++.dg/cpp/elifdef-4.C: New test.
* g++.dg/cpp/elifdef-5.C: New test.
* g++.dg/cpp/elifdef-6.C: New test.
* g++.dg/cpp/elifdef-7.C: New test.
The following patch implements the
P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31
paper. We already allow UTF-8 characters in the source, so that part
is already implemented, so IMHO all we need to do is pedwarn instead of
just warn for the (default) -Wnormalize=nfc (or for -Wnormalize={id,nkfc})
if the character is not in NFC and to use the unicode XID_Start and
XID_Continue derived code properties to find out what characters are allowed
(the standard actually adds U+005F to XID_Start, but we are handling the
ASCII compatible characters differently already and they aren't allowed
in UCNs in identifiers). Instead of hardcoding the large tables
in ucnid.tab, this patch makes makeucnid.c read them from the Unicode
tables (13.0.0 version at this point).
For non-pedantic mode, we accept as 2nd+ char in identifiers a union
of valid characters in all supported modes, but for the 1st char it
was actually pedantically requiring that it is not any of the characters
that may not appear in the currently chosen standard as the first character.
This patch changes it such that also what is allowed at the start of an
identifier is a union of characters valid at the start of an identifier
in any of the pedantic modes.
2021-09-01 Jakub Jelinek <jakub@redhat.com>
PR c++/100977
libcpp/
* include/cpplib.h (struct cpp_options): Add cxx23_identifiers.
* charset.c (CXX23, NXX23): New enumerators.
(CID, NFC, NKC, CTX): Renumber.
(ucn_valid_in_identifier): Implement P1949R7 - use CXX23 and
NXX23 flags for cxx23_identifiers. For start character in
non-pedantic mode, allow characters that are allowed as start
characters in any of the supported language modes, rather than
disallowing characters allowed only as non-start characters in
current mode but for characters from other language modes allowing
them even if they are never allowed at start.
* init.c (struct lang_flags): Add cxx23_identifiers.
(lang_defaults): Add cxx23_identifiers column.
(cpp_set_lang): Initialize CPP_OPTION (pfile, cxx23_identifiers).
* lex.c (warn_about_normalization): If cxx23_identifiers, use
cpp_pedwarning_with_line instead of cpp_warning_with_line for
"is not in NFC" diagnostics.
* makeucnid.c: Adjust usage comment.
(CXX23, NXX23): New enumerators.
(all_languages): Add CXX23.
(not_NFC, not_NFKC, maybe_not_NFC): Renumber.
(read_derivedcore): New function.
(write_table): Print also CXX23 and NXX23 columns.
(main): Require 5 arguments instead of 4, call read_derivedcore.
* ucnid.h: Regenerated using Unicode 13.0.0 files.
gcc/testsuite/
* g++.dg/cpp23/normalize1.C: New test.
* g++.dg/cpp23/normalize2.C: New test.
* g++.dg/cpp23/normalize3.C: New test.
* g++.dg/cpp23/normalize4.C: New test.
* g++.dg/cpp23/normalize5.C: New test.
* g++.dg/cpp23/normalize6.C: New test.
* g++.dg/cpp23/normalize7.C: New test.
* g++.dg/cpp23/ucnid-1-utf8.C: New test.
* g++.dg/cpp23/ucnid-2-utf8.C: New test.
* gcc.dg/cpp/ucnid-4.c: Don't expect
"not valid at the start of an identifier" errors.
* gcc.dg/cpp/ucnid-4-utf8.c: Likewise.
* gcc.dg/cpp/ucnid-5-utf8.c: New test.
> We want to remove the latter <placemarker> but not the former one, and
> the patch adds the vaopt_padding_tokens counter for it to control
> how many placemarkers are removed on vaopt_state::END.
> As can be seen in #c1 and #c2 of the PR, I've tried various approaches,
> but neither worked out for all the cases except the posted one.
I notice that the second placemarker you mention is avoid_paste, which seems
relevant. This seems to also work, at least it doesn't seem to break any of
the va_opt tests.
2021-09-01 Jason Merrill <jason@redhat.com>
* macro.c (replace_args): When __VA_OPT__ is on the LHS of ##,
remove trailing avoid_paste tokens.
So, besides missing #__VA_OPT__ patch for which I've posted patch last week,
P1042R1 introduced some placemarker changes for __VA_OPT__, most notably
the addition of before "removal of placemarker tokens," rescanning ...
and the
#define H4(X, ...) __VA_OPT__(a X ## X) ## b
H4(, 1) // replaced by a b
example mentioned there where we replace it currently with ab
The following patch are the minimum changes (except for the
__builtin_expect) that achieve the same preprocessing between current
clang++ and patched gcc on all the testcases I've tried (i.e. gcc __VA_OPT__
testsuite in c-c++-common/cpp/va-opt* including the new test and the clang
clang/test/Preprocessor/macro_va_opt* testcases).
At one point I was trying to implement the __VA_OPT__(args) case as if
for non-empty __VA_ARGS__ it expanded as if __VA_OPT__( and ) were missing,
but from the tests it seems that is not how it should work, in particular
if after (or before) we have some macro argument and it is not followed
(or preceded) by ##, then it should be macro expanded even when __VA_OPT__
is after ## or ) is followed by ##. And it seems that not removing any
padding tokens isn't possible either, because the expansion of the arguments
typically has a padding token at the start and end and those at least
according to the testsuite need to go. It is unclear if it would be enough
to remove just one or if all padding tokens should be removed.
Anyway, e.g. the previous removal of all padding tokens at the end of
__VA_OPT__ is undesirable, as it e.g. eats also the padding tokens needed
for the H4 example from the paper.
2021-09-01 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/101488
* macro.c (replace_args): Fix up handling of CPP_PADDING tokens at the
start or end of __VA_OPT__ arguments when preceeded or followed by ##.
* c-c++-common/cpp/va-opt-3.c: Adjust expected output.
* c-c++-common/cpp/va-opt-7.c: New test.
Adds the logic to handle -finput-charset in layout_get_source_line(), so that
source lines are converted from their input encodings prior to being output by
diagnostics machinery. Also adds the ability to strip a UTF-8 BOM similarly.
gcc/c-family/ChangeLog:
PR other/93067
* c-opts.c (c_common_input_charset_cb): New function.
(c_common_post_options): Call new function
diagnostic_initialize_input_context().
gcc/d/ChangeLog:
PR other/93067
* d-lang.cc (d_input_charset_callback): New function.
(d_init): Call new function
diagnostic_initialize_input_context().
gcc/fortran/ChangeLog:
PR other/93067
* cpp.c (gfc_cpp_post_options): Call new function
diagnostic_initialize_input_context().
gcc/ChangeLog:
PR other/93067
* coretypes.h (typedef diagnostic_input_charset_callback): Declare.
* diagnostic.c (diagnostic_initialize_input_context): New function.
* diagnostic.h (diagnostic_initialize_input_context): Declare.
* input.c (default_charset_callback): New function.
(file_cache::initialize_input_context): New function.
(file_cache_slot::create): Added ability to convert the input
according to the input context.
(file_cache::file_cache): Initialize the new input context.
(class file_cache_slot): Added new m_alloc_offset member.
(file_cache_slot::file_cache_slot): Initialize the new member.
(file_cache_slot::~file_cache_slot): Handle potentially offset buffer.
(file_cache_slot::maybe_grow): Likewise.
(file_cache_slot::needs_read_p): Handle NULL fp, which is now possible.
(file_cache_slot::get_next_line): Likewise.
* input.h (class file_cache): Added input context member.
libcpp/ChangeLog:
PR other/93067
* charset.c (init_iconv_desc): Adapt to permit PFILE argument to
be NULL.
(_cpp_convert_input): Likewise. Also move UTF-8 BOM logic to...
(cpp_check_utf8_bom): ...here. New function.
(cpp_input_conversion_is_trivial): New function.
* files.c (read_file_guts): Allow PFILE argument to be NULL. Add
INPUT_CHARSET argument as an alternate source of this information.
(read_file): Pass the new argument to read_file_guts.
(cpp_get_converted_source): New function.
* include/cpplib.h (struct cpp_converted_source): Declare.
(cpp_get_converted_source): Declare.
(cpp_input_conversion_is_trivial): Declare.
(cpp_check_utf8_bom): Declare.
gcc/testsuite/ChangeLog:
PR other/93067
* gcc.dg/diagnostic-input-charset-1.c: New test.
* gcc.dg/diagnostic-input-utf8-bom.c: New test.
The following patch implements C++20 # __VA_OPT__ (...) support.
Testcases cover what I came up with myself and what LLVM has for #__VA_OPT__
in its testsuite and the string literals are identical between the two
compilers on the va-opt-5.c testcase.
2021-08-17 Jakub Jelinek <jakub@redhat.com>
libcpp/
* macro.c (vaopt_state): Add m_stringify member.
(vaopt_state::vaopt_state): Initialize it.
(vaopt_state::update): Overwrite it.
(vaopt_state::stringify): New method.
(stringify_arg): Replace arg argument with first, count arguments
and add va_opt argument. Use first instead of arg->first and
count instead of arg->count, for va_opt add paste_tokens handling.
(paste_tokens): Fix up len calculation. Don't spell rhs twice,
instead use %.*s to supply lhs and rhs spelling lengths. Don't call
_cpp_backup_tokens here.
(paste_all_tokens): Call it here instead.
(replace_args): Adjust stringify_arg caller. For vaopt_state::END
if stringify is true handle __VA_OPT__ stringification.
(create_iso_definition): Handle # __VA_OPT__ similarly to # macro_arg.
gcc/testsuite/
* c-c++-common/cpp/va-opt-5.c: New test.
* c-c++-common/cpp/va-opt-6.c: New test.
The following testcase ICEs in cpp_sys_macro_p, because cpp_sys_macro_p
is called for a builtin macro which doesn't use node->value.macro union
member but a different one and so dereferencing it ICEs.
As the testcase is distilled from contemporary glibc headers, it means
basically -Wtraditional now ICEs on almost everything.
The fix can be either the patch below, return true for builtin macros,
or we could instead return false for builtin macros, or the fix could
be also (untested):
--- libcpp/expr.c 2021-05-07 10:34:46.345122608 +0200
+++ libcpp/expr.c 2021-08-12 09:54:01.837556365 +0200
@@ -783,13 +783,13 @@ cpp_classify_number (cpp_reader *pfile,
/* Traditional C only accepted the 'L' suffix.
Suppress warning about 'LL' with -Wno-long-long. */
- if (CPP_WTRADITIONAL (pfile) && ! cpp_sys_macro_p (pfile))
+ if (CPP_WTRADITIONAL (pfile))
{
int u_or_i = (result & (CPP_N_UNSIGNED|CPP_N_IMAGINARY));
int large = (result & CPP_N_WIDTH) == CPP_N_LARGE
&& CPP_OPTION (pfile, cpp_warn_long_long);
- if (u_or_i || large)
+ if ((u_or_i || large) && ! cpp_sys_macro_p (pfile))
cpp_warning_with_line (pfile, large ? CPP_W_LONG_LONG : CPP_W_TRADITIONAL,
virtual_location, 0,
"traditional C rejects the \"%.*s\" suffix",
The builtin macros at least currently don't add any suffixes
or numbers -Wtraditional would like to warn about. For floating
point suffixes, -Wtraditional calls cpp_sys_macro_p only right
away before emitting the warning, but in the above case the ICE
is because cpp_sys_macro_p is called even if the number doesn't
have any suffixes (that is I think always for builtin macros
right now).
2021-08-12 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/101638
* macro.c (cpp_sys_macro_p): Return true instead of
crashing on builtin macros.
* gcc.dg/cpp/pr101638.c: New test.
The following patch (incremental to the makeucnid.c fix) regenerates
ucnid.h with https://www.unicode.org/Public/13.0.0/ucd/ files.
2021-08-05 Jakub Jelinek <jakub@redhat.com>
PR c++/100977
* ucnid.h: Regenerated using Unicode 13.0.0 files.
I've noticed in ucnid.h two adjacent lines that had all flags and combine
values identical and as such were supposed to be merged.
This is due to a bug in makeucnid.c, which records last_flag,
last_combine and really_safe of what has just been printed, but
because of a typo mishandles it for last_combine, always compares against
the combining_value[0] which is 0.
This has two effects on the table, one is that often the table is
unnecessarily large, as for non-zero .combine every character has its own
record instead of adjacent characters with the same flags and combine
being merged. This means larger tables.
The other is that sometimes the last char that has combine set doesn't
actually have it in the tables, because the code is printing entries only
upon seeing the next character and if that character does have
combining_value of 0 and flags are otherwise the same as previously printed,
it will not print anything.
The following patch fixes that, for clarity what exactly it affects
I've regenerated with the same Unicode files as last time it has
been regenerated.
2021-08-05 Jakub Jelinek <jakub@redhat.com>
PR c++/100977
* makeucnid.c (write_table): Fix computation of last_combine.
* ucnid.h: Regenerated using Unicode 6.3.0 files.
The patch for 96391 changed linemap_compare_locations to give up on
comparing locations from macro expansions if we don't have column
information. But in this testcase, the BOILERPLATE macro is multiple lines
long, so we do want to compare locations within the macro. So this patch
moves the LINE_MAP_MAX_LOCATION_WITH_COLS check inside the block, to use it
for failing gracefully.
PR c++/100796
PR preprocessor/96391
libcpp/ChangeLog:
* line-map.c (linemap_compare_locations): Only use comparison with
LINE_MAP_MAX_LOCATION_WITH_COLS to avoid abort.
gcc/testsuite/ChangeLog:
* g++.dg/plugin/location-overflow-test-pr100796.c: New test.
* g++.dg/plugin/plugin.exp: Run it.
The toolchain provided by ST for stm32 has had support for
__FILENAME__ for a while, but clang/llvm has recently implemented
support for __FILE_NAME__, so it seems better to use the same macro
name in GCC.
It happens that the ST patch is similar to the one proposed in PR
c/42579.
Given these input files:
::::::::::::::
mydir/myinc.h
::::::::::::::
char* mystringh_file = __FILE__;
char* mystringh_filename = __FILE_NAME__;
char* mystringh_base_file = __BASE_FILE__;
::::::::::::::
mydir/mysrc.c
::::::::::::::
char* mystring_file = __FILE__;
char* mystring_filename = __FILE_NAME__;
char* mystring_base_file = __BASE_FILE__;
we produce:
$ gcc mydir/mysrc.c -I . -E
char* mystringh_file = "./mydir/myinc.h";
char* mystringh_filename = "myinc.h";
char* mystringh_base_file = "mydir/mysrc.c";
char* mystring_file = "mydir/mysrc.c";
char* mystring_filename = "mysrc.c";
char* mystring_base_file = "mydir/mysrc.c";
2021-05-20 Christophe Lyon <christophe.lyon@linaro.org>
Torbjörn Svensson <torbjorn.svensson@st.com>
PR c/42579
libcpp/
* include/cpplib.h (cpp_builtin_type): Add BT_FILE_NAME entry.
* init.c (builtin_array): Likewise.
* macro.c (_cpp_builtin_macro_text): Add support for BT_FILE_NAME.
gcc/
* doc/cpp.texi (Common Predefined Macros): Document __FILE_NAME__.
gcc/testsuite/
* c-c++-common/spellcheck-reserved.c: Add tests for __FILE_NAME__.
* c-c++-common/cpp/file-name-1.c: New test.
As can be seen on the testcases, before the -fdirectives-only preprocessing
rewrite the preprocessor would assume // comments are terminated by the
end of file even when newline wasn't there, but now we error out.
The following patch restores the previous behavior.
2021-05-20 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/100646
* lex.c (cpp_directive_only_process): Treat end of file as termination
for !is_block comments.
* gcc.dg/cpp/pr100646-1.c: New test.
* gcc.dg/cpp/pr100646-2.c: New test.
If a header doesn't end with a new-line, with -fdirectives-only we right now
preprocess it as
int i = 1;# 2 "pr100392.c" 2
i.e. the line directive isn't on the next line, which means we fail to parse
it when compiling.
GCC 10 and earlier libcpp/directives-only.c had for this:
if (!pfile->state.skipping && cur != base)
{
/* If the file was not newline terminated, add rlimit, which is
guaranteed to point to a newline, to the end of our range. */
if (cur[-1] != '\n')
{
cur++;
CPP_INCREMENT_LINE (pfile, 0);
lines++;
}
cb->print_lines (lines, base, cur - base);
}
and we have the assertion
/* Files always end in a newline or carriage return. We rely on this for
character peeking safety. */
gcc_assert (buffer->rlimit[0] == '\n' || buffer->rlimit[0] == '\r');
So, this patch just does readd the more less same thing, so that we emit
a newline after the inline even when it wasn't there before.
2021-05-12 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/100392
* lex.c (cpp_directive_only_process): If buffer doesn't end with '\n',
add buffer->rlimit[0] character to the printed range and
CPP_INCREMENT_LINE and increment line_count.
* gcc.dg/cpp/pr100392.c: New test.
* gcc.dg/cpp/pr100392.h: New file.
C2X adds #elifdef and #elifndef preprocessor directives; these have
also been proposed for C++. Implement these directives in libcpp
accordingly.
In this implementation, #elifdef and #elifndef are treated as
non-directives for any language version other than c2x and gnu2x (if
the feature is accepted for C++, it can trivially be enabled for
relevant C++ versions). In strict conformance modes for prior
language versions, this is required, as illustrated by the
c11-elifdef-1.c test added.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
libcpp/
* include/cpplib.h (struct cpp_options): Add elifdef.
* init.c (struct lang_flags): Add elifdef.
(lang_defaults): Update to include elifdef initializers.
(cpp_set_lang): Set elifdef for pfile based on language.
* directives.c (STDC2X, ELIFDEF): New macros.
(EXTENSION): Increase value to 3.
(DIRECTIVE_TABLE): Add #elifdef and #elifndef.
(_cpp_handle_directive): Do not treat ELIFDEF directives as
directives for language versions without the #elifdef feature.
(do_elif): Handle #elifdef and #elifndef.
(do_elifdef, do_elifndef): New functions.
gcc/testsuite/
* gcc.dg/cpp/c11-elifdef-1.c, gcc.dg/cpp/c2x-elifdef-1.c,
gcc.dg/cpp/c2x-elifdef-2.c: New tests.
The libcpp function cpp_avoid_paste is used to insert whitespace in
preprocessed output where needed to avoid two consecutive
preprocessing tokens, that logically (e.g. when stringized) do not
have whitespace between them, from being incorrectly lexed as one when
the preprocessed input is reread by a compiler.
This fails to allow for digit separators, so meaning that invalid
code, that has a CPP_NUMBER (from a macro expansion) followed by a
character literal, can result in preprocessed output with a valid use
of digit separators, so that required syntax errors do not occur when
compiling with -save-temps. Fix this by handling that case in
cpp_avoid_paste (as with other cases in cpp_avoid_paste, this doesn't
try to check whether the language version in use supports digit
separators; it's always OK to have unnecessary whitespace in
preprocessed output).
Note: there are other cases, with various kinds of wide character or
string literal following a CPP_NUMBER, where spurious pasting of
preprocessing tokens can occur but the sequence of tokens remains
invalid both before and after that pasting. Maybe cpp_avoid_paste
should also handle those cases (and similar cases after a CPP_NAME),
to ensure the sequence of preprocessing tokens in preprocessed output
is exactly right, whether or not it affects whether syntax errors
occur. This patch only addresses the case with digit separators where
invalid code can fail to be diagnosed without the space inserted.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
libcpp/
* lex.c (cpp_avoid_paste): Do not allow pasting CPP_NUMBER with
CPP_CHAR.
gcc/testsuite/
* g++.dg/cpp1y/digit-sep-paste.C, gcc.dg/c2x-digit-separators-3.c:
New tests.
C2X adds digit separators, as in C++. Enable them accordingly in
libcpp and c-lex.c. Some basic tests are added that digit separators
behave as expected for C2X and are properly disabled for C11; further
test coverage is included in the existing g++.dg/cpp1y/digit-sep*.C
tests.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c-family/
* c-lex.c (interpret_float): Handle digit separators for C2X.
libcpp/
* init.c (lang_defaults): Enable digit separators for GNUC2X and
STDC2X.
gcc/testsuite/
* gcc.dg/c11-digit-separators-1.c,
gcc.dg/c2x-digit-separators-1.c, gcc.dg/c2x-digit-separators-2.c:
New tests.
Since the r0-85991-ga25a8f3be322fe0f838947b679f73d6efc2a412c
https://gcc.gnu.org/legacy-ml/gcc-patches/2008-02/msg01329.html
changes, so that we handle macros inside of pragmas that should expand
macros, during preprocessing we print those pragmas token by token,
with CPP_PRAGMA printed as
fputs ("#pragma ", print.outf);
if (space)
fprintf (print.outf, "%s %s", space, name);
else
fprintf (print.outf, "%s", name);
where name is some identifier (so e.g. print
#pragma omp parallel
or
#pragma omp for
etc.). Because it ends in an identifier, we need to handle it like
an identifier (i.e. CPP_NAME) for the decision whether a space needs
to be emitted in between that #pragma whatever or #pragma whatever whatever
and following token, otherwise the attached testcase is preprocessed as
#pragma omp forreduction(+:red)
rather than
#pragma omp for reduction(+:red)
The cpp_avoid_paste function is only called for this purpose.
2021-05-07 Jakub Jelinek <jakub@redhat.com>
PR c/100450
* lex.c (cpp_avoid_paste): Handle token1 CPP_PRAGMA like CPP_NAME.
* c-c++-common/gomp/pr100450.c: New test.
When the preprocessor lexes preprocessing numbers in lex_number, it
accepts digit separators in more cases than actually permitted in
pp-numbers by the standard syntax.
One thing this accepts is adjacent digit separators; there is some
code to reject those later, but as noted in bug 83873 it fails to
cover the case of adjacent digit separators within a floating-point
exponent. Accepting adjacent digit separators only results in a
missing diagnostic, not in valid code being rejected or being accepted
with incorrect semantics, because the correct lexing in such a case
would have '' start the following preprocessing tokens, and no valid
preprocessing token starts '' while ' isn't valid on its own as a
preprocessing token either. So this patch fixes that case by moving
the error for adjacent digit separators to lex_number (allowing a more
specific diagnostic than if '' were excluded from the pp-number
completely).
Other cases inappropriately accepted involve digit separators before
'.', 'e+', 'e-', 'p+' or 'p-' (or corresponding uppercase variants).
In those cases, as shown by the test digit-sep-pp-number.C added, this
can result in valid code being wrongly rejected as a result of too
many characters being included in the pp-number. So this case is
fixed by terminating the pp-number at the correct character according
to the standard. That test also covers the case where a digit
separator was followed by an identifier-nondigit that is not a
nondigit (e.g. a UCN); that case was already handled correctly.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
libcpp/
PR c++/83873
PR preprocessor/97604
* lex.c (lex_number): Reject adjacent digit separators here. Do
not allow digit separators before '.' or an exponent with sign.
* expr.c (cpp_classify_number): Do not check for adjacent digit
separators here.
gcc/testsuite/
PR c++/83873
PR preprocessor/97604
* g++.dg/cpp1y/digit-sep-neg-2.C,
g++.dg/cpp1y/digit-sep-pp-number.C: New tests.
* g++.dg/cpp1y/digit-sep-line-neg.C, g++.dg/cpp1y/digit-sep-neg.C:
Adjust expected messages.
As reported in bug 82359, the preprocessor does not allow C++ digit
separators in the line number in a #line directive, despite the
standard syntax for that directive using digit-sequence which allows
digit separators.
There is some confusion in that bug about whether C++ is meant to
allow digit separators there or not, but the last comment there
suggests they are meant to be allowed, and the version of digit
separators accepted for C2X at the March meeting explicitly mentions
digit separators in the #line specification to avoid any ambiguity
there.
This patch thus adds code to handle digit separators in the line
number in #line, as part of the preparation for enabling digit
separators in C2X mode. The code changed does not contain any
conditionals for whether digit separators are supported in the chosen
language version, because that was handled earlier in pp-number lexing
and if they aren't supported they won't appear in the string passed to
that function. It does however make sure not to allow adjacent digit
separators because those are only handled at a later stage of lexing
at present. (Problems with how certain source character sequences
involving digit separators that don't actually match the pp-number
syntax get lexed as a pp-number and only diagnosed later, if at all,
are bugs 83873 and 97604, to be addressed separately.)
Making the change in this location will have the effect of allowing
digit separators in the "# <line-number> <file> <flags>" form of
directive as well as #line; I don't think that's a problem.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
libcpp/
PR preprocessor/82359
* directives.c (strtolinenum): Handle digit separators.
gcc/testsuite/
PR preprocessor/82359
* g++.dg/cpp1y/digit-sep-line.C,
g++.dg/cpp1y/digit-sep-line-neg.C: New tests.
This reverts a s/column_offset/column/ change in the fix for PR99446.
2021-04-19 Richard Biener <rguenther@suse.de>
PR preprocessor/100142
libcpp/
* line-map.c (linemap_position_for_loc_and_offset): Revert
unintended s/column_offset/column/ change.
gcc/testsuite/
* gcc.dg/pr100142.c: New testcase.
* g++.dg/diagnostic/pr72803.C: Revert last change.
This ICE was because when adjusting a column offset we could advance
into a linemap for a different file. We only checked the next line
map was not for a line further advanced in any file, forgetting that
it could be for an earlier line in a different file. The testcase
needed adjusting as column 512 was unrepresentable, once that was
taken into consideration.
PR preprocessor/99446
libcpp/
* line-map.c (line-map.c): Do not advance to linemaps for
different files.
gcc/testsuite/
* g++.dg/diagnostic/pr72803.C: Adjust expected column.
The problem is that the new IS_MACRO_LOC macro:
inline bool
IS_MACRO_LOC (location_t loc)
{
return !IS_ORDINARY_LOC (loc) && !IS_ADHOC_LOC (loc);
}
is not fully correct since the position of the macro lines is not fixed:
/* Returns the lowest location [of a token resulting from macro
expansion] encoded in this line table. */
inline location_t
LINEMAPS_MACRO_LOWEST_LOCATION (const line_maps *set)
{
return LINEMAPS_MACRO_USED (set)
? MAP_START_LOCATION (LINEMAPS_LAST_MACRO_MAP (set))
: MAX_LOCATION_T + 1;
}
In Ada, LINEMAPS_MACRO_USED is false so LINEMAPS_MACRO_LOWEST_LOCATION is
MAX_LOCATION_T + 1, but IS_MACRO_LOC nevertheless returns true for anything
in the range [LINE_MAP_MAX_LOCATION; MAX_LOCATION_T], thus yielding an ICE
in linemap_macro_map_lookup for very large files.
libcpp/
* include/line-map.h (IS_MACRO_LOC): Delete.
* line-map.c (linemap_location_from_macro_expansion_p): Test
LINEMAPS_MACRO_LOWEST_LOCATION of the linemap.
gcc/cp/
* module.cc (ordinary_loc_of): Test LINEMAPS_MACRO_LOWEST_LOCATION
of the linemap.
(module_state::write_location): Likewise.
PR c/99323 describes an ICE due to a failed assertion deep inside the
fix-it printing machinery, where the fix-it hints on one line have not
been properly sorted in layout's constructor.
The underlying issue occurs when multiple fix-it hints affect a line
wider that LINE_MAP_MAX_COLUMN_NUMBER, where the location_t values for
characters after that threshold fall back to having column zero.
It's not meaningful to try to handle fix-it hints without column
information, so this patch rejects them as they are added to the
rich_location, falling back to the "no fix-it hints on this diagnostic"
case, fixing the crash.
gcc/ChangeLog:
PR c/99323
* diagnostic-show-locus.c
(selftest::test_one_liner_many_fixits_2): Fix accidental usage of
column 0.
gcc/testsuite/ChangeLog:
PR c/99323
* gcc.dg/pr99323-1.c: New test.
* gcc.dg/pr99323-2.c: New test.
libcpp/ChangeLog:
PR c/99323
* line-map.c (rich_location::maybe_add_fixit): Reject fix-it hints
at column 0.
This fixes some issues with macro maps. We were incorrectly
calculating the number of macro expansions in a location span, and I
had a workaround that partially covered that up. Further, while macro
location spans are monotonic, that is not true of ordinary location
spans. Thus we need to insert an indirection array when binary
searching the latter. (We load ordinary locations before loading
imports, but macro locations afterwards. We make sure an import
location is de-macrofied, if needed.)
PR c++/98718
gcc/cp/
* module.cc (ool): New indirection vector.
(loc_spans::maybe_propagate): Location is not optional.
(loc_spans::open): Likewise. Assert monotonically advancing.
(module_for_ordinary_loc): Use ool indirection vector.
(module_state::write_prepare_maps): Do not count empty macro
expansions. Elide empty spans.
(module_state::write_macro_maps): Skip empty expansions.
(ool_cmp): New qsort comparator.
(module_state::write): Create and destroy ool vector.
(name_pending_imports): Fix dump push/pop.
(preprocess_module): Likewise. Add more dumping.
(preprocessed_module): Likewise.
libcpp/
* include/line-map.h
* line-map.c
gcc/testsuite/
* g++.dg/modules/pr98718_a.C: New.
* g++.dg/modules/pr98718_b.C: New.
When we read preprocessed source, we deal with a couple of special
location lines at the start of the file. These provide information
about the original filename of the source and the current directory,
so we can process the source in the same manner. When updating that
code, I had a somewhat philosophical question: Should the line table
contain evidence of the filename the user provided to the compiler? I
figured to leave it there, as it did no harm. But this defect shows
an issue. It's in the line table and our (non optimizing) line table
serializer emits that filename. Which means if one re-preprocesses
the original source to a differently-named intermediate file, the
resultant CMI is different. Boo. That's a difference that doesn't
matter, except the CRC matching then fails. We should elide the
filename, so that one can preprocess to mktemp intermediate filenames
for whatever reason.
This patch takes the approach of expunging it from the line table --
so the line table will end up with exactly the same form. That seems
a better bet than trying to fix up mismatching line tables in CMI
emission.
PR c++/99072
libcpp/
* init.c (read_original_filename): Expunge all evidence of the
original filename.
gcc/testsuite/
* g++.dg/modules/pr99072.H: New.
This defect really required building header-units and include translation
of pieces of the standard library. This adds smarts to the modules
test harness to do that -- accept .X files as the source file, but
provide '-x c++-system-header $HDR' in the options. The .X file will
be considered by the driver to be a linker script and ignored (with a
warning).
Using this we can add 2 tests that end up building list_initializer
and iostream, along with a test that iostream's build
include-translates list_initializer's #include. That discovered a set
of issues with the -flang-info-include-translate=HDR handling, also
fixed and documented here.
PR c++/99023
gcc/cp/
* module.cc (canonicalize_header_name): Use
cpp_probe_header_unit.
(maybe_translate_include): Fix note_includes comparison.
(init_modules): Fix note_includes string termination.
libcpp/
* include/cpplib.h (cpp_find_header_unit): Rename to ...
(cpp_probe_header_unit): ... this.
* internal.h (_cp_find_header_unit): Declare.
* files.c (cpp_find_header_unit): Break apart to ..
(test_header_unit): ... this, and ...
(_cpp_find_header_unit): ... and, or and ...
(cpp_probe_header_unit): ... this.
* macro.c (cpp_get_token_1): Call _cpp_find_header_unit.
gcc/
* doc/invoke.texi (flang-info-include-translate): Document header
lookup behaviour.
gcc/testsuite/
* g++.dg/modules/modules.exp: Bail on cross-testing. Add support
for .X files.
* g++.dg/modules/pr99023_a.X: New.
* g++.dg/modules/pr99023_b.X: New.
We make sure files end in \n by placing one at the limit of the buffer
(just past the end of what is read). We need to do the same for
buffers generated via include-translation. Fortunately they have
space.
libcpp/
* files.c (_cpp_stack_file): Make buffers end in unread \n.
gcc/testsuite/
* g++.dg/modules/pr99050_a.H: New.
* g++.dg/modules/pr99050_b.C: New.
PR preprocessor/96391 describes an ICE in the C++ frontend on:
#define CONST const
#define VOID void
typedef CONST VOID *PCVOID;
where the typedef line occurs after enough code has been compiled
that location_t values are beyond LINE_MAP_MAX_LOCATION_WITH_COLS,
and hence no column numbers are available.
The issue occurs in linemap_compare_locations when comparing the
locations of the "const" and "void" tokens.
Upon resolving the LRK_MACRO_EXPANSION_POINT, both have the same
location_t, the line of the "typedef" (with no column), and so
the l0 == l1 clause is triggered, but they are not from the
same macro expansion, leading first_map_in_common to return NULL
and triggering the "abort" condition.
This patch fixes the issue by checking when the two macro expansion
point location_t values are equal that the value
<= LINE_MAP_MAX_LOCATION_WITH_COLS and thus has column information,
fixing the issue.
gcc/testsuite/ChangeLog:
PR preprocessor/96391
* g++.dg/plugin/location-overflow-test-pr96391.c: New test.
* g++.dg/plugin/plugin.exp (plugin_test_list): Add it,
using the location_overflow_plugin.c from gcc.dg/plugin.
libcpp/ChangeLog:
PR preprocessor/96391
* line-map.c (linemap_compare_locations): Require that
the location be <= LINE_MAP_MAX_LOCATION_WITH_COLS when
treating locations as coming from the same macro expansion.
The following patch uses make_signed_t<size_t> instead of
make_signed<size_t>::type in the diagnostics, because the former is shorter.
It is true that one can't use make_signed<size_t>::type in C++11 code (which
is why I haven't changed it in the testcase which is c++11 effective
target), but the message talks about C++23 and make_signed_t is a C++14 and
later feature, so I think it is fine.
2021-02-04 Jakub Jelinek <jakub@redhat.com>
* expr.c (cpp_classify_number): Use make_signed_t<size_t> instead of
make_signed<size_t>::type in the diagnostics.
* g++.dg/warn/Wsize_t-literals.C: Expect make_signed_t<size_t> instead
of make_signed<size_t>::type in the diagnostics.
GCC 11 ICEs on all -fdirectives-only preprocessing when the files don't end
with a newline.
The problem is in the assertion, for empty TUs buffer->cur == buffer->rlimit
and so buffer->rlimit[-1] access triggers UB in the preprocessor, for
non-empty TUs it refers to the last character in the file, which can be
anything.
The preprocessor adds a '\n' character (or '\r', in particular if the
user file ends with '\r' then it adds another '\r' rather than '\n'), but
that is added after the limit, i.e. at buffer->rlimit[0].
Now, if the routine handles occassional bumping of pos to buffer->rlimit + 1,
I think it is just the assert that needs changing, usually we read from *pos
if pos < limit and then e.g. if it is '\r', look at the following character
(which could be one of those '\n' or '\r' at buffer->rlimit[0]). There is
also the case where for '\\' before the limit we read following character
and if it is '\n', do one thing, if it is '\r' read another character.
But in that case if '\\' was the last char in the TU, the limit char will be
'\n', so we are ok.
2021-02-03 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/98882
* lex.c (cpp_directive_only_process): Don't assert that rlimit[-1]
is a newline, instead assert that rlimit[0] is either newline or
carriage return. When seeing '\\' followed by '\r', check limit
before accessing pos[1].
* gcc.dg/cpp/pr98882.c: New test.
Integer literal suffixes for signed size ('z') and unsigned size
(some permutation od 'zu') are provided as a language addition.
gcc/c-family/ChangeLog:
* c-cppbuiltin.c (c_cpp_builtins): Define __cpp_size_t_suffix.
* c-lex.c (interpret_integer): Set node type for size literal.
libcpp/ChangeLog:
* expr.c (interpret_int_suffix): Detect 'z' integer suffix.
(cpp_classify_number): Compat warning for use of 'z' suffix.
* include/cpplib.h (struct cpp_options): New flag.
(enum cpp_warning_reason): New flag.
(CPP_N_USERDEF): Comment C++0x -> C++11.
(CPP_N_SIZE_T): New flag for cpp_classify_number.
* init.c (cpp_set_lang): Initialize new flag.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/udlit-shadow-neg.C: Test for 'z' and 'zu' shadowing.
* g++.dg/cpp23/feat-cxx2b.C: New test.
* g++.dg/cpp23/size_t-literals.C: New test.
* g++.dg/warn/Wsize_t-literals.C: New test.
Derived from the changes that added C++2a support in 2017.
r8-3237-g026a79f70cf33f836ea5275eda72d4870a3041e5
No C++23 features are added here.
Use of -std=c++23 sets __cplusplus to 202100L.
$ g++ -std=c++23 -dM -E -x c++ - < /dev/null | grep cplusplus
#define __cplusplus 202100L
gcc/
* doc/cpp.texi (__cplusplus): Document value for -std=c++23
or -std=gnu++23.
* doc/invoke.texi: Document -std=c++23 and -std=gnu++23.
* dwarf2out.c (highest_c_language): Recognise C++20 and C++23.
(gen_compile_unit_die): Recognise C++23.
gcc/c-family/
* c-common.h (cxx_dialect): Add cxx23 as a dialect.
* c.opt: Add options for -std=c++23, std=c++2b, -std=gnu++23
and -std=gnu++2b
* c-opts.c (set_std_cxx23): New.
(c_common_handle_option): Set options when -std=c++23 is enabled.
(c_common_post_options): Adjust comments.
(set_std_cxx20): Likewise.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_c++2a):
Check for C++2a or C++23.
(check_effective_target_c++20_down): New.
(check_effective_target_c++23_only): New.
(check_effective_target_c++23): New.
* g++.dg/cpp23/cplusplus.C: New.
libcpp/
* include/cpplib.h (c_lang): Add CXX23 and GNUCXX23.
* init.c (lang_defaults): Add rows for CXX23 and GNUCXX23.
(cpp_init_builtins): Set __cplusplus to 202100L for C++23.
For deferred macros we also need a new field on the macro itself, so
that the module machinery can determine the macro was imported. Also
the documentation for the hashnode's deferred field was incomplete.
libcpp/
* include/cpplib.h (struct cpp_macro): Add imported_p field.
(struct cpp_hashnode): Tweak deferred field documentation.
* macro.c (_cpp_new_macro): Clear new field.
(cpp_get_deferred_macro, get_deferred_or_lazy_macro): Assert
more.
The preprocessor check for overflow (of linenum_type = unsigned int)
when reading the line number in a #line directive is incomplete; it
checks "reg < reg_prev" which doesn't cover all cases where
multiplying by 10 overflowed. Fix this by checking for overflow
before rather than after it occurs (using essentially the same logic
as used by e.g. glibc printf when reading width and precision values
from strings).
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
libcpp/
2020-11-27 Joseph Myers <joseph@codesourcery.com>
PR preprocessor/97602
* directives.c (strtolinenum): Check for overflow before it
occurs. Correct comment.
gcc/testsuite/
2020-11-27 Joseph Myers <joseph@codesourcery.com>
PR preprocessor/97602
* gcc.dg/cpp/line9.c, gcc.dg/cpp/line10.c: New tests.
Deferred macros are needed for C++ modules. Header units may export
macro definitions and undefinitions. These are resolved lazily at the
point of (potential) use. (The language specifies that, it's not just
a useful optimization.) Thus, identifier nodes grow a 'deferred'
field, which fortunately doesn't expand the structure on 64-bit
systems as there was padding there. This is non-zero on NT_MACRO
nodes, if the macro is deferred. When such an identifier is lexed, it
is resolved via a callback that I added recently. That will either
provide the macro definition, or discover it there was an overriding
undef. Either way the identifier is no longer a deferred macro.
Notice it is now possible for NT_MACRO nodes to have a NULL macro
expansion.
libcpp/
* include/cpplib.h (struct cpp_hashnode): Add deferred field.
(cpp_set_deferred_macro): Define.
(cpp_get_deferred_macro): Declare.
(cpp_macro_definition): Reformat, add overload.
(cpp_macro_definition_location): Deal with deferred macro.
(cpp_alloc_token_string, cpp_compare_macro): Declare.
* internal.h (_cpp_notify_macro_use): Return bool
(_cpp_maybe_notify_macro_use): Likewise.
* directives.c (do_undef): Check macro is not undef before
warning.
(do_ifdef, do_ifndef): Deal with deferred macro.
* expr.c (parse_defined): Likewise.
* lex.c (cpp_allocate_token_string): Break out of ...
(create_literal): ... here. Call it.
(cpp_maybe_module_directive): Deal with deferred macro.
* macro.c (cpp_get_token_1): Deal with deferred macro.
(warn_of_redefinition): Deal with deferred macro.
(compare_macros): Rename to ...
(cpp_compare_macro): ... here. Make extern.
(cpp_get_deferred_macro): New.
(_cpp_notify_macro_use): Deal with deferred macro, return bool
indicating definedness.
(cpp_macro_definition): Deal with deferred macro.
This adds the capability to locate the main file on the user or system
include paths. That's extremely useful to users building header
units. Searching has to be requiested (plain header-unit compilation
will not search). Also, to make include_next work as expected when
building a header unit, we add a mechanism to retrofit a non-searched
source file as one on the include path.
libcpp/
* include/cpplib.h (enum cpp_main_search): New.
(struct cpp_options): Add main_search field.
(cpp_main_loc): Declare.
(cpp_retrofit_as_include): Declare.
* internal.h (struct cpp_reader): Add main_loc field.
(_cpp_in_main_source_file): Not main if main is a header.
* init.c (cpp_read_main_file): Use main_search option to locate
main file. Set main_loc
* files.c (cpp_retrofit_as_include): New.
In preparing module patch 7 I realized there was a cleanup I could
make to simplify it. This is that cleanup. Also, when doing the
cleanup I noticed some macros had been turned into inline functions,
but not renamed to the preprocessors internal namespace
(_cpp_$INTERNAL rather than cpp_$USER). Thus, this renames those
functions, deletes an internal field of the file structure, and
determines whether we're in the main file by comparing to
pfile->main_file, the _cpp_file of the main file.
libcpp/
* internal.h (cpp_in_system_header): Rename to ...
(_cpp_in_system_header): ... here.
(cpp_in_primary_file): Rename to ...
(_cpp_in_main_source_file): ... here. Compare main_file equality
and check main_search value.
* lex.c (maybe_va_opt_error, _cpp_lex_direct): Adjust for rename.
* macro.c (_cpp_builtin_macro_text): Likewise.
(replace_args): Likewise.
* directives.c (do_include_next): Likewise.
(do_pragma_once, do_pragma_system_header): Likewise.
* files.c (struct _cpp_file): Delete main_file field.
(pch_open): Check pfile->main_file equality.
(make_cpp_file): Drop cpp_reader parm, don't set main_file.
(_cpp_find_file): Adjust.
(_cpp_stack_file): Check pfile->main_file equality.
(struct report_missing_guard_data): Add cpp_reader field.
(report_missing_guard): Check pfile->main_file equality.
(_cpp_report_missing_guards): Adjust.
C++20 modules introduces a new kind of preprocessor directive -- a
module directive. These are directives but without the leading '#'.
We have to detect them by sniffing the start of a logical line. When
detected we replace the initial identifiers with unspellable tokens
and pass them through to the language parser the same way deferred
pragmas are. There's a PRAGMA_EOL at the logical end of line too.
One additional complication is that we have to do header-name lexing
after the initial tokens, and that requires changes in the macro-aware
piece of the preprocessor. The above sniffer sets a counter in the
lexer state, and that triggers at the appropriate point. We then do
the same header-name lexing that occurs on a #include directive or
has_include pseudo-macro. Except that the header name ends up in the
token stream.
A couple of token emitters need to deal with the new token possibility.
gcc/c-family/
* c-lex.c (c_lex_with_flags): CPP_HEADER_NAMEs can now be seen.
libcpp/
* include/cpplib.h (struct cpp_options): Add module_directives
option.
(NODE_MODULE): New node flag.
(struct cpp_hashnode): Make rid-code a bitfield, increase bits in
flags and swap with type field.
* init.c (post_options): Create module-directive identifier nodes.
* internal.h (struct lexer_state): Add directive_file_token &
n_modules fields. Add module node enumerator.
* lex.c (cpp_maybe_module_directive): New.
(_cpp_lex_token): Call it.
(cpp_output_token): Add '"' around CPP_HEADER_NAME token.
(do_peek_ident, do_peek_module): New.
(cpp_directives_only): Detect module-directive lines.
* macro.c (cpp_get_token_1): Deal with directive_file_token
triggering.
This is slightly different to the original patch I posted. This adds
separate module target and dependency functions (rather than a single
bi-modal function).
libcpp/
* include/cpplib.h (struct cpp_options): Add modules to
dep-options.
* include/mkdeps.h (deps_add_module_target): Declare.
(deps_add_module_dep): Declare.
* mkdeps.c (class mkdeps): Add modules, module_name, cmi_name,
is_header_unit fields. Adjust cdtors.
(deps_add_module_target, deps_add_module_dep): New.
(make_write): Write module dependencies, if enabled.