Deferred macros are needed for C++ modules. Header units may export
macro definitions and undefinitions. These are resolved lazily at the
point of (potential) use. (The language specifies that, it's not just
a useful optimization.) Thus, identifier nodes grow a 'deferred'
field, which fortunately doesn't expand the structure on 64-bit
systems as there was padding there. This is non-zero on NT_MACRO
nodes, if the macro is deferred. When such an identifier is lexed, it
is resolved via a callback that I added recently. That will either
provide the macro definition, or discover it there was an overriding
undef. Either way the identifier is no longer a deferred macro.
Notice it is now possible for NT_MACRO nodes to have a NULL macro
expansion.
libcpp/
* include/cpplib.h (struct cpp_hashnode): Add deferred field.
(cpp_set_deferred_macro): Define.
(cpp_get_deferred_macro): Declare.
(cpp_macro_definition): Reformat, add overload.
(cpp_macro_definition_location): Deal with deferred macro.
(cpp_alloc_token_string, cpp_compare_macro): Declare.
* internal.h (_cpp_notify_macro_use): Return bool
(_cpp_maybe_notify_macro_use): Likewise.
* directives.c (do_undef): Check macro is not undef before
warning.
(do_ifdef, do_ifndef): Deal with deferred macro.
* expr.c (parse_defined): Likewise.
* lex.c (cpp_allocate_token_string): Break out of ...
(create_literal): ... here. Call it.
(cpp_maybe_module_directive): Deal with deferred macro.
* macro.c (cpp_get_token_1): Deal with deferred macro.
(warn_of_redefinition): Deal with deferred macro.
(compare_macros): Rename to ...
(cpp_compare_macro): ... here. Make extern.
(cpp_get_deferred_macro): New.
(_cpp_notify_macro_use): Deal with deferred macro, return bool
indicating definedness.
(cpp_macro_definition): Deal with deferred macro.
Various aapcs64 tests were failing at -O1 and above because
the assignments to testfunc_ptr were being deleted as dead.
That in turn happened because FUNC_VAL_CHECK hid the tail call
to myfunc using an LR asm trick:
asm volatile ("mov %0, x30" : "=r" (saved_return_address));
asm volatile ("mov x30, %0" : : "r" ((unsigned long long) myfunc));
and so the compiler couldn't see any calls that might read
testfunc_ptr.
That in itself could be fixed by adding a memory clobber to the
second asm above, forcing the compiler to keep both the testfunc_ptr
and the saved_return_address assignments. But since this is an ABI
test, it seems better to make sure that we don't do any IPA at all.
The fact that doing IPA caused a problem was kind-of helpful and
so it might be better to avoid making the test “work” in the
presence of IPA.
The patch therefore just replaced “noinline” with “noipa”.
gcc/testsuite/
* gcc.target/aarch64/aapcs64/abitest.h (FUNC_VAL_CHECK): Use
noipa rather than noinline.
* gcc.target/aarch64/aapcs64/abitest-2.h (FUNC_VAL_CHECK): Likewise.
This turns a mysterious segfault into an exception with a more useful
message. If the exception isn't caught, the user sees this instead of
just a segfault:
terminate called after throwing an instance of 'std::system_error'
what(): Enable multithreading to use std:🧵 Operation not permitted
Aborted (core dumped)
libstdc++-v3/ChangeLog:
PR libstdc++/67791
* src/c++11/thread.cc (thread::_M_start_thread(_State_ptr, void (*)())):
Check that gthreads is available before calling __gthread_create.
Most initialization of locales and facets happens before main() during
startup, when the program is likely to only have one thread. By using
the new __gnu_cxx::__is_single_threaded() function instead of checking
__gthread_active_p() we can avoid using pthread_once or atomics for the
common case.
That said, I'm not sure why we don't just use a local static variable
instead, as __cxa_guard_acquire() already optimizes for the
single-threaded case:
static const bool init = (_S_initialize_once(), true);
I'll revisit that for GCC 12.
libstdc++-v3/ChangeLog:
* src/c++98/locale.cc (locale::facet::_S_get_c_locale())
(locale:🆔:_M_id() const): Use __is_single_threaded.
* src/c++98/locale_init.cc (locale::_S_initialize()):
Likewise.
Ensure the code will continue to compile when elf.h gets these definitions.
libgomp/ChangeLog:
* plugin/plugin-gcn.c: Don't redefine relocations if elf.h has them.
(reserved): Delete unused define.
Commit 5d9ade39b8 ("IBM Z: Fix PR97326: Enable fp compares in
vec_cmp") made it possible to create rtxes that describe signaling
comparisons on z13, which are not supported by the hardware. Restrict
this by using vcond_comparison_operator predicate.
gcc/ChangeLog:
2020-11-24 Ilya Leoshkevich <iii@linux.ibm.com>
* config/s390/vector.md: Use vcond_comparison_operator
predicate.
Commit 229752afe3 ("VEC_COND_EXPR optimizations") has improved code
generation: we no longer need "vx x,x,-1", which turned out to be
superfluous. Instead, we simply swap 0 and -1 arguments of the
preceding "vsel".
gcc/testsuite/ChangeLog:
2020-11-23 Ilya Leoshkevich <iii@linux.ibm.com>
* gcc.target/s390/zvector/autovec-double-quiet-uneq.c: Expect
that "vx" is not emitted.
* gcc.target/s390/zvector/autovec-float-quiet-uneq.c: Likewise.
This patch implements the following set of changes:
1. If a component flag of -ffast-math (or -funsafe-math-optimizations)
is explicitly set (or reset) on the command line, this should override
any implicit change due to -f(no-)fast-math, no matter in which order
the flags come on the command line. This change affects all flags.
2. Any component flag modified from its default by -ffast-math should
be reset to the default by -fno-fast-math. This was previously
not done for the following flags:
-fcx-limited-range
-fexcess-precision=
3. Once -ffinite-math-only is true, the -f(no-)signaling-nans flag has
no meaning (if we have no NaNs at all, it does not matter whether
there is a difference between quiet and signaling NaNs). Therefore,
it does not make sense for -ffast-math to imply -fno-signaling-nans.
(This is also a documentation change.)
4. -ffast-math is documented to imply -fno-rounding-math, however the
latter setting is the default anyway; therefore it does not make
sense to try to modify it from its default setting.
5. The __FAST_MATH__ preprocessor macro should be defined if and only
if all the component flags of -ffast-math are set to the value that
is documented as the effect of -ffast-math. The following flags
were currently *not* so tested:
-fcx-limited-range
-fassociative-math
-freciprocal-math
-frounding-math
(Note that we should still *test* for -fno-rounding-math here even
though it is not set as to 4. -ffast-math -frounding-math should
not set the __FAST_MATH__ macro.)
This is also a documentation change.
2020-11-24 Ulrich Weigand <uweigand@de.ibm.com>
gcc/
* doc/invoke.texi (-ffast-math): Remove mention of -fno-signaling-nans.
Clarify conditions when __FAST_MATH__ preprocessor macro is defined.
* opts.c (common_handle_option): Pass OPTS_SET to set_fast_math_flags
and set_unsafe_math_optimizations_flags.
(set_fast_math_flags): Add OPTS_SET argument, and use it to avoid
setting flags already explicitly set on the command line. In the !set
case, also reset x_flag_cx_limited_range and x_flag_excess_precision.
Never reset x_flag_signaling_nans or x_flag_rounding_math.
(set_unsafe_math_optimizations_flags): Add OPTS_SET argument, and use
it to avoid setting flags already explicitly set on the command line.
(fast_math_flags_set_p): Also test x_flag_cx_limited_range,
x_flag_associative_math, x_flag_reciprocal_math, and
x_flag_rounding_math.
gcc/ada/
* exp_attr.adb (Expand_N_Attribute_Reference): Replace calls to
Sloc with a local constant Loc; remove call to
Analyze_And_Resolve and return, which is exactly what happens
anyway (and other branches in the Constrained declare block
appear to rely on analysis, resolution and returning happening
in all cases).
* sem_util.adb: Remove useless parens.
gcc/ada/
* sem_type.adb (Add_One_Interp.Is_Universal_Operation): Account
for universal_access = operator.
(Disambiguate): Take into account preference on universal_access
= operator when relevant.
(Disambiguate.Is_User_Defined_Anonymous_Access_Equality): New.
gcc/ada/
* exp_util.adb (Is_Finalizable_Transient): Take into account return
statements containing N_Expression_With_Actions. Also clean up a
condition to make it more readable.
* exp_ch6.adb: Fix typo.
gcc/ada/
* sem_ch13.adb (Validate_Literal_Aspect): Add support for named
numbers and in particular overload of the Real_Literal function.
* sem_res.adb (Resolve): Add support for named numbers in
Real_Literal and Integer_Literal resolution.
* einfo.adb, einfo.ads (Related_Expression,
Set_Related_Expression): Allow E_Function.
* uintp.ads (UI_Image_Max): Bump size of buffer to avoid loosing
precision.
* sem_eval.adb: Fix typo in comment.
* libgnat/a-nbnbin.adb, libgnat/a-nbnbin.ads (From_String):
Return a Valid_Big_Integer.
* libgnat/a-nbnbre.adb, libgnat/a-nbnbre.ads (From_String): New
variant taking two strings. Return a Valid_Big_Real.
gcc/ada/
* sem_ch12.adb (Analyze_Associations) <Explicit_Freeze_Check>: Test
that the instance is in a statement sequence instead of local scope.
(Freeze_Subprogram_Body): Use the special delayed placement with
regard to the parent instance only if its Sloc is strictly greater.
(Install_Body): Likewise.
gcc/ada/
* sem_ch13.adb (Validate_Literal_Aspect): Call to Base_Type
needed in order to correctly check result type of String_Literal
function when the first named subtype differs from the base
type (e.g.:
type T is range 1 .. 10 with String_Literal => ... ;
).
gcc/ada/
* sem_prag.adb (Analyze_Global_Item): Handle specially the
current instance of a PO.
* sem_util.ads (Is_Effectively_Volatile,
Is_Effectively_Volatile_For_Reading): Add parameter
Ignore_Protected.
* sem_util.adb (Is_Effectively_Volatile,
Is_Effectively_Volatile_For_Reading): Add parameter
Ignore_Protected to compute the query results ignoring protected
objects/types.
(Is_Effectively_Volatile_Object,
Is_Effectively_Volatile_Object_For_Reading): Adapt to new
signature.
gcc/ada/
* exp_ch6.adb (Add_Cond_Expression_Extra_Actual): Simplify
handling of function calls and remove bug in handling of
transient objects. Minor reformatting along the way.
gcc/ada/
* sem_aggr.adb (Resolve_Delta_Array_Aggregate): If the choice is
a subtype_indication then call
Resolve_Discrete_Subtype_Indication; both for choices
immediately inside array delta aggregates and inside
iterated_component_association within array delta aggregates.
gcc/ada/
* exp_spark.adb (Expand_SPARK_Array_Aggregate,
Expand_SPARK_N_Aggregate): Remove, no longer needed.
* sem_aggr.adb (Resolve_Iterated_Component_Association): Only
remove references in the analyzed expression when generating
code and the expression needs to be analyzed anew after being
rewritten into a loop.
As the following testcase shows, unlike char, int or long long sized
__builtin_*_overflow{,_p}, for short sized one in most cases the ce1 pass
doesn't optimize the jo/jno or jc/jnc jumps with setting of a pseudo to 0/1
into seto/setc. The reason is missing *setcc_hi_1* pattern. The following
patch implements it using mode iterators so that on i486 and pentium?
one can get the zero extension through and instead of movzbw.
2020-11-24 Jakub Jelinek <jakub@redhat.com>
PR target/97950
* config/i386/i386.md (*setcc_si_1_and): Macroize into...
(*setcc_<mode>_1_and): New define_insn_and_split with SWI24 iterator.
(*setcc_si_1_movzbl): Macroize into...
(*setcc_<mode>_1_movzbl): New define_insn_and_split with SWI24
iterator.
* gcc.target/i386/pr97950.c: New test.
Currently the __builtin_clear_padding expansion code emits no code for
full words that don't have any padding bits, and most of the time if
the only padding bytes are from the start of the word it attempts to merge
them with previous padding store (via {}) or if the only padding bytes are
from the end of the word, it attempts to merge it with following padding
bytes. For everything else it was using a RMW, except when it found
an aligned char/short/int covering all the padding bytes and all those
padding bytes were all ones in that store.
The following patch changes it, such that we only use RMW if the padding has
any bytes which have some padding and some non-padding bits (i.e. bitfields
are involved), often it is the same amount of instructions in the end and
avoids being thread-unsafe unless necessary (and avoids having to wait for
the reads to make it into the CPU). So, if there are no bitfields,
the function will just store some zero bytes, shorts, ints, long longs etc.
where needed.
2020-11-24 Jakub Jelinek <jakub@redhat.com>
* gimple-fold.c (clear_padding_flush): If a word contains only 0
or 0xff bytes of padding other than all set, all clear, all set
followed by all clear or all clear followed by all set, don't emit
a RMW operation on the whole word or parts of it, but instead
clear the individual bytes of padding. For paddings of one byte
size, don't use char[1] and {}, but instead just char and 0.
This testcase started failing with r8-2090 and works again starting
with r11-4755.
2020-11-24 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/97964
* gcc.dg/tree-ssa/pr97964.c: New test.
In particular, more precisely highlight what applies generally vs. the special
handling for the current 'parloops'-based OpenACC 'kernels' implementation.
gcc/
* omp-expand.c (expand_oacc_for): More explicit checking of which
OMP constructs we're expecting.
The diagnostics produced by 'dg-optimized', 'dg-missed' aren't error
diagnostics (fatal, meaning: causes compilation to fail) but rather warning
diagnostics (non-fatal, doesn't cause compilation to fail). Thus, same as
'dg-message', these should use 'saved-dg-warning' instead of 'saved-dg-error',
which then prints: "(test for *warnings*, line [...]) instead of currently:
"(test for *errors*, line [...])".
This is a small bug-fix for commit ed2d9d3720
"dumpfile.c: use prefixes other than 'note: ' for
MSG_{OPTIMIZED_LOCATIONS|MISSED_OPTIMIZATION}", which added 'dg-optimized',
'dg-missed'.
gcc/testsuite/
* lib/gcc-dg.exp (dg-optimized, dg-missed): Use 'saved-dg-warning'
instead of 'saved-dg-error'.
'process-message' would like the 'msgprefix' argument without trailing space.
This is a small bug-fix for commit ed2d9d3720
"dumpfile.c: use prefixes other than 'note: ' for
MSG_{OPTIMIZED_LOCATIONS|MISSED_OPTIMIZATION}", which added 'dg-optimized',
'dg-missed'.
gcc/testsuite/
* lib/gcc-dg.exp (dg-optimized, dg-missed): Fix 'process-message'
call.
* gcc.dg/vect/nodump-vect-opt-info-1.c: Demonstrate.
* gcc.dg/vect/nodump-vect-opt-info-2.c: Likewise.
c_parser_binary_expression was using build2 to create a temporary holder
for binary expression that c_parser_atomic and c_finish_omp_atomic can then
handle. The latter performs then all the needed checking.
Unfortunately, build2 performs some checking too, e.g. PLUS_EXPR vs.
POINTER_PLUS_EXPR or matching types of the arguments, nothing we can guarantee
at the parsing time. So we need something like C++ build_min_nt*. This
patch implements that inline.
2020-11-24 Jakub Jelinek <jakub@redhat.com>
PR c/97958
* c-parser.c (c_parser_binary_expression): For omp atomic binary
expressions, use make_node instead of build2 to avoid checking build2
performs.
* c-c++-common/gomp/pr97958.c: New test.
The PR38359 change made the -1 >> x to -1 optimization less useful by
requiring that the x must be non-negative.
Shifts by negative amount are UB, but we for historic reasons had in some
(but not all) places some hack to treat shifts by negative value as the
other direction shifts by the negated amount.
The following patch just removes that special handling, instead we punt on
optimizing those (and ideally path isolation should catch that up and turn
those into __builtin_unreachable, perhaps with __builtin_warning next to
it). Folding the shifts in some places as if they were rotates and in other
as if they were saturating just leads to inconsistencies.
For C++ constexpr diagnostics and -fpermissive, I've added code to pretend
fold-const.c has not changed, without -fpermissive it will be an error
anyway and I think it is better not to change all the diagnostics.
During x86_64-linux and i686-linux bootstrap/regtest, my statistics
gathering patch noted 185 unique -m32/-m64 x TU x function_name x shift_kind
x fold-const/tree-ssa-ccp cases. I have investigated the
64 ../../gcc/config/i386/i386.c x86_output_aligned_bss LSHIFT_EXPR wide_int_bitop
64 ../../gcc/config/i386/i386-expand.c emit_memmov LSHIFT_EXPR wide_int_bitop
64 ../../gcc/config/i386/i386-expand.c ix86_expand_carry_flag_compare LSHIFT_EXPR wide_int_bitop
64 ../../gcc/expmed.c expand_divmod LSHIFT_EXPR wide_int_bitop
64 ../../gcc/lra-lives.c process_bb_lives LSHIFT_EXPR wide_int_bitop
64 ../../gcc/rtlanal.c nonzero_bits1 LSHIFT_EXPR wide_int_bitop
64 ../../gcc/varasm.c optimize_constant_pool.isra LSHIFT_EXPR wide_int_bitop
cases and all of them are either during jump threading (dom) or during PRE.
For jump threading, the most common case is 1 << floor_log2 (whatever) where
floor_log2 is return HOST_BITS_PER_WIDE_INT - 1 - clz_hwi (x);
and clz_hwi is if (x == 0) return HOST_BITS_PER_WIDE_INT; return __builtin_clz* (x);
and so has range [-1, 63] and a comparison against == 0 which makes the
threader think it might be nice to jump thread the case leading to 1 << -1.
I think it is better to keep the 1 << -1 s in the IL for this and let path
isolation turn that into __builtin_unreachable () if the user wishes so.
2020-11-24 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/96929
* fold-const.c (wide_int_binop) <case LSHIFT_EXPR, case RSHIFT_EXPR>:
Return false on negative second argument rather than trying to handle
it as shift in the other direction.
* tree-ssa-ccp.c (bit_value_binop) <case LSHIFT_EXPR,
case RSHIFT_EXPR>: Punt on negative shift count rather than trying
to handle it as shift in the other direction.
* match.pd (-1 >> x to -1): Remove tree_expr_nonnegative_p check.
* constexpr.c (cxx_eval_binary_expression): For shifts by constant
with MSB set, emulate older wide_int_binop behavior to preserve
diagnostics and -fpermissive behavior.
* gcc.dg/tree-ssa/pr96929.c: New test.
Commit r11-3393 improved the epilogue loop handling of partial
vectors and we won't use partial vectors to vectorize a single
iteration scalar loop any more.
The affected test cases have only one single iteration in their
epilogues, so we shouldn't expect the vectorization with
partial vector there.
Tested with explicit --param=vect-partial-vector-usage=1 and
default enablement.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-perm-1.c: Adjust for partial vectors.
* gcc.dg/vect/slp-perm-5.c: Likewise.
* gcc.dg/vect/slp-perm-6.c: Likewise.
* gcc.dg/vect/slp-perm-7.c: Likewise.