According to
https://gcc.gnu.org/pipermail/gcc/2020-November/234344.html, GCC is
allowed to perform optimizations that remove floating point traps,
since they do not affect the modeled control flow. This interferes with
two signaling comparison tests, where (a <= b && a >= b) is turned into
(a <= b && a == b) by test_for_singularity, into ((a <= b) & (a == b))
by vectorizer and then into (a == b) eliminate_redundant_comparison.
Fix by making traps affect the control flow by turning them into
exceptions.
gcc/testsuite/ChangeLog:
2020-12-03 Ilya Leoshkevich <iii@linux.ibm.com>
* gcc.target/s390/zvector/autovec-double-signaling-eq.c: Build
with exceptions.
* gcc.target/s390/zvector/autovec-float-signaling-eq.c:
Likewise.
These are the same header files that exist in the Radeon Open Compute Runtime
project (as of October 2020), but they have been specially relicensed by AMD
for use in GCC.
The header files retain AMD copyright.
include/ChangeLog:
* hsa.h: Replace whole file.
* hsa_ext_amd.h: New file.
* hsa_ext_image.h: New file.
libgomp/ChangeLog:
* plugin/plugin-gcn.c: Include hsa_ext_amd.h.
(HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT): Delete redundant definition.
This avoids ICEing by making sure to propagate error early.
2020-12-09 Richard Biener <rguenther@suse.de>
PR c/98200
gcc/c/
* gimple-parser.c (c_parser_gimple_postfix_expression): Return
early on error.
gcc/testsuite/
* gcc.dg/gimplefe-error-8.c: New testcase.
Fix to 'omp scan' commit 005cff4e2e
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/reduction4.f90: Update scan-trees, add
lost testcase; move test with FE error to ...
* gfortran.dg/gomp/reduction5.f90: ... here.
With the bit_cast changes, I have added support for bitfields which don't
have scalar representatives. For bit_cast it works fine, as when mask
is non-NULL, off is asserted to be 0. But when native_encode_initializer
is called e.g. from sccvn with off > 0 (i.e. we are interested in encoding
just a few bytes out of it somewhere from the middle or at the end), the
following computations are incorrect.
pos is a byte position from the start of the constructor, repr_size is the
size in bytes of the bit-field representative and len is the length
of the buffer. If the buffer is offsetted by positive off, those numbers
are uncomparable though, we need to add off to len to make both
count bytes from the start of the constructor, and o is a utility temporary
set to off != -1 ? off : 0 (because off -1 also means start at offset 0
and just force special behavior).
2020-12-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/98199
* fold-const.c (native_encode_initializer): Fix handling bit-fields
when off > 0.
* gcc.c-torture/compile/pr98199.c: New test.
When native_encode_initializer is called with non-NULL mask (i.e. ATM
bit_cast only), it checks if the current index in the CONSTRUCTOR (if any)
is the next initializable FIELD_DECL, and if not, decrements cnt and
performs the iteration with that FIELD_DECL as field and val of zero
(so that it computes mask properly). As the testcase shows, I forgot to
set pos to the byte position of the field though (like it is done
for e.g. index referenced FIELD_DECLs in the constructor.
2020-12-09 Jakub Jelinek <jakub@redhat.com>
PR c++/98193
* fold-const.c (native_encode_initializer): Set pos to field's
byte position if iterating over a field with missing initializer.
* g++.dg/cpp2a/bit-cast7.C: New test.
If we aren't really evaluating the expression, it doesn't matter that the
return value is discarded.
gcc/cp/ChangeLog:
PR c++/98019
* cvt.c (maybe_warn_nodiscard): Check c_inhibit_evaluation_warnings.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-nodiscard1.C: Remove xfail.
Jakub noticed that in build_new_1 we needed to add tf_no_cleanup to avoid
building a cleanup for a TARGET_EXPR that we already know is going to be
used to initialize something, so the cleanup will never be run. The best
place to add it is close to where we build the INIT_EXPR; in
cp_build_modify_expr fixes the single-object new, in expand_default_init
fixes array new.
Co-authored-by: Jakub Jelinek <jakub@redhat.com>
gcc/cp/ChangeLog:
PR c++/59238
* init.c (expand_default_init): Pass tf_no_cleanup when building
a TARGET_EXPR to go on the RHS of an INIT_EXPR.
* typeck.c (cp_build_modify_expr): Likewise.
gcc/testsuite/ChangeLog:
PR c++/59238
* g++.dg/cpp0x/new4.C: New test.
-fsanitize=vptr initializes all vtable pointers to null so that it can
catch invalid calls; see cp_ubsan_maybe_initialize_vtbl_ptrs. That
means that evaluating a vtable reference can produce a null pointer
in this mode, so cxx_eval_dynamic_cast_fn should check that and give
and error.
gcc/cp/ChangeLog:
PR c++/98103
* constexpr.c (cxx_eval_dynamic_cast_fn): If the evaluating of vtable
yields a null pointer, give an error and return. Use objtype.
gcc/testsuite/ChangeLog:
PR c++/98103
* g++.dg/ubsan/vptr-18.C: New test.
With modules streamed entities have two new properties -- the module
that declares them and the module that instantiates them. Here
'instantiate' applies to more than just templates -- for instance an
implicit member fn. These may well be the same module. This adds the
calls to places that need it.
gcc/cp/
* class.c (layout_class_type): Call set_instantiating_module.
(build_self_reference): Likewise.
* decl.c (grokfndecl): Call set_originating_module.
(grokvardecl): Likewise.
(grokdeclarator): Likewise.
* pt.c (maybe_new_partial_specialization): Call
set_instantiating_module, propagate DECL_MODULE_EXPORT_P.
(lookup_template_class_1): Likewise.
(tsubst_function_decl): Likewise.
(tsubst_decl, instantiate_template_1): Likewise.
(build_template_decl): Propagate module flags.
(tsubst_template_dcl): Likewise.
(finish_concept_definition): Call set_originating_module.
* module.cc (set_instantiating_module, set_originating_module): Stubs.
I thought I had implemented P1186R3, but apparently I didn't read it closely
enough to understand the point of the paper, namely that for a defaulted
operator<=>, if a member type doesn't have a viable operator<=>, we will use
its operator< and operator== if the defaulted operator has an specific
comparison category as its return type; the compiler can't guess if it
should be strong_ordering or something else, but the user can make that
choice explicit.
The libstdc++ test change was necessary because of the change in
genericize_spaceship from op0 > op1 to op1 < op0; this should be equivalent,
but isn't because of PR88173.
gcc/cp/ChangeLog:
PR c++/96299
* cp-tree.h (build_new_op): Add overload that omits some parms.
(genericize_spaceship): Add location_t parm.
* constexpr.c (cxx_eval_binary_expression): Pass it.
* cp-gimplify.c (genericize_spaceship): Pass it.
* method.c (genericize_spaceship): Handle class-type arguments.
(build_comparison_op): Fall back to op</== when appropriate.
gcc/testsuite/ChangeLog:
PR c++/96299
* g++.dg/cpp2a/spaceship-synth-neg2.C: Move error.
* g++.dg/cpp2a/spaceship-p1186.C: New test.
libstdc++-v3/ChangeLog:
PR c++/96299
* testsuite/18_support/comparisons/algorithms/partial_order.cc:
One more line needs to use VERIFY instead of static_assert.
Several recent C++ features are specified to try overload resolution, and if
no viable candidate is found, do something else. But our error return
doesn't distinguish between that situation and finding multiple viable
candidates that end up being ambiguous. We're already trying to separately
return the single function we found even if it ends up being ill-formed for
some reason; for ambiguity let's pass back error_mark_node, to be
distinguished from NULL_TREE meaning no viable candidate.
gcc/cp/ChangeLog:
* call.c (build_new_op_1): Set *overload for ambiguity.
(build_new_method_call_1): Likewise.
When the atomic access involves a call to __sync_synchronize
it is better to call __cxa_guard_acquire unconditionally,
since it handles the atomics too, or is a non-threaded
implementation when there is no gthread support for this target.
This fixes also a bug for the ARM EABI big-endian target,
that is, previously the wrong bit was checked.
2020-12-08 Bernd Edlinger <bernd.edlinger@hotmail.de>
* decl2.c: (is_atomic_expensive_p): New helper function.
(build_atomic_load_byte): Rename to...
(build_atomic_load_type): ... and add new parameter type.
(get_guard_cond): Skip the atomic here if that is expensive.
Use the correct type for the atomic load on certain targets.
We need to expose build_cdtor_clones, it fortunately has the desired
API -- gosh, how did that happen? :) The template machinery will need
to cache path-of-instantiation information, so add two more fields to
the tinst_level struct. I also had to adjust the
match_mergeable_specialization API since adding it, so including that
change too.
gcc/cp/
* cp-tree.h (struct tinst_level): Add path & visible fields.
(build_cdtor_clones): Declare.
(match_mergeable_specialization): Use a spec_entry, add insert parm.
* class.c (build_cdtor_clones): Externalize.
* pt.c (push_tinst_level_loc): Clear new fields.
(match_mergeable_specialization): Adjust API.
Here are the couple of raw accessors I make use of in the module streaming.
gcc/
* tree.h (DECL_ALIGN_RAW): New.
(DECL_ALIGN): Use it.
(DECL_WARN_IF_NOT_ALIGN_RAW): New.
(DECL_WARN_IF_NOT_ALIGN): Use it.
(SET_DECL_WARN_IF_NOT_ALIGN): Likewise.
C++ 20 modules adds some new rules about when the global initializers
of imported modules run. They must run no later than before any
initializers in the importer that appear after the import. To provide
this, each named module emits an idempotent global initializer that
calls the global initializer functions of its imports (these of course
may call further import initializers). This is the machinery in our
global-init emission to accomplish that, other than the actual
emission of calls, which is in the module file. The naming of this
global init is a new piece of the ABI.
FWIW, the module's emitter does some optimization to avoid calling a
direct import's initializer when it can determine thatr import is also
indirect.
gcc/cp/
* decl2.c (start_objects): Refactor and adjust for named module
initializers.
(finish_objects): Likewise.
(generate_ctor_or_dtor_function): Likewise.
* module.cc (module_initializer_kind)
(module_add_import_initializers): Stubs.
The documentation says
For a named pattern, the condition may not depend on the data in
the insn being matched, but only the target-machine-type flags.
The i386 backend violates that by using flag_excess_precision and
flag_unsafe_math_optimizations in the conditions too, which is bad
when optimize attribute or pragmas are used. The problem is that the
middle-end caches the enabled conditions for the optabs for a particular
switchable target, but multiple functions can share the same
TARGET_OPTION_NODE, but have different TREE_OPTIMIZATION_NODE with different
flag_excess_precision or flag_unsafe_math_optimizations, so the enabled
conditions then match only one of those.
I think best would be to just have a single options node for both the
generic and target options, then such problems wouldn't exist, but that
would be very risky at this point and quite large change.
So, instead the following patch just shadows flag_excess_precision and
flag_unsafe_math_optimizations values for uses in the instruction conditions
in TargetVariable and during set_cfun artificially creates new
TARGET_OPTION_NODE if flag_excess_precision and/or
flag_unsafe_math_optimizations change from what is recorded in their
TARGET_OPTION_NODE. The target nodes are hashed, so worst case we can get 4
times as many target option nodes if one would for each unique target option
try all the flag_excess_precision and flag_unsafe_math_optimizations values.
2020-12-08 Jakub Jelinek <jakub@redhat.com>
PR target/94440
* config/i386/i386.opt (ix86_excess_precision,
ix86_unsafe_math_optimizations): New TargetVariables.
* config/i386/i386.h (X87_ENABLE_ARITH, X87_ENABLE_FLOAT): Use
ix86_unsafe_math_optimizations instead of
flag_unsafe_math_optimizations and ix86_excess_precision instead of
flag_excess_precision.
* config/i386/i386.c (ix86_excess_precision): Rename to ...
(ix86_get_excess_precision): ... this.
(TARGET_C_EXCESS_PRECISION): Define to ix86_get_excess_precision.
* config/i386/i386-options.c (ix86_valid_target_attribute_tree,
ix86_option_override_internal): Update ix86_unsafe_math_optimization
from flag_unsafe_math_optimizations and ix86_excess_precision
from flag_excess_precision when constructing target option nodes.
(ix86_set_current_function): If flag_unsafe_math_optimizations
or flag_excess_precision is different from the one recorded
in TARGET_OPTION_NODE, create a new target option node for the
current function and switch to that.
Adding includes to module.cc triggered the kind of build failure I
wanted to check for. In this case it was MODULE_VERSION not being
defined, and module.cc's internal #error triggering. I've relaxed the
check in Make-lang, so we proviude MODULE_VERSION when DEVPHASE is not
empty (rather than when it is 'experimental'). AFAICT devphase is
empty for release builds, and the #error will force us to decide
whether modules is sufficiently baked at that point.
gcc/cp
* Make-lang.in (MODULE_VERSION): Override when DEVPHASE not empty.
* module.cc: Comment.
This is the mangling changes for modules. These were developed in
collaboration with clang, which also implemements the same ABI (or
plans to, I do not think the global init is in clang). The global
init mangling is captured in
https://github.com/itanium-cxx-abi/cxx-abi/issues/99
gcc/cp/
* cp-tree.h (mangle_module_substitution, mangle_identifier)
(mangle_module_global_init): Declare.
* mangle.c (struct globals): Add mod field.
(mangle_module_substitution, mangle_identifier)
(mangle_module_global_init): Define.
(write_module, maybe_write_module): New.
(write_name): Call it.
(start_mangling): Clear mod field.
(finish_mangling_internal): Adjust.
* module.cc (mangle_module, mangle_module_fini)
(get_originating_module): Stubs.
As mentioned in the preprocessor patches, there's a new kind of
preprocessor directive for modules, and it interacts with the
compiler-proper, as that has to stream in header-unit macro
information (when the directive is an import that names a
header-unit). This is that machinery. It's an FSM that inspects the
token stream and does the minimal parsing to detect such imports.
This ends up being called from the C++ parser's tokenizer and from the
-E tokenizer (via a lang hook). The actual module streaming is a stub
here.
gcc/cp/
* cp-tree.h (module_token_pre, module_token_cdtor)
(module_token_lang): Declare.
* lex.c: Include langhooks.
(struct module_token_filter): New.
* cp-tree.h (module_token_pre, module_token_cdtor)
(module_token_lang): Define.
* module.cc (get_module, preprocess_module, preprocessed_module):
Nop stubs.
Two recent AVX512 tests FAIL on Solaris/x86 with /bin/as:
FAIL: gcc.target/i386/avx512vpopcntdq-pr97770-2.c (test for excess errors)
Excess errors:
Assembler: avx512vpopcntdq-pr97770-2.c
"/var/tmp//ccM4Gt1a.s", line 171 : Illegal mnemonic
Near line: " vpopcntd (%eax), %zmm0"
"/var/tmp//ccM4Gt1a.s", line 171 : Syntax error
Near line: " vpopcntd (%eax), %zmm0"
FAIL: gcc.target/i386/avx512vpopcntdqvl-pr97770-1.c (test for excess errors)
similarly.
Fixed as follows.
Tested on i386-pc-solaris2.11 with as and gas and x86_64-pc-linux-gnu.
2020-12-07 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
* gcc.target/i386/avx512vpopcntdq-pr97770-2.c: Require
avx512vpopcntdq support.
* gcc.target/i386/avx512vpopcntdqvl-pr97770-1.c: Require
avx512vpopcntdq, avx512vl support.
The new gcc.target/i386/pr98100.c test FAILs on Solaris/x86:
FAIL: gcc.target/i386/pr98100.c (test for excess errors)
Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.target/i386/pr98100.c:6:1: error: the call requires 'ifunc', which is not supported by this target
Fixed as follows.
Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
2020-12-07 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
* gcc.target/i386/pr98100.c: Require ifunc support.
This makes sure to clear the vector pointer on release.
2020-12-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/98192
* tree-vect-slp.c (vect_build_slp_instance): Get scalar_stmts
by reference.
We require a vector-by-scalar shift, there's no appropriate target
selector so use SSE2 for now.
2020-12-08 Richard Biener <rguenther@suse.de>
PR testsuite/95900
* gcc.dg/vect/bb-slp-pr95866.c: Require sse2 for the
BIT_FIELD_REF match.
These tests violated strict aliasing, fixed by using a union and
type punning through that.
2020-12-08 Jakub Jelinek <jakub@redhat.com>
* gcc.target/i386/avx512dq-vandnpd-2.c (CALC): Use union
to avoid aliasing violations.
* gcc.target/i386/avx512dq-vandnps-2.c (CALC): Likewise.
* gcc.target/i386/avx512dq-vandpd-2.c (CALC): Likewise.
* gcc.target/i386/avx512dq-vandps-2.c (CALC): Likewise.
* gcc.target/i386/avx512dq-vorpd-2.c (CALC): Likewise.
* gcc.target/i386/avx512dq-vorps-2.c (CALC): Likewise.
* gcc.target/i386/avx512dq-vxorpd-2.c (CALC): Likewise.
* gcc.target/i386/avx512dq-vxorps-2.c (CALC): Likewise.
This patch fixes two bugs in the -fopenmp-simd support. One is that
in C++ #pragma omp parallel master would actually create OMP_PARALLEL
in the IL, which is a big no-no for -fopenmp-simd, we should be creating
only the constructs -fopenmp-simd handles (mainly OMP_SIMD, OMP_LOOP which
is gimplified as simd in that case, declare simd/reduction and ordered simd).
The other bug was that #pragma omp master taskloop simd combined construct
contains simd and thus should be recognized as #pragma omp simd (with only
the simd applicable clauses), but as master wasn't included in
omp_pragmas_simd, we'd ignore it completely instead.
2020-12-08 Jakub Jelinek <jakub@redhat.com>
PR c++/98187
* c-pragma.c (omp_pragmas): Remove "master".
(omp_pragmas_simd): Add "master".
* parser.c (cp_parser_omp_parallel): For parallel master with
-fopenmp-simd only, just call cp_parser_omp_master instead of
wrapping it in OMP_PARALLEL.
* c-c++-common/gomp/pr98187.c: New test.
This adds a missing check.
2020-12-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/98191
* tree-vect-slp.c (vect_slp_check_for_constructors): Do not
follow a non-SSA def chain.
* gcc.dg/torture/pr98191.c: New testcase.
This fixes sinking of loads when irreducible regions are involved
and the heuristics to find stores on the path along the sink
breaks down since that uses dominator queries.
2020-12-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/97559
* tree-ssa-sink.c (statement_sink_location): Never ignore
PHIs on sink paths in irreducible regions.
* gcc.dg/torture/pr97559-1.c: New testcase.
* gcc.dg/torture/pr97559-2.c: Likewise.
gcc/
2020-12-08 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
PR target/97872
* gimple-isel.cc (gimple_expand_vec_cond_expr): Try to fold
x CMP y ? -1 : 0 to x CMP y.
gcc/testsuite/
2020-12-08 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
PR target/97872
* gcc.target/arm/pr97872.c: New test.
This adds a missing check for the first inserted value.
2020-12-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/98180
* tree-vect-slp.c (vect_slp_check_for_constructors): Check the
first inserted value has a def.
This forces the scalarization of the testcase on PowerPC.
gcc/testsuite/ChangeLog:
PR target/96470
* gnat.dg/opt39.adb: Add dg-additional-options for PowerPC.
The very recent addition of the if_to_switch pass has partially disabled
the optimization added back in June to optimize_range_tests_to_bit_test,
as witnessed by the 3 new failures in the gnat.dg testsuite. It turns out
that both tree-ssa-reassoc.c and tree-switch-conversion.c can turn things
into bit tests so the optimization is added to bit_test_cluster::emit too.
The patch also contains a secondary optimization, whereby the full bit-test
sequence is sent to the folder before being gimplified in case there is only
one test, so that the optimal sequence (bt + jc on x86) can be emitted like
with optimize_range_tests_to_bit_test.
gcc/ChangeLog:
PR tree-optimization/96344
* tree-switch-conversion.c (bit_test_cluster::emit): Compute the
range only if an entry test is necessary. Merge the entry test in
the bit test when possible. Use PREC local variable consistently.
When there is only one test, do a single gimplification at the end.
We'll try to canonicalize the arch string for --with-arch,
and the script is written in python, however it will turns out
GCC require python to build for RISC-V port, it's not expect as
the GCC requirement.
So this patch is made this as optional, detect python and only use it
when it available, it won't break any functionality with out doing
canonicalization, just might build one more redundant multi-lib.
gcc/ChangeLog:
PR target/98152
* config.gcc (riscv*-*-*): Checking python, python3 or python2
is available, and skip doing with_arch canonicalize if no python
available.
To handle atomic loads correctly, we need to move the code that
drops qualifiers in lvalue conversion after the code that
handles atomics.
2020-12-07 Martin Uecker <muecker@gwdg.de>
gcc/c/
PR c/97981
* c-typeck.c (convert_lvalue_to_rvalue): Move the code
that drops qualifiers to the end of the function.
gcc/testsuite/
PR c/97981
* gcc.dg/pr97981.c: New test.
* gcc.dg/pr60195.c: Adapt test.
The function that calls targetm.emit_call_builtin___clear_cache
asserts that each of the begin and end operands has either ptr_mode or
Pmode.
On most targets that is the same mode, but e.g. on aarch64 -mabi=ilp32
or a few others it is different. When a target has a clear cache
non-library handler, it will use create_address_operand which will do the
conversion to the right mode automatically, but when emitting a library
call, we just say the operands are ptr_mode even when they can be Pmode
too; in that case we need to convert explicitly.
2020-12-07 Jakub Jelinek <jakub@redhat.com>
PR target/98147
* builtins.c (default_emit_call_builtin___clear_cache): Call
convert_memory_address to ptr_mode on both begin and end.
* gcc.dg/pr98147.c: New test.