Conditionally #undef some more names that are used in system headers.
libstdc++-v3/ChangeLog:
PR libstdc++/97088
* testsuite/17_intro/names.cc: Undef more names for newlib and
also for arm-none-linux-gnueabi.
* testsuite/experimental/names.cc: Disable PCH.
The dg-shouldfail testcase libgomp.c-c++-common/struct-elem-5.c does not
properly fail for non-shared address space offloading. Adjust testcase
to limit testing only for "target offload_device_nonshared_as".
libgomp/ChangeLog:
PR testsuite/101114
* testsuite/libgomp.c-c++-common/struct-elem-5.c:
Add "target offload_device_nonshared_as" condition for enabling test.
The __bit_cast function was a hack to achieve what __builtin_bit_cast
can do, therefore use __builtin_bit_cast if possible. However,
__builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since
it isn't trivially copyable (in the language sense — in principle it
is). Therefore add __proposed::simd_bit_cast to enable the use case
required in the test framework.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h (__bit_cast): Implement via
__builtin_bit_cast #if available.
(__proposed::simd_bit_cast): Add overloads for simd and
simd_mask, which use __builtin_bit_cast (or __bit_cast #if not
available), which return an object of the requested type with
the same bits as the argument.
* include/experimental/bits/simd_math.h: Use simd_bit_cast
instead of __bit_cast to allow casts to fixed_size_simd.
(copysign): Remove branch that was only required if __bit_cast
cannot be constexpr.
* testsuite/experimental/simd/tests/bits/test_values.h: Switch
from __bit_cast to __proposed::simd_bit_cast since the former
will not cast fixed_size objects anymore.
This fixes an ICE with failed backedge SLP nodes still in the graph
while doing permute optimization by explicitely handling those.
2021-06-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/101202
* tree-vect-slp.c (vect_optimize_slp): Explicitely handle
failed nodes.
* gcc.dg/torture/pr101202.c: New testcase.
gcc.dg/vect/pr96854.c shows we need to copy over reduction info
to the SLP pattern as already done for the complex patterns.
2021-06-25 Richard Biener <rguenther@suse.de>
* tree-vect-slp-patterns.c (addsub_pattern::build): Copy
STMT_VINFO_REDUC_DEF from the original representative.
This adds a -ltrans-objects option to lto-plugin that by-passes
lto-wrapper invocation and instead feeds LD the final LTRANS objects
directly from the response file given as argument to the option.
This allows LD issues involving the linker-plugin path to be
debugged in an easier way with just the IR objects (their symtab)
and the LTRANS objects as testcase.
I've tested the path re-building stage2 build/genmatch from an
LTO bootstrap and got a bit-identical executable by adding
-plugin-opt=-ltrans-objects=y to the original collect2 invocation,
seeding y with the final objects as printed by building genmatch
with -save-temps -v.
2021-06-22 Richard Biener <rguenther@suse.de>
lto-plugin/
* lto-plugin.c (ltrans_objects): New global.
(all_symbols_read_handler): If -ltrans-objects was specified,
add the output files from the specified file directly.
(process_option): Handle -ltrans-objects.
On MIPS a call to __stack_chk_fail needs an additional .reloc pseudo-op,
so "stack_chk_fail" will appear two times.
gcc/testsuite/
* g++.dg/no-stack-protector-attr-3.C (dg-final): Adjust for MIPS.
Still put general regs as first alloca order.
gcc/ChangeLog:
PR target/101185
* config/i386/i386.c (x86_order_regs_for_local_alloc):
Revert r12-1669.
gcc/testsuite/ChangeLog
PR target/101185
* gcc.target/i386/bitwise_mask_op-3.c: Add xfail to
temporarily avoid regression, eventually xfail should be
removed.
Register a relation on a conditional edge only if the LHS supports
this edge being taken.
gcc/
PR tree-optimization/101189
* gimple-range-fold.cc (fold_using_range::range_of_range_op): Pass
LHS range of condition to postfold routine.
(fold_using_range::postfold_gcond_edges): Only process the TRUE or
FALSE edge if the LHS range supports it being taken.
* gimple-range-fold.h (postfold_gcond_edges): Add range parameter.
gcc/testsuite/
* gcc.dg/tree-ssa/pr101189.c: New.
When looking for relations between equivalencies, a typo was causing
the wrong bitmap to be checked. Effect was is missed them.
Plus don't dump blocks which don't exist.
* value-relation.cc (equiv_oracle::dump): Do not dump NULL blocks.
(relation_oracle::find_relation_block): Check correct bitmap.
(relation_oracle::dump): Do not dump NULL blocks.
When propagating the on-entry cache, new block ranges are calculated
by combining all the incoming edges and comparing to the old value.
When a recomputation was performed on an edge, it didn't take into account
that the value in the block may already be better than a potential
recompuation... Thus a worse values was sometimes propagated.
Fixed by simply calling the now correct range_on_edge the cache provides.
* gimple-range-cache.cc (ranger_cache::propagate_cache): Call
range_on_edge instead of manually calculating.
During alias CTAD, we're accidentally ignoring the aggregate deduction
candidate for the underlying template because this guide is added
separately via maybe_aggr_guide (which doesn't yet handle alias
templates) instead of via deduction_guides_for (which does). This patch
makes maybe_aggr_guide handle alias templates in a manner similar to
deduction_guides_for.
PR c++/98832
gcc/cp/ChangeLog:
* pt.c (maybe_aggr_guide): Handle alias templates appropriately.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/class-deduction-alias9.C: New test.
Here we're crashing because cp_fold_function walks into the (templated)
requirements of a requires-expression outside a template, but the
folding routines aren't prepared to handle templated trees. This patch
fixes this by making cp_fold use evaluate_requires_expr to fold a
requires-expression as a whole, which also means we no longer need to
explicitly do so during gimplification. (Note that we delay folding
of such requires-expressions for sake of better diagnostics when one is
used as the condition of a failed static_assert.)
PR c++/101182
gcc/cp/ChangeLog:
* constraint.cc (evaluate_requires_expr): Adjust function comment.
* cp-gimplify.c (cp_genericize_r) <case REQUIRES_EXPR>: Move to ...
(cp_fold) <case REQUIRES_EXPR>: ... here.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-requires25.C: New test.
This function keeps src_range member of the result uninitialized, which at
least under valgrind can show up later when those uninitialized location_t's
can make it into the IL or location_t hash tables.
2021-06-24 Jakub Jelinek <jakub@redhat.com>
PR c/101176
* c-parser.c (c_parser_has_attribute_expression): Set source range for
the result.
The following testcase ICEs during error-recovery, as build_c_cast calls
note_integer_operands on error_mark_node and that wraps it into
C_MAYBE_CONST_EXPR which is unexpected and causes ICE later on.
Seems most other callers of note_integer_operands check early if something
is error_mark_node and return before calling note_integer_operands on it.
The following patch fixes it by not calling on error_mark_node, another
possibility would be to handle error_mark_node in note_integer_operands and
just return it.
2021-06-24 Jakub Jelinek <jakub@redhat.com>
PR c/101171
* c-typeck.c (build_c_cast): Don't call note_integer_operands on
error_mark_node.
* gcc.dg/pr101171.c: New test.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_x86.h (_S_trunc, _S_floor)
(_S_ceil): Set bit 8 (_MM_FROUND_NO_EXC) on AVX and SSE4.1
roundp[sd] calls.
This improves codegen of ldexp if AVX512VL is available.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_x86.h (_S_ldexp): The AVX512F
implementation doesn't require a _VecBltnBtmsk ABI tag, it
requires either a 64-Byte input (in which case AVX512F must be
available) or AVX512VL.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_math.h: Undefine internal
macros after use.
(frexp): Move #if to a more sensible position and reformat
preceding code.
(logb): Call _SimdImpl::_S_logb for fixed_size instead of
duplicating the code here.
(modf): Simplify condition.
fabs(int) returns double, this one didn't. This overload is not
specified in the Parallelism TS 2. Also remove the comment about labs
and llabs: it doesn't belong here.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_math.h (fabs): Remove
fabs(simd<integral>) overload.
Sometimes fixed_size objects will get unnecessarily copied on the stack.
The simd implementation should never pass _SimdTuple by value to avoid
requiring the optimizer to see through these copies.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_converter.h
(_SimdConverter::operator()): Pass _SimdTuple by const-ref.
* include/experimental/bits/simd_fixed_size.h
(_GLIBCXX_SIMD_FIXED_OP): Pass binary operator _SimdTuple
arguments by const-ref.
(_S_masked_unary): Pass _SimdTuple by const-ref.
This helper type became unused at some point.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_fixed_size.h
(_AbisInSimdTuple): Removed.
This also resolves a test failure on aarch64 with -ffast-math and
fixed_size<N> with large N.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h: Add missing operator~
overload for simd<floating-point> to __float_bitwise_operators.
* include/experimental/bits/simd_builtin.h
(_SimdImplBuiltin::_S_complement): Bitcast to int (and back) to
implement complement for floating-point vectors.
* include/experimental/bits/simd_fixed_size.h
(_SimdImplFixedSize::_S_copysign): New function, forwarding to
copysign implementation of _SimdTuple members.
* include/experimental/bits/simd_math.h (copysign): Call
_SimdImpl::_S_copysign for fixed_size arguments. Simplify
generic copysign implementation using the new ~ operator.
The LWG issue proposes to add a conditional noexcept-specifier to
std::unique_ptr's dereference operator. The issue is currently in
Tentatively Ready status, but even if it isn't voted into the draft, we
can do it as a conforming extensions. This commit also adds a similar
noexcept-specifier to operator[] for the unique_ptr<T[], D> partial
specialization.
Also ensure that all dereference operators for shared_ptr are noexcept,
and adds tests for the std::optional accessors modified by the issue,
which were already noexcept in our implementation.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/shared_ptr_base.h (__shared_ptr_access::operator[]):
Add noexcept.
* include/bits/unique_ptr.h (unique_ptr::operator*): Add
conditional noexcept as per LWG 2762.
* testsuite/20_util/shared_ptr/observers/array.cc: Check that
dereferencing cannot throw.
* testsuite/20_util/shared_ptr/observers/get.cc: Likewise.
* testsuite/20_util/optional/observers/lwg2762.cc: New test.
* testsuite/20_util/unique_ptr/lwg2762.cc: New test.
When the assembler supports it, the compiler automatically passes --gdwarf-5
to it, which has an interesting side effect: any assembly instruction prior
to the first .file directive defines a new line associated with .file 0 in
the .debug_line section and of course the numbering of these implicit lines
has nothing to do with that of the source code. This can be problematic in
Ada when we do not generate .file/.loc directives for compiled-generated
functions to avoid too jumpy a debugging experience.
gcc/
* dwarf2out.c (dwarf2out_assembly_start): Emit .file 0 marker here..
(dwarf2out_finish): ...instead of here.
The issues are that 1) they use readelf instead of objdump and 2) they use
ELF syntax in the assembly code.
gcc/
* configure.ac (--gdwarf-5 option): Use objdump instead of readelf.
(working --gdwarf-4/--gdwarf-5 for all sources): Likewise.
(--gdwarf-4 not refusing generated .debug_line): Adjust for Windows.
* configure: Regenerate.
This merges the vec_addsub<mode>3 patterns using a mode attribute
for the vec_merge merge operand.
2021-06-18 Richard Biener <rguenther@suse.de>
* config/i386/sse.md (vec_addsubv4df3, vec_addsubv2df3,
vec_addsubv8sf3, vec_addsubv4sf3): Merge into ...
(vec_addsub<mode>3): ... using a new addsub_cst mode attribute.
This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1
instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... }
thus subtract, add alternating on lanes, starting with subtract.
It adds a corresponding optab and direct internal function,
vec_addsub$a3 and renames the existing i386 backend patterns to
the new canonical name.
The SLP pattern matches the exact alternating lane sequence rather
than trying to be clever and anticipating incoming permutes - we
could permute the two input vectors to the needed lane alternation,
do the addsub and then permute the result vector back but that's
only profitable in case the two input or the output permute will
vanish - something Tamars refactoring of SLP pattern recog should
make possible.
2021-06-17 Richard Biener <rguenther@suse.de>
* config/i386/sse.md (avx_addsubv4df3): Rename to
vec_addsubv4df3.
(avx_addsubv8sf3): Rename to vec_addsubv8sf3.
(sse3_addsubv2df3): Rename to vec_addsubv2df3.
(sse3_addsubv4sf3): Rename to vec_addsubv4sf3.
* config/i386/i386-builtin.def: Adjust.
* internal-fn.def (VEC_ADDSUB): New internal optab fn.
* optabs.def (vec_addsub_optab): New optab.
* tree-vect-slp-patterns.c (class addsub_pattern): New.
(slp_patterns): Add addsub_pattern.
* tree-vect-slp.c (vect_optimize_slp): Disable propagation
across CFN_VEC_ADDSUB.
* tree-vectorizer.h (vect_pattern::vect_pattern): Make
m_ops optional.
* doc/md.texi (vec_addsub<mode>3): Document.
* gcc.target/i386/vect-addsubv2df.c: New testcase.
* gcc.target/i386/vect-addsubv4sf.c: Likewise.
* gcc.target/i386/vect-addsubv4df.c: Likewise.
* gcc.target/i386/vect-addsubv8sf.c: Likewise.
* gcc.target/i386/vect-addsub-2.c: Likewise.
* gcc.target/i386/vect-addsub-3.c: Likewise.
The recent addition of gcc_assert (regno < endregno); triggers during
glibc build on m68k.
The problem is that RA decisions shouldn't depend on expressions in
DEBUG_INSNs and those expressions can contain paradoxical subregs of certain
pseudos. If RA then decides to allocate the pseudo to a register
with very small hard register REGNO, we can trigger the new assert,
as (int) subreg_regno_offset may be negative on big endian and the small
REGNO + the negative offset can wrap around.
The following patch in that case records the range from the REGNO 0 to
endregno, before the addition of the assert as both regno and endregno are
unsigned it wouldn't record anything at all silently.
2021-06-24 Jakub Jelinek <jakub@redhat.com>
PR middle-end/101170
* df-scan.c (df_ref_record): For paradoxical big-endian SUBREGs
where regno + subreg_regno_offset wraps around use 0 as starting
regno.
* gcc.dg/pr101170.c: New test.
finish_bitfield_representative has an early out if the field after a
bitfield has error_mark_node type, but that early out leads to TREE_TYPE
of the DECL_BIT_FIELD_REPRESENTATIVE being NULL, which breaks assumptions
on code that uses the DECL_BIT_FIELD_REPRESENTATIVE during error-recovery.
The following patch instead sets TREE_TYPE of the representative to
error_mark_node, something the users can deal with better. At this point
the representative can be set as DECL_BIT_FIELD_REPRESENTATIVE for multiple
bitfields, so making sure that we clear the DECL_BIT_FIELD_REPRESENTATIVE
instead would be harder (but doable, e.g. with the error_mark_node TREE_TYPE
set by this patch set some flag in the caller and if the flag is there, walk
all the fields once again and clear all DECL_BIT_FIELD_REPRESENTATIVE that
have error_mark_node TREE_TYPE).
2021-06-24 Jakub Jelinek <jakub@redhat.com>
PR middle-end/101172
* stor-layout.c (finish_bitfield_representative): If nextf has
error_mark_node type, set repr type to error_mark_node too.
* gcc.dg/pr101172.c: New test.
s390 glibc does not need counters in the .data section, since it stores
edge hits in its own data structure. Therefore counters only waste
space and confuse diffing tools (e.g. kpatch), so don't generate them.
gcc/ChangeLog:
* config/s390/s390.c (s390_function_profiler): Ignore labelno
parameter.
* config/s390/s390.h (NO_PROFILE_COUNTERS): Define.
gcc/testsuite/ChangeLog:
* gcc.target/s390/mnop-mcount-m31-mzarch.c: Adapt to the new
prologue size.
* gcc.target/s390/mnop-mcount-m64.c: Likewise.
This fixes SLP permute propagation to not propagate across operations
that have different semantics on different lanes like for example
the recently added COMPLEX_ADD_ROT90.
2021-06-24 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_optimize_slp): Do not propagate
across operations that have different semantics on different
lanes.
This patch adds support for in_reduction clause on target construct, though
for now only for synchronous targets (without nowait clause).
The encountering thread in that case runs the target task and blocks until
the target region ends, so it is implemented by remapping it before entering
the target, initializing the private copy if not yet initialized for the
current thread and then using the remapped addresses for the mapping
addresses.
For nowait combined with in_reduction the patch contains a hack where the
nowait clause is ignored. To implement it correctly, I think we would need
to create a new private variable for the in_reduction and initialize it before
doing the async target and adjust the map addresses to that private variable
and then pass a function pointer to the library routine with code where the callback
would remap the address to the current threads private variable and use in_reduction
combiner to combine the private variable we've created into the thread's copy.
The library would then need to make sure that the routine is called in some thread
participating in the parallel (and not in an unshackeled thread).
2021-06-24 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree.h (OMP_CLAUSE_MAP_IN_REDUCTION): Document meaning for OpenMP.
* gimplify.c (gimplify_scan_omp_clauses): For OpenMP map clauses
with OMP_CLAUSE_MAP_IN_REDUCTION flag partially defer gimplification
of non-decl OMP_CLAUSE_DECL. For OMP_CLAUSE_IN_REDUCTION on
OMP_TARGET user outer_ctx instead of ctx for placeholders and
initializer/combiner gimplification.
* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_MAP_IN_REDUCTION
on target constructs.
(lower_rec_input_clauses): Likewise.
(lower_omp_target): Likewise.
* omp-expand.c (expand_omp_target): Temporarily ignore nowait clause
on target if in_reduction is present.
gcc/c-family/
* c-common.h (enum c_omp_region_type): Add C_ORT_TARGET and
C_ORT_OMP_TARGET.
* c-omp.c (c_omp_split_clauses): For OMP_CLAUSE_IN_REDUCTION on
combined target constructs also add map (always, tofrom:) clause.
gcc/c/
* c-parser.c (omp_split_clauses): Pass C_ORT_OMP_TARGET instead of
C_ORT_OMP for clauses on target construct.
(OMP_TARGET_CLAUSE_MASK): Add in_reduction clause.
(c_parser_omp_target): For non-combined target add
map (always, tofrom:) clauses for OMP_CLAUSE_IN_REDUCTION. Pass
C_ORT_OMP_TARGET to c_finish_omp_clauses.
* c-typeck.c (handle_omp_array_sections): Adjust ort handling
for addition of C_ORT_OMP_TARGET and simplify, mapping clauses are
never present on C_ORT_*DECLARE_SIMD.
(c_finish_omp_clauses): Likewise. Handle OMP_CLAUSE_IN_REDUCTION
on C_ORT_OMP_TARGET, set OMP_CLAUSE_MAP_IN_REDUCTION on
corresponding map clauses.
gcc/cp/
* parser.c (cp_omp_split_clauses): Pass C_ORT_OMP_TARGET instead of
C_ORT_OMP for clauses on target construct.
(OMP_TARGET_CLAUSE_MASK): Add in_reduction clause.
(cp_parser_omp_target): For non-combined target add
map (always, tofrom:) clauses for OMP_CLAUSE_IN_REDUCTION. Pass
C_ORT_OMP_TARGET to finish_omp_clauses.
* semantics.c (handle_omp_array_sections_1): Adjust ort handling
for addition of C_ORT_OMP_TARGET and simplify, mapping clauses are
never present on C_ORT_*DECLARE_SIMD.
(handle_omp_array_sections): Likewise.
(finish_omp_clauses): Likewise. Handle OMP_CLAUSE_IN_REDUCTION
on C_ORT_OMP_TARGET, set OMP_CLAUSE_MAP_IN_REDUCTION on
corresponding map clauses.
* pt.c (tsubst_expr): Pass C_ORT_OMP_TARGET instead of C_ORT_OMP for
clauses on target construct.
gcc/testsuite/
* c-c++-common/gomp/target-in-reduction-1.c: New test.
* c-c++-common/gomp/clauses-1.c: Add in_reduction clauses on
target or combined target constructs.
libgomp/
* testsuite/libgomp.c-c++-common/target-in-reduction-1.c: New test.
* testsuite/libgomp.c-c++-common/target-in-reduction-2.c: New test.
* testsuite/libgomp.c++/target-in-reduction-1.C: New test.
* testsuite/libgomp.c++/target-in-reduction-2.C: New test.
This refactors SLP permute propagation to record the outgoing permute
separately from the incoming/materialized one. Instead of separate
arrays/bitmaps I've now created a struct to represent the state.
2021-06-23 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (slpg_vertex): New struct.
(vect_slp_build_vertices): Adjust.
(vect_optimize_slp): Likewise. Maintain an outgoing permute
and a materialized one.
We were ignoring DR_STEP for VF == 1 which is OK only in case
the scalar order is preserved or both DR steps are the same.
2021-06-23 Richard Biener <rguenther@suse.de>
PR tree-optimization/101105
* tree-vect-data-refs.c (vect_prune_runtime_alias_test_list):
Only ignore steps when they are equal or scalar order is preserved.
* gcc.dg/torture/pr101105.c: New testcase.