As the following testcase shows, unwind.h on ARM can't be (starting with GCC
10) compiled with -std=c* modes, only -std=gnu* modes.
The problem is it uses asm keyword, which isn't a keyword in those modes
(system headers vs. non-system ones don't make a difference here).
glibc and other installed headers use __asm or __asm__ keywords instead that
work fine in both standard and gnu modes.
While there, as it is an installed header, I think it is also wrong to
completely ignore any identifier namespace rules.
The generic unwind.h defines just _Unwind* namespace identifiers plus
_sleb128_t/_uleb128_t (but e.g. unlike libstdc++/glibc headers doesn't
uglify operand names), the ARM unwind.h is much worse here. I've just
changed the gnu_Unwind_Find_got function at least not be in user identifier
namespace, but perhaps it would be good to go further and rename e.g.
or e.g.
typedef _Unwind_Reason_Code (*personality_routine) (_Unwind_State,
_Unwind_Control_Block *, _Unwind_Context *);
in unwind-arm-common.h.
2020-02-07 Jakub Jelinek <jakub@redhat.com>
PR target/93615
* config/arm/unwind-arm.h (gnu_Unwind_Find_got): Rename to ...
(_Unwind_gnu_Find_got): ... this. Use __asm instead of asm. Remove
trailing :s in asm. Formatting fixes.
(_Unwind_decode_typeinfo_ptr): Adjust caller.
* gcc.dg/pr93615.c: New test.
After thinking some more on this, we can do better; rather than having to
add a new prereload splitter pattern to catch all other cases where it might
be beneficial to fold first part of an UNSPEC_CAST back to the unspec
operand, this patch reverts the *.md changes I've made yesterday and instead
tweaks the patterns, so that simplify-rtx.c can optimize those on its own.
Instead of the whole SET_SRC being an UNSPEC through which simplify-rtx.c
obviously can't optimize anything, this represents those patterns through a
VEC_CONCAT (or two nested ones for the 128-bit -> 512-bit casts) with the
operand as the low part of it and UNSPEC representing just the high part of
it (the undefined, to be ignored, bits). While richi suggested using
already in GIMPLE for those using a SSA_NAME default definition (i.e.
clearly uninitialized use), I'd say that uninit pass would warn about those,
but more importantly, in RTL it would probably force zero initialization of
that or use or an uninitialized pseudo, all of which is hard to match in an
pattern, so I think an UNSPEC is better for that.
2020-02-07 Jakub Jelinek <jakub@redhat.com>
PR target/93594
* config/i386/predicates.md (avx_identity_operand): Remove.
* config/i386/sse.md (*avx_vec_concat<mode>_1): Remove.
(avx_<castmode><avxsizesuffix>_<castmode>,
avx512f_<castmode><avxsizesuffix>_256<castmode>): Change patterns to
a VEC_CONCAT of the operand and UNSPEC_CAST.
(avx512f_<castmode><avxsizesuffix>_<castmode>): Change pattern to
a VEC_CONCAT of VEC_CONCAT of the operand and UNSPEC_CAST with
UNSPEC_CAST.
The following testcase ICEs. The generated split_insns starts
with recog_data.insn = NULL and then tries to put various operands into
recog_data.operand array and checks various splitter conditions.
The problem is that some atom related tuning splitters indirectly call
extract_insn_cached on the insn they are used in. This can change
recog_data.operand, but most likely it will just keep it as is, but
sets recog_data.insn to the current instruction. If that splitter doesn't
match, we continue trying some other split conditions and modify
recog_data.operand array again. If even that doesn't find any usable
splitter, we punt, but at that point recog_data.insn says that recog_data
is valid for that particular instruction, even when recog_data.operand array
can be anything.
The safest thing would be to copy whole recog_data to a temporary object
before doing the calls that can call extract_insn_cached and restore it
afterwards, but it would be also very costly, recog_data has 1280 bytes.
So, this patch just makes sure to clear recog_data.insn if it has changed
during the extract_insn_cached call, which means if we extract_insn_cached
later, we'll extract it properly, while if we call it say from some other
context than splitter conditions, the insn is already cached, we don't reset
the cache.
2020-02-07 Jakub Jelinek <jakub@redhat.com>
PR target/93611
* config/i386/i386.c (ix86_lea_outperforms): Make sure to clear
recog_data.insn if distance_non_agu_define changed it.
* gcc.target/i386/pr93611.c: New test.
This patch implements the C++20 ranges overloads for the algorithms in
[algorithms]. Most of the algorithms were reimplemented, with each of their
implementations very closely following the existing implementation in
bits/stl_algo.h and bits/stl_algobase.h. The reason for reimplementing most of
the algorithms instead of forwarding to their STL-style overload is because
forwarding cannot be conformantly and efficiently performed for algorithms that
operate on non-random-access iterators. But algorithms that operate on random
access iterators can safely and efficiently be forwarded to the STL-style
implementation, and this patch does so for push_heap, pop_heap, make_heap,
sort_heap, sort, stable_sort, nth_element, inplace_merge and stable_partition.
What's missing from this patch is debug-iterator and container specializations
that are present for some of the STL-style algorithms that need to be ported
over to the ranges algos. I marked them missing at TODO comments. There are
also some other minor outstanding TODOs.
The code that could use the most thorough review is ranges::__copy_or_move,
ranges::__copy_or_move_backward, ranges::__equal and
ranges::__lexicographical_compare. In the tests, I tried to test the interface
of each new overload, as well as the correctness of the new implementation.
libstdc++-v3/ChangeLog:
Implement C++20 constrained algorithms
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/std/algorithm: Include <bits/ranges_algo.h>.
* include/bits/ranges_algo.h: New file.
* testsuite/25_algorithms/adjacent_find/constrained.cc: New test.
* testsuite/25_algorithms/all_of/constrained.cc: New test.
* testsuite/25_algorithms/any_of/constrained.cc: New test.
* testsuite/25_algorithms/binary_search/constrained.cc: New test.
* testsuite/25_algorithms/copy/constrained.cc: New test.
* testsuite/25_algorithms/copy_backward/constrained.cc: New test.
* testsuite/25_algorithms/copy_if/constrained.cc: New test.
* testsuite/25_algorithms/copy_n/constrained.cc: New test.
* testsuite/25_algorithms/count/constrained.cc: New test.
* testsuite/25_algorithms/count_if/constrained.cc: New test.
* testsuite/25_algorithms/equal/constrained.cc: New test.
* testsuite/25_algorithms/equal_range/constrained.cc: New test.
* testsuite/25_algorithms/fill/constrained.cc: New test.
* testsuite/25_algorithms/fill_n/constrained.cc: New test.
* testsuite/25_algorithms/find/constrained.cc: New test.
* testsuite/25_algorithms/find_end/constrained.cc: New test.
* testsuite/25_algorithms/find_first_of/constrained.cc: New test.
* testsuite/25_algorithms/find_if/constrained.cc: New test.
* testsuite/25_algorithms/find_if_not/constrained.cc: New test.
* testsuite/25_algorithms/for_each/constrained.cc: New test.
* testsuite/25_algorithms/generate/constrained.cc: New test.
* testsuite/25_algorithms/generate_n/constrained.cc: New test.
* testsuite/25_algorithms/heap/constrained.cc: New test.
* testsuite/25_algorithms/includes/constrained.cc: New test.
* testsuite/25_algorithms/inplace_merge/constrained.cc: New test.
* testsuite/25_algorithms/is_partitioned/constrained.cc: New test.
* testsuite/25_algorithms/is_permutation/constrained.cc: New test.
* testsuite/25_algorithms/is_sorted/constrained.cc: New test.
* testsuite/25_algorithms/is_sorted_until/constrained.cc: New test.
* testsuite/25_algorithms/lexicographical_compare/constrained.cc: New
test.
* testsuite/25_algorithms/lower_bound/constrained.cc: New test.
* testsuite/25_algorithms/max/constrained.cc: New test.
* testsuite/25_algorithms/max_element/constrained.cc: New test.
* testsuite/25_algorithms/merge/constrained.cc: New test.
* testsuite/25_algorithms/min/constrained.cc: New test.
* testsuite/25_algorithms/min_element/constrained.cc: New test.
* testsuite/25_algorithms/minmax/constrained.cc: New test.
* testsuite/25_algorithms/minmax_element/constrained.cc: New test.
* testsuite/25_algorithms/mismatch/constrained.cc: New test.
* testsuite/25_algorithms/move/constrained.cc: New test.
* testsuite/25_algorithms/move_backward/constrained.cc: New test.
* testsuite/25_algorithms/next_permutation/constrained.cc: New test.
* testsuite/25_algorithms/none_of/constrained.cc: New test.
* testsuite/25_algorithms/nth_element/constrained.cc: New test.
* testsuite/25_algorithms/partial_sort/constrained.cc: New test.
* testsuite/25_algorithms/partial_sort_copy/constrained.cc: New test.
* testsuite/25_algorithms/partition/constrained.cc: New test.
* testsuite/25_algorithms/partition_copy/constrained.cc: New test.
* testsuite/25_algorithms/partition_point/constrained.cc: New test.
* testsuite/25_algorithms/prev_permutation/constrained.cc: New test.
* testsuite/25_algorithms/remove/constrained.cc: New test.
* testsuite/25_algorithms/remove_copy/constrained.cc: New test.
* testsuite/25_algorithms/remove_copy_if/constrained.cc: New test.
* testsuite/25_algorithms/remove_if/constrained.cc: New test.
* testsuite/25_algorithms/replace/constrained.cc: New test.
* testsuite/25_algorithms/replace_copy/constrained.cc: New test.
* testsuite/25_algorithms/replace_copy_if/constrained.cc: New test.
* testsuite/25_algorithms/replace_if/constrained.cc: New test.
* testsuite/25_algorithms/reverse/constrained.cc: New test.
* testsuite/25_algorithms/reverse_copy/constrained.cc: New test.
* testsuite/25_algorithms/rotate/constrained.cc: New test.
* testsuite/25_algorithms/rotate_copy/constrained.cc: New test.
* testsuite/25_algorithms/search/constrained.cc: New test.
* testsuite/25_algorithms/search_n/constrained.cc: New test.
* testsuite/25_algorithms/set_difference/constrained.cc: New test.
* testsuite/25_algorithms/set_intersection/constrained.cc: New test.
* testsuite/25_algorithms/set_symmetric_difference/constrained.cc: New
test.
* testsuite/25_algorithms/set_union/constrained.cc: New test.
* testsuite/25_algorithms/shuffle/constrained.cc: New test.
* testsuite/25_algorithms/sort/constrained.cc: New test.
* testsuite/25_algorithms/stable_partition/constrained.cc: New test.
* testsuite/25_algorithms/stable_sort/constrained.cc: New test.
* testsuite/25_algorithms/swap_ranges/constrained.cc: New test.
* testsuite/25_algorithms/transform/constrained.cc: New test.
* testsuite/25_algorithms/unique/constrained.cc: New test.
* testsuite/25_algorithms/unique_copy/constrained.cc: New test.
* testsuite/25_algorithms/upper_bound/constrained.cc: New test.
Reproducing the ICE in PR analyzer/93375 required some kind of
analyzer diagnostic occurring after a call with fewer arguments
than required by the callee.
The testcase used __builtin_memcpy with a NULL argument for this.
On x86_64-pc-linux-gnu this happened to be already optimized into:
_4 = MEM <unsigned int> [(char * {ref-all})0B];
MEM <unsigned int> [(char * {ref-all})rl_1] = _4;
by the time of the analyzer pass, leading to the diagnostic in question
being:
warning: dereference of NULL ‘rl’ [CWE-690] [-Wanalyzer-null-dereference]
On other targets e.g. arm-unknown-linux-gnueabi, the builtin isn't
optimized at the time of the analyzer pass, leading to this diagnostic
instead:
warning: use of NULL ‘rl’ where non-null expected [CWE-690] [-Wanalyzer-null-argument]
<built-in>: note: argument 1 of ‘__builtin_memcpy’ must be non-null
This patch fixes the test case by using a custom function marked as
nonnull. I manually verified that it still reproduces the ICE if the
patch for the PR is reverted.
gcc/testsuite/ChangeLog:
PR analyzer/93375
* gcc.dg/analyzer/pr93375.c: Rework test case to avoid per-target
differences in how __builtin_memcpy has been optimized at the time
the analyzer runs.
2020-02-06 Michael Meissner <meissner@linux.ibm.com>
PR target/93569
* config/rs6000/rs6000.c (reg_to_non_prefixed): Before ISA 3.0
we only had X-FORM (reg+reg) addressing for vectors. Also before
ISA 3.0, we only had X-FORM addressing for scalars in the
traditional Altivec registers.
2020-02-06 <zhongyunde@huawei.com>
Vladimir Makarov <vmakarov@redhat.com>
PR rtl-optimization/93561
* lra-assigns.c (spill_for): Check that tested hard regno is not out of
hard register range.
When investigating how the analyzer handles malloc/free of Cray pointers
in gfortran I noticed that that analyzer was losing information on
pointers that were cast to an integer type, and then back to a pointer
type again.
The root cause is that region_model::maybe_cast_1 was only preserving
the region_svalue-ness of the result if both types were pointers,
instead returning an unknown_svalue for a pointer-to-int cast.
This patch updates the above code so that it attempts to use a
region_svalue if *either* type is a pointer
Doing so allows the analyzer to recognize that the same underlying
region is in use through various casts through integer types.
gcc/analyzer/ChangeLog:
* region-model.cc (region_model::maybe_cast_1): Attempt to provide
a region_svalue if either type is a pointer, rather than if both
types are pointers.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/torture/intptr_t.c: New test.
Kyrill pointed out off-list that this new pattern was missing
a type attribute.
2020-02-06 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64.md (aarch64_movk<mode>): Add a type
attribute.
We currently use an (up to) five instruction sequence to generate such
constants. After this change we just generate a 32-bit constant and do
a rotate-and-mask-insert instruction, making the sequence only up to
three instructions.
* config/rs6000/rs6000.c (rs6000_emit_set_long_const): Handle the case
where the low and the high 32 bits are equal to each other specially,
with an rldimi instruction.
gcc/testsuite/
* gcc.target/powerpc/pr93012.c: New.
This patch adds a second movk pattern that models the instruction
as a "normal" and/ior operation rather than an insertion. It fixes
the third insv_1.c failure in PR87763, which was a regression from
GCC 8.
2020-02-06 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR target/87763
* config/aarch64/aarch64-protos.h (aarch64_movk_shift): Declare.
* config/aarch64/aarch64.c (aarch64_movk_shift): New function.
* config/aarch64/aarch64.md (aarch64_movk<mode>): New pattern.
gcc/testsuite/
PR target/87763
* gcc.target/aarch64/movk_2.c: New test.
This patch matches another form of sbfiz, in which the input
has DImode and the output has SImode. It fixes a regression
in gcc.target/aarch64/lsl_asr_sbfiz.c from GCC 8.
2020-02-06 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR rtl-optimization/87763
* config/aarch64/aarch64.md (*ashiftsi_extvdi_bfiz): New pattern.
After -fno-common became the default, we can unify various
scan strings between 64bit and 32bit targets.
* gcc.target/i386/memcpy-strategy-1.c (dg-final):
Unify scan-assembler strings for all targets.
* gcc.target/i386/memcpy-strategy-2.c (dg-final): Ditto.
* gcc.target/i386/memcpy-strategy-3.c (dg-final): Ditto.
* gcc.target/i386/memcpy-vector_loop-1.c (dg-final): Ditto.
If we are going to use get_first_fn let's make sure we operate on
is_overloaded_fn, as the rest of the codebase does, and if lookup finds
any class-scope declaration, return early too.
PR c++/93597 - ICE with lambda in operator function.
* name-lookup.c (maybe_save_operator_binding): Check is_overloaded_fn.
* g++.dg/cpp0x/lambda/lambda-93597.C: New test.
This patch adds the ARMv8.6 ACLE intrinsics for bfmmla, bfmlalb and
bfmlalt as part of the BFloat16 extension.
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
The intrinsics are declared in arm_neon.h and the RTL patterns are
defined in aarch64-simd.md. Two new tests are added to check assembler
output.
2020-02-06 Delia Burduv <delia.burduv@arm.com>
gcc/
* config/aarch64/aarch64-simd-builtins.def
(bfmlaq): New built-in function.
(bfmlalb): New built-in function.
(bfmlalt): New built-in function.
(bfmlalb_lane): New built-in function.
(bfmlalt_lane): New built-in function.
* config/aarch64/aarch64-simd.md
(aarch64_bfmmlaqv4sf): New pattern.
(aarch64_bfmlal<bt>v4sf): New pattern.
(aarch64_bfmlal<bt>_lane<q>v4sf): New pattern.
* config/aarch64/arm_neon.h (vbfmmlaq_f32): New intrinsic.
(vbfmlalbq_f32): New intrinsic.
(vbfmlaltq_f32): New intrinsic.
(vbfmlalbq_lane_f32): New intrinsic.
(vbfmlaltq_lane_f32): New intrinsic.
(vbfmlalbq_laneq_f32): New intrinsic.
(vbfmlaltq_laneq_f32): New intrinsic.
* config/aarch64/iterators.md (BF_MLA): New int iterator.
(bt): New int attribute.
Implement standard approach by emitting "#" for insns that have to be split.
* config/i386/i386.md (*pushtf): Emit "#" instead of
calling gcc_unreachable in insn output.
(*pushxf): Ditto.
(*pushdf): Ditto.
(*pushsf_rex64): Ditto for alternatives other than 1.
(*pushsf): Ditto for alternatives other than 1.
PR93570 reports that the documentation shows __builtin_mtfsf to
return a double, but this is incorrect. The return signature
should be void.
2020-02-06 Bill Schmidt <wschmidt@linux.ibm.com>
PR target/93570
* doc/extend.texi (Basic PowerPC Built-in Functions): Correct
prototype for __builtin_mtfsf.
PR gcov-profile/91971
PR gcov-profile/93466
* coverage.c (coverage_init): Revert mangling of
path into filename. It can lead to huge filename length.
Creation of subfolders seem more natural.
* gcc.target/arm/multilib.exp (multilib_config): Pass flags to
…_target_compile as (additional_flags=) option and not as source
filename to make it work with remote execution.
* lib/target-supports.exp (check_runtime, check_gc_sections_available,
check_effective_target_gas, check_effective_target_gld): Likewise.
The __iter_swap class template and explicit specialization are only
declared (and used) for C++03 so _GLIBCXX20_CONSTEXPR does nothing here.
* include/bits/stl_algobase.h (__iter_swap, __iter_swap<true>): Remove
redundant _GLIBCXX20_CONSTEXPR.
This was sent and approved on gcc-patches as "[GCC][BUG][Aarch64][ARM]
(PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain".
The observed error came about because BFmode was placed between HFmode
and SFmode in the GET_MODES_WIDER chain, resulting in convert_mode_scalar
attempting to gen a libfunc for a HFmode -> BFmode conversion.
This patch registers NULL for all libfuncs in BFmode, which stops the
middle-end from attempting to generate them.
gcc/ChangeLog:
2020-02-06 Stam Markianos-Wright <stam.markianos-wright@arm.com>
PR target/93300
* config/arm/arm.c (arm_block_arith_comp_libfuncs_for_mode): New.
(arm_init_libfuncs): Add BFmode support to block spurious BF libfuncs.
Use arm_block_arith_comp_libfuncs_for_mode for HFmode.
The following testcase shows that for _mm256_set*_m128i and similar
intrinsics, we sometimes generate bad code. All 4 routines are expressing
the same thing, a 128-bit vector zero padded to 256-bit vector, but only the
3rd one actually emits the desired vmovdqa %xmm0, %xmm0 insn, the
others vpxor %xmm1, %xmm1, %xmm1; vinserti128 $0x1, %xmm1, %ymm0, %ymm0
The problem is that the cast builtins use UNSPEC_CAST which is after reload
simplified using a splitter, but during combine it prevents optimizations.
We do have avx_vec_concat* patterns that generate efficient code, both for
this low part + zero concatenation special case and for other cases too, so
the following define_insn_and_split just recognizes avx_vec_concat made of a
low half of a cast and some other reg.
2020-02-06 Jakub Jelinek <jakub@redhat.com>
PR target/93594
* config/i386/predicates.md (avx_identity_operand): New predicate.
* config/i386/sse.md (*avx_vec_concat<mode>_1): New
define_insn_and_split.
* gcc.target/i386/avx2-pr93594.c: New test.
As the following testcase shows, we need to consider even target to be a construct
that forces not to use copy in/out for shared on parallel inside of the target.
E.g. for parallel nested inside another parallel or host teams, we already avoid
copy in/out and we need to treat target the same.
2020-02-06 Jakub Jelinek <jakub@redhat.com>
PR libgomp/93515
* omp-low.c (use_pointer_for_field): For nested constructs, also
look for map clauses on target construct.
(scan_omp_1_stmt) <case GIMPLE_OMP_TARGET>: Bump temporarily
taskreg_nesting_level.
* testsuite/libgomp.c-c++-common/pr93515.c: New test.
If we call omp_add_variable, following omp_notice_variable will already find it
on that construct and not go through outer constructs, the following patch fixes that.
Note, this still doesn't follow OpenMP 5.0 semantics on target combined with other
constructs with reduction/lastprivate/linear clauses, will handle that for GCC11.
2020-02-06 Jakub Jelinek <jakub@redhat.com>
PR libgomp/93515
* gimplify.c (gimplify_scan_omp_clauses) <do_notice>: If adding
shared clause, call omp_notice_variable on outer context if any.
The ARM Exception Handling ABI requires personality functions in
phase1 to initialize barrier_cache before returning
_URC_HANDLER_FOUND, and we don't.
Although our own ARM personality function does not use barrier_cache
at all, other languages' ARM personality functions, during phase2, are
allowed and expected to test barrier_cache.sp to check whether the
handler frame was reached, which implies that personality functions is
in charge of the frame, and the remaining fields of barrier_cache hold
whatever values it put there in phase1.
Since we did not set barrier_cache.sp, an earlier exception, already
handled by a non-Ada handler and then released, may have its storage
reused for a new exception, that phase1 matches to an Ada frame, but
if that leaves barrier_cache.sp alone, the phase2 personality function
that handled the earlier exception, upon reaching the frame that
handled the earlier exception, may believe the information in
barrier_cache applies to the current exception. The C++ personality
function, for example, would take the information in the barrier_cache
and end up activating the handler that handled the earlier exception:
try {
throw 1;
} catch (int i) {
std::cout << "caught " << i << " by c++" << std::endl;
}
raise_ada_exception (); // might loop back to the handler above
for gcc/ada/ChangeLog
* raise-gcc.c (personality_body) [__ARM_EABI_UNWINDER__]:
Initialize barrier_cache.sp when ending phase1.
We should be able to assume that a template instantiation or other COMDAT
has non-zero address even if MAKE_DECL_ONE_ONLY for the target sets
DECL_WEAK and we haven't yet decided to emit a definition in this
translation unit.
PR c++/92003
* symtab.c (symtab_node::nonzero_address): A DECL_COMDAT decl has
non-zero address even if weak and not yet defined.
In unevaluated context, we only substitute a single PARM_DECL, not the
entire chain, but the handling of an empty pack expansion was missing that
check.
PR c++/93140
* pt.c (tsubst_decl) [PARM_DECL]: Check cp_unevaluated_operand in
handling of TREE_CHAIN for empty pack.
Now that we have post epilogue_completed split point for all
optimization levels, we can simplify post epilogue_completed splitters
considerably. If corresponding define_peephole2 pattern fails to
allocate a temporary register (or if peephole2 pass isn't run at all),
we can now always split invalid RTX after epilogue_completed is set.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
* config/i386/i386.md (*pushdi2_rex64 peephole2): Remove.
(*pushdi2_rex64 peephole2): Unconditionally split after
epilogue_completed.
(*ashl<mode>3_doubleword): Ditto.
(*<shift_insn><mode>3_doubleword): Ditto.
In C++ we weren't calling mark_exp_read on the __builtin_convertvector first
argument. I guess it could misbehave even with lambda implicit captures.
Fixed by calling decay_conversion on the argument, we use the argument as
rvalue so we want the standard lvalue to rvalue conversions, but as the
argument must be a vector type, e.g. integral promotions aren't really
needed.
2020-02-05 Jakub Jelinek <jakub@redhat.com>
PR c++/93557
* semantics.c (cp_build_vec_convert): Call decay_conversion on arg
prior to passing it to c_build_vec_convert.
* c-c++-common/Wunused-var-17.c: New test.
Since reshape_init_array_1 can now reuse a single constructor for
an array of non-aggregate type, we might run into a scenario where
we reuse a constructor with TREE_SIDE_EFFECTS. This broke this test
because we have something like { { expr } } and we try to reshape it,
so we recurse on the inner CONSTRUCTOR, reuse an existing CONSTRUCTOR
with TREE_SIDE_EFFECTS, and then ICE on the discrepancy because the
outermost CONSTRUCTOR doesn't have TREE_SIDE_EFFECTS. In this case
EXPR was a call to an operator function so TREE_SIDE_EFFECTS should
be set. Naturally one would want to fix this by calling
recompute_constructor_flags in an appropriate place so that the flags
on the CONSTRUCTORs match. The appropriate place would be at the end
of reshape_init, but this breaks initlist109.C: there we are dealing
with { { TARGET_EXPR <{}> } } where the outermost { } is TREE_CONSTANT
but the inner { } is not, so recompute_constructor_flags would clear
the constant flag in the outermost { }. Seems resonable but it upsets
check_initializer which then complains about "non-constant in-class
initialization invalid for static member". TARGET_EXPRs are always
created with TREE_SIDE_EFFECTS on, but that is mutually exclusive
with TREE_CONSTANT. So we're in a bind.
Fixed by not reusing a CONSTRUCTOR that has TREE_SIDE_EFFECTS; in the
grand scheme of things it isn't measurable: it only affects ~3 tests
in the testsuite.
PR c++/93559 - ICE with CONSTRUCTOR flags verification.
* decl.c (reshape_init_array_1): Don't reuse a CONSTRUCTOR with
TREE_SIDE_EFFECTS.
* g++.dg/cpp0x/initlist119.C: New test.
* g++.dg/cpp0x/initlist120.C: New test.
In the testcase, since there's no declaration of T, ref_view(T) declares a
non-static data member T of type ref_view, the same type as its enclosing
class. Then when we try to do C++20 aggregate class template argument
deduction we recursively try to adjust the braced-init-list to match the
template class definition until we run out of stack.
Fixed by rejecting the template data member.
PR c++/92593
* decl.c (grokdeclarator): Reject field of current class type even
in a template.
* testsuite/lib/libgomp.exp
(check_effective_target_offload_target_nvptx): Pass flags as 'options'
and not as 'source' argument to libgomp_target_compile.
The G++ bug has been fixed for a couple of months so we can remove these
workarounds that define alias templates in terms of constrained class
templates. We can just apply constraints directly to alias templates as
specified in the C++20 working draft.
* include/bits/iterator_concepts.h (iter_reference_t)
(iter_rvalue_reference_t, iter_common_reference_t, indirect_result_t):
Remove workarounds for PR c++/67704.
* testsuite/24_iterators/aliases.cc: New test.
The analyzer recognizes __analyzer_dump_exploded_nodes as a "magic"
function for use in DejaGnu tests: at the end of the pass, it issues
a warning at each such call, dumping the count of exploded nodes seen at
the call, which can be checked in test cases via dg-warning directives,
along with the IDs of the enodes (which is helpful when debugging).
My intent was to give a way of testing the results of the state-merging
code.
The state-merging code can generate duplicate exploded nodes at a point
when state merging occurs, taking a pair of enodes from the worklist
that share a program_point and sufficiently similar state. For these
cases it generates a merged state, and adds edges from those enodes to
the merged-state enode (potentially a new or a pre-existing enode); the
input enodes don't have process_node called on them.
This means that at a CFG join point there can be an unpredictable number
of enodes that we don't care about, where the precise number depends on
the details of the state-merger code, immediately followed by a more
predictable number that we do care about.
I've been papering over this in the analyzer DejaGnu tests somewhat
by adding pairs of __analyzer_dump_exploded_nodes calls at CFG join
points, where the output at the first call is somewhat arbitrary, and
the second has the number we care about; the first number tends to
change "at random" as I tweak the state merging code, in ways that
aren't interesting, but require the tests to be updated.
See e.g. gcc.dg/analyzer/paths-6.c which had:
__analyzer_dump_exploded_nodes (0); /* { dg-warning "2 exploded nodes" } */
// FIXME: the above can vary between 2 and 3 exploded nodes
__analyzer_dump_exploded_nodes (0); /* { dg-warning "1 exploded node" } */
This patch remedies this situation by tracking which enodes are
processed, and which are merely "merger" enodes. It updates the
output for __analyzer_dump_exploded_nodes so that count of enodes
only includes the *processed* enodes, and that the IDs are split
into "processed" and "merger" enodes.
The patch simplifies the testsuite by eliminating the redundant calls
described above; the example above becomes:
__analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed enode" } */
where the output in question is now:
warning: 1 processed enode: [EN: 94] merger(s): [EN: 93]
The patch also adds various checks on the status of enodes, to ensure
e.g. that each enode is processed at most once.
gcc/analyzer/ChangeLog:
* engine.cc (exploded_node::dump_dot): Show merger enodes.
(worklist::add_node): Assert that the node's m_status is
STATUS_WORKLIST.
(exploded_graph::process_worklist): Likewise for nodes from the
worklist. Set status of merged nodes to STATUS_MERGER.
(exploded_graph::process_node): Set status of node to
STATUS_PROCESSED.
(exploded_graph::dump_exploded_nodes): Rework handling of
"__analyzer_dump_exploded_nodes", splitting enodes by status into
"processed" and "merger", showing the count of just the processed
enodes at the call, rather than the count of all enodes.
* exploded-graph.h (exploded_node::status): New enum.
(exploded_node::exploded_node): Initialize m_status to
STATUS_WORKLIST.
(exploded_node::get_status): New getter.
(exploded_node::set_status): New setter.
(exploded_node::m_status): New field.
gcc/ChangeLog:
* doc/analyzer.texi
(Special Functions for Debugging the Analyzer): Update description
of __analyzer_dump_exploded_nodes.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/data-model-1.c: Update for changed output to
__analyzer_dump_exploded_nodes, dropping redundant call at merger.
* gcc.dg/analyzer/data-model-7.c: Likewise.
* gcc.dg/analyzer/loop-2.c: Update for changed output format.
* gcc.dg/analyzer/loop-2a.c: Likewise.
* gcc.dg/analyzer/loop-4.c: Likewise.
* gcc.dg/analyzer/loop.c: Likewise.
* gcc.dg/analyzer/malloc-paths-10.c: Likewise; drop redundant
call at merger.
* gcc.dg/analyzer/malloc-vs-local-1a.c: Likewise.
* gcc.dg/analyzer/malloc-vs-local-1b.c: Likewise.
* gcc.dg/analyzer/malloc-vs-local-2.c: Likewise.
* gcc.dg/analyzer/malloc-vs-local-3.c: Likewise.
* gcc.dg/analyzer/paths-1.c: Likewise.
* gcc.dg/analyzer/paths-1a.c: Likewise.
* gcc.dg/analyzer/paths-2.c: Likewise.
* gcc.dg/analyzer/paths-3.c: Likewise.
* gcc.dg/analyzer/paths-4.c: Update for changed output format.
* gcc.dg/analyzer/paths-5.c: Likewise.
* gcc.dg/analyzer/paths-6.c: Likewise; drop redundant calls
at merger.
* gcc.dg/analyzer/paths-7.c: Likewise.
* gcc.dg/analyzer/torture/conditionals-2.c: Update for changed
output format.
* gcc.dg/analyzer/zlib-1.c: Likewise; drop redundant calls.
* gcc.dg/analyzer/zlib-5.c: Update for changed output format.
As mentioned in the PR, the CLOBBERs in vzeroupper are added there even for
registers that aren't ever live in the function before and break the
prologue/epilogue expansion with ms ABI (normal ABIs are fine, as they
consider all [xyz]mm registers call clobbered, but the ms ABI considers
xmm0-15 call used but the bits above low 128 ones call clobbered).
The following patch fixes it by not adding the clobbers during vzeroupper
pass (before pro_and_epilogue), but adding them for -fipa-ra purposes only
during the final output. Perhaps we could add some CLOBBERs early (say for
df_regs_ever_live_p regs that aren't live in the live_regs bitmap, or
depending on the ABI either add all of them immediately, or for ms ABI add
CLOBBERs for xmm0-xmm5 if they don't have a SET) and add the rest later.
And the addition could be perhaps done at other spots, e.g. in an
epilogue_completed guarded splitter.
2020-02-05 Jakub Jelinek <jakub@redhat.com>
PR target/92190
* config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only
include sets and not clobbers in the vzeroupper pattern.
* config/i386/sse.md (*avx_vzeroupper): Require in insn condition that
the parallel has 17 (64-bit) or 9 (32-bit) elts.
(*avx_vzeroupper_1): New define_insn_and_split.
* gcc.target/i386/pr92190.c: New test.