Commit Graph

187179 Commits

Author SHA1 Message Date
Jakub Jelinek 02e5ffd5db libgcc: Honor LDFLAGS_FOR_TARGET when linking libgcc_s
When building gcc with some specific LDFLAGS_FOR_TARGET, e.g.
LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now
those flags propagate info linking of target shared libraries,
e.g. lib{ubsan,tsan,stdc++,quadmath,objc,lsan,itm,gphobos,gdruntime,gomp,go,gfortran,atomic,asan}.so.*
but there is one important exception, libgcc_s.so.* linking ignores it.

The following patch fixes that.

Bootstrapped/regtested on x86_64-linux with LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now
and verified that libgcc_s.so.* is BIND_NOW when it previously wasn't, and
without any LDFLAGS_FOR_TARGET on x86_64-linux and i686-linux.
There on x86_64-linux I've verified that the libgcc_s.so.1 linking command
line for -m64 is identical except for whitespace to one without the patch,
and for -m32 multilib $(LDFLAGS) actually do supply there an extra -m32
that also repeats later in the @multilib_flags@, which should be harmless.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

	* config/t-slibgcc (SHLIB_LINK): Add $(LDFLAGS).
	* config/t-slibgcc-darwin (SHLIB_LINK): Likewise.
	* config/t-slibgcc-vms (SHLIB_LINK): Likewise.
	* config/t-slibgcc-fuchsia (SHLIB_LDFLAGS): Remove $(LDFLAGS).
2021-08-05 17:32:06 +02:00
Chung-Lin Tang 0bac793ed6 openmp: Implement omp_get_device_num routine
This patch implements the omp_get_device_num library routine, specified in
OpenMP 5.0.

GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number"
variable, is defined on the device-side libgomp, has it's address returned to
host-side libgomp during device initialization, and the host libgomp then
sets its value to the designated device number.

libgomp/ChangeLog:

	* icv-device.c (omp_get_device_num): New API function, host side.
	* fortran.c (omp_get_device_num_): New interface function.
	* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
	* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
	omp_get_device_num_.
	* libgomp.texi (omp_get_device_num): Add documentation for new API
	function.
	* omp.h.in (omp_get_device_num): Add declaration.
	* omp_lib.f90.in (omp_get_device_num): Likewise.
	* omp_lib.h.in (omp_get_device_num): Likewise.
	* target.c (gomp_load_image_to_device): If additional entry for device
	number exists at end of returned entries from 'load_image_func' hook,
	copy the assigned device number over to the device variable.

	* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* plugin/plugin-gcn.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* plugin/plugin-nvptx.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_target_intelmic): New function for
	testing for intelmic offloading.
	* testsuite/libgomp.c-c++-common/target-45.c: New test.
	* testsuite/libgomp.fortran/target10.f90: New test.
2021-08-05 23:29:03 +08:00
Jonathan Wakely 8dec72aeb5 libstdc++: Add [[nodiscard]] to <compare>
This adds the [[nodiscard]] attribute to all conversion operators,
comparison operators, call operators and non-member functions in
<compare>. Nothing in this header except constructors has side effects.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* libsupc++/compare (partial_ordering, weak_ordering)
	(strong_ordering, is_eq, is_neq, is_lt, is_lteq, is_gt, is_gteq)
	(compare_three_way, strong_order, weak_order, partial_order)
	(compare_strong_order_fallback, compare_weak_order_fallback)
	(compare_partial_order_fallback, __detail::__synth3way): Add
	nodiscard attribute.
	* testsuite/18_support/comparisons/categories/zero_neg.cc: Add
	-Wno-unused-result to options.
2021-08-05 15:16:58 +01:00
Jonathan Wakely 03d47da7e1 testsuite: Fix warning introduced by nodiscard in libstdc++
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

gcc/testsuite/ChangeLog:

	* g++.old-deja/g++.other/inline7.C: Cast nodiscard call to void.
2021-08-05 15:16:58 +01:00
Jonathan Wakely 7b1de3eb9e libstdc++: Move attributes that follow requires-clauses [PR101782]
As explained in the PR, the grammar in the Concepts TS means that a [
token following a requires-clause is parsed as part of the
logical-or-expression rather than the start of an attribute. That makes
the following ill-formed when using -fconcepts-ts:

  template<typename T> requires foo<T> [[nodiscard]] int f(T);

This change moves all attributes that follow a requires-clause to the
end of the function declarator.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	PR libstdc++/101782
	* include/bits/ranges_base.h (ranges::begin, ranges::end)
	(ranges::rbegin, ranges::rend, ranges::size, ranges::ssize)
	(ranges::empty, ranges::data): Move attribute to the end of
	the declarator.
	* include/bits/stl_iterator.h (__gnu_cxx::__normal_iterator)
	(common_iterator): Likewise for non-member operator functions.
	* include/std/ranges (views::all, views::filter)
	(views::transform, views::take, views::take_while, views::drop)
	(views::drop_while, views::join, views::lazy_split)
	(views::split, views::counted, views::common, views::reverse)
	(views::elements): Likewise.
	* testsuite/std/ranges/access/101782.cc: New test.
2021-08-05 15:16:58 +01:00
H.J. Lu 72264a6397 <x86gprintrin.h>: Add pragma GCC target("general-regs-only")
1. Intrinsics in <x86gprintrin.h> only require GPR ISAs.  Add

 #if defined __MMX__ || defined __SSE__
 #pragma GCC push_options
 #pragma GCC target("general-regs-only")
 #define __DISABLE_GENERAL_REGS_ONLY__
 #endif

and

 #ifdef __DISABLE_GENERAL_REGS_ONLY__
 #undef __DISABLE_GENERAL_REGS_ONLY__
 #pragma GCC pop_options
 #endif /* __DISABLE_GENERAL_REGS_ONLY__ */

to <x86gprintrin.h> to disable non-GPR ISAs so that they can be used in
functions with __attribute__ ((target("general-regs-only"))).
2. When checking always_inline attribute, if callee only uses GPRs,
ignore MASK_80387 since enable MASK_80387 in caller has no impact on
callee inline.

gcc/

	PR target/99744
	* config/i386/i386.c (ix86_can_inline_p): Ignore MASK_80387 if
	callee only uses GPRs.
	* config/i386/ia32intrin.h: Revert commit 5463cee277.
	* config/i386/serializeintrin.h: Revert commit 71958f740f.
	* config/i386/x86gprintrin.h: Add
	#pragma GCC target("general-regs-only") and #pragma GCC pop_options
	to disable non-GPR ISAs.

gcc/testsuite/

	PR target/99744
	* gcc.target/i386/pr99744-3.c: New test.
	* gcc.target/i386/pr99744-4.c: Likewise.
	* gcc.target/i386/pr99744-5.c: Likewise.
	* gcc.target/i386/pr99744-6.c: Likewise.
	* gcc.target/i386/pr99744-7.c: Likewise.
	* gcc.target/i386/pr99744-8.c: Likewise.
2021-08-05 06:23:03 -07:00
Richard Sandiford c04bb6d93f doc: Document cond_* shift optabs in md.texi
gcc/
	PR middle-end/101787
	* doc/md.texi (cond_ashl, cond_ashr, cond_lshr): Document.
2021-08-05 14:03:24 +01:00
Richard Sandiford 783d809f0b vect: Move costing helpers from aarch64 code
aarch64.c has various routines to test for specific kinds of
vector statement cost.  The routines aren't really target-specific,
so following a suggestion from Richi, this patch moves them to a new
section of tree-vectorizer.h.

gcc/
	* tree-vectorizer.h (vect_is_store_elt_extraction, vect_is_reduction)
	(vect_reduc_type, vect_embedded_comparison_type, vect_comparison_type)
	(vect_is_extending_load, vect_is_integer_truncation): New functions,
	moved from aarch64.c but given different names.
	* config/aarch64/aarch64.c (aarch64_is_store_elt_extraction)
	(aarch64_is_reduction, aarch64_reduc_type)
	(aarch64_embedded_comparison_type, aarch64_comparison_type)
	(aarch64_extending_load_p, aarch64_integer_truncation_p): Delete
	in favor of the above.  Update callers accordingly.
2021-08-05 14:03:23 +01:00
Richard Earnshaw c1cdabe3aa arm: reorder assembler architecture directives [PR101723]
A change to the way gas interprets the .fpu directive in binutils-2.34
means that issuing .fpu will clear any features set by .arch_extension
that apply to the floating point or simd units.  This unfortunately
causes problems for more recent versions of the architecture because
we currently emit .arch, .arch_extension and .fpu directives at
different times and try to suppress redundant changes.

This change addresses this by firstly unifying all the places where we
emit these directives to a single block of code and secondly
(re)emitting all the directives if any changes have been made to the
target options.  Whilst this is slightly more than the strict minimum
it should be enough to catch all cases where a change could have
happened.  The new code also emits the directives in the order: .arch,
.fpu, .arch_extension.  This ensures that the additional architectural
extensions are not removed by a later .fpu directive.

Whilst writing this patch I also noticed that in the corner case where
the last function to be compiled had a non-standard set of
architecture flags, the assembler would add an incorrect set of
derived attributes for the file as a whole.  Instead of reflecting the
command-line options it would reflect the flags from the last file in
the function.  To address this I've also added a call to re-emit the
flags from the asm_file_end callback so the assembler will be in the
correct state when it finishes processing the intput.

There's some slight churn to the testsuite as a consequence of this,
because previously we had a hack to suppress emitting a .fpu directive
for one specific case, but with the new order this is no-longer
necessary.

gcc/ChangeLog:

	PR target/101723
	* config/arm/arm-cpus.in (generic-armv7-a): Add quirk to suppress
	writing .cpu directive in asm output.
	* config/arm/arm.c (arm_identify_fpu_from_isa): New variable.
	(arm_last_printed_arch_string): Delete.
	(arm_last-printed_fpu_string): Delete.
	(arm_configure_build_target): If use of floating-point/SIMD is
	disabled, remove all fp/simd related features from the target ISA.
	(last_arm_targ_options): New variable.
	(arm_print_asm_arch_directives): Add new parameters.  Change order
	of emitted directives and handle all cases here.
	(arm_file_start): Always call arm_print_asm_arch_directives, move
	all generation of .arch/.arch_extension here.
	(arm_file_end): Call arm_print_asm_arch.
	(arm_declare_function_name): Call arm_print_asm_arch_directives
	instead of printing .arch/.fpu directives directly.

gcc/testsuite/ChangeLog:

	PR target/101723
	* gcc.target/arm/cortex-m55-nofp-flag-hard.c: Update expected output.
	* gcc.target/arm/cortex-m55-nofp-flag-softfp.c: Likewise.
	* gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_fpu1.c: Convert to dg-do assemble.
	Add a non-no-op function body.
	* gcc.target/arm/mve/intrinsics/mve_fpu2.c: Likewise.
	* gcc.target/arm/pr98636.c (dg-options): Add -mfloat-abi=softfp.
	* gcc.target/arm/attr-neon.c: Tighten scan-assembler tests.
	* gcc.target/arm/attr-neon2.c: Use -Ofast, convert test to use
	check-function-bodies.
	* gcc.target/arm/attr-neon3.c: Likewise.
	* gcc.target/arm/pr69245.c: Tighten scan-assembler match, but allow
	multiple instances.
	* gcc.target/arm/pragma_fpu_attribute.c: Likewise.
	* gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.
2021-08-05 12:51:14 +01:00
Richard Earnshaw 6a37d0331c arm: Don't reconfigure globals in arm_configure_build_target
arm_configure_build_target is usually used to reconfigure the
arm_active_target structure, which is then used to reconfigure a
number of other global variables describing the current target.
Occasionally, however, we need to use arm_configure_build_target to
construct a temporary target structure and in that case it is wrong to
try to reconfigure the global variables (although probably harmless,
since arm_option_reconfigure_globals() only looks at
arm_active_target).  At the very least, however, this is wasted work,
so it is best not to do it unless needed.  What's more, several
callers of arm_configure_build target call
arm_option_reconfigure_globals themselves within a few lines, making
the call from within arm_configure_build_target completely redundant.

So this patch moves the responsibility of calling of
arm_configure_build_target to its callers (only two places needed
updating).

gcc:
	* config/arm/arm.c (arm_configure_build_target): Don't call
	arm_option_reconfigure_globals.
	(arm_option_restore): Call arm_option_reconfigure_globals after
	reconfiguring the target.
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
2021-08-05 12:51:14 +01:00
Richard Earnshaw 62e66c6a6c arm: ensure the arch_name is always set for the build target
This should never happen now if GCC is invoked by the driver, but in
the unusual case of calling cc1 (or its ilk) directly from the command
line the build target's arch_name string can remain NULL.  This can
complicate later processing meaning that we need to check for this
case explicitly in some circumstances.  Nothing should rely on this
behaviour, so it's simpler to always set the arch_name when
configuring the build target and be done with it.

gcc:

	* config/arm/arm.c (arm_configure_build_target): Ensure the target's
	arch_name is always set.
2021-08-05 12:51:14 +01:00
Jonathan Wright 0c3aab7f2a aarch64: Don't include vec_select high-half in SIMD subtract cost
The Neon subtract-long/subract-widen instructions can select the top
or bottom half of the operand registers. This selection does not
change the cost of the underlying instruction and this should be
reflected by the RTL cost function.

This patch adds RTL tree traversal in the Neon subtract cost function
to match vec_select high-half of its operands. This traversal
prevents the cost of the vec_select from being added into the cost of
the subtract - meaning that these instructions can now be emitted in
the combine pass as they are no longer deemed prohibitively
expensive.

gcc/ChangeLog:

2021-07-28  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
	of vec_select high-half from being added into Neon subtract
	cost.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vsubX_high_cost.c: New test.
2021-08-05 11:52:13 +01:00
Jonathan Wright 8cd27a3b25 aarch64: Don't include vec_select high-half in SIMD add cost
The Neon add-long/add-widen instructions can select the top or bottom
half of the operand registers. This selection does not change the
cost of the underlying instruction and this should be reflected by
the RTL cost function.

This patch adds RTL tree traversal in the Neon add cost function to
match vec_select high-half of its operands. This traversal prevents
the cost of the vec_select from being added into the cost of the
subtract - meaning that these instructions can now be emitted in the
combine pass as they are no longer deemed prohibitively expensive.

gcc/ChangeLog:

2021-07-28  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
	of vec_select high-half from being added into Neon add cost.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vaddX_high_cost.c: New test.
2021-08-05 11:51:57 +01:00
Richard Biener f0fc1e6623 Adjust gcc.dg/vect/bb-slp-pr101756.c
This adjusts the testcase for excess diagnostics emitted by some
targets because of the attribute simd usage like

warning: GCC does not currently support mixed size types for 'simd' functions

on aarch64.

2021-08-05  Richard Biener  <rguenther@suse.de>

	* gcc.dg/vect/bb-slp-pr101756.c: Add -w.
2021-08-05 11:41:38 +02:00
Kewen Lin d0a5624bb4 cfgloop: Make loops_list support an optional loop_p root
This patch follows Richi's suggestion to add one optional
argument class loop* root to loops_list's CTOR, it can
provide the ability to construct a visiting list starting
from the given class loop* ROOT rather than the default
tree_root of loops_for_fn (FN), for visiting a subset of
the loop tree.

It unifies all orders of walkings into walk_loop_tree, but
it still uses linear search for LI_ONLY_INNERMOST when
looking at the whole loop tree since it has a more stable
bound.

gcc/ChangeLog:

	* cfgloop.h (loops_list::loops_list): Add one optional argument
	root and adjust accordingly, update loop tree walking and factor
	out to ...
	* cfgloop.c (loops_list::walk_loop_tree): ... this.  New function.
2021-08-05 03:44:20 -05:00
Eric Botcazou 4e3129b0ca Fix oversight in handling of reverse SSO in SRA pass
The scalar storage order does not apply to pointer and vector components.

gcc/
	PR tree-optimization/101626
	* tree-sra.c (propagate_subaccesses_from_rhs): Do not set the
	reverse scalar storage order on a pointer or vector component.

gcc/testsuite/
	* gcc.dg/sso-15.c: New test.
2021-08-05 10:24:50 +02:00
Cherry Mui ac8a2fbedf compiler: make escape analysis more robust about builtin functions
In the places where we handle builtin functions, list all
supported ones, and fail if an unexpected one is seen. So if a
new builtin function is added in the future we can detect it,
instead of silently treating it as nonescaping.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/339992
2021-08-04 21:24:00 -07:00
liuhongt c16f21c7cf Support cond_{xor,ior,and} for vector integer mode under AVX512.
gcc/ChangeLog:

	* config/i386/sse.md (cond_<code><mode>): New expander.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/cond_op_anylogic_d-1.c: New test.
	* gcc.target/i386/cond_op_anylogic_d-2.c: New test.
	* gcc.target/i386/cond_op_anylogic_q-1.c: New test.
	* gcc.target/i386/cond_op_anylogic_q-2.c: New test.
2021-08-05 09:11:35 +08:00
liuhongt f7aa81892e Support cond_{smax,smin} for vector float/double modes under AVX512.
gcc/ChangeLog:

	* config/i386/sse.md (cond_<code><mode>): New expander.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/cond_op_maxmin_double-1.c: New test.
	* gcc.target/i386/cond_op_maxmin_double-2.c: New test.
	* gcc.target/i386/cond_op_maxmin_float-1.c: New test.
	* gcc.target/i386/cond_op_maxmin_float-2.c: New test.
2021-08-05 09:11:31 +08:00
liuhongt 9a8c3fc2b2 Support cond_{smax,smin,umax,umin} for vector integer modes under AVX512.
gcc/ChangeLog:

	* config/i386/sse.md (cond_<code><mode>): New expander.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/cond_op_maxmin_b-1.c: New test.
	* gcc.target/i386/cond_op_maxmin_b-2.c: New test.
	* gcc.target/i386/cond_op_maxmin_d-1.c: New test.
	* gcc.target/i386/cond_op_maxmin_d-2.c: New test.
	* gcc.target/i386/cond_op_maxmin_q-1.c: New test.
	* gcc.target/i386/cond_op_maxmin_q-2.c: New test.
	* gcc.target/i386/cond_op_maxmin_ub-1.c: New test.
	* gcc.target/i386/cond_op_maxmin_ub-2.c: New test.
	* gcc.target/i386/cond_op_maxmin_ud-1.c: New test.
	* gcc.target/i386/cond_op_maxmin_ud-2.c: New test.
	* gcc.target/i386/cond_op_maxmin_uq-1.c: New test.
	* gcc.target/i386/cond_op_maxmin_uq-2.c: New test.
	* gcc.target/i386/cond_op_maxmin_uw-1.c: New test.
	* gcc.target/i386/cond_op_maxmin_uw-2.c: New test.
	* gcc.target/i386/cond_op_maxmin_w-1.c: New test.
	* gcc.target/i386/cond_op_maxmin_w-2.c: New test.
2021-08-05 09:11:28 +08:00
GCC Administrator 2697f8324f Daily bump. 2021-08-05 00:17:03 +00:00
David Malcolm ded2c2c068 analyzer: initial implementation of asm support [PR101570]
gcc/ChangeLog:
	PR analyzer/101570
	* Makefile.in (ANALYZER_OBJS): Add analyzer/region-model-asm.o.

gcc/analyzer/ChangeLog:
	PR analyzer/101570
	* analyzer.cc (maybe_reconstruct_from_def_stmt): Add GIMPLE_ASM
	case.
	* analyzer.h (class asm_output_svalue): New forward decl.
	(class reachable_regions): New forward decl.
	* complexity.cc (complexity::from_vec_svalue): New.
	* complexity.h (complexity::from_vec_svalue): New decl.
	* engine.cc (feasibility_state::maybe_update_for_edge): Handle
	asm stmts by calling on_asm_stmt.
	* region-model-asm.cc: New file.
	* region-model-manager.cc
	(region_model_manager::maybe_fold_asm_output_svalue): New.
	(region_model_manager::get_or_create_asm_output_svalue): New.
	(region_model_manager::log_stats): Log m_asm_output_values_map.
	* region-model.cc (region_model::on_stmt_pre): Handle GIMPLE_ASM.
	* region-model.h (visitor::visit_asm_output_svalue): New.
	(region_model_manager::get_or_create_asm_output_svalue): New decl.
	(region_model_manager::maybe_fold_asm_output_svalue): New decl.
	(region_model_manager::asm_output_values_map_t): New typedef.
	(region_model_manager::m_asm_output_values_map): New field.
	(region_model::on_asm_stmt): New.
	* store.cc (binding_cluster::on_asm): New.
	* store.h (binding_cluster::on_asm): New decl.
	* svalue.cc (svalue::cmp_ptr): Handle SK_ASM_OUTPUT.
	(asm_output_svalue::dump_to_pp): New.
	(asm_output_svalue::dump_input): New.
	(asm_output_svalue::input_idx_to_asm_idx): New.
	(asm_output_svalue::accept): New.
	* svalue.h (enum svalue_kind): Add SK_ASM_OUTPUT.
	(svalue::dyn_cast_asm_output_svalue): New.
	(class asm_output_svalue): New.
	(is_a_helper <const asm_output_svalue *>::test): New.
	(struct default_hash_traits<asm_output_svalue::key_t>): New.

gcc/testsuite/ChangeLog:
	PR analyzer/101570
	* gcc.dg/analyzer/asm-x86-1.c: New test.
	* gcc.dg/analyzer/asm-x86-lp64-1.c: New test.
	* gcc.dg/analyzer/asm-x86-lp64-2.c: New test.
	* gcc.dg/analyzer/pr101570.c: New test.
	* gcc.dg/analyzer/torture/asm-x86-linux-array_index_mask_nospec.c:
	New test.
	* gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-1.c: New
	test.
	* gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-2.c: New
	test.
	* gcc.dg/analyzer/torture/asm-x86-linux-cpuid.c: New test.
	* gcc.dg/analyzer/torture/asm-x86-linux-rdmsr-paravirt.c: New
	test.
	* gcc.dg/analyzer/torture/asm-x86-linux-rdmsr.c: New test.
	* gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full.c:
	New test.
	* gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-reduced.c:
	New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-08-04 18:21:25 -04:00
H.J. Lu 5738a64f8b x86: Update STORE_MAX_PIECES
Update STORE_MAX_PIECES to allow 16/32/64 bytes only if inter-unit move
is enabled since vec_duplicate enabled by inter-unit move is used to
implement store_by_pieces of 16/32/64 bytes.

gcc/

	PR target/101742
	* config/i386/i386.h (STORE_MAX_PIECES): Allow 16/32/64 bytes
	only if TARGET_INTER_UNIT_MOVES_TO_VEC is true.

gcc/testsuite/

	PR target/101742
	* gcc.target/i386/pr101742a.c: New test.
	* gcc.target/i386/pr101742b.c: Likewise.
2021-08-04 12:59:38 -07:00
H.J. Lu 09dba016db x86: Avoid stack realignment when copying data with SSE register
To avoid stack realignment, call ix86_gen_scratch_sse_rtx to get a
scratch SSE register to copy data with with SSE register from one
memory location to another.

gcc/

	PR target/101772
	* config/i386/i386-expand.c (ix86_expand_vector_move): Call
	ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy
	data with SSE register from one memory location to another.

gcc/testsuite/

	PR target/101772
	* gcc.target/i386/eh_return-2.c: New test.
2021-08-04 12:51:12 -07:00
Andreas Krebbel 361da782a2 IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vpdi
This patch makes use of the vector permute double immediate
instruction for constant permute vectors.

gcc/ChangeLog:

	* config/s390/s390.c (expand_perm_with_vpdi): New function.
	(vectorize_vec_perm_const_1): Call expand_perm_with_vpdi.
	* config/s390/vector.md (*vpdi1<mode>, @vpdi1<mode>): Enable a
	parameterized expander.
	(*vpdi4<mode>, @vpdi4<mode>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/vector/perm-vpdi.c: New test.
2021-08-04 18:40:11 +02:00
Andreas Krebbel 6dc8c46564 IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vector merge
This patch implements the TARGET_VECTORIZE_VEC_PERM_CONST in the IBM Z
backend. The initial implementation only exploits the vector merge
instruction but there is more to come.

gcc/ChangeLog:

	* config/s390/s390.c (MAX_VECT_LEN): Define macro.
	(struct expand_vec_perm_d): Define struct.
	(expand_perm_with_merge): New function.
	(vectorize_vec_perm_const_1): New function.
	(s390_vectorize_vec_perm_const): New function.
	(TARGET_VECTORIZE_VEC_PERM_CONST): Define target macro.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/vector/perm-merge.c: New test.
	* gcc.target/s390/vector/vec-types.h: New test.
2021-08-04 18:40:10 +02:00
Andreas Krebbel 4e34925ef1 IBM Z: Remove redundant V_HW_64 mode iterator.
gcc/ChangeLog:

	* config/s390/vector.md (V_HW_64): Remove mode iterator.
	(*vec_load_pair<mode>): Use V_HW_2 instead of V_HW_64.
	* config/s390/vx-builtins.md
	(vec_scatter_element<V_HW_2:mode>_SI): Use V_HW_2 instead of
	V_HW_64.
2021-08-04 18:40:10 +02:00
Andreas Krebbel 0aa7091bef IBM Z: Get rid of vpdi unspec
The patch gets rid of the unspec used for the vector permute double
immediate instruction and replaces it with generic rtx.

gcc/ChangeLog:

	* config/s390/s390.md (UNSPEC_VEC_PERMI): Remove constant
	definition.
	* config/s390/vector.md (*vpdi1<mode>, *vpdi4<mode>): New pattern
	definitions.
	* config/s390/vx-builtins.md (*vec_permi<mode>): Emit generic rtx
	instead of an unspec.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/zvector/vec-permi.c: Removed.
	* gcc.target/s390/zvector/vec_permi.c: New test.
2021-08-04 18:40:09 +02:00
Andreas Krebbel 5391688acc IBM Z: Get rid of vec merge unspec
This patch gets rid of the unspecs we were using for the vector merge
instruction and replaces it with generic rtx.

gcc/ChangeLog:

	* config/s390/s390-modes.def: Add more vector modes to support
	concatenation of two vectors.
	* config/s390/s390-protos.h (s390_expand_merge_perm_const): Add
	prototype.
	(s390_expand_merge): Likewise.
	* config/s390/s390.c (s390_expand_merge_perm_const): New function.
	(s390_expand_merge): New function.
	* config/s390/s390.md (UNSPEC_VEC_MERGEH, UNSPEC_VEC_MERGEL):
	Remove constant definitions.
	* config/s390/vector.md (V_HW_2): Add mode iterators.
	(VI_HW_4, V_HW_4): Rename VI_HW_4 to V_HW_4.
	(vec_2x_nelts, vec_2x_wide): New mode attributes.
	(*vmrhb, *vmrlb, *vmrhh, *vmrlh, *vmrhf, *vmrlf, *vmrhg, *vmrlg):
	New pattern definitions.
	(vec_widen_umult_lo_<mode>, vec_widen_umult_hi_<mode>)
	(vec_widen_smult_lo_<mode>, vec_widen_smult_hi_<mode>)
	(vec_unpacks_lo_v4sf, vec_unpacks_hi_v4sf, vec_unpacks_lo_v2df)
	(vec_unpacks_hi_v2df): Adjust expanders to emit non-unspec RTX for
	vec merge.
	* config/s390/vx-builtins.md (V_HW_4): Remove mode iterator. Now
	in vector.md.
	(vec_mergeh<mode>, vec_mergel<mode>): Use s390_expand_merge to
	emit vec merge pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c:
	Instead of vpdi with 0 and 5 vmrlg and vmrhg are used now.
	* gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: Likewise.
	* gcc.target/s390/zvector/vec-types.h: New test.
	* gcc.target/s390/zvector/vec_merge.c: New test.
2021-08-04 18:40:09 +02:00
Jonathan Wright 63834c84d4 aarch64: Don't include vec_select high-half in SIMD multiply cost
The Neon multiply/multiply-accumulate/multiply-subtract instructions
can select the top or bottom half of the operand registers. This
selection does not change the cost of the underlying instruction and
this should be reflected by the RTL cost function.

This patch adds RTL tree traversal in the Neon multiply cost function
to match vec_select high-half of its operands. This traversal
prevents the cost of the vec_select from being added into the cost of
the multiply - meaning that these instructions can now be emitted in
the combine pass as they are no longer deemed prohibitively
expensive.

gcc/ChangeLog:

2021-07-19  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64.c (aarch64_strip_extend_vec_half):
	Define.
	(aarch64_rtx_mult_cost): Traverse RTL tree to prevent cost of
	vec_select high-half from being added into Neon multiply
	cost.
	* rtlanal.c (vec_series_highpart_p): Define.
	* rtlanal.h (vec_series_highpart_p): Declare.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vmul_high_cost.c: New test.
2021-08-04 16:58:26 +01:00
Jonathan Wright 1d65c9d251 aarch64: Don't include vec_select element in SIMD multiply cost
The Neon multiply/multiply-accumulate/multiply-subtract instructions
can take various forms - multiplying full vector registers of values
or multiplying one vector by a single element of another. Regardless
of the form used, these instructions have the same cost, and this
should be reflected by the RTL cost function.

This patch adds RTL tree traversal in the Neon multiply cost function
to match the vec_select used by the lane-referencing forms of the
instructions already mentioned. This traversal prevents the cost of
the vec_select from being added into the cost of the multiply -
meaning that these instructions can now be emitted in the combine
pass as they are no longer deemed prohibitively expensive.

gcc/ChangeLog:

2021-07-19  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64.c (aarch64_strip_duplicate_vec_elt):
	Define.
	(aarch64_rtx_mult_cost): Traverse RTL tree to prevent
	vec_select cost from being added into Neon multiply cost.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vmul_element_cost.c: New test.
2021-08-04 16:57:38 +01:00
Richard Sandiford 5a1017dc30 vect: Tweak comparisons with existing epilogue loops
This patch uses a more accurate scalar iteration estimate when
comparing the epilogue of a constant-iteration loop with a candidate
replacement epilogue.

In the testcase, the patch prevents a 1-to-3-element SVE epilogue
from seeming better than a 64-bit Advanced SIMD epilogue.

gcc/
	* tree-vect-loop.c (vect_better_loop_vinfo_p): Detect cases in
	which old_loop_vinfo is an epilogue loop that handles a constant
	number of iterations.

gcc/testsuite/
	* gcc.target/aarch64/sve/cost_model_12.c: New test.
2021-08-04 16:52:09 +01:00
Richard Sandiford 315a1c3756 vect: Tweak dump messages for vector mode choice
After vect_analyze_loop has successfully analysed a loop for
one base vector mode B1, it considers using following base vector
modes to vectorise an epilogue.  However, for VECT_COMPARE_COSTS,
a later mode B2 might turn out to be better than B1 was.  Initially
this comparison will be between an epilogue loop (for B2) and a main
loop (for B1).  However, in r11-6458 I'd added code to reanalyse the
B2 epilogue loop as a main loop, partly for correctness and partly
for better costing.

This can lead to a situation in which we think that the B2 epilogue
loop was better than the B1 main loop, but that the B2 main loop is
not better than the B1 main loop.  There was no dump message to say
that this had happened, which made it look like B2 had still won.

gcc/
	* tree-vect-loop.c (vect_analyze_loop): Print a dump message
	when a reanalyzed loop fails to be cheaper than the current
	main loop.
2021-08-04 16:52:08 +01:00
Richard Sandiford eb55b5b0df aarch64: Fix a typo
gcc/
	* config/aarch64/aarch64.c: Fix a typo.
2021-08-04 16:52:07 +01:00
Vincent Lefèvre 929f2cf410 gcov: check return code of a fclose
gcc/ChangeLog:

	PR gcov-profile/101773
	* gcov-io.c (gcov_close): Check return code of a fclose.
2021-08-04 17:26:28 +02:00
Bernd Edlinger 96c82a16b2 Fix debug info for ignored decls at start of assembly
Ignored functions decls that are compiled at the start of
the assembly have bogus line numbers until the first .file
directive, as reported in PR101575.

The corresponding binutils bug report is
https://sourceware.org/bugzilla/show_bug.cgi?id=28149

The work around for this issue is to emit a dummy .file
directive before the first function is compiled, unless
another .file directive was already emitted previously.

2021-08-04  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR ada/101575
	* dwarf2out.c (dwarf2out_assembly_start): Emit a dummy
	.file statement when needed.
2021-08-04 16:18:07 +02:00
Tamar Christina 9fcb8ec603 [testsuite] Fix trapping access in test PR101750
I believe PR101750 to be a testism. Fix it by giving the class a name.

gcc/testsuite/ChangeLog:

	PR tree-optimization/101750
	* g++.dg/vect/pr99149.cc: Name class.
2021-08-04 14:36:26 +01:00
Richard Biener 31855ba6b1 Add emulated gather capability to the vectorizer
This adds a gather vectorization capability to the vectorizer
without target support by decomposing the offset vector, doing
sclar loads and then building a vector from the result.  This
is aimed mainly at cases where vectorizing the rest of the loop
offsets the cost of vectorizing the gather.

Note it's difficult to avoid vectorizing the offset load, but in
some cases later passes can turn the vector load + extract into
scalar loads, see the followup patch.

On SPEC CPU 2017 510.parest_r this improves runtime from 250s
to 219s on a Zen2 CPU which has its native gather instructions
disabled (using those the runtime instead increases to 254s)
using -Ofast -march=znver2 [-flto].  It turns out the critical
loops in this benchmark all perform gather operations.

2021-07-30  Richard Biener  <rguenther@suse.de>

	* tree-vect-data-refs.c (vect_check_gather_scatter):
	Include widening conversions only when the result is
	still handed by native gather or the current offset
	size not already matches the data size.
	Also succeed analysis in case there's no native support,
	noted by a IFN_LAST ifn and a NULL decl.
	(vect_analyze_data_refs): Always consider gathers.
	* tree-vect-patterns.c (vect_recog_gather_scatter_pattern):
	Test for no IFN gather rather than decl gather.
	* tree-vect-stmts.c (vect_model_load_cost): Pass in the
	gather-scatter info and cost emulated gathers accordingly.
	(vect_truncate_gather_scatter_offset): Properly test for
	no IFN gather.
	(vect_use_strided_gather_scatters_p): Likewise.
	(get_load_store_type): Handle emulated gathers and its
	restrictions.
	(vectorizable_load): Likewise.  Emulate them by extracting
	scalar offsets, doing scalar loads and a vector construct.

	* gcc.target/i386/vect-gather-1.c: New testcase.
	* gfortran.dg/vect/vect-8.f90: Adjust.
2021-08-04 15:28:07 +02:00
H.J. Lu f2e5d2717d by_pieces: Pass MAX_PIECES to op_by_pieces_d
Pass MAX_PIECES to op_by_pieces_d::op_by_pieces_d for move, store and
compare.

	PR target/101742
	* expr.c (op_by_pieces_d::op_by_pieces_d): Add a max_pieces
	argument to set m_max_size.
	(move_by_pieces_d): Pass MOVE_MAX_PIECES to op_by_pieces_d.
	(store_by_pieces_d): Pass STORE_MAX_PIECES to op_by_pieces_d.
	(compare_by_pieces_d): Pass COMPARE_MAX_PIECES to op_by_pieces_d.
2021-08-04 06:24:46 -07:00
Roger Sayle 96146e61cd Fold (X<<C1)^(X<<C2) to a multiplication when possible.
The easiest way to motivate these additions to match.pd is with the
following example:

unsigned int foo(unsigned char i) {
  return i | (i<<8) | (i<<16) | (i<<24);
}

which mainline with -O2 on x86_64 currently generates:
foo:	movzbl  %dil, %edi
	movl    %edi, %eax
	movl    %edi, %edx
	sall    $8, %eax
	sall    $16, %edx
	orl     %edx, %eax
	orl     %edi, %eax
	sall    $24, %edi
	orl     %edi, %eax
	ret

but with this patch now becomes:
foo:	movzbl  %dil, %eax
        imull   $16843009, %eax, %eax
        ret

Interestingly, this transformation is already applied when using
addition, allowing synth_mult to select an optimal sequence, but
not when using the equivalent bit-wise ior or xor operators.

The solution is to use tree_nonzero_bits to check that the
potentially non-zero bits of each operand don't overlap, which
ensures that BIT_IOR_EXPR and BIT_XOR_EXPR produce the same
results as PLUS_EXPR, which effectively generalizes the old
fold_plusminus_mult_expr.  Technically, the transformation
is to canonicalize (X*C1)|(X*C2) and (X*C1)^(X*C2) to
X*(C1+C2) where X and X<<C are considered special cases.

2021-08-04  Roger Sayle  <roger@nextmovesoftware.com>
	    Marc Glisse  <marc.glisse@inria.fr>

gcc/ChangeLog
	* match.pd (bit_ior, bit_xor): Canonicalize (X*C1)|(X*C2) and
	(X*C1)^(X*C2) as X*(C1+C2), and related variants, using
	tree_nonzero_bits to ensure that operands are bit-wise disjoint.

gcc/testsuite/ChangeLog
	* gcc.dg/fold-ior-4.c: New test.
2021-08-04 14:22:51 +01:00
Jonathan Wakely 0d04fe4923 libstdc++: Add [[nodiscard]] to sequence containers
... and container adaptors.

This adds the [[nodiscard]] attribute to functions with no side-effects
for the sequence containers and their iterators, and the debug versions
of those containers, and the container adaptors,

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/forward_list.h: Add [[nodiscard]] to functions
	with no side-effects.
	* include/bits/stl_bvector.h: Likewise.
	* include/bits/stl_deque.h: Likewise.
	* include/bits/stl_list.h: Likewise.
	* include/bits/stl_queue.h: Likewise.
	* include/bits/stl_stack.h: Likewise.
	* include/bits/stl_vector.h: Likewise.
	* include/debug/deque: Likewise.
	* include/debug/forward_list: Likewise.
	* include/debug/list: Likewise.
	* include/debug/safe_iterator.h: Likewise.
	* include/debug/vector: Likewise.
	* include/std/array: Likewise.
	* testsuite/23_containers/array/creation/3_neg.cc: Use
	-Wno-unused-result.
	* testsuite/23_containers/array/debug/back1_neg.cc: Cast result
	to void.
	* testsuite/23_containers/array/debug/back2_neg.cc: Likewise.
	* testsuite/23_containers/array/debug/front1_neg.cc: Likewise.
	* testsuite/23_containers/array/debug/front2_neg.cc: Likewise.
	* testsuite/23_containers/array/debug/square_brackets_operator1_neg.cc:
	Likewise.
	* testsuite/23_containers/array/debug/square_brackets_operator2_neg.cc:
	Likewise.
	* testsuite/23_containers/array/tuple_interface/get_neg.cc:
	Adjust dg-error line numbers.
	* testsuite/23_containers/deque/cons/clear_allocator.cc: Cast
	result to void.
	* testsuite/23_containers/deque/debug/invalidation/4.cc:
	Likewise.
	* testsuite/23_containers/deque/types/1.cc: Use
	-Wno-unused-result.
	* testsuite/23_containers/list/types/1.cc: Cast result to void.
	* testsuite/23_containers/priority_queue/members/7161.cc:
	Likewise.
	* testsuite/23_containers/queue/members/7157.cc: Likewise.
	* testsuite/23_containers/vector/59829.cc: Likewise.
	* testsuite/23_containers/vector/ext_pointer/types/1.cc:
	Likewise.
	* testsuite/23_containers/vector/ext_pointer/types/2.cc:
	Likewise.
	* testsuite/23_containers/vector/types/1.cc: Use
	-Wno-unused-result.
2021-08-04 12:54:29 +01:00
Jonathan Wakely 240b01b021 libstdc++: Add [[nodiscard]] to iterators and related utilities
This adds [[nodiscard]] throughout <iterator>, as proposed by P2377R0
(with some minor corrections).

The attribute is added for all modes from C++11 up, using
[[__nodiscard__]] or _GLIBCXX_NODISCARD where C++17 [[nodiscard]] can't
be used directly.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/iterator_concepts.h (iter_move): Add
	[[nodiscard]].
	* include/bits/range_access.h (begin, end, cbegin, cend)
	(rbegin, rend, crbegin, crend, size, data, ssize): Likewise.
	* include/bits/ranges_base.h (ranges::begin, ranges::end)
	(ranges::cbegin, ranges::cend, ranges::rbegin, ranges::rend)
	(ranges::crbegin, ranges::crend, ranges::size, ranges::ssize)
	(ranges::empty, ranges::data, ranges::cdata): Likewise.
	* include/bits/stl_iterator.h (reverse_iterator, __normal_iterator)
	(back_insert_iterator, front_insert_iterator, insert_iterator)
	(move_iterator, move_sentinel, common_iterator)
	(counted_iterator): Likewise.
	* include/bits/stl_iterator_base_funcs.h (distance, next, prev):
	Likewise.
	* include/bits/stream_iterator.h (istream_iterator)
	(ostream_iterartor): Likewise.
	* include/bits/streambuf_iterator.h (istreambuf_iterator)
	(ostreambuf_iterator): Likewise.
	* include/std/ranges (views::single, views::iota, views::all)
	(views::filter, views::transform, views::take, views::take_while)
	(views::drop, views::drop_while, views::join, views::lazy_split)
	(views::split, views::counted, views::common, views::reverse)
	(views::elements): Likewise.
	* testsuite/20_util/rel_ops.cc: Use -Wno-unused-result.
	* testsuite/24_iterators/move_iterator/greedy_ops.cc: Likewise.
	* testsuite/24_iterators/normal_iterator/greedy_ops.cc:
	Likewise.
	* testsuite/24_iterators/reverse_iterator/2.cc: Likewise.
	* testsuite/24_iterators/reverse_iterator/greedy_ops.cc:
	Likewise.
	* testsuite/21_strings/basic_string/range_access/char/1.cc:
	Cast result to void.
	* testsuite/21_strings/basic_string/range_access/wchar_t/1.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/range_access/char/1.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/range_access/wchar_t/1.cc:
	Likewise.
	* testsuite/23_containers/array/range_access.cc: Likewise.
	* testsuite/23_containers/deque/range_access.cc: Likewise.
	* testsuite/23_containers/forward_list/range_access.cc:
	Likewise.
	* testsuite/23_containers/list/range_access.cc: Likewise.
	* testsuite/23_containers/map/range_access.cc: Likewise.
	* testsuite/23_containers/multimap/range_access.cc: Likewise.
	* testsuite/23_containers/multiset/range_access.cc: Likewise.
	* testsuite/23_containers/set/range_access.cc: Likewise.
	* testsuite/23_containers/unordered_map/range_access.cc:
	Likewise.
	* testsuite/23_containers/unordered_multimap/range_access.cc:
	Likewise.
	* testsuite/23_containers/unordered_multiset/range_access.cc:
	Likewise.
	* testsuite/23_containers/unordered_set/range_access.cc:
	Likewise.
	* testsuite/23_containers/vector/range_access.cc: Likewise.
	* testsuite/24_iterators/customization_points/iter_move.cc:
	Likewise.
	* testsuite/24_iterators/istream_iterator/sentinel.cc:
	Likewise.
	* testsuite/24_iterators/istreambuf_iterator/sentinel.cc:
	Likewise.
	* testsuite/24_iterators/move_iterator/dr2061.cc: Likewise.
	* testsuite/24_iterators/operations/prev_neg.cc: Likewise.
	* testsuite/24_iterators/ostreambuf_iterator/2.cc: Likewise.
	* testsuite/24_iterators/range_access/range_access.cc:
	Likewise.
	* testsuite/24_iterators/range_operations/100768.cc: Likewise.
	* testsuite/26_numerics/valarray/range_access2.cc: Likewise.
	* testsuite/28_regex/range_access.cc: Likewise.
	* testsuite/experimental/string_view/range_access/char/1.cc:
	Likewise.
	* testsuite/experimental/string_view/range_access/wchar_t/1.cc:
	Likewise.
	* testsuite/ext/vstring/range_access.cc: Likewise.
	* testsuite/std/ranges/adaptors/take.cc: Likewise.
	* testsuite/std/ranges/p2259.cc: Likewise.
2021-08-04 12:54:28 +01:00
Richard Biener 2724d1bba6 Rewrite more vector loads to scalar loads
This teaches forwprop to rewrite more vector loads that are only
used in BIT_FIELD_REFs as scalar loads.  This provides the
remaining uplift to SPEC CPU 2017 510.parest_r on Zen 2 which
has CPU gathers disabled.

In particular vector load + vec_unpack + bit-field-ref is turned
into (extending) scalar loads which avoids costly XMM/GPR
transitions.  To not conflict with vector load + bit-field-ref
+ vector constructor matching to vector load + shuffle the
extended transform is only done after vector lowering.

2021-07-30  Richard Biener  <rguenther@suse.de>

	* tree-ssa-forwprop.c (pass_forwprop::execute): Split
	out code to decompose vector loads ...
	(optimize_vector_load): ... here.  Generalize it to
	handle intermediate widening and TARGET_MEM_REF loads
	and apply it to loads with a supported vector mode as well.
2021-08-04 12:38:03 +02:00
Richard Biener 87a0b607e4 tree-optimization/101756 - avoid vectorizing boolean MAX reductions
The following avoids vectorizing MIN/MAX reductions on bools which,
when ending up as vector(2) <signed-boolean:64> would need to be
adjusted because of the sign change.  The fix instead avoids any
reduction vectorization where the result isn't compatible
to the original scalar type since we don't compensate for that
either.

2021-08-04  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101756
	* tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Make sure
	the result of the reduction epilogue is compatible to the original
	scalar result.

	* gcc.dg/vect/bb-slp-pr101756.c: New testcase.
2021-08-04 12:33:23 +02:00
Jakub Jelinek af31cab047 c++: Fix up #pragma omp declare {simd,variant} and acc routine parsing
When parsing default arguments, we need to temporarily clear parser->omp_declare_simd
and parser->oacc_routine, otherwise it can clash with further declarations
inside of e.g. lambdas inside of those default arguments.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

	PR c++/101759
	* parser.c (cp_parser_default_argument): Temporarily override
	parser->omp_declare_simd and parser->oacc_routine to NULL.

	* g++.dg/gomp/pr101759.C: New test.
	* g++.dg/goacc/pr101759.C: New test.
2021-08-04 11:53:48 +02:00
Jakub Jelinek 8aa14fa7d9 testsuite: Fix duplicated content of gcc.c-torture/execute/ieee/pr29302-1.x
The file has two identical halves, seems like twice applied patch.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

	* gcc.c-torture/execute/ieee/pr29302-1.x: Undo doubly applied patch.
2021-08-04 11:44:45 +02:00
liuhongt 9f26640f7b Refine predicate of peephole2 to general_reg_operand. [PR target/101743]
The define_peephole2 which is added by r12-2640-gf7bf03cf69ccb7dc
should only work on general registers, considering that x86 also
supports mov instructions between gpr, sse reg, mask reg, limiting the
peephole2 predicate to general_reg_operand.

gcc/ChangeLog:

	PR target/101743
	* config/i386/i386.md (peephole2): Refine predicate from
	register_operand to general_reg_operand.
2021-08-04 17:43:17 +08:00
Jakub Jelinek 7195fa03e7 libgcc: Fix duplicated content of config/t-slibgcc-fuchsia
The file has two identical halves, seems like twice applied patch.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

	* config/t-slibgcc-fuchsia: Undo doubly applied patch.
2021-08-04 11:40:52 +02:00
Aldy Hernandez 9db0bcd9fd Mark path_range_query::dump as override.
gcc/ChangeLog:

	* gimple-range-path.h (path_range_query::dump): Mark override.
2021-08-04 10:57:11 +02:00
Richard Biener 4d56259101 tree-optimization/101769 - tail recursion creates possibly infinite loop
This makes tail recursion optimization produce a loop structure
manually rather than relying on loop fixup.  That also allows the
loop to be marked as finite (it would eventually blow the stack
if it were not).

2021-08-04  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101769
	* tree-tailcall.c (eliminate_tail_call): Add the created loop
	for the first recursion and return it via the new output parameter.
	(optimize_tail_call): Pass through new output param.
	(tree_optimize_tail_calls_1): After creating all latches,
	add the created loop to the loop tree.  Do not mark loops for fixup.

	* g++.dg/tree-ssa/pr101769.C: New testcase.
2021-08-04 10:35:27 +02:00