Commit Graph

187159 Commits

Author SHA1 Message Date
GCC Administrator 2697f8324f Daily bump. 2021-08-05 00:17:03 +00:00
David Malcolm ded2c2c068 analyzer: initial implementation of asm support [PR101570]
gcc/ChangeLog:
	PR analyzer/101570
	* Makefile.in (ANALYZER_OBJS): Add analyzer/region-model-asm.o.

gcc/analyzer/ChangeLog:
	PR analyzer/101570
	* analyzer.cc (maybe_reconstruct_from_def_stmt): Add GIMPLE_ASM
	case.
	* analyzer.h (class asm_output_svalue): New forward decl.
	(class reachable_regions): New forward decl.
	* complexity.cc (complexity::from_vec_svalue): New.
	* complexity.h (complexity::from_vec_svalue): New decl.
	* engine.cc (feasibility_state::maybe_update_for_edge): Handle
	asm stmts by calling on_asm_stmt.
	* region-model-asm.cc: New file.
	* region-model-manager.cc
	(region_model_manager::maybe_fold_asm_output_svalue): New.
	(region_model_manager::get_or_create_asm_output_svalue): New.
	(region_model_manager::log_stats): Log m_asm_output_values_map.
	* region-model.cc (region_model::on_stmt_pre): Handle GIMPLE_ASM.
	* region-model.h (visitor::visit_asm_output_svalue): New.
	(region_model_manager::get_or_create_asm_output_svalue): New decl.
	(region_model_manager::maybe_fold_asm_output_svalue): New decl.
	(region_model_manager::asm_output_values_map_t): New typedef.
	(region_model_manager::m_asm_output_values_map): New field.
	(region_model::on_asm_stmt): New.
	* store.cc (binding_cluster::on_asm): New.
	* store.h (binding_cluster::on_asm): New decl.
	* svalue.cc (svalue::cmp_ptr): Handle SK_ASM_OUTPUT.
	(asm_output_svalue::dump_to_pp): New.
	(asm_output_svalue::dump_input): New.
	(asm_output_svalue::input_idx_to_asm_idx): New.
	(asm_output_svalue::accept): New.
	* svalue.h (enum svalue_kind): Add SK_ASM_OUTPUT.
	(svalue::dyn_cast_asm_output_svalue): New.
	(class asm_output_svalue): New.
	(is_a_helper <const asm_output_svalue *>::test): New.
	(struct default_hash_traits<asm_output_svalue::key_t>): New.

gcc/testsuite/ChangeLog:
	PR analyzer/101570
	* gcc.dg/analyzer/asm-x86-1.c: New test.
	* gcc.dg/analyzer/asm-x86-lp64-1.c: New test.
	* gcc.dg/analyzer/asm-x86-lp64-2.c: New test.
	* gcc.dg/analyzer/pr101570.c: New test.
	* gcc.dg/analyzer/torture/asm-x86-linux-array_index_mask_nospec.c:
	New test.
	* gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-1.c: New
	test.
	* gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-2.c: New
	test.
	* gcc.dg/analyzer/torture/asm-x86-linux-cpuid.c: New test.
	* gcc.dg/analyzer/torture/asm-x86-linux-rdmsr-paravirt.c: New
	test.
	* gcc.dg/analyzer/torture/asm-x86-linux-rdmsr.c: New test.
	* gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full.c:
	New test.
	* gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-reduced.c:
	New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-08-04 18:21:25 -04:00
H.J. Lu 5738a64f8b x86: Update STORE_MAX_PIECES
Update STORE_MAX_PIECES to allow 16/32/64 bytes only if inter-unit move
is enabled since vec_duplicate enabled by inter-unit move is used to
implement store_by_pieces of 16/32/64 bytes.

gcc/

	PR target/101742
	* config/i386/i386.h (STORE_MAX_PIECES): Allow 16/32/64 bytes
	only if TARGET_INTER_UNIT_MOVES_TO_VEC is true.

gcc/testsuite/

	PR target/101742
	* gcc.target/i386/pr101742a.c: New test.
	* gcc.target/i386/pr101742b.c: Likewise.
2021-08-04 12:59:38 -07:00
H.J. Lu 09dba016db x86: Avoid stack realignment when copying data with SSE register
To avoid stack realignment, call ix86_gen_scratch_sse_rtx to get a
scratch SSE register to copy data with with SSE register from one
memory location to another.

gcc/

	PR target/101772
	* config/i386/i386-expand.c (ix86_expand_vector_move): Call
	ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy
	data with SSE register from one memory location to another.

gcc/testsuite/

	PR target/101772
	* gcc.target/i386/eh_return-2.c: New test.
2021-08-04 12:51:12 -07:00
Andreas Krebbel 361da782a2 IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vpdi
This patch makes use of the vector permute double immediate
instruction for constant permute vectors.

gcc/ChangeLog:

	* config/s390/s390.c (expand_perm_with_vpdi): New function.
	(vectorize_vec_perm_const_1): Call expand_perm_with_vpdi.
	* config/s390/vector.md (*vpdi1<mode>, @vpdi1<mode>): Enable a
	parameterized expander.
	(*vpdi4<mode>, @vpdi4<mode>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/vector/perm-vpdi.c: New test.
2021-08-04 18:40:11 +02:00
Andreas Krebbel 6dc8c46564 IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vector merge
This patch implements the TARGET_VECTORIZE_VEC_PERM_CONST in the IBM Z
backend. The initial implementation only exploits the vector merge
instruction but there is more to come.

gcc/ChangeLog:

	* config/s390/s390.c (MAX_VECT_LEN): Define macro.
	(struct expand_vec_perm_d): Define struct.
	(expand_perm_with_merge): New function.
	(vectorize_vec_perm_const_1): New function.
	(s390_vectorize_vec_perm_const): New function.
	(TARGET_VECTORIZE_VEC_PERM_CONST): Define target macro.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/vector/perm-merge.c: New test.
	* gcc.target/s390/vector/vec-types.h: New test.
2021-08-04 18:40:10 +02:00
Andreas Krebbel 4e34925ef1 IBM Z: Remove redundant V_HW_64 mode iterator.
gcc/ChangeLog:

	* config/s390/vector.md (V_HW_64): Remove mode iterator.
	(*vec_load_pair<mode>): Use V_HW_2 instead of V_HW_64.
	* config/s390/vx-builtins.md
	(vec_scatter_element<V_HW_2:mode>_SI): Use V_HW_2 instead of
	V_HW_64.
2021-08-04 18:40:10 +02:00
Andreas Krebbel 0aa7091bef IBM Z: Get rid of vpdi unspec
The patch gets rid of the unspec used for the vector permute double
immediate instruction and replaces it with generic rtx.

gcc/ChangeLog:

	* config/s390/s390.md (UNSPEC_VEC_PERMI): Remove constant
	definition.
	* config/s390/vector.md (*vpdi1<mode>, *vpdi4<mode>): New pattern
	definitions.
	* config/s390/vx-builtins.md (*vec_permi<mode>): Emit generic rtx
	instead of an unspec.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/zvector/vec-permi.c: Removed.
	* gcc.target/s390/zvector/vec_permi.c: New test.
2021-08-04 18:40:09 +02:00
Andreas Krebbel 5391688acc IBM Z: Get rid of vec merge unspec
This patch gets rid of the unspecs we were using for the vector merge
instruction and replaces it with generic rtx.

gcc/ChangeLog:

	* config/s390/s390-modes.def: Add more vector modes to support
	concatenation of two vectors.
	* config/s390/s390-protos.h (s390_expand_merge_perm_const): Add
	prototype.
	(s390_expand_merge): Likewise.
	* config/s390/s390.c (s390_expand_merge_perm_const): New function.
	(s390_expand_merge): New function.
	* config/s390/s390.md (UNSPEC_VEC_MERGEH, UNSPEC_VEC_MERGEL):
	Remove constant definitions.
	* config/s390/vector.md (V_HW_2): Add mode iterators.
	(VI_HW_4, V_HW_4): Rename VI_HW_4 to V_HW_4.
	(vec_2x_nelts, vec_2x_wide): New mode attributes.
	(*vmrhb, *vmrlb, *vmrhh, *vmrlh, *vmrhf, *vmrlf, *vmrhg, *vmrlg):
	New pattern definitions.
	(vec_widen_umult_lo_<mode>, vec_widen_umult_hi_<mode>)
	(vec_widen_smult_lo_<mode>, vec_widen_smult_hi_<mode>)
	(vec_unpacks_lo_v4sf, vec_unpacks_hi_v4sf, vec_unpacks_lo_v2df)
	(vec_unpacks_hi_v2df): Adjust expanders to emit non-unspec RTX for
	vec merge.
	* config/s390/vx-builtins.md (V_HW_4): Remove mode iterator. Now
	in vector.md.
	(vec_mergeh<mode>, vec_mergel<mode>): Use s390_expand_merge to
	emit vec merge pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c:
	Instead of vpdi with 0 and 5 vmrlg and vmrhg are used now.
	* gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: Likewise.
	* gcc.target/s390/zvector/vec-types.h: New test.
	* gcc.target/s390/zvector/vec_merge.c: New test.
2021-08-04 18:40:09 +02:00
Jonathan Wright 63834c84d4 aarch64: Don't include vec_select high-half in SIMD multiply cost
The Neon multiply/multiply-accumulate/multiply-subtract instructions
can select the top or bottom half of the operand registers. This
selection does not change the cost of the underlying instruction and
this should be reflected by the RTL cost function.

This patch adds RTL tree traversal in the Neon multiply cost function
to match vec_select high-half of its operands. This traversal
prevents the cost of the vec_select from being added into the cost of
the multiply - meaning that these instructions can now be emitted in
the combine pass as they are no longer deemed prohibitively
expensive.

gcc/ChangeLog:

2021-07-19  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64.c (aarch64_strip_extend_vec_half):
	Define.
	(aarch64_rtx_mult_cost): Traverse RTL tree to prevent cost of
	vec_select high-half from being added into Neon multiply
	cost.
	* rtlanal.c (vec_series_highpart_p): Define.
	* rtlanal.h (vec_series_highpart_p): Declare.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vmul_high_cost.c: New test.
2021-08-04 16:58:26 +01:00
Jonathan Wright 1d65c9d251 aarch64: Don't include vec_select element in SIMD multiply cost
The Neon multiply/multiply-accumulate/multiply-subtract instructions
can take various forms - multiplying full vector registers of values
or multiplying one vector by a single element of another. Regardless
of the form used, these instructions have the same cost, and this
should be reflected by the RTL cost function.

This patch adds RTL tree traversal in the Neon multiply cost function
to match the vec_select used by the lane-referencing forms of the
instructions already mentioned. This traversal prevents the cost of
the vec_select from being added into the cost of the multiply -
meaning that these instructions can now be emitted in the combine
pass as they are no longer deemed prohibitively expensive.

gcc/ChangeLog:

2021-07-19  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64.c (aarch64_strip_duplicate_vec_elt):
	Define.
	(aarch64_rtx_mult_cost): Traverse RTL tree to prevent
	vec_select cost from being added into Neon multiply cost.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vmul_element_cost.c: New test.
2021-08-04 16:57:38 +01:00
Richard Sandiford 5a1017dc30 vect: Tweak comparisons with existing epilogue loops
This patch uses a more accurate scalar iteration estimate when
comparing the epilogue of a constant-iteration loop with a candidate
replacement epilogue.

In the testcase, the patch prevents a 1-to-3-element SVE epilogue
from seeming better than a 64-bit Advanced SIMD epilogue.

gcc/
	* tree-vect-loop.c (vect_better_loop_vinfo_p): Detect cases in
	which old_loop_vinfo is an epilogue loop that handles a constant
	number of iterations.

gcc/testsuite/
	* gcc.target/aarch64/sve/cost_model_12.c: New test.
2021-08-04 16:52:09 +01:00
Richard Sandiford 315a1c3756 vect: Tweak dump messages for vector mode choice
After vect_analyze_loop has successfully analysed a loop for
one base vector mode B1, it considers using following base vector
modes to vectorise an epilogue.  However, for VECT_COMPARE_COSTS,
a later mode B2 might turn out to be better than B1 was.  Initially
this comparison will be between an epilogue loop (for B2) and a main
loop (for B1).  However, in r11-6458 I'd added code to reanalyse the
B2 epilogue loop as a main loop, partly for correctness and partly
for better costing.

This can lead to a situation in which we think that the B2 epilogue
loop was better than the B1 main loop, but that the B2 main loop is
not better than the B1 main loop.  There was no dump message to say
that this had happened, which made it look like B2 had still won.

gcc/
	* tree-vect-loop.c (vect_analyze_loop): Print a dump message
	when a reanalyzed loop fails to be cheaper than the current
	main loop.
2021-08-04 16:52:08 +01:00
Richard Sandiford eb55b5b0df aarch64: Fix a typo
gcc/
	* config/aarch64/aarch64.c: Fix a typo.
2021-08-04 16:52:07 +01:00
Vincent Lefèvre 929f2cf410 gcov: check return code of a fclose
gcc/ChangeLog:

	PR gcov-profile/101773
	* gcov-io.c (gcov_close): Check return code of a fclose.
2021-08-04 17:26:28 +02:00
Bernd Edlinger 96c82a16b2 Fix debug info for ignored decls at start of assembly
Ignored functions decls that are compiled at the start of
the assembly have bogus line numbers until the first .file
directive, as reported in PR101575.

The corresponding binutils bug report is
https://sourceware.org/bugzilla/show_bug.cgi?id=28149

The work around for this issue is to emit a dummy .file
directive before the first function is compiled, unless
another .file directive was already emitted previously.

2021-08-04  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR ada/101575
	* dwarf2out.c (dwarf2out_assembly_start): Emit a dummy
	.file statement when needed.
2021-08-04 16:18:07 +02:00
Tamar Christina 9fcb8ec603 [testsuite] Fix trapping access in test PR101750
I believe PR101750 to be a testism. Fix it by giving the class a name.

gcc/testsuite/ChangeLog:

	PR tree-optimization/101750
	* g++.dg/vect/pr99149.cc: Name class.
2021-08-04 14:36:26 +01:00
Richard Biener 31855ba6b1 Add emulated gather capability to the vectorizer
This adds a gather vectorization capability to the vectorizer
without target support by decomposing the offset vector, doing
sclar loads and then building a vector from the result.  This
is aimed mainly at cases where vectorizing the rest of the loop
offsets the cost of vectorizing the gather.

Note it's difficult to avoid vectorizing the offset load, but in
some cases later passes can turn the vector load + extract into
scalar loads, see the followup patch.

On SPEC CPU 2017 510.parest_r this improves runtime from 250s
to 219s on a Zen2 CPU which has its native gather instructions
disabled (using those the runtime instead increases to 254s)
using -Ofast -march=znver2 [-flto].  It turns out the critical
loops in this benchmark all perform gather operations.

2021-07-30  Richard Biener  <rguenther@suse.de>

	* tree-vect-data-refs.c (vect_check_gather_scatter):
	Include widening conversions only when the result is
	still handed by native gather or the current offset
	size not already matches the data size.
	Also succeed analysis in case there's no native support,
	noted by a IFN_LAST ifn and a NULL decl.
	(vect_analyze_data_refs): Always consider gathers.
	* tree-vect-patterns.c (vect_recog_gather_scatter_pattern):
	Test for no IFN gather rather than decl gather.
	* tree-vect-stmts.c (vect_model_load_cost): Pass in the
	gather-scatter info and cost emulated gathers accordingly.
	(vect_truncate_gather_scatter_offset): Properly test for
	no IFN gather.
	(vect_use_strided_gather_scatters_p): Likewise.
	(get_load_store_type): Handle emulated gathers and its
	restrictions.
	(vectorizable_load): Likewise.  Emulate them by extracting
	scalar offsets, doing scalar loads and a vector construct.

	* gcc.target/i386/vect-gather-1.c: New testcase.
	* gfortran.dg/vect/vect-8.f90: Adjust.
2021-08-04 15:28:07 +02:00
H.J. Lu f2e5d2717d by_pieces: Pass MAX_PIECES to op_by_pieces_d
Pass MAX_PIECES to op_by_pieces_d::op_by_pieces_d for move, store and
compare.

	PR target/101742
	* expr.c (op_by_pieces_d::op_by_pieces_d): Add a max_pieces
	argument to set m_max_size.
	(move_by_pieces_d): Pass MOVE_MAX_PIECES to op_by_pieces_d.
	(store_by_pieces_d): Pass STORE_MAX_PIECES to op_by_pieces_d.
	(compare_by_pieces_d): Pass COMPARE_MAX_PIECES to op_by_pieces_d.
2021-08-04 06:24:46 -07:00
Roger Sayle 96146e61cd Fold (X<<C1)^(X<<C2) to a multiplication when possible.
The easiest way to motivate these additions to match.pd is with the
following example:

unsigned int foo(unsigned char i) {
  return i | (i<<8) | (i<<16) | (i<<24);
}

which mainline with -O2 on x86_64 currently generates:
foo:	movzbl  %dil, %edi
	movl    %edi, %eax
	movl    %edi, %edx
	sall    $8, %eax
	sall    $16, %edx
	orl     %edx, %eax
	orl     %edi, %eax
	sall    $24, %edi
	orl     %edi, %eax
	ret

but with this patch now becomes:
foo:	movzbl  %dil, %eax
        imull   $16843009, %eax, %eax
        ret

Interestingly, this transformation is already applied when using
addition, allowing synth_mult to select an optimal sequence, but
not when using the equivalent bit-wise ior or xor operators.

The solution is to use tree_nonzero_bits to check that the
potentially non-zero bits of each operand don't overlap, which
ensures that BIT_IOR_EXPR and BIT_XOR_EXPR produce the same
results as PLUS_EXPR, which effectively generalizes the old
fold_plusminus_mult_expr.  Technically, the transformation
is to canonicalize (X*C1)|(X*C2) and (X*C1)^(X*C2) to
X*(C1+C2) where X and X<<C are considered special cases.

2021-08-04  Roger Sayle  <roger@nextmovesoftware.com>
	    Marc Glisse  <marc.glisse@inria.fr>

gcc/ChangeLog
	* match.pd (bit_ior, bit_xor): Canonicalize (X*C1)|(X*C2) and
	(X*C1)^(X*C2) as X*(C1+C2), and related variants, using
	tree_nonzero_bits to ensure that operands are bit-wise disjoint.

gcc/testsuite/ChangeLog
	* gcc.dg/fold-ior-4.c: New test.
2021-08-04 14:22:51 +01:00
Jonathan Wakely 0d04fe4923 libstdc++: Add [[nodiscard]] to sequence containers
... and container adaptors.

This adds the [[nodiscard]] attribute to functions with no side-effects
for the sequence containers and their iterators, and the debug versions
of those containers, and the container adaptors,

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/forward_list.h: Add [[nodiscard]] to functions
	with no side-effects.
	* include/bits/stl_bvector.h: Likewise.
	* include/bits/stl_deque.h: Likewise.
	* include/bits/stl_list.h: Likewise.
	* include/bits/stl_queue.h: Likewise.
	* include/bits/stl_stack.h: Likewise.
	* include/bits/stl_vector.h: Likewise.
	* include/debug/deque: Likewise.
	* include/debug/forward_list: Likewise.
	* include/debug/list: Likewise.
	* include/debug/safe_iterator.h: Likewise.
	* include/debug/vector: Likewise.
	* include/std/array: Likewise.
	* testsuite/23_containers/array/creation/3_neg.cc: Use
	-Wno-unused-result.
	* testsuite/23_containers/array/debug/back1_neg.cc: Cast result
	to void.
	* testsuite/23_containers/array/debug/back2_neg.cc: Likewise.
	* testsuite/23_containers/array/debug/front1_neg.cc: Likewise.
	* testsuite/23_containers/array/debug/front2_neg.cc: Likewise.
	* testsuite/23_containers/array/debug/square_brackets_operator1_neg.cc:
	Likewise.
	* testsuite/23_containers/array/debug/square_brackets_operator2_neg.cc:
	Likewise.
	* testsuite/23_containers/array/tuple_interface/get_neg.cc:
	Adjust dg-error line numbers.
	* testsuite/23_containers/deque/cons/clear_allocator.cc: Cast
	result to void.
	* testsuite/23_containers/deque/debug/invalidation/4.cc:
	Likewise.
	* testsuite/23_containers/deque/types/1.cc: Use
	-Wno-unused-result.
	* testsuite/23_containers/list/types/1.cc: Cast result to void.
	* testsuite/23_containers/priority_queue/members/7161.cc:
	Likewise.
	* testsuite/23_containers/queue/members/7157.cc: Likewise.
	* testsuite/23_containers/vector/59829.cc: Likewise.
	* testsuite/23_containers/vector/ext_pointer/types/1.cc:
	Likewise.
	* testsuite/23_containers/vector/ext_pointer/types/2.cc:
	Likewise.
	* testsuite/23_containers/vector/types/1.cc: Use
	-Wno-unused-result.
2021-08-04 12:54:29 +01:00
Jonathan Wakely 240b01b021 libstdc++: Add [[nodiscard]] to iterators and related utilities
This adds [[nodiscard]] throughout <iterator>, as proposed by P2377R0
(with some minor corrections).

The attribute is added for all modes from C++11 up, using
[[__nodiscard__]] or _GLIBCXX_NODISCARD where C++17 [[nodiscard]] can't
be used directly.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/iterator_concepts.h (iter_move): Add
	[[nodiscard]].
	* include/bits/range_access.h (begin, end, cbegin, cend)
	(rbegin, rend, crbegin, crend, size, data, ssize): Likewise.
	* include/bits/ranges_base.h (ranges::begin, ranges::end)
	(ranges::cbegin, ranges::cend, ranges::rbegin, ranges::rend)
	(ranges::crbegin, ranges::crend, ranges::size, ranges::ssize)
	(ranges::empty, ranges::data, ranges::cdata): Likewise.
	* include/bits/stl_iterator.h (reverse_iterator, __normal_iterator)
	(back_insert_iterator, front_insert_iterator, insert_iterator)
	(move_iterator, move_sentinel, common_iterator)
	(counted_iterator): Likewise.
	* include/bits/stl_iterator_base_funcs.h (distance, next, prev):
	Likewise.
	* include/bits/stream_iterator.h (istream_iterator)
	(ostream_iterartor): Likewise.
	* include/bits/streambuf_iterator.h (istreambuf_iterator)
	(ostreambuf_iterator): Likewise.
	* include/std/ranges (views::single, views::iota, views::all)
	(views::filter, views::transform, views::take, views::take_while)
	(views::drop, views::drop_while, views::join, views::lazy_split)
	(views::split, views::counted, views::common, views::reverse)
	(views::elements): Likewise.
	* testsuite/20_util/rel_ops.cc: Use -Wno-unused-result.
	* testsuite/24_iterators/move_iterator/greedy_ops.cc: Likewise.
	* testsuite/24_iterators/normal_iterator/greedy_ops.cc:
	Likewise.
	* testsuite/24_iterators/reverse_iterator/2.cc: Likewise.
	* testsuite/24_iterators/reverse_iterator/greedy_ops.cc:
	Likewise.
	* testsuite/21_strings/basic_string/range_access/char/1.cc:
	Cast result to void.
	* testsuite/21_strings/basic_string/range_access/wchar_t/1.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/range_access/char/1.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/range_access/wchar_t/1.cc:
	Likewise.
	* testsuite/23_containers/array/range_access.cc: Likewise.
	* testsuite/23_containers/deque/range_access.cc: Likewise.
	* testsuite/23_containers/forward_list/range_access.cc:
	Likewise.
	* testsuite/23_containers/list/range_access.cc: Likewise.
	* testsuite/23_containers/map/range_access.cc: Likewise.
	* testsuite/23_containers/multimap/range_access.cc: Likewise.
	* testsuite/23_containers/multiset/range_access.cc: Likewise.
	* testsuite/23_containers/set/range_access.cc: Likewise.
	* testsuite/23_containers/unordered_map/range_access.cc:
	Likewise.
	* testsuite/23_containers/unordered_multimap/range_access.cc:
	Likewise.
	* testsuite/23_containers/unordered_multiset/range_access.cc:
	Likewise.
	* testsuite/23_containers/unordered_set/range_access.cc:
	Likewise.
	* testsuite/23_containers/vector/range_access.cc: Likewise.
	* testsuite/24_iterators/customization_points/iter_move.cc:
	Likewise.
	* testsuite/24_iterators/istream_iterator/sentinel.cc:
	Likewise.
	* testsuite/24_iterators/istreambuf_iterator/sentinel.cc:
	Likewise.
	* testsuite/24_iterators/move_iterator/dr2061.cc: Likewise.
	* testsuite/24_iterators/operations/prev_neg.cc: Likewise.
	* testsuite/24_iterators/ostreambuf_iterator/2.cc: Likewise.
	* testsuite/24_iterators/range_access/range_access.cc:
	Likewise.
	* testsuite/24_iterators/range_operations/100768.cc: Likewise.
	* testsuite/26_numerics/valarray/range_access2.cc: Likewise.
	* testsuite/28_regex/range_access.cc: Likewise.
	* testsuite/experimental/string_view/range_access/char/1.cc:
	Likewise.
	* testsuite/experimental/string_view/range_access/wchar_t/1.cc:
	Likewise.
	* testsuite/ext/vstring/range_access.cc: Likewise.
	* testsuite/std/ranges/adaptors/take.cc: Likewise.
	* testsuite/std/ranges/p2259.cc: Likewise.
2021-08-04 12:54:28 +01:00
Richard Biener 2724d1bba6 Rewrite more vector loads to scalar loads
This teaches forwprop to rewrite more vector loads that are only
used in BIT_FIELD_REFs as scalar loads.  This provides the
remaining uplift to SPEC CPU 2017 510.parest_r on Zen 2 which
has CPU gathers disabled.

In particular vector load + vec_unpack + bit-field-ref is turned
into (extending) scalar loads which avoids costly XMM/GPR
transitions.  To not conflict with vector load + bit-field-ref
+ vector constructor matching to vector load + shuffle the
extended transform is only done after vector lowering.

2021-07-30  Richard Biener  <rguenther@suse.de>

	* tree-ssa-forwprop.c (pass_forwprop::execute): Split
	out code to decompose vector loads ...
	(optimize_vector_load): ... here.  Generalize it to
	handle intermediate widening and TARGET_MEM_REF loads
	and apply it to loads with a supported vector mode as well.
2021-08-04 12:38:03 +02:00
Richard Biener 87a0b607e4 tree-optimization/101756 - avoid vectorizing boolean MAX reductions
The following avoids vectorizing MIN/MAX reductions on bools which,
when ending up as vector(2) <signed-boolean:64> would need to be
adjusted because of the sign change.  The fix instead avoids any
reduction vectorization where the result isn't compatible
to the original scalar type since we don't compensate for that
either.

2021-08-04  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101756
	* tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Make sure
	the result of the reduction epilogue is compatible to the original
	scalar result.

	* gcc.dg/vect/bb-slp-pr101756.c: New testcase.
2021-08-04 12:33:23 +02:00
Jakub Jelinek af31cab047 c++: Fix up #pragma omp declare {simd,variant} and acc routine parsing
When parsing default arguments, we need to temporarily clear parser->omp_declare_simd
and parser->oacc_routine, otherwise it can clash with further declarations
inside of e.g. lambdas inside of those default arguments.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

	PR c++/101759
	* parser.c (cp_parser_default_argument): Temporarily override
	parser->omp_declare_simd and parser->oacc_routine to NULL.

	* g++.dg/gomp/pr101759.C: New test.
	* g++.dg/goacc/pr101759.C: New test.
2021-08-04 11:53:48 +02:00
Jakub Jelinek 8aa14fa7d9 testsuite: Fix duplicated content of gcc.c-torture/execute/ieee/pr29302-1.x
The file has two identical halves, seems like twice applied patch.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

	* gcc.c-torture/execute/ieee/pr29302-1.x: Undo doubly applied patch.
2021-08-04 11:44:45 +02:00
liuhongt 9f26640f7b Refine predicate of peephole2 to general_reg_operand. [PR target/101743]
The define_peephole2 which is added by r12-2640-gf7bf03cf69ccb7dc
should only work on general registers, considering that x86 also
supports mov instructions between gpr, sse reg, mask reg, limiting the
peephole2 predicate to general_reg_operand.

gcc/ChangeLog:

	PR target/101743
	* config/i386/i386.md (peephole2): Refine predicate from
	register_operand to general_reg_operand.
2021-08-04 17:43:17 +08:00
Jakub Jelinek 7195fa03e7 libgcc: Fix duplicated content of config/t-slibgcc-fuchsia
The file has two identical halves, seems like twice applied patch.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

	* config/t-slibgcc-fuchsia: Undo doubly applied patch.
2021-08-04 11:40:52 +02:00
Aldy Hernandez 9db0bcd9fd Mark path_range_query::dump as override.
gcc/ChangeLog:

	* gimple-range-path.h (path_range_query::dump): Mark override.
2021-08-04 10:57:11 +02:00
Richard Biener 4d56259101 tree-optimization/101769 - tail recursion creates possibly infinite loop
This makes tail recursion optimization produce a loop structure
manually rather than relying on loop fixup.  That also allows the
loop to be marked as finite (it would eventually blow the stack
if it were not).

2021-08-04  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101769
	* tree-tailcall.c (eliminate_tail_call): Add the created loop
	for the first recursion and return it via the new output parameter.
	(optimize_tail_call): Pass through new output param.
	(tree_optimize_tail_calls_1): After creating all latches,
	add the created loop to the loop tree.  Do not mark loops for fixup.

	* g++.dg/tree-ssa/pr101769.C: New testcase.
2021-08-04 10:35:27 +02:00
Martin Liska 5c73b94fdc docs: document threader-mode param
gcc/ChangeLog:

	* doc/invoke.texi: Document threader-mode param.
2021-08-04 09:48:05 +02:00
liuhongt 3ae1468e26 Add dg-require-effective-target for testcases.
gcc/testsuite/ChangeLog:

	* gcc.target/i386/cond_op_addsubmul_d-2.c: Add
	dg-require-effective-target for avx512.
	* gcc.target/i386/cond_op_addsubmul_q-2.c: Ditto.
	* gcc.target/i386/cond_op_addsubmul_w-2.c: Ditto.
	* gcc.target/i386/cond_op_addsubmuldiv_double-2.c: Ditto.
	* gcc.target/i386/cond_op_addsubmuldiv_float-2.c: Ditto.
	* gcc.target/i386/cond_op_fma_double-2.c: Ditto.
	* gcc.target/i386/cond_op_fma_float-2.c: Ditto.
2021-08-04 13:25:46 +08:00
liuhongt 2fc2e3917f Support cond_{fma,fms,fnma,fnms} for vector float/double under AVX512.
gcc/ChangeLog:

	* config/i386/sse.md (cond_fma<mode>): New expander.
	(cond_fms<mode>): Ditto.
	(cond_fnma<mode>): Ditto.
	(cond_fnms<mode>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/cond_op_fma_double-1.c: New test.
	* gcc.target/i386/cond_op_fma_double-2.c: New test.
	* gcc.target/i386/cond_op_fma_float-1.c: New test.
	* gcc.target/i386/cond_op_fma_float-2.c: New test.
2021-08-04 12:58:01 +08:00
Cherry Mui 22e40cc7fe compiler: support new language constructs in escape analysis
Previous CLs add new language constructs in Go 1.17, specifically,
unsafe.Add, unsafe.Slice, and conversion from a slice to a pointer
to an array. This CL handles them in the escape analysis.

At the point of the escape analysis, unsafe.Add and unsafe.Slice
are still builtin calls, so just handle them in data flow.
Conversion from a slice to a pointer to an array has already been
lowered to a combination of compound expression, conditional
expression and slice info expressions, so handle them in the
escape analysis.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/339671
2021-08-03 18:32:07 -07:00
GCC Administrator fa1407c761 Daily bump. 2021-08-04 00:16:51 +00:00
Ian Lance Taylor e435e72ad7 compile, runtime: make selectnbrecv return two values
The only different between selectnbrecv and selectnbrecv2 is the later
set the input pointer value by second return value from chanrecv.

So by making selectnbrecv return two values from chanrecv, we can get
rid of selectnbrecv2, the compiler can now call only selectnbrecv and
generate simpler code.

This is the gofrontend version of https://golang.org/cl/292890.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/339529
2021-08-03 16:40:00 -07:00
Ian Lance Taylor cbbd439a33 compiler: check slice to pointer-to-array conversion element type
When checking a slice to pointer-to-array conversion, I forgot to
verify that the elements types are identical.

For golang/go#395

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/339329
2021-08-03 16:36:20 -07:00
Segher Boessenkool 3a7794b469 rs6000: Replace & by &&
2021-08-03  Segher Boessenkool  <segher@kernel.crashing.org>

	* config/rs6000/vsx.md (*vsx_le_perm_store_<mode>): Use && instead of &.
2021-08-03 22:33:52 +00:00
Segher Boessenkool ebff536cf4 rs6000: "e" is not a free constraint letter
It is the prefix of the "es" and "eI" constraints.

2021-08-03  Segher Boessenkool  <segher@kernel.crashing.org>

	* config/rs6000/constraints.md: Remove "e" from the list of available
	constraint characters.
2021-08-03 22:26:40 +00:00
Eugene Rozenfeld 285aa6895d Fix indirect call inlining with AutoFDO
The histogram value for indirect calls was incorrectly set up.
That is fixed now.

With this change the tree-prof tests checking indirect call inlining with AutoFDO
in gcc.dg and g++.dg are passing.

Resolves:
PR gcov-profile/71672 - inlining indirect calls does not work with autofdo

gcc/ChangeLog:
	PR gcov-profile/71672
	* auto-profile.c (afdo_indirect_call): Fix setup of the historgram value for indirect calls.
2021-08-03 14:36:33 -07:00
Eugene Rozenfeld 9265b37853 Fixes for AutoFDO testing
* create_gcov tool doesn't currently support dwarf 5 so I made a change in profopt.exp
  to pass -gdwarf-4 when compiling the binary to profile.

* I updated the invocation of create_gcov in profopt.exp to pass -gcov_version=2.
  I recently made a change to create_gcov to support version 2:
  https://github.com/google/autofdo/pull/117 .

* I removed useless -o perf.data from the invocation of gcc-auto-profile in
  target-supports.exp.

These changes contribute to fixing PR gcov-profile/71672.

gcc/testsuite/ChangeLog:

	* lib/profopt.exp: Pass gdwarf-4 when compiling test to profile; pass -gcov_version=2.
	* lib/target-supports.exp: Remove unnecessary -o perf.data passed to gcc-auto-profile.
2021-08-03 14:28:42 -07:00
Eugene Rozenfeld 0ed093c7c3 Fix indir-call-prof-2.c with AutoFDO
indir-call-prof-2.c has -fno-early-inlining but AutoFDO can't work without
early inlining (it needs to match the inlining of the profiled binary).
I changed profopt.exp to always pass -fearly-inlining for AutoFDO.
With that change the indirect call inlining in indir-call-prof-2.c happens in the early inliner
so I changed the dg-final-use-autofdo.

Contributes to fixing PR gcov-profile/71672

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-prof/indir-call-prof-2.c: Fix dg-final-use-autofdo.
	* lib/profopt.exp: Pass -fearly-inlining when compiling with AutoFDO.
2021-08-03 14:26:27 -07:00
Eugene Rozenfeld f9ad3d5339 Fixes for AutoFDO tests
* Changed several tests to use -fdump-ipa-afdo-optimized instead of -fdump-ipa-afdo
in dg-options so that the expected output can be found

* Increased the number of iterations in several tests so that perf can have
enough sampling events

Contributes to fixing PR gcov-profile/71672.

gcc/testsuite/ChangeLog:

	* g++.dg/tree-prof/indir-call-prof.C: Fix options, increase the number of iterations.
	* g++.dg/tree-prof/morefunc.C: Fix options, increase the number of iterations.
	* g++.dg/tree-prof/reorder.C: Fix options, increase the number of iterations.
	* gcc.dg/tree-prof/indir-call-prof-2.c: Fix options, increase the number of iterations.
	* gcc.dg/tree-prof/indir-call-prof.c: Fix options.
2021-08-03 14:25:47 -07:00
Martin Sebor aabf07cd5d Disable a test case in ILP32 [PR101688].
Resolves:
PR testsuite/101688 - g++.dg/warn/Wstringop-overflow-4.C fails on 32-bit archs with new jump threader

gcc/testsuite:
	PR testsuite/101688
	* g++.dg/warn/Wstringop-overflow-4.C: Disable a test case in ILP32.
2021-08-03 13:56:56 -06:00
Paul A. Clarke 0f44b09732 rs6000: Add test for _mm_minpos_epu16
Copy the test for _mm_minpos_epu16 from
gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with
a few adjustments:

- Adjust the dejagnu directives for powerpc platform.
- Make the data not be monotonically increasing,
  such that some of the returned values are not
  always the first value (index 0).
- Create a list of input data testing various scenarios
  including more than one minimum value and different
  orders and indices of the minimum value.
- Fix a masking issue where the index was being truncated
  to 2 bits instead of 3 bits, which wasn't found because
  all of the returned indices were 0 with the original
  generated data.
- Support big-endian.

2021-08-03  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
	* gcc.target/powerpc/sse4_1-phminposuw.c: Copy from
	gcc/testsuite/gcc.target/i386, adjust dg directives to suit,
	make more robust.
2021-08-03 13:58:41 -05:00
Paul A. Clarke eaa93a0f3d rs6000: Add support for _mm_minpos_epu16
Add a naive implementation of the subject x86 intrinsic to
ease porting.

2021-08-03  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_minpos_epu16): New.
2021-08-03 13:58:31 -05:00
Jonathan Wakely a77a46d9ae libstdc++: Suppress redundant definitions of inline variables
In C++17 the out-of-class definitions for static constexpr variables are
redundant, because they are implicitly inline. This change avoids
"redundant redeclaration" warnings from -Wsystem-headers -Wdeprecated.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/random.tcc (linear_congruential_engine): Do not
	define static constexpr members when they are implicitly inline.
	* include/std/ratio (ratio, __ratio_multiply, __ratio_divide)
	(__ratio_add, __ratio_subtract): Likewise.
	* include/std/type_traits (integral_constant): Likewise.
	* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
	line number.
2021-08-03 15:41:11 +01:00
Jonathan Wakely 5c6759e416 libstdc++: Replace TR1 components with C++11 ones in test utils
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* testsuite/util/testsuite_common_types.h: Replace uses of
	tr1::unordered_map and tr1::unordered_set with their C++11
	equivalents.
	* testsuite/29_atomics/atomic/cons/assign_neg.cc: Adjust
	dg-error line number.
	* testsuite/29_atomics/atomic/cons/copy_neg.cc: Likewise.
	* testsuite/29_atomics/atomic_integral/cons/assign_neg.cc:
	Likewise.
	* testsuite/29_atomics/atomic_integral/cons/copy_neg.cc:
	Likewise.
	* testsuite/29_atomics/atomic_integral/operators/bitwise_neg.cc:
	Likewise.
	* testsuite/29_atomics/atomic_integral/operators/decrement_neg.cc:
	Likewise.
	* testsuite/29_atomics/atomic_integral/operators/increment_neg.cc:
	Likewise.
2021-08-03 15:40:42 +01:00
Jonathan Wakely 13a1ac9f6f libstdc++: Specialize allocator_traits<pmr::polymorphic_allocator<T>>
This adds a partial specialization of allocator_traits, similar to what
was already done for std::allocator. This means that most uses of
polymorphic_allocator via the traits can avoid the metaprogramming
overhead needed to deduce the properties from polymorphic_allocator.

In addition, I'm changing polymorphic_allocator::delete_object to invoke
the destructor (or pseudo-destructor) directly, rather than calling
allocator_traits::destroy, which calls polymorphic_allocator::destroy
(which is deprecated). This is observable if a user has specialized
allocator_traits<polymorphic_allocator<Foo>> and expects to see its
destroy member function called. I consider explicit specializations of
allocator_traits to be wrong-headed, and this use case seems unnecessary
to support. So delete_object just invokes the destructor directly.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/std/memory_resource (polymorphic_allocator::delete_object):
	Call destructor directly instead of using destroy.
	(allocator_traits<polymorphic_allocator<T>>): Define partial
	specialization.
2021-08-03 15:30:36 +01:00
Jonathan Wakely 9bd87e3887 libstdc++: Remove trailing whitespace in some tests
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* testsuite/20_util/function_objects/binders/3113.cc: Remove
	trailing whitespace.
	* testsuite/20_util/shared_ptr/assign/auto_ptr.cc: Likewise.
	* testsuite/20_util/shared_ptr/assign/auto_ptr_neg.cc: Likewise.
	* testsuite/20_util/shared_ptr/assign/auto_ptr_rvalue.cc:
	Likewise.
	* testsuite/20_util/shared_ptr/creation/dr925.cc: Likewise.
	* testsuite/25_algorithms/headers/algorithm/synopsis.cc:
	Likewise.
	* testsuite/25_algorithms/random_shuffle/requirements/explicit_instantiation/2.cc:
	Likewise.
	* testsuite/25_algorithms/random_shuffle/requirements/explicit_instantiation/pod.cc:
	Likewise.
2021-08-03 15:30:36 +01:00