187142 Commits

Author SHA1 Message Date
Richard Biener
31855ba6b1 Add emulated gather capability to the vectorizer
This adds a gather vectorization capability to the vectorizer
without target support by decomposing the offset vector, doing
sclar loads and then building a vector from the result.  This
is aimed mainly at cases where vectorizing the rest of the loop
offsets the cost of vectorizing the gather.

Note it's difficult to avoid vectorizing the offset load, but in
some cases later passes can turn the vector load + extract into
scalar loads, see the followup patch.

On SPEC CPU 2017 510.parest_r this improves runtime from 250s
to 219s on a Zen2 CPU which has its native gather instructions
disabled (using those the runtime instead increases to 254s)
using -Ofast -march=znver2 [-flto].  It turns out the critical
loops in this benchmark all perform gather operations.

2021-07-30  Richard Biener  <rguenther@suse.de>

	* tree-vect-data-refs.c (vect_check_gather_scatter):
	Include widening conversions only when the result is
	still handed by native gather or the current offset
	size not already matches the data size.
	Also succeed analysis in case there's no native support,
	noted by a IFN_LAST ifn and a NULL decl.
	(vect_analyze_data_refs): Always consider gathers.
	* tree-vect-patterns.c (vect_recog_gather_scatter_pattern):
	Test for no IFN gather rather than decl gather.
	* tree-vect-stmts.c (vect_model_load_cost): Pass in the
	gather-scatter info and cost emulated gathers accordingly.
	(vect_truncate_gather_scatter_offset): Properly test for
	no IFN gather.
	(vect_use_strided_gather_scatters_p): Likewise.
	(get_load_store_type): Handle emulated gathers and its
	restrictions.
	(vectorizable_load): Likewise.  Emulate them by extracting
	scalar offsets, doing scalar loads and a vector construct.

	* gcc.target/i386/vect-gather-1.c: New testcase.
	* gfortran.dg/vect/vect-8.f90: Adjust.
2021-08-04 15:28:07 +02:00
H.J. Lu
f2e5d2717d by_pieces: Pass MAX_PIECES to op_by_pieces_d
Pass MAX_PIECES to op_by_pieces_d::op_by_pieces_d for move, store and
compare.

	PR target/101742
	* expr.c (op_by_pieces_d::op_by_pieces_d): Add a max_pieces
	argument to set m_max_size.
	(move_by_pieces_d): Pass MOVE_MAX_PIECES to op_by_pieces_d.
	(store_by_pieces_d): Pass STORE_MAX_PIECES to op_by_pieces_d.
	(compare_by_pieces_d): Pass COMPARE_MAX_PIECES to op_by_pieces_d.
2021-08-04 06:24:46 -07:00
Roger Sayle
96146e61cd Fold (X<<C1)^(X<<C2) to a multiplication when possible.
The easiest way to motivate these additions to match.pd is with the
following example:

unsigned int foo(unsigned char i) {
  return i | (i<<8) | (i<<16) | (i<<24);
}

which mainline with -O2 on x86_64 currently generates:
foo:	movzbl  %dil, %edi
	movl    %edi, %eax
	movl    %edi, %edx
	sall    $8, %eax
	sall    $16, %edx
	orl     %edx, %eax
	orl     %edi, %eax
	sall    $24, %edi
	orl     %edi, %eax
	ret

but with this patch now becomes:
foo:	movzbl  %dil, %eax
        imull   $16843009, %eax, %eax
        ret

Interestingly, this transformation is already applied when using
addition, allowing synth_mult to select an optimal sequence, but
not when using the equivalent bit-wise ior or xor operators.

The solution is to use tree_nonzero_bits to check that the
potentially non-zero bits of each operand don't overlap, which
ensures that BIT_IOR_EXPR and BIT_XOR_EXPR produce the same
results as PLUS_EXPR, which effectively generalizes the old
fold_plusminus_mult_expr.  Technically, the transformation
is to canonicalize (X*C1)|(X*C2) and (X*C1)^(X*C2) to
X*(C1+C2) where X and X<<C are considered special cases.

2021-08-04  Roger Sayle  <roger@nextmovesoftware.com>
	    Marc Glisse  <marc.glisse@inria.fr>

gcc/ChangeLog
	* match.pd (bit_ior, bit_xor): Canonicalize (X*C1)|(X*C2) and
	(X*C1)^(X*C2) as X*(C1+C2), and related variants, using
	tree_nonzero_bits to ensure that operands are bit-wise disjoint.

gcc/testsuite/ChangeLog
	* gcc.dg/fold-ior-4.c: New test.
2021-08-04 14:22:51 +01:00
Jonathan Wakely
0d04fe4923 libstdc++: Add [[nodiscard]] to sequence containers
... and container adaptors.

This adds the [[nodiscard]] attribute to functions with no side-effects
for the sequence containers and their iterators, and the debug versions
of those containers, and the container adaptors,

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/forward_list.h: Add [[nodiscard]] to functions
	with no side-effects.
	* include/bits/stl_bvector.h: Likewise.
	* include/bits/stl_deque.h: Likewise.
	* include/bits/stl_list.h: Likewise.
	* include/bits/stl_queue.h: Likewise.
	* include/bits/stl_stack.h: Likewise.
	* include/bits/stl_vector.h: Likewise.
	* include/debug/deque: Likewise.
	* include/debug/forward_list: Likewise.
	* include/debug/list: Likewise.
	* include/debug/safe_iterator.h: Likewise.
	* include/debug/vector: Likewise.
	* include/std/array: Likewise.
	* testsuite/23_containers/array/creation/3_neg.cc: Use
	-Wno-unused-result.
	* testsuite/23_containers/array/debug/back1_neg.cc: Cast result
	to void.
	* testsuite/23_containers/array/debug/back2_neg.cc: Likewise.
	* testsuite/23_containers/array/debug/front1_neg.cc: Likewise.
	* testsuite/23_containers/array/debug/front2_neg.cc: Likewise.
	* testsuite/23_containers/array/debug/square_brackets_operator1_neg.cc:
	Likewise.
	* testsuite/23_containers/array/debug/square_brackets_operator2_neg.cc:
	Likewise.
	* testsuite/23_containers/array/tuple_interface/get_neg.cc:
	Adjust dg-error line numbers.
	* testsuite/23_containers/deque/cons/clear_allocator.cc: Cast
	result to void.
	* testsuite/23_containers/deque/debug/invalidation/4.cc:
	Likewise.
	* testsuite/23_containers/deque/types/1.cc: Use
	-Wno-unused-result.
	* testsuite/23_containers/list/types/1.cc: Cast result to void.
	* testsuite/23_containers/priority_queue/members/7161.cc:
	Likewise.
	* testsuite/23_containers/queue/members/7157.cc: Likewise.
	* testsuite/23_containers/vector/59829.cc: Likewise.
	* testsuite/23_containers/vector/ext_pointer/types/1.cc:
	Likewise.
	* testsuite/23_containers/vector/ext_pointer/types/2.cc:
	Likewise.
	* testsuite/23_containers/vector/types/1.cc: Use
	-Wno-unused-result.
2021-08-04 12:54:29 +01:00
Jonathan Wakely
240b01b021 libstdc++: Add [[nodiscard]] to iterators and related utilities
This adds [[nodiscard]] throughout <iterator>, as proposed by P2377R0
(with some minor corrections).

The attribute is added for all modes from C++11 up, using
[[__nodiscard__]] or _GLIBCXX_NODISCARD where C++17 [[nodiscard]] can't
be used directly.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/iterator_concepts.h (iter_move): Add
	[[nodiscard]].
	* include/bits/range_access.h (begin, end, cbegin, cend)
	(rbegin, rend, crbegin, crend, size, data, ssize): Likewise.
	* include/bits/ranges_base.h (ranges::begin, ranges::end)
	(ranges::cbegin, ranges::cend, ranges::rbegin, ranges::rend)
	(ranges::crbegin, ranges::crend, ranges::size, ranges::ssize)
	(ranges::empty, ranges::data, ranges::cdata): Likewise.
	* include/bits/stl_iterator.h (reverse_iterator, __normal_iterator)
	(back_insert_iterator, front_insert_iterator, insert_iterator)
	(move_iterator, move_sentinel, common_iterator)
	(counted_iterator): Likewise.
	* include/bits/stl_iterator_base_funcs.h (distance, next, prev):
	Likewise.
	* include/bits/stream_iterator.h (istream_iterator)
	(ostream_iterartor): Likewise.
	* include/bits/streambuf_iterator.h (istreambuf_iterator)
	(ostreambuf_iterator): Likewise.
	* include/std/ranges (views::single, views::iota, views::all)
	(views::filter, views::transform, views::take, views::take_while)
	(views::drop, views::drop_while, views::join, views::lazy_split)
	(views::split, views::counted, views::common, views::reverse)
	(views::elements): Likewise.
	* testsuite/20_util/rel_ops.cc: Use -Wno-unused-result.
	* testsuite/24_iterators/move_iterator/greedy_ops.cc: Likewise.
	* testsuite/24_iterators/normal_iterator/greedy_ops.cc:
	Likewise.
	* testsuite/24_iterators/reverse_iterator/2.cc: Likewise.
	* testsuite/24_iterators/reverse_iterator/greedy_ops.cc:
	Likewise.
	* testsuite/21_strings/basic_string/range_access/char/1.cc:
	Cast result to void.
	* testsuite/21_strings/basic_string/range_access/wchar_t/1.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/range_access/char/1.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/range_access/wchar_t/1.cc:
	Likewise.
	* testsuite/23_containers/array/range_access.cc: Likewise.
	* testsuite/23_containers/deque/range_access.cc: Likewise.
	* testsuite/23_containers/forward_list/range_access.cc:
	Likewise.
	* testsuite/23_containers/list/range_access.cc: Likewise.
	* testsuite/23_containers/map/range_access.cc: Likewise.
	* testsuite/23_containers/multimap/range_access.cc: Likewise.
	* testsuite/23_containers/multiset/range_access.cc: Likewise.
	* testsuite/23_containers/set/range_access.cc: Likewise.
	* testsuite/23_containers/unordered_map/range_access.cc:
	Likewise.
	* testsuite/23_containers/unordered_multimap/range_access.cc:
	Likewise.
	* testsuite/23_containers/unordered_multiset/range_access.cc:
	Likewise.
	* testsuite/23_containers/unordered_set/range_access.cc:
	Likewise.
	* testsuite/23_containers/vector/range_access.cc: Likewise.
	* testsuite/24_iterators/customization_points/iter_move.cc:
	Likewise.
	* testsuite/24_iterators/istream_iterator/sentinel.cc:
	Likewise.
	* testsuite/24_iterators/istreambuf_iterator/sentinel.cc:
	Likewise.
	* testsuite/24_iterators/move_iterator/dr2061.cc: Likewise.
	* testsuite/24_iterators/operations/prev_neg.cc: Likewise.
	* testsuite/24_iterators/ostreambuf_iterator/2.cc: Likewise.
	* testsuite/24_iterators/range_access/range_access.cc:
	Likewise.
	* testsuite/24_iterators/range_operations/100768.cc: Likewise.
	* testsuite/26_numerics/valarray/range_access2.cc: Likewise.
	* testsuite/28_regex/range_access.cc: Likewise.
	* testsuite/experimental/string_view/range_access/char/1.cc:
	Likewise.
	* testsuite/experimental/string_view/range_access/wchar_t/1.cc:
	Likewise.
	* testsuite/ext/vstring/range_access.cc: Likewise.
	* testsuite/std/ranges/adaptors/take.cc: Likewise.
	* testsuite/std/ranges/p2259.cc: Likewise.
2021-08-04 12:54:28 +01:00
Richard Biener
2724d1bba6 Rewrite more vector loads to scalar loads
This teaches forwprop to rewrite more vector loads that are only
used in BIT_FIELD_REFs as scalar loads.  This provides the
remaining uplift to SPEC CPU 2017 510.parest_r on Zen 2 which
has CPU gathers disabled.

In particular vector load + vec_unpack + bit-field-ref is turned
into (extending) scalar loads which avoids costly XMM/GPR
transitions.  To not conflict with vector load + bit-field-ref
+ vector constructor matching to vector load + shuffle the
extended transform is only done after vector lowering.

2021-07-30  Richard Biener  <rguenther@suse.de>

	* tree-ssa-forwprop.c (pass_forwprop::execute): Split
	out code to decompose vector loads ...
	(optimize_vector_load): ... here.  Generalize it to
	handle intermediate widening and TARGET_MEM_REF loads
	and apply it to loads with a supported vector mode as well.
2021-08-04 12:38:03 +02:00
Richard Biener
87a0b607e4 tree-optimization/101756 - avoid vectorizing boolean MAX reductions
The following avoids vectorizing MIN/MAX reductions on bools which,
when ending up as vector(2) <signed-boolean:64> would need to be
adjusted because of the sign change.  The fix instead avoids any
reduction vectorization where the result isn't compatible
to the original scalar type since we don't compensate for that
either.

2021-08-04  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101756
	* tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Make sure
	the result of the reduction epilogue is compatible to the original
	scalar result.

	* gcc.dg/vect/bb-slp-pr101756.c: New testcase.
2021-08-04 12:33:23 +02:00
Jakub Jelinek
af31cab047 c++: Fix up #pragma omp declare {simd,variant} and acc routine parsing
When parsing default arguments, we need to temporarily clear parser->omp_declare_simd
and parser->oacc_routine, otherwise it can clash with further declarations
inside of e.g. lambdas inside of those default arguments.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

	PR c++/101759
	* parser.c (cp_parser_default_argument): Temporarily override
	parser->omp_declare_simd and parser->oacc_routine to NULL.

	* g++.dg/gomp/pr101759.C: New test.
	* g++.dg/goacc/pr101759.C: New test.
2021-08-04 11:53:48 +02:00
Jakub Jelinek
8aa14fa7d9 testsuite: Fix duplicated content of gcc.c-torture/execute/ieee/pr29302-1.x
The file has two identical halves, seems like twice applied patch.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

	* gcc.c-torture/execute/ieee/pr29302-1.x: Undo doubly applied patch.
2021-08-04 11:44:45 +02:00
liuhongt
9f26640f7b Refine predicate of peephole2 to general_reg_operand. [PR target/101743]
The define_peephole2 which is added by r12-2640-gf7bf03cf69ccb7dc
should only work on general registers, considering that x86 also
supports mov instructions between gpr, sse reg, mask reg, limiting the
peephole2 predicate to general_reg_operand.

gcc/ChangeLog:

	PR target/101743
	* config/i386/i386.md (peephole2): Refine predicate from
	register_operand to general_reg_operand.
2021-08-04 17:43:17 +08:00
Jakub Jelinek
7195fa03e7 libgcc: Fix duplicated content of config/t-slibgcc-fuchsia
The file has two identical halves, seems like twice applied patch.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

	* config/t-slibgcc-fuchsia: Undo doubly applied patch.
2021-08-04 11:40:52 +02:00
Aldy Hernandez
9db0bcd9fd Mark path_range_query::dump as override.
gcc/ChangeLog:

	* gimple-range-path.h (path_range_query::dump): Mark override.
2021-08-04 10:57:11 +02:00
Richard Biener
4d56259101 tree-optimization/101769 - tail recursion creates possibly infinite loop
This makes tail recursion optimization produce a loop structure
manually rather than relying on loop fixup.  That also allows the
loop to be marked as finite (it would eventually blow the stack
if it were not).

2021-08-04  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101769
	* tree-tailcall.c (eliminate_tail_call): Add the created loop
	for the first recursion and return it via the new output parameter.
	(optimize_tail_call): Pass through new output param.
	(tree_optimize_tail_calls_1): After creating all latches,
	add the created loop to the loop tree.  Do not mark loops for fixup.

	* g++.dg/tree-ssa/pr101769.C: New testcase.
2021-08-04 10:35:27 +02:00
Martin Liska
5c73b94fdc docs: document threader-mode param
gcc/ChangeLog:

	* doc/invoke.texi: Document threader-mode param.
2021-08-04 09:48:05 +02:00
liuhongt
3ae1468e26 Add dg-require-effective-target for testcases.
gcc/testsuite/ChangeLog:

	* gcc.target/i386/cond_op_addsubmul_d-2.c: Add
	dg-require-effective-target for avx512.
	* gcc.target/i386/cond_op_addsubmul_q-2.c: Ditto.
	* gcc.target/i386/cond_op_addsubmul_w-2.c: Ditto.
	* gcc.target/i386/cond_op_addsubmuldiv_double-2.c: Ditto.
	* gcc.target/i386/cond_op_addsubmuldiv_float-2.c: Ditto.
	* gcc.target/i386/cond_op_fma_double-2.c: Ditto.
	* gcc.target/i386/cond_op_fma_float-2.c: Ditto.
2021-08-04 13:25:46 +08:00
liuhongt
2fc2e3917f Support cond_{fma,fms,fnma,fnms} for vector float/double under AVX512.
gcc/ChangeLog:

	* config/i386/sse.md (cond_fma<mode>): New expander.
	(cond_fms<mode>): Ditto.
	(cond_fnma<mode>): Ditto.
	(cond_fnms<mode>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/cond_op_fma_double-1.c: New test.
	* gcc.target/i386/cond_op_fma_double-2.c: New test.
	* gcc.target/i386/cond_op_fma_float-1.c: New test.
	* gcc.target/i386/cond_op_fma_float-2.c: New test.
2021-08-04 12:58:01 +08:00
Cherry Mui
22e40cc7fe compiler: support new language constructs in escape analysis
Previous CLs add new language constructs in Go 1.17, specifically,
unsafe.Add, unsafe.Slice, and conversion from a slice to a pointer
to an array. This CL handles them in the escape analysis.

At the point of the escape analysis, unsafe.Add and unsafe.Slice
are still builtin calls, so just handle them in data flow.
Conversion from a slice to a pointer to an array has already been
lowered to a combination of compound expression, conditional
expression and slice info expressions, so handle them in the
escape analysis.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/339671
2021-08-03 18:32:07 -07:00
GCC Administrator
fa1407c761 Daily bump. 2021-08-04 00:16:51 +00:00
Ian Lance Taylor
e435e72ad7 compile, runtime: make selectnbrecv return two values
The only different between selectnbrecv and selectnbrecv2 is the later
set the input pointer value by second return value from chanrecv.

So by making selectnbrecv return two values from chanrecv, we can get
rid of selectnbrecv2, the compiler can now call only selectnbrecv and
generate simpler code.

This is the gofrontend version of https://golang.org/cl/292890.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/339529
2021-08-03 16:40:00 -07:00
Ian Lance Taylor
cbbd439a33 compiler: check slice to pointer-to-array conversion element type
When checking a slice to pointer-to-array conversion, I forgot to
verify that the elements types are identical.

For golang/go#395

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/339329
2021-08-03 16:36:20 -07:00
Segher Boessenkool
3a7794b469 rs6000: Replace & by &&
2021-08-03  Segher Boessenkool  <segher@kernel.crashing.org>

	* config/rs6000/vsx.md (*vsx_le_perm_store_<mode>): Use && instead of &.
2021-08-03 22:33:52 +00:00
Segher Boessenkool
ebff536cf4 rs6000: "e" is not a free constraint letter
It is the prefix of the "es" and "eI" constraints.

2021-08-03  Segher Boessenkool  <segher@kernel.crashing.org>

	* config/rs6000/constraints.md: Remove "e" from the list of available
	constraint characters.
2021-08-03 22:26:40 +00:00
Eugene Rozenfeld
285aa6895d Fix indirect call inlining with AutoFDO
The histogram value for indirect calls was incorrectly set up.
That is fixed now.

With this change the tree-prof tests checking indirect call inlining with AutoFDO
in gcc.dg and g++.dg are passing.

Resolves:
PR gcov-profile/71672 - inlining indirect calls does not work with autofdo

gcc/ChangeLog:
	PR gcov-profile/71672
	* auto-profile.c (afdo_indirect_call): Fix setup of the historgram value for indirect calls.
2021-08-03 14:36:33 -07:00
Eugene Rozenfeld
9265b37853 Fixes for AutoFDO testing
* create_gcov tool doesn't currently support dwarf 5 so I made a change in profopt.exp
  to pass -gdwarf-4 when compiling the binary to profile.

* I updated the invocation of create_gcov in profopt.exp to pass -gcov_version=2.
  I recently made a change to create_gcov to support version 2:
  https://github.com/google/autofdo/pull/117 .

* I removed useless -o perf.data from the invocation of gcc-auto-profile in
  target-supports.exp.

These changes contribute to fixing PR gcov-profile/71672.

gcc/testsuite/ChangeLog:

	* lib/profopt.exp: Pass gdwarf-4 when compiling test to profile; pass -gcov_version=2.
	* lib/target-supports.exp: Remove unnecessary -o perf.data passed to gcc-auto-profile.
2021-08-03 14:28:42 -07:00
Eugene Rozenfeld
0ed093c7c3 Fix indir-call-prof-2.c with AutoFDO
indir-call-prof-2.c has -fno-early-inlining but AutoFDO can't work without
early inlining (it needs to match the inlining of the profiled binary).
I changed profopt.exp to always pass -fearly-inlining for AutoFDO.
With that change the indirect call inlining in indir-call-prof-2.c happens in the early inliner
so I changed the dg-final-use-autofdo.

Contributes to fixing PR gcov-profile/71672

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-prof/indir-call-prof-2.c: Fix dg-final-use-autofdo.
	* lib/profopt.exp: Pass -fearly-inlining when compiling with AutoFDO.
2021-08-03 14:26:27 -07:00
Eugene Rozenfeld
f9ad3d5339 Fixes for AutoFDO tests
* Changed several tests to use -fdump-ipa-afdo-optimized instead of -fdump-ipa-afdo
in dg-options so that the expected output can be found

* Increased the number of iterations in several tests so that perf can have
enough sampling events

Contributes to fixing PR gcov-profile/71672.

gcc/testsuite/ChangeLog:

	* g++.dg/tree-prof/indir-call-prof.C: Fix options, increase the number of iterations.
	* g++.dg/tree-prof/morefunc.C: Fix options, increase the number of iterations.
	* g++.dg/tree-prof/reorder.C: Fix options, increase the number of iterations.
	* gcc.dg/tree-prof/indir-call-prof-2.c: Fix options, increase the number of iterations.
	* gcc.dg/tree-prof/indir-call-prof.c: Fix options.
2021-08-03 14:25:47 -07:00
Martin Sebor
aabf07cd5d Disable a test case in ILP32 [PR101688].
Resolves:
PR testsuite/101688 - g++.dg/warn/Wstringop-overflow-4.C fails on 32-bit archs with new jump threader

gcc/testsuite:
	PR testsuite/101688
	* g++.dg/warn/Wstringop-overflow-4.C: Disable a test case in ILP32.
2021-08-03 13:56:56 -06:00
Paul A. Clarke
0f44b09732 rs6000: Add test for _mm_minpos_epu16
Copy the test for _mm_minpos_epu16 from
gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with
a few adjustments:

- Adjust the dejagnu directives for powerpc platform.
- Make the data not be monotonically increasing,
  such that some of the returned values are not
  always the first value (index 0).
- Create a list of input data testing various scenarios
  including more than one minimum value and different
  orders and indices of the minimum value.
- Fix a masking issue where the index was being truncated
  to 2 bits instead of 3 bits, which wasn't found because
  all of the returned indices were 0 with the original
  generated data.
- Support big-endian.

2021-08-03  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
	* gcc.target/powerpc/sse4_1-phminposuw.c: Copy from
	gcc/testsuite/gcc.target/i386, adjust dg directives to suit,
	make more robust.
2021-08-03 13:58:41 -05:00
Paul A. Clarke
eaa93a0f3d rs6000: Add support for _mm_minpos_epu16
Add a naive implementation of the subject x86 intrinsic to
ease porting.

2021-08-03  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_minpos_epu16): New.
2021-08-03 13:58:31 -05:00
Jonathan Wakely
a77a46d9ae libstdc++: Suppress redundant definitions of inline variables
In C++17 the out-of-class definitions for static constexpr variables are
redundant, because they are implicitly inline. This change avoids
"redundant redeclaration" warnings from -Wsystem-headers -Wdeprecated.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/random.tcc (linear_congruential_engine): Do not
	define static constexpr members when they are implicitly inline.
	* include/std/ratio (ratio, __ratio_multiply, __ratio_divide)
	(__ratio_add, __ratio_subtract): Likewise.
	* include/std/type_traits (integral_constant): Likewise.
	* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
	line number.
2021-08-03 15:41:11 +01:00
Jonathan Wakely
5c6759e416 libstdc++: Replace TR1 components with C++11 ones in test utils
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* testsuite/util/testsuite_common_types.h: Replace uses of
	tr1::unordered_map and tr1::unordered_set with their C++11
	equivalents.
	* testsuite/29_atomics/atomic/cons/assign_neg.cc: Adjust
	dg-error line number.
	* testsuite/29_atomics/atomic/cons/copy_neg.cc: Likewise.
	* testsuite/29_atomics/atomic_integral/cons/assign_neg.cc:
	Likewise.
	* testsuite/29_atomics/atomic_integral/cons/copy_neg.cc:
	Likewise.
	* testsuite/29_atomics/atomic_integral/operators/bitwise_neg.cc:
	Likewise.
	* testsuite/29_atomics/atomic_integral/operators/decrement_neg.cc:
	Likewise.
	* testsuite/29_atomics/atomic_integral/operators/increment_neg.cc:
	Likewise.
2021-08-03 15:40:42 +01:00
Jonathan Wakely
13a1ac9f6f libstdc++: Specialize allocator_traits<pmr::polymorphic_allocator<T>>
This adds a partial specialization of allocator_traits, similar to what
was already done for std::allocator. This means that most uses of
polymorphic_allocator via the traits can avoid the metaprogramming
overhead needed to deduce the properties from polymorphic_allocator.

In addition, I'm changing polymorphic_allocator::delete_object to invoke
the destructor (or pseudo-destructor) directly, rather than calling
allocator_traits::destroy, which calls polymorphic_allocator::destroy
(which is deprecated). This is observable if a user has specialized
allocator_traits<polymorphic_allocator<Foo>> and expects to see its
destroy member function called. I consider explicit specializations of
allocator_traits to be wrong-headed, and this use case seems unnecessary
to support. So delete_object just invokes the destructor directly.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/std/memory_resource (polymorphic_allocator::delete_object):
	Call destructor directly instead of using destroy.
	(allocator_traits<polymorphic_allocator<T>>): Define partial
	specialization.
2021-08-03 15:30:36 +01:00
Jonathan Wakely
9bd87e3887 libstdc++: Remove trailing whitespace in some tests
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* testsuite/20_util/function_objects/binders/3113.cc: Remove
	trailing whitespace.
	* testsuite/20_util/shared_ptr/assign/auto_ptr.cc: Likewise.
	* testsuite/20_util/shared_ptr/assign/auto_ptr_neg.cc: Likewise.
	* testsuite/20_util/shared_ptr/assign/auto_ptr_rvalue.cc:
	Likewise.
	* testsuite/20_util/shared_ptr/creation/dr925.cc: Likewise.
	* testsuite/25_algorithms/headers/algorithm/synopsis.cc:
	Likewise.
	* testsuite/25_algorithms/random_shuffle/requirements/explicit_instantiation/2.cc:
	Likewise.
	* testsuite/25_algorithms/random_shuffle/requirements/explicit_instantiation/pod.cc:
	Likewise.
2021-08-03 15:30:36 +01:00
Jonathan Wakely
7f2f4b8791 libstdc++: Deprecate std::random_shuffle for C++14
The std::random_shuffle algorithm was removed in C++14 (without
deprecation). This adds the deprecated attribute for C++14 and later, so
that users are warned they should not be using it in those dialects.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* doc/xml/manual/evolution.xml: Document deprecation.
	* doc/html/*: Regenerate.
	* include/bits/c++config (_GLIBCXX14_DEPRECATED): Define.
	(_GLIBCXX14_DEPRECATED_SUGGEST): Define.
	* include/bits/stl_algo.h (random_shuffle): Deprecate for C++14
	and later.
	* testsuite/25_algorithms/headers/algorithm/synopsis.cc: Adjust
	for C++11 and C++14 changes to std::random_shuffle and
	std::shuffle.
	* testsuite/25_algorithms/random_shuffle/1.cc: Add options to
	use deprecated algorithms.
	* testsuite/25_algorithms/random_shuffle/59603.cc: Likewise.
	* testsuite/25_algorithms/random_shuffle/moveable.cc: Likewise.
	* testsuite/25_algorithms/random_shuffle/requirements/explicit_instantiation/2.cc:
	Likewise.
	* testsuite/25_algorithms/random_shuffle/requirements/explicit_instantiation/pod.cc:
	Likewise.
2021-08-03 15:30:35 +01:00
Jonathan Wakely
07b70dfc4e libstdc++: Add testsuite proc for testing deprecated features
This change adds options to tests that explicitly use deprecated
features, so that -D_GLIBCXX_USE_DEPRECATED=0 can be used to run the
rest of the testsuite. The tests that explicitly/intentionally use
deprecated features will still be able to use them, but they can be
disabled for the majority of tests.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* testsuite/23_containers/forward_list/operations/3.cc:
	Use lambda instead of std::bind2nd.
	* testsuite/20_util/function_objects/binders/3113.cc: Add
	options for testing deprecated features.
	* testsuite/20_util/pair/cons/99957.cc: Likewise.
	* testsuite/20_util/shared_ptr/assign/auto_ptr.cc: Likewise.
	* testsuite/20_util/shared_ptr/assign/auto_ptr_neg.cc: Likewise.
	* testsuite/20_util/shared_ptr/assign/auto_ptr_rvalue.cc:
	Likewise.
	* testsuite/20_util/shared_ptr/cons/43820_neg.cc: Likewise.
	* testsuite/20_util/shared_ptr/cons/auto_ptr.cc: Likewise.
	* testsuite/20_util/shared_ptr/cons/auto_ptr_neg.cc: Likewise.
	* testsuite/20_util/shared_ptr/creation/dr925.cc: Likewise.
	* testsuite/20_util/unique_ptr/cons/auto_ptr.cc: Likewise.
	* testsuite/20_util/unique_ptr/cons/auto_ptr_neg.cc: Likewise.
	* testsuite/ext/pb_ds/example/priority_queue_erase_if.cc:
	Likewise.
	* testsuite/ext/pb_ds/example/priority_queue_split_join.cc:
	Likewise.
	* testsuite/lib/dg-options.exp (dg_add_options_using-deprecated):
	New proc.
2021-08-03 15:30:17 +01:00
Jonathan Wakely
e9f64fff64 libstdc++: Reduce header dependencies in <regex>
This reduces the size of <regex> a little. This is one of the largest
and slowest headers in the library.

By using <bits/stl_algobase.h> and <bits/stl_algo.h> instead of
<algorithm> we don't need to parse all the parallel algorithms and
std::ranges:: algorithms that are not needed by <regex>. Similarly, by
using <bits/stl_tree.h> and <bits/stl_map.h> instead of <map> we don't
need to parse the definition of std::multimap.

The _State_info type is not movable or copyable, so doesn't need to use
std::unique_ptr<bool[]> to manage a bitset, we can just delete it in the
destructor. It would use a lot less space if we used a bitset instead,
but that would be an ABI break. We could do it for the versioned
namespace, but this patch doesn't do so. For future reference, using
vector<bool> would work, but would increase sizeof(_State_info) by two
pointers, because it's three times as large as unique_ptr<bool[]>. We
can't use std::bitset because the length isn't constant. We want a
bitset with a non-constant but fixed length.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/regex_executor.h (_State_info): Replace
	unique_ptr<bool[]> with array of bool.
	* include/bits/regex_executor.tcc: Likewise.
	* include/bits/regex_scanner.tcc: Replace std::strchr with
	__builtin_strchr.
	* include/std/regex: Replace standard headers with smaller
	internal ones.
	* testsuite/28_regex/traits/char/lookup_classname.cc: Include
	<string.h> for strlen.
	* testsuite/28_regex/traits/char/lookup_collatename.cc:
	Likewise.
2021-08-03 15:24:52 +01:00
H.J. Lu
98d7f305d5 x86: Use XMM31 for scratch SSE register
In 64-bit mode, use XMM31 for scratch SSE register to avoid vzeroupper
if possible.

gcc/

	* config/i386/i386.c (ix86_gen_scratch_sse_rtx): In 64-bit mode,
	try XMM31 to avoid vzeroupper.

gcc/testsuite/

	* gcc.target/i386/avx-vzeroupper-14.c: Pass -mno-avx512f to
	disable XMM31.
	* gcc.target/i386/avx-vzeroupper-15.c: Likewise.
	* gcc.target/i386/pr82941-1.c: Updated.  Check for vzeroupper.
	* gcc.target/i386/pr82942-1.c: Likewise.
	* gcc.target/i386/pr82990-1.c: Likewise.
	* gcc.target/i386/pr82990-3.c: Likewise.
	* gcc.target/i386/pr82990-5.c: Likewise.
	* gcc.target/i386/pr100865-4b.c: Likewise.
	* gcc.target/i386/pr100865-6b.c: Likewise.
	* gcc.target/i386/pr100865-7b.c: Likewise.
	* gcc.target/i386/pr100865-10b.c: Likewise.
	* gcc.target/i386/pr100865-8b.c: Updated.
	* gcc.target/i386/pr100865-9b.c: Likewise.
	* gcc.target/i386/pr100865-11b.c: Likewise.
	* gcc.target/i386/pr100865-12b.c: Likewise.
2021-08-03 07:11:58 -07:00
Jonathan Wakely
a1a2654cdc libstdc++: Avoid using std::unique_ptr in <locale>
std::wstring_convert and std::wbuffer_convert types are not copyable or
movable, and store a plain pointer without a deleter. That means a much
simpler type that just uses delete in its destructor can be used instead
of std::unique_ptr.

That avoids including and parsing all of <bits/unique_ptr.h> in every
header that includes <locale>. It also avoids instantiating
unique_ptr<C> and std::tuple<C*, default_delete<C>> when the conversion
utilities are used.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/locale_conv.h (__detail::_Scoped_ptr): Define new
	RAII class template.
	(wstring_convert, wbuffer_convert): Use __detail::_Scoped_ptr
	instead of unique_ptr.
2021-08-03 15:06:56 +01:00
Richard Sandiford
048039c49b aarch64: Add -mtune=neoverse-512tvb
This patch adds an option to tune for Neoverse cores that have
a total vector bandwidth of 512 bits (4x128 for Advanced SIMD
and a vector-length-dependent equivalent for SVE).  This is intended
to be a compromise between tuning aggressively for a single core like
Neoverse V1 (which can be too narrow) and tuning for AArch64 cores
in general (which can be too wide).

-mcpu=neoverse-512tvb is equivalent to -mcpu=neoverse-v1
-mtune=neoverse-512tvb.

gcc/
	* doc/invoke.texi: Document -mtune=neoverse-512tvb and
	-mcpu=neoverse-512tvb.
	* config/aarch64/aarch64-cores.def (neoverse-512tvb): New entry.
	* config/aarch64/aarch64-tune.md: Regenerate.
	* config/aarch64/aarch64.c (neoverse512tvb_sve_vector_cost)
	(neoverse512tvb_sve_issue_info, neoverse512tvb_vec_issue_info)
	(neoverse512tvb_vector_cost, neoverse512tvb_tunings): New structures.
	(aarch64_adjust_body_cost_sve): Handle -mtune=neoverse-512tvb.
	(aarch64_adjust_body_cost): Likewise.
2021-08-03 13:00:49 +01:00
Richard Sandiford
9690309baf aarch64: Restrict issue heuristics to inner vector loop
The AArch64 vector costs try to take issue rates into account.
However, when vectorising an outer loop, we lumped the inner
and outer operations together, which is somewhat meaningless.
This patch restricts the heuristic to the inner loop.

gcc/
	* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Only
	record issue information for operations that occur in the
	innermost loop.
2021-08-03 13:00:48 +01:00
Richard Sandiford
028059b46e aarch64: Tweak MLA vector costs
The issue-based vector costs currently assume that a multiply-add
sequence can be implemented using a single instruction.  This is
generally true for scalars (which have a 4-operand instruction)
and SVE (which allows the output to be tied to any input).
However, for Advanced SIMD, multiplying two values and adding
an invariant will end up being a move and an MLA.

The only target to use the issue-based vector costs is Neoverse V1,
which would generally prefer SVE in this case anyway.  I therefore
don't have a self-contained testcase.  However, the distinction
becomes more important with a later patch.

gcc/
	* config/aarch64/aarch64.c (aarch64_multiply_add_p): Add a vec_flags
	parameter.  Detect cases in which an Advanced SIMD MLA would almost
	certainly require a MOV.
	(aarch64_count_ops): Update accordingly.
2021-08-03 13:00:47 +01:00
Richard Sandiford
537afb0857 aarch64: Tweak the cost of elementwise stores
When the vectoriser scalarises a strided store, it counts one
scalar_store for each element plus one vec_to_scalar extraction
for each element.  However, extracting element 0 is free on AArch64,
so it should have zero cost.

I don't have a testcase that requires this for existing -mtune
options, but it becomes more important with a later patch.

gcc/
	* config/aarch64/aarch64.c (aarch64_is_store_elt_extraction): New
	function, split out from...
	(aarch64_detect_vector_stmt_subtype): ...here.
	(aarch64_add_stmt_cost): Treat extracting element 0 as free.
2021-08-03 13:00:46 +01:00
Richard Sandiford
78770e0e5d aarch64: Add gather_load_xNN_cost tuning fields
This patch adds tuning fields for the total cost of a gather load
instruction.  Until now, we've costed them as one scalar load
per element instead.  Those scalar_load-based values are also
what the patch uses to fill in the new fields for existing
cost structures.

gcc/
	* config/aarch64/aarch64-protos.h (sve_vec_cost):
	Add gather_load_x32_cost and gather_load_x64_cost.
	* config/aarch64/aarch64.c (generic_sve_vector_cost)
	(a64fx_sve_vector_cost, neoversev1_sve_vector_cost): Update
	accordingly, using the values given by the scalar_load * number
	of elements calculation that we used previously.
	(aarch64_detect_vector_stmt_subtype): Use the new fields.
2021-08-03 13:00:45 +01:00
Richard Sandiford
b585f0112f aarch64: Split out aarch64_adjust_body_cost_sve
This patch splits the SVE-specific part of aarch64_adjust_body_cost
out into its own subroutine, so that a future patch can call it
more than once.  I wondered about using a lambda to avoid having
to pass all the arguments, but in the end this way seemed clearer.

gcc/
	* config/aarch64/aarch64.c (aarch64_adjust_body_cost_sve): New
	function, split out from...
	(aarch64_adjust_body_cost): ...here.
2021-08-03 13:00:45 +01:00
Richard Sandiford
83d796d3e5 aarch64: Add a simple fixed-point class for costing
This patch adds a simple fixed-point class for holding fractional
cost values.  It can exactly represent the reciprocal of any
single-vector SVE element count (including the non-power-of-2 ones).
This means that it can also hold 1/N for all N in [1, 16], which should
be enough for the various *_per_cycle fields.

For now the assumption is that the number of possible reciprocals
is fixed at compile time and so the class should always be able
to hold an exact value.

The class uses a uint64_t to hold the fixed-point value, which means
that it can hold any scaled uint32_t cost.  Normally we don't worry
about overflow when manipulating raw uint32_t costs, but just to be
on the safe side, the class uses saturating arithmetic for all
operations.

As far as the changes to the cost routines themselves go:

- The changes to aarch64_add_stmt_cost and its subroutines are
  just laying groundwork for future patches; no functional change
  intended.

- The changes to aarch64_adjust_body_cost mean that we now
  take fractional differences into account.

gcc/
	* config/aarch64/fractional-cost.h: New file.
	* config/aarch64/aarch64.c: Include <algorithm> (indirectly)
	and cost_fraction.h.
	(vec_cost_fraction): New typedef.
	(aarch64_detect_scalar_stmt_subtype): Use it for statement costs.
	(aarch64_detect_vector_stmt_subtype): Likewise.
	(aarch64_sve_adjust_stmt_cost, aarch64_adjust_stmt_cost): Likewise.
	(aarch64_estimate_min_cycles_per_iter): Use vec_cost_fraction
	for cycle counts.
	(aarch64_adjust_body_cost): Likewise.
	(aarch64_test_cost_fraction): New function.
	(aarch64_run_selftests): Call it.
2021-08-03 13:00:44 +01:00
Richard Sandiford
fa3ca6151c aarch64: Turn sve_width tuning field into a bitmask
The tuning structures have an sve_width field that specifies the
number of bits in an SVE vector (or SVE_NOT_IMPLEMENTED if not
applicable).  This patch turns the field into a bitmask so that
it can specify multiple widths at the same time.  For now we
always treat the mininum width as the likely width.

An alternative would have been to add extra fields, which would
have coped correctly with non-power-of-2 widths.  However,
we're very far from supporting constant non-power-of-2 vectors
in GCC, so I think the non-power-of-2 case will in reality always
have to be hidden behind VLA.

gcc/
	* config/aarch64/aarch64-protos.h (tune_params::sve_width): Turn
	into a bitmask.
	* config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Update
	accordingly.
	(aarch64_estimated_poly_value): Likewise.  Use the least significant
	set bit for the minimum and likely values.  Use the most significant
	set bit for the maximum value.
2021-08-03 13:00:43 +01:00
liuhongt
d0b952edd3 Add cond_add/sub/mul for vector integer modes.
gcc/ChangeLog:

	* config/i386/sse.md (cond_<insn><mode>): New expander.
	(cond_mul<mode>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/cond_op_addsubmul_d-1.c: New test.
	* gcc.target/i386/cond_op_addsubmul_d-2.c: New test.
	* gcc.target/i386/cond_op_addsubmul_q-1.c: New test.
	* gcc.target/i386/cond_op_addsubmul_q-2.c: New test.
	* gcc.target/i386/cond_op_addsubmul_w-1.c: New test.
	* gcc.target/i386/cond_op_addsubmul_w-2.c: New test.
2021-08-03 19:27:52 +08:00
Mosè Giordano
759f3854f0 Fix bashism in `libsanitizer/configure.tgt'
Appending to a string variable with `+=' is a bashism and does not work in
strict POSIX shells like dash.  This results in the extra compilation flags not
to be set correctly.  This patch replaces the `+=' syntax with a simple string
interpolation to append to the `EXTRA_CXXFLAGS' variable.

libsanitizer/ChangeLog

	PR sanitizer/101111
	* configure.tgt: Fix bashism in setting of `EXTRA_CXXFLAGS'.
2021-08-03 13:24:47 +02:00
Jakub Jelinek
1a830c0636 analyzer: Fix ICE on MD builtin [PR101721]
The following testcase ICEs because DECL_FUNCTION_CODE asserts the builtin
is BUILT_IN_NORMAL, but it sees a backend (MD) builtin instead.
The FE, normal and MD builtin numbers overlap, so one should always
check what kind of builtin it is before looking at specific codes.

On the other side, region-model.cc has:
      if (fndecl_built_in_p (callee_fndecl, BUILT_IN_NORMAL)
          && gimple_builtin_call_types_compatible_p (call, callee_fndecl))
        switch (DECL_UNCHECKED_FUNCTION_CODE (callee_fndecl))
which IMO should use DECL_FUNCTION_CODE instead, it checked first it is
a normal builtin...

2021-08-03  Jakub Jelinek  <jakub@redhat.com>

	PR analyzer/101721
	* sm-malloc.cc (known_allocator_p): Only check DECL_FUNCTION_CODE on
	BUILT_IN_NORMAL builtins.

	* gcc.dg/analyzer/pr101721.c: New test.
2021-08-03 12:44:17 +02:00
Martin Liska
872c1a56e3 ChangeLog: add problematic commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc.
gcc/ChangeLog:

	* ChangeLog: Add manually.

libgomp/ChangeLog:

	* ChangeLog: Add manually.

gcc/testsuite/ChangeLog:

	* ChangeLog: Add manually.
2021-08-03 09:57:21 +02:00