Commit Graph

187091 Commits

Author SHA1 Message Date
Martin Liska e460471571 gcc-changelog: ignore one more commit
contrib/ChangeLog:

	* gcc-changelog/git_update_version.py: Ignore problematic
	  commit.
2021-08-03 09:22:30 +02:00
H.J. Lu 585394d30d x86: Add testcases for PR target/80566
PR target/80566
	* g++.target/i386/pr80566-1.C: New test.
	* g++.target/i386/pr80566-2.C: Likewise.
2021-08-02 20:34:13 -07:00
Kewen Lin daaed9e365 tree-cfg: Fix typos on dloop in move_sese_region_to_fn
As mentioned in [1], there is one pre-existing issue before
the refactoring of FOR_EACH_LOOP_FN.  The macro will always
set the given LOOP as NULL at the end of iterating unless
there is some early break inside, obviously there is no
early break and dloop will be set as NULL after the loop
iterating.  It's kept as NULL after the factoring.

I tried to debug the test case gcc.dg/graphite/pr83359.c
with commit 555758de90 (also reproduced the ICE with
555758de90074~), and noticed the compilation of the test
case only covers the hunk:

  else
    {
      moved_orig_loop_num[dloop->orig_loop_num] = -1;
      dloop->orig_loop_num = 0;
    }

it doesn't touch the if condition hunk to increase
"moved_orig_loop_num[dloop->orig_loop_num]".  So the
following hunk guarded with

  if (moved_orig_loop_num[orig_loop_num] == 2)

using dloop for dereference doesn't get executed.  It
explains why the problem doesn't get exposed before.

By looking to the code using dloop, I think it's a copy
paste typo, the modified assertion codes have the same
words as the above condition check.  In that context, the
expected original number has been assigned to variable
orig_loop_num by extracting from the arg0 of the call
IFN_LOOP_DIST_ALIAS.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576367.html

gcc/ChangeLog:

	* tree-cfg.c (move_sese_region_to_fn): Fix typos on dloop.
2021-08-02 22:12:00 -05:00
liuhongt 724adffe65 Support cond_add/sub/mul/div for vector float/double.
gcc/ChangeLog:

	* config/i386/sse.md (cond_<insn><mode>):New expander.
	(cond_mul<mode>): Ditto.
	(cond_div<mode>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/cond_op_addsubmuldiv_double-1.c: New test.
	* gcc.target/i386/cond_op_addsubmuldiv_double-2.c: New test.
	* gcc.target/i386/cond_op_addsubmuldiv_float-1.c: New test.
	* gcc.target/i386/cond_op_addsubmuldiv_float-2.c: New test.
2021-08-03 09:10:27 +08:00
Ian Lance Taylor 7459bfa8a3 compiler, runtime: allow slice to array pointer conversion
Panic if the slice is too short.

For golang/go#395

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/338630
2021-08-02 15:27:08 -07:00
Ian Lance Taylor 06d0437d4a compiler, runtime: support unsafe.Add and unsafe.Slice
For golang/go#19367
For golang/go#40481

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/338949
2021-08-02 13:56:28 -07:00
Patrick Palka 14d8a5ae47 libstdc++: Add missing std::move to ranges::copy/move/reverse_copy [PR101599]
In passing, this also renames the template parameter _O2 to _Out2 in
ranges::partition_copy and uglifies two of its function parameters,
out_true and out_false.

	PR libstdc++/101599

libstdc++-v3/ChangeLog:

	* include/bits/ranges_algo.h (__reverse_copy_fn::operator()):
	Add missing std::move in return statement.
	(__partition_copy_fn::operator()): Rename templtae parameter
	_O2 to _Out2.  Uglify function parameters out_true and out_false.
	* include/bits/ranges_algobase.h (__copy_or_move): Add missing
	std::move to recursive call that unwraps a __normal_iterator
	output iterator.
	* testsuite/25_algorithms/copy/constrained.cc (test06): New test.
	* testsuite/25_algorithms/move/constrained.cc (test05): New test.
2021-08-02 15:30:15 -04:00
Patrick Palka 4414057186 libstdc++: Fix up implementation of LWG 3533 [PR101589]
In r12-569 I accidentally applied the LWG 3533 change to
elements_view::iterator::base instead to elements_view::base.

This patch corrects this, and also applies the corresponding LWG 3533
change to lazy_split_view::inner-iter::base now that we implement P2210.

	PR libstdc++/101589

libstdc++-v3/ChangeLog:

	* include/std/ranges (lazy_split_view::_InnerIter::base): Make
	the const& overload unconstrained and return a const reference
	as per LWG 3533.  Make unconditionally noexcept.
	(elements_view::base): Revert accidental r12-569 change.
	(elements_view::_Iterator::base): Make the const& overload
	unconstrained and return a const reference as per LWG 3533.
	Make unconditionally noexcept.
2021-08-02 15:30:13 -04:00
Patrick Palka 0e1bb3c88c libstdc++: Add missing std::move to join_view::iterator ctor [PR101483]
PR libstdc++/101483

libstdc++-v3/ChangeLog:

	* include/std/ranges (join_view::_Iterator::_Iterator): Add
	missing std::move.
2021-08-02 15:30:10 -04:00
H.J. Lu af863ef935 x86: Also pass -mno-sse to vect8-ret.c
Also pass -mno-sse to vect8-ret.c to disable XMM load/store when running
GCC tests with "-march=x86-64 -m32".

	* gcc.target/i386/vect8-ret.c: Also pass -mno-sse.
2021-08-02 10:40:50 -07:00
H.J. Lu ff12cc3d4e x86: Update gcc.target/i386/incoming-11.c
Expect no stack realignment since we no longer realign stack when
copying data.

	* gcc.target/i386/incoming-11.c: Expect no stack realignment.
2021-08-02 10:40:50 -07:00
H.J. Lu dadbb1a886 x86: Also pass -mno-avx to sw-1.c for ia32
Also pass -mno-avx to sw-1.c for ia32 since copying data with YMM or ZMM
registers disables shrink-wrapping when the second argument is passed on
stack.

	* gcc.target/i386/sw-1.c: Also pass -mno-avx for ia32.
2021-08-02 10:40:50 -07:00
H.J. Lu 20a1c9aae0 x86: Also pass -mno-avx to cold-attribute-1.c
Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

	* gcc.target/i386/cold-attribute-1.c: Also pass -mno-avx.
2021-08-02 10:40:50 -07:00
H.J. Lu d7d74754a0 x86: Also pass -mno-avx to pr72839.c
Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

	* gcc.target/i386/pr72839.c: Also pass -mno-avx.
2021-08-02 10:40:50 -07:00
H.J. Lu 0d3be08a23 x86: Add tests for piecewise move and store
* gcc.target/i386/pieces-memcpy-10.c: New test.
	* gcc.target/i386/pieces-memcpy-11.c: Likewise.
	* gcc.target/i386/pieces-memcpy-12.c: Likewise.
	* gcc.target/i386/pieces-memcpy-13.c: Likewise.
	* gcc.target/i386/pieces-memcpy-14.c: Likewise.
	* gcc.target/i386/pieces-memcpy-15.c: Likewise.
	* gcc.target/i386/pieces-memcpy-16.c: Likewise.
	* gcc.target/i386/pieces-memset-1.c: Likewise.
	* gcc.target/i386/pieces-memset-2.c: Likewise.
	* gcc.target/i386/pieces-memset-3.c: Likewise.
	* gcc.target/i386/pieces-memset-4.c: Likewise.
	* gcc.target/i386/pieces-memset-5.c: Likewise.
	* gcc.target/i386/pieces-memset-6.c: Likewise.
	* gcc.target/i386/pieces-memset-7.c: Likewise.
	* gcc.target/i386/pieces-memset-8.c: Likewise.
	* gcc.target/i386/pieces-memset-9.c: Likewise.
	* gcc.target/i386/pieces-memset-10.c: Likewise.
	* gcc.target/i386/pieces-memset-11.c: Likewise.
	* gcc.target/i386/pieces-memset-12.c: Likewise.
	* gcc.target/i386/pieces-memset-13.c: Likewise.
	* gcc.target/i386/pieces-memset-14.c: Likewise.
	* gcc.target/i386/pieces-memset-15.c: Likewise.
	* gcc.target/i386/pieces-memset-16.c: Likewise.
	* gcc.target/i386/pieces-memset-17.c: Likewise.
	* gcc.target/i386/pieces-memset-18.c: Likewise.
	* gcc.target/i386/pieces-memset-19.c: Likewise.
	* gcc.target/i386/pieces-memset-20.c: Likewise.
	* gcc.target/i386/pieces-memset-21.c: Likewise.
	* gcc.target/i386/pieces-memset-22.c: Likewise.
	* gcc.target/i386/pieces-memset-23.c: Likewise.
	* gcc.target/i386/pieces-memset-24.c: Likewise.
	* gcc.target/i386/pieces-memset-25.c: Likewise.
	* gcc.target/i386/pieces-memset-26.c: Likewise.
	* gcc.target/i386/pieces-memset-27.c: Likewise.
	* gcc.target/i386/pieces-memset-28.c: Likewise.
	* gcc.target/i386/pieces-memset-29.c: Likewise.
	* gcc.target/i386/pieces-memset-30.c: Likewise.
	* gcc.target/i386/pieces-memset-31.c: Likewise.
	* gcc.target/i386/pieces-memset-32.c: Likewise.
	* gcc.target/i386/pieces-memset-33.c: Likewise.
	* gcc.target/i386/pieces-memset-34.c: Likewise.
	* gcc.target/i386/pieces-memset-35.c: Likewise.
	* gcc.target/i386/pieces-memset-36.c: Likewise.
	* gcc.target/i386/pieces-memset-37.c: Likewise.
	* gcc.target/i386/pieces-memset-38.c: Likewise.
	* gcc.target/i386/pieces-memset-39.c: Likewise.
	* gcc.target/i386/pieces-memset-40.c: Likewise.
	* gcc.target/i386/pieces-memset-41.c: Likewise.
	* gcc.target/i386/pieces-memset-42.c: Likewise.
	* gcc.target/i386/pieces-memset-43.c: Likewise.
	* gcc.target/i386/pieces-memset-44.c: Likewise.
2021-08-02 10:40:32 -07:00
H.J. Lu bf159e5e12 x86: Add AVX2 tests for PR middle-end/90773
PR middle-end/90773
	* gcc.target/i386/pr90773-20.c: New test.
	* gcc.target/i386/pr90773-21.c: Likewise.
	* gcc.target/i386/pr90773-22.c: Likewise.
	* gcc.target/i386/pr90773-23.c: Likewise.
	* gcc.target/i386/pr90773-26.c: Likewise.
2021-08-02 10:38:19 -07:00
H.J. Lu 29f0e955c9 x86: Update piecewise move and store
We can use TImode/OImode/XImode integers for piecewise move and store.

1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of
bytes that a single instruction can move quickly between memory and
registers or between two memory locations.
2. Define MOVE_MAX to the maximum number of bytes we can move from memory
to memory in one reasonably fast instruction.  The difference between
MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX must be a constant,
independent of compiler options, since it is used in reload.h to define
struct target_reload and MOVE_MAX can vary, depending on compiler options.
3. When vector register is used for piecewise move and store, we don't
increase stack_alignment_needed since vector register spill isn't
required for piecewise move and store.  Since stack_realign_needed is
set to true by checking stack_alignment_estimated set by pseudo vector
register usage, we also need to check stack_realign_needed to eliminate
frame pointer.

gcc/

	* config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
	check stack_realign_needed for stack realignment.
	(ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
	than the largest integer supported by vector register.
	* config/i386/i386.h (MAX_MOVE_MAX): New.  Set to 64.
	(MOVE_MAX): Set to bytes of the largest integer supported by
	vector register.
	(STORE_MAX_PIECES): New.

gcc/testsuite/

	* gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
	* gcc.target/i386/pr90773-4.c: Also run for 32-bit.
	* gcc.target/i386/pr90773-15.c: Likewise.
	* gcc.target/i386/pr90773-16.c: Likewise.
	* gcc.target/i386/pr90773-17.c: Likewise.
	* gcc.target/i386/pr90773-24.c: Likewise.
	* gcc.target/i386/pr90773-25.c: Likewise.
	* gcc.target/i386/pr100865-1.c: Likewise.
	* gcc.target/i386/pr100865-2.c: Likewise.
	* gcc.target/i386/pr100865-3.c: Likewise.
	* gcc.target/i386/pr90773-14.c: Also run for 32-bit and expect
	XMM movd to store 4 bytes.
	* gcc.target/i386/pr100865-4a.c: Also run for 32-bit and expect
	YMM registers.
	* gcc.target/i386/pr100865-4b.c: Likewise.
	* gcc.target/i386/pr100865-10a.c: Expect YMM registers.
	* gcc.target/i386/pr100865-10b.c: Likewise.
2021-08-02 10:38:19 -07:00
H.J. Lu 7f4c3943f7 x86: Avoid stack realignment when copying data
To avoid stack realignment, use SCRATCH_SSE_REG to copy data from one
memory location to another.

gcc/

	* config/i386/i386-expand.c (ix86_expand_vector_move): Call
	ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy
	data from one memory location to another.

gcc/testsuite/

	* gcc.target/i386/eh_return-1.c: New test.
2021-08-02 10:38:18 -07:00
H.J. Lu 1bee034e01 x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX
Define TARGET_GEN_MEMSET_SCRATCH_RTX to ix86_gen_scratch_sse_rtx to
return a scratch SSE register for memset.

gcc/

	PR middle-end/90773
	* config/i386/i386.c (TARGET_GEN_MEMSET_SCRATCH_RTX): New.

gcc/testsuite/

	PR middle-end/90773
	* gcc.target/i386/pr90773-5.c: Updated to expect XMM register.
	* gcc.target/i386/pr90773-14.c: Likewise.
	* gcc.target/i386/pr90773-15.c: New test.
	* gcc.target/i386/pr90773-16.c: Likewise.
	* gcc.target/i386/pr90773-17.c: Likewise.
	* gcc.target/i386/pr90773-18.c: Likewise.
	* gcc.target/i386/pr90773-19.c: Likewise.
2021-08-02 10:38:06 -07:00
Jonathan Wakely 38fb24ba4d libstdc++: Fix filesystem::temp_directory_path [PR101709]
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	PR libstdc++/101709
	* src/filesystem/ops-common.h (get_temp_directory_from_env):
	Add error_code parameter.
	* src/c++17/fs_ops.cc (fs::temp_directory_path): Pass error_code
	argument to get_temp_directory_from_env and check it.
	* src/filesystem/ops.cc (fs::temp_directory_path): Likewise.
2021-08-02 16:33:44 +01:00
Jonathan Wakely 2aaf69133f libstc++: Add dg-error for additional error in C++11 mode
When the comparison with a nullptr_t is ill-formed, there is an
additional error for C++11 mode due to the constexpr function body being
invalid.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* testsuite/20_util/tuple/comparison_operators/overloaded2.cc:
	Add dg-error for c++11_only target.
2021-08-02 16:22:24 +01:00
Aldy Hernandez cac2353f8b Remove --param=threader-iterative.
This was meant to be an internal construct, but I see folks are using
it and submitting PRs against it.  Let's just remove this to avoid
further confusion.

Tested on x86-64 Linux.

gcc/ChangeLog:

	PR tree-optimization/101724
	* params.opt: Remove --param=threader-iterative.
	* tree-ssa-threadbackward.c (pass_thread_jumps::execute): Remove
	iterative mode.
2021-08-02 16:58:07 +02:00
Tom de Vries 7d8577dd46 [gcc/doc] Improve nonnull attribute documentation
Improve nonnull attribute documentation in a number of ways:

Reorganize discussion of effects into:
- effects for calls to functions with nonnull-marked parameters, and
- effects for function definitions with nonnull-marked parameters.
This makes it clear that -fno-delete-null-pointer-checks has no effect for
optimizations based on nonnull-marked parameters in function definitions
(see PR100404).

Mention -Wnonnull-compare.

gcc/ChangeLog:

2021-07-28  Tom de Vries  <tdevries@suse.de>

	PR middle-end/101665
	* doc/extend.texi (nonnull attribute): Improve documentation.
2021-08-02 16:49:27 +02:00
Andrew Pinski 99b520f031 Fix PR 101683: FP exceptions for float->unsigned
Just like the old bug PR9651, unsigned_fix rtl should
also be handled as a trapping instruction.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

	PR rtl-optimization/101683
	* rtlanal.c (may_trap_p_1): Handle UNSIGNED_FIX.
2021-08-02 14:47:03 +00:00
Patrick Palka f48c3cd2e3 c++: Improve memory usage of subsumption [PR100828]
Constraint subsumption is implemented in two steps.  The first step
computes the disjunctive (or conjunctive) normal form of one of the
constraints, and the second step verifies that each clause in the
decomposed form implies the other constraint.   Performing these two
steps separately is problematic because in the first step the DNF/CNF
can be exponentially larger than the original constraint, and by
computing it ahead of time we'd have to keep all of it in memory.

This patch fixes this exponential blowup in memory usage by interleaving
the two steps, so that as soon as we decompose one clause we check
implication for it.  In turn, memory usage during subsumption is now
worst case linear in the size of the constraints rather than
exponential, and so we can safely remove the hard limit of 16 clauses
without introducing runaway memory usage on some inputs.  (Note the
_time_ complexity of subsumption is still exponential in the worst case.)

In order for this to work we need to make formula::branch() insert the
copy of the current clause directly after the current clause rather than
at the end of the list, so that we fully decompose a clause shortly
after creating it.  Otherwise we'd end up accumulating exponentially
many (partially decomposed) clauses in memory anyway.

	PR c++/100828

gcc/cp/ChangeLog:

	* logic.cc (formula::formula): Use emplace_back instead of
	push_back.
	(formula::branch): Insert a copy of m_current directly after
	m_current instead of at the end of the list.
	(formula::erase): Define.
	(decompose_formula): Remove.
	(decompose_antecedents): Remove.
	(decompose_consequents): Remove.
	(derive_proofs): Remove.
	(max_problem_size): Remove.
	(diagnose_constraint_size): Remove.
	(subsumes_constraints_nonnull): Rewrite directly in terms of
	decompose_clause and derive_proof, interleaving decomposition
	with implication checking.  Remove limit on constraint complexity.
	Use formula::erase to free the current clause before moving on to
	the next one.
2021-08-02 09:59:56 -04:00
Roger Sayle f9fcf75482 Optimize x ? bswap(x) : 0 in tree-ssa-phiopt
Many thanks again to Jakub Jelinek for a speedy fix for PR 101642.
Interestingly, that test case "bswap16(x) ? : x" also reveals a
missed optimization opportunity.  The resulting "x ? bswap(x) : 0"
can be further simplified to just bswap(x).

Conveniently, tree-ssa-phiopt.c already recognizes/optimizes the
related "x ? popcount(x) : 0", so this patch simply makes that
transformation make general, additionally handling bswap, parity,
ffs and clrsb.  All of the required infrastructure is already
present thanks to Jakub previously adding support for clz/ctz.
To reflect this generalization, the name of the function is changed
from cond_removal_in_popcount_clz_ctz_pattern to the hopefully
equally descriptive cond_removal_in_builtin_zero_pattern.

2021-08-02  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* tree-ssa-phiopt.c (cond_removal_in_builtin_zero_pattern):
	Renamed from cond_removal_in_popcount_clz_ctz_pattern.
	Add support for BSWAP, FFS, PARITY and CLRSB builtins.
	(tree_ssa_phiop_worker): Update call to function above.

gcc/testsuite/ChangeLog
	* gcc.dg/tree-ssa/phi-opt-25.c: New test case.
2021-08-02 13:30:38 +01:00
H.J. Lu 6f0c43e978 i386: Improve SImode constant - __builtin_clzll for -mno-lzcnt
Add a zero_extend patten for bsr_rex64_1 and use it to split SImode
constant - __builtin_clzll to avoid unncessary zero_extend.

gcc/

	PR target/78103
	* config/i386/i386.md (bsr_rex64_1_zext): New.
	(combine splitter for constant - clzll): Replace gen_bsr_rex64_1
	with gen_bsr_rex64_1_zext.

gcc/testsuite/

	PR target/78103
	* gcc.target/i386/pr78103-2.c: Also scan incl.
	* gcc.target/i386/pr78103-3.c: Scan leal|addl|incl for x32.  Also
	scan incq.
2021-08-01 13:32:55 -07:00
Jonathan Wakely 8dd1644734 Add missing descriptions gcc/testsuite/ChangeLog 2021-08-01 19:37:52 +01:00
Joseph Myers 9a89a0643c Update gcc fr.po.
* fr.po: Update.
2021-07-31 19:30:11 +00:00
Jason Merrill af76342b44 c++: ICE on anon struct with base [PR96636]
pinski pointed out that my recent change to reject anonymous structs with
bases was relevant to this PR.  But we still ICEd after giving that error;
this fixes the ICE.

	PR c++/96636

gcc/cp/ChangeLog:

	* decl.c (fixup_anonymous_aggr): Clear TYPE_NEEDS_CONSTRUCTING
	after error.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/anon-struct9.C: New test.
2021-07-31 10:43:42 -04:00
Jason Merrill 5b759cdcb7 c++: pretty-print TYPE_PACK_EXPANSION better
gcc/cp/ChangeLog:

	* ptree.c (cxx_print_type) [TYPE_PACK_EXPANSION]: Also print
	PACK_EXPANSION_PATTERN.
2021-07-31 10:43:07 -04:00
Roger Sayle 4c4249b71d [Committed] Tweak new test case gcc.target/i386/dec-cmov-2.c
With -m32, this test case is sensitive to the instruction timings of
the target (for ifcvt to normalize bar() to foo() during the ce1 pass,
prior to the transformations actually being tested here).  Specifying
-march=core2 prevents these failures.  Committed as obvious.

2021-07-31  Roger Sayle  <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
	* gcc.target/i386/dec-cmov-2.c: Require -march=core2 with -m32.
2021-07-31 11:09:31 +01:00
Jakub Jelinek 05bcef5a88 openmp: Handle OpenMP directives in attribute syntax in attribute-declaration
Now that we parse attribute-declaration (outside of functions), the following
patch handles OpenMP directives in its attribute(s).
What needs handling incrementally is diagnose mismatching begin/end pair
like
 [[omp::directive (declare target)]];
 int a;
 #pragma omp end declare target
or
 #pragma omp declare target
 int b;
 [[omp::directive (end declare target)]];
and handling declare simd/declare variant on declarations (function
definitions and declarations), for those in two different spots.

2021-07-31  Jakub Jelinek  <jakub@redhat.com>

	* parser.c (cp_parser_declaration): Handle OpenMP directives
	in attribute-declaration.

	* g++.dg/gomp/attrs-9.C: New test.
2021-07-31 09:35:25 +02:00
Jakub Jelinek 91425e2ade i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]
This patch improves emitted code for the non-TARGET_LZCNT case.
As __builtin_clz* is UB on 0 argument and for !TARGET_LZCNT
CLZ_VALUE_DEFINED_AT_ZERO is 0, it is UB even at RTL time and so we
can take advantage of that and assume the result will be 0 to 31 or
0 to 63.
Given that, sign or zero extension of that result are the same and
are actually already performed by bsrl or xorl instructions.
And constant - __builtin_clz* can be simplified into
bsr + constant - bitmask.
For TARGET_LZCNT, a lot of this is already fine as is (e.g. the sign or
zero extensions), and other optimizations are IMHO not possible
(if we have lzcnt, we've lost information on whether it is UB at
zero or not and so can't transform it into bsr even when that is
1-2 insns shorter).
The changes on the 3 testcases between unpatched and patched gcc
are for -m64:
pr78103-1.s:
        bsrq    %rdi, %rax
-       xorq    $63, %rax
-       cltq
+       xorl    $63, %eax
...
        bsrq    %rdi, %rax
-       xorq    $63, %rax
-       cltq
+       xorl    $63, %eax
...
        bsrl    %edi, %eax
        xorl    $31, %eax
-       cltq
...
        bsrl    %edi, %eax
        xorl    $31, %eax
-       cltq
pr78103-2.s:
        bsrl    %edi, %edi
-       movl    $32, %eax
-       xorl    $31, %edi
-       subl    %edi, %eax
+       leal    1(%rdi), %eax
...
-       bsrl    %edi, %edi
-       movl    $31, %eax
-       xorl    $31, %edi
-       subl    %edi, %eax
+       bsrl    %edi, %eax
...
        bsrq    %rdi, %rdi
-       movl    $64, %eax
-       xorq    $63, %rdi
-       subl    %edi, %eax
+       leal    1(%rdi), %eax
...
-       bsrq    %rdi, %rdi
-       movl    $63, %eax
-       xorq    $63, %rdi
-       subl    %edi, %eax
+       bsrq    %rdi, %rax
pr78103-3.s:
        bsrl    %edi, %edi
-       movl    $32, %eax
-       xorl    $31, %edi
-       movslq  %edi, %rdi
-       subq    %rdi, %rax
+       leaq    1(%rdi), %rax
...
-       bsrl    %edi, %edi
-       movl    $31, %eax
-       xorl    $31, %edi
-       movslq  %edi, %rdi
-       subq    %rdi, %rax
+       bsrl    %edi, %eax
...
        bsrq    %rdi, %rdi
-       movl    $64, %eax
-       xorq    $63, %rdi
-       movslq  %edi, %rdi
-       subq    %rdi, %rax
+       leaq    1(%rdi), %rax
...
-       bsrq    %rdi, %rdi
-       movl    $63, %eax
-       xorq    $63, %rdi
-       movslq  %edi, %rdi
-       subq    %rdi, %rax
+       bsrq    %rdi, %rax

Most of the changes are done with combine splitters, but for
*bsr_rex64_2 and *bsr_2 I had to use define_insn_and_split, because
as mentioned in the PR the combiner unfortunately doesn't create LOG_LINKS
in between the two insns created by combine splitter, so it can't be
combined further with following instructions.

2021-07-31  Jakub Jelinek  <jakub@redhat.com>

	PR target/78103
	* config/i386/i386.md (bsr_rex64_1, bsr_1, bsr_zext_1): New
	define_insn patterns.
	(*bsr_rex64_2, *bsr_2): New define_insn_and_split patterns.
	Add combine splitters for constant - clz.
	(clz<mode>2): Use a temporary pseudo for bsr result.

	* gcc.target/i386/pr78103-1.c: New test.
	* gcc.target/i386/pr78103-2.c: New test.
	* gcc.target/i386/pr78103-3.c: New test.
2021-07-31 09:19:32 +02:00
Hans-Peter Nilsson cfd60b39cd gcc.dg/tree-ssa/ssa-dse-26.c: Skip on mmix-knuth-mmixware
Commit r12-432, rewriting the dg-stuff, reverted the
adjustment for mmix-knuth-mmixware that I added in r11-2335.
(See those commits for context.)

Hopefully this variant will age better, just skipping it
with a trivial extra line less prone to pile-on.  (Not much
is won by covering this generic case for MMIX too; might as
well skip it.)

Beware that the dg-skip-if text can't say
"temporary variables are not x and y but x::3 and y::4"
because that leads to (on one line):

ERROR: gcc.dg/tree-ssa/ssa-dse-26.c: can't set "{temporary
 variables are not x and y but x::3 and y::4} {
 mmix-knuth-mmixware }": parent namespace doesn't exist for
 " dg-skip-if 4 "temporary variables are not x and y but
 x::3 and y::4" { mmix-knuth-mmixware } "

gcc/testsuite:
	* gcc.dg/tree-ssa/ssa-dse-26.c: Skip on mmix-knuth-mmixware.
2021-07-31 02:31:26 +02:00
Hans-Peter Nilsson 309ddde04f gcc.dg/uninit-pred-9_b.c: Xfail for MMIX too
Looks like MMIX is the "correct target" too (cf. 2f6bdd51cf)
and from
https://gcc.gnu.org/pipermail/gcc-testresults/2021-July/710188.html
it seems powerpc-ibm-aix7.2.3.0 is too, but I've not found
other targets failing.

gcc/testsuite:
	PR middle-end/101674
	* gcc.dg/uninit-pred-9_b.c: Xfail for mmix-*-* too.
2021-07-31 02:31:09 +02:00
Paul A. Clarke 15c8ad00d8 rs6000: Add tests for SSE4.1 "floor" intrinsics
Add the tests for _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss.
These are modelled after (and depend upon parts of) the tests for
_mm_ceil intrinsics, recently posted.

Copy a test for _mm_floor_sd from gcc/testsuite/gcc.target/i386.

2021-07-30  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
	* gcc.target/powerpc/sse4_1-floorpd.c: New.
	* gcc.target/powerpc/sse4_1-floorps.c: New.
	* gcc.target/powerpc/sse4_1-floorsd.c: New.
	* gcc.target/powerpc/sse4_1-floorss.c: New.
	* gcc.target/powerpc/sse4_1-roundpd-2.c: Copy from
	gcc/testsuite/gcc.target/i386 and adjust dg directives to suit.
2021-07-30 16:53:40 -05:00
Paul A. Clarke 5f50071543 rs6000: Add support for SSE4.1 "floor" intrinsics
2021-07-30  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_floor_pd, _mm_floor_ps,
	_mm_floor_sd, _mm_floor_ss): New.
2021-07-30 16:53:39 -05:00
Paul A. Clarke d656a3d3ce rs6000: Add tests for SSE4.1 "ceil" intrinsics
Add the tests for _mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss.

Copy a test for _mm_ceil_pd and _mm_ceil_ps from
gcc/testsuite/gcc.target/i386.

Define __VSX_SSE2__ to pick up some union definitions in
m128-check.h.

2021-07-30  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
	* gcc.target/powerpc/sse4_1-ceilpd.c: New.
	* gcc.target/powerpc/sse4_1-ceilps.c: New.
	* gcc.target/powerpc/sse4_1-ceilsd.c: New.
	* gcc.target/powerpc/sse4_1-ceilss.c: New.
	* gcc.target/powerpc/sse4_1-round-data.h: New.
	* gcc.target/powerpc/sse4_1-round.h: New.
	* gcc.target/powerpc/sse4_1-round2.h: New.
	* gcc.target/powerpc/sse4_1-roundpd-3.c: Copy from gcc.target/i386
	and adjust dg directives to suit.
	* gcc.target/powerpc/sse4_1-check.h (__VSX_SSE2__): Define.
2021-07-30 16:53:39 -05:00
Paul A. Clarke bd9a8737d4 rs6000: Add support for SSE4.1 "ceil" intrinsics
2021-07-30  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps,
	_mm_ceil_sd, _mm_ceil_ss): New.
2021-07-30 16:53:39 -05:00
Paul A. Clarke ed04cf6d73 rs6000: Add tests for SSE4.1 "blend" intrinsics
Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps,
_mm_blendv_ps from gcc/testsuite/gcc.target/i386.

2021-07-30  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
	* gcc.target/powerpc/sse4_1-blendpd.c: Copy from gcc.target/i386
	and adjust dg directives to suit.
	* gcc.target/powerpc/sse4_1-blendps-2.c: Likewise.
	* gcc.target/powerpc/sse4_1-blendps.c: Likewise.
	* gcc.target/powerpc/sse4_1-blendvpd.c: Likewise.
2021-07-30 16:53:39 -05:00
Paul A. Clarke 9d352c68e8 rs6000: Add support for SSE4.1 "blend" intrinsics
_mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
Add these four to complete the set.

2021-07-30  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
	_mm_blend_ps, _mm_blendv_ps): New.
2021-07-30 16:53:39 -05:00
Roger Sayle f7bf03cf69 Decrement followed by cmov improvements.
The following patch to the x86_64 backend improves the code generated
for a decrement followed by a conditional move.  The primary change is
to recognize that after subtracting one, checking the result is -1 (or
equivalently that the original value was zero) can be implemented using
the borrow/carry flag instead of requiring an explicit test instruction.
This is achieved by a new define_insn_and_split that allows combine to
split the desired sequence/composite into a *subsi_3 and *movsicc_noc.

The other change with this patch is/are a pair of peephole2 optimizations
to eliminate register-to-register moves generated during register
allocation.  During reload, the compiler doesn't know that inverting
the condition of a conditional cmove can sometimes reduce register
pressure, but this is easy to tidy up during the peephole2 pass (where
swapping the order of the insn's operands performs the required
logic inversion).

Both improvements are demonstrated by the case below:

int foo(int x) {
  if (x == 0)
    x = 16;
  else x--;
  return x;
}

Before:
foo:	leal    -1(%rdi), %eax
        testl   %edi, %edi
        movl    $16, %edx
        cmove   %edx, %eax
        ret

After:
foo:	subl    $1, %edi
        movl    $16, %eax
        cmovnc  %edi, %eax
        ret

And the value of the peephole2 clean-up can be seen on its own in:

int bar(int x) {
  x--;
  if (x == 0)
    x = 16;
  return x;
}

Before:
bar:	movl    %edi, %eax
        movl    $16, %edx
        subl    $1, %eax
        cmove   %edx, %eax
        ret

After:
bar:	subl    $1, %edi
        movl    $16, %eax
        cmovne  %edi, %eax
        ret

These idioms were inspired by the source code of NIST SciMark4's
Random_nextDouble function, where the tweaks above result in
a ~1% improvement in the MonteCarlo benchmark kernel.

2021-07-30  Roger Sayle  <roger@nextmovesoftware.com>
	    Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
	* config/i386/i386.md (*dec_cmov<mode>): New define_insn_and_split
	to generate a conditional move using the carry flag after sub $1.
	(peephole2): Eliminate a register-to-register move by inverting
	the condition of a conditional move.

gcc/testsuite/ChangeLog
	* gcc.target/i386/dec-cmov-1.c: New test.
	* gcc.target/i386/dec-cmov-2.c: New test.
2021-07-30 22:46:32 +01:00
Hans-Peter Nilsson 5b2515f5ae MMIX: remove generic placeholders parameters in call insn patterns.
I guess the best way to describe these operands, at least for MMIX, is
"ballast".  Some targets seem to drag along one or two of the incoming
pattern operands through the rtl passes and not dropping them until
assembly output.  Let's stop doing that for MMIX.  There really are
*two* unused parameters: one is a number corresponding to the
stack-size of arguments as a const_int and the other is whatever the
target yields for targetm.calls.function_arg (args_so_far,
function_arg_info::end_marker ()).  There's a mandatory second
argument to the "call" RTX, but the target doesn't have to keep it a
variable number; it can be replaced by (const_int 0) early, like this.

Astute readers may object that as the MMIX call-type insns (PUSHJ,
PUSHGO) have a parameter in addition to the address of the called
function, so should the emitted RTL.  But, that parameter depends only
on the local function, not the called function (IOW, it's the same for
all calls in a function), and its value isn't known until frame layout
time.  Having it a parameter in the emitted RTL for the call would
just be confusing.  (Maybe this will be amended later, if/when
improving "shrink-wrapping".)

gcc:
	* config/mmix/mmix.md ("call", "call_value", "*call_real")
	("*call_value_real"): Don't generate rtx mentioning the generic
	operands 1 and 2 to "call", and similarly for "call_value".
	* config/mmix/mmix.c (mmix_print_operand_punct_valid_p)
	(mmix_print_operand): Use '!' instead of 'p'.
2021-07-30 23:38:49 +02:00
Hans-Peter Nilsson ee189a7327 doc: correct documentation of "call" (et al) operand 2.
An old itch being scratched: the documentation lies; it's not "the
number of registers used as operands", unless the target makes a
special arrangement to that effect, and there's nothing in the guts of
gcc setting up or assuming those semantics.

Instead, see calls.c:expand_call, variable next_arg_reg.  Or just
consider the variable name.  The text is somewhat transcribed from the
head comment of emit_call_1 for parameter next_arg_reg.  Most
important is to document the relation to function_arg_info::end_marker()
and the TARGET_FUNCTION_ARG hook.

The "normally" in the head comment, in "normally it is the first
arg-register beyond those used for args in this call, or 0 if all the
arg-registers are used in this call" means "by default", unless the
target tests end_marker_p and does something special, but the port is
free to return whatever it likes when it sees the end-marker.

And, I do mean "whatever it likes" because if the port doesn't
actually mention that operand in the RTX emitted for its "call" or
"call_value" patterns ("usually" define_expands), it can be any
mumbo-jumbo, such as a VOIDmode register, which seems like it happens
for some targets, or NULL, that happens for others.  Returning a
VOIDmode register until recently included MMIX, where it made it into
the emitted RTL, confusing later passes, recently exposed as an ICE.

Tested by inspecting the info and generated pdf for sanity.

gcc:
	* doc/md.texi (call): Correct information about operand 2.
	* config/mmix/mmix.md ("call", "call_value"): Remove fixed FIXMEs.
2021-07-30 23:34:54 +02:00
Andrew MacLeod 145bc41dae Handle constants in wi_fold for trunc_mod.
Handle const % const, as wi_fold_in_parts may now provide this.  Before this
[10, 10] % [4, 4] would produce [0, 3] instead of [2, 2].

	gcc/
	* range-op.cc (operator_trunc_mod::wi_fold): Fold constants.

	gcc/testsuite/
	* gcc.dg/tree-ssa/pr61839_2.c: Adjust.  Add new const fold test.
2021-07-30 15:10:49 -04:00
Andrew MacLeod ebbcdd7fae Change integral divide by zero to produce UNDEFINED.
Instead of VARYING, we can get better results by treating divide by zero
as producing an undefined result.

	gcc/
	* range-op.cc (operator_div::wi_fold): Return UNDEFINED for [0, 0] divisor.

	gcc/testsuite/
	* gcc.dg/tree-ssa/pr61839_2.c: Adjust.
2021-07-30 15:10:48 -04:00
Andrew MacLeod d242acc396 Change const basic_block to const_basic_block.
* gimple-range-cache.cc (*::set_bb_range): Change const basic_block to
	const_basic_block..
	(*::get_bb_range): Ditto.
	(*::bb_range_p): Ditto.
	* gimple-range-cache.h: Change prototypes.
2021-07-30 15:10:48 -04:00
Martin Sebor 0b3560d3a9 Move failed part of a test to a new file [PR101671]
Related:
PR middle-end/101671 - pr83510 fails with -Os because threader confuses -Warray-bounds

gcc/testsuite:
	PR middle-end/101671
	* gcc.c-torture/compile/pr83510.c: Move test functions...
	* gcc.dg/Warray-bounds-87.c: ...to this file.
2021-07-30 11:44:09 -06:00
H.J. Lu e5e164effa Add QI vector mode support to by-pieces for memset
1. Replace scalar_int_mode with fixed_size_mode in the by-pieces
infrastructure to allow non-integer mode.
2. Rename widest_int_mode_for_size to widest_fixed_size_mode_for_size
to return QI vector mode for memset.
3. Add op_by_pieces_d::smallest_fixed_size_mode_for_size to return the
smallest integer or QI vector mode.
4. Remove clear_by_pieces_1 and use builtin_memset_read_str in
clear_by_pieces to support vector mode broadcast.
5. Add lowpart_subreg_regno, a wrapper around simplify_subreg_regno that
uses subreg_lowpart_offset (mode, prev_mode) as the offset.
6. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
scratch register to avoid stack realignment when expanding memset.

gcc/

	PR middle-end/90773
	* builtins.c (builtin_memcpy_read_str): Change the mode argument
	from scalar_int_mode to fixed_size_mode.
	(builtin_strncpy_read_str): Likewise.
	(gen_memset_value_from_prev): New function.
	(builtin_memset_read_str): Change the mode argument from
	scalar_int_mode to fixed_size_mode.  Use gen_memset_value_from_prev
	and support CONST_VECTOR.
	(builtin_memset_gen_str): Likewise.
	(try_store_by_multiple_pieces): Use by_pieces_constfn to declare
	constfun.
	* builtins.h (builtin_strncpy_read_str): Replace scalar_int_mode
	with fixed_size_mode.
	(builtin_memset_read_str): Likewise.
	* expr.c (widest_int_mode_for_size): Renamed to ...
	(widest_fixed_size_mode_for_size): Add a bool argument to
	indicate if QI vector mode can be used.
	(by_pieces_ninsns): Call widest_fixed_size_mode_for_size
	instead of widest_int_mode_for_size.
	(pieces_addr::adjust): Change the mode argument from
	scalar_int_mode to fixed_size_mode.
	(op_by_pieces_d): Make m_len read-only.  Add a bool member,
	m_qi_vector_mode, to indicate that QI vector mode can be used.
	(op_by_pieces_d::op_by_pieces_d): Add a bool argument to
	initialize m_qi_vector_mode.  Call widest_fixed_size_mode_for_size
	instead of widest_int_mode_for_size.
	(op_by_pieces_d::get_usable_mode): Change the mode argument from
	scalar_int_mode to fixed_size_mode.  Call
	widest_fixed_size_mode_for_size instead of
	widest_int_mode_for_size.
	(op_by_pieces_d::smallest_fixed_size_mode_for_size): New member
	function to return the smallest integer or QI vector mode.
	(op_by_pieces_d::run): Call widest_fixed_size_mode_for_size
	instead of widest_int_mode_for_size.  Call
	smallest_fixed_size_mode_for_size instead of
	smallest_int_mode_for_size.
	(store_by_pieces_d::store_by_pieces_d): Add a bool argument to
	indicate that QI vector mode can be used and pass it to
	op_by_pieces_d::op_by_pieces_d.
	(can_store_by_pieces): Call widest_fixed_size_mode_for_size
	instead of widest_int_mode_for_size.  Pass memsetp to
	widest_fixed_size_mode_for_size to support QI vector mode.
	Allow all CONST_VECTORs for memset if vec_duplicate is supported.
	(store_by_pieces): Pass memsetp to
	store_by_pieces_d::store_by_pieces_d.
	(clear_by_pieces_1): Removed.
	(clear_by_pieces): Replace clear_by_pieces_1 with
	builtin_memset_read_str and pass true to store_by_pieces_d to
	support vector mode broadcast.
	(string_cst_read_str): Change the mode argument from
	scalar_int_mode to fixed_size_mode.
	* expr.h (by_pieces_constfn): Change scalar_int_mode to
	fixed_size_mode.
	(by_pieces_prev): Likewise.
	* rtl.h (lowpart_subreg_regno): New.
	* rtlanal.c (lowpart_subreg_regno): New.  A wrapper around
	simplify_subreg_regno.
	* target.def (gen_memset_scratch_rtx): New hook.
	* doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
	* doc/tm.texi: Regenerated.

gcc/testsuite/

	* gcc.target/i386/pr100865-3.c: Expect vmovdqu8 instead of
	vmovdqu.
	* gcc.target/i386/pr100865-4b.c: Likewise.
2021-07-30 10:34:19 -07:00