Commit Graph

186921 Commits

Author SHA1 Message Date
GCC Administrator
c4fee1c646 Daily bump. 2021-07-15 00:16:54 +00:00
Peter Bergner
69feb7601e rs6000: Generate an lxvp instead of two adjacent lxv instructions
The MMA build built-ins currently use individual lxv instructions to
load up the registers of a __vector_pair or __vector_quad.  If the
memory addresses of the built-in operands are to adjacent locations,
then we can use an lxvp in some cases to load up two registers at once.
The patch below adds support for checking whether memory addresses are
adjacent and emitting an lxvp instead of two lxv instructions.

2021-07-14  Peter Bergner  <bergner@linux.ibm.com>

gcc/
	* config/rs6000/rs6000.c (adjacent_mem_locations): Return the lower
	addressed memory rtx, if any.
	(rs6000_split_multireg_move): Fix code formatting.
	Handle MMA build built-ins with operands in adjacent memory locations.

gcc/testsuite/
	* gcc.target/powerpc/mma-builtin-9.c: New test.
2021-07-14 18:27:02 -05:00
Peter Bergner
7d914777fc rs6000: Move rs6000_split_multireg_move to later in file
An upcoming change to rs6000_split_multireg_move requires it to be
moved later in the file to fix a declaration issue.

2021-07-14  Peter Bergner  <bergner@linux.ibm.com>

gcc/
	* config/rs6000/rs6000.c (rs6000_split_multireg_move): Move to later
	in the file.
2021-07-14 18:23:31 -05:00
Patrick Palka
bebd8e9da8 c++: CTAD and forwarding references [PR88252]
Here during CTAD we're incorrectly treating T&& as a forwarding
reference even though T is a template parameter of the class template.

This happens because the template parameter T in the out-of-line
definition of the constructor doesn't have the flag
TEMPLATE_TYPE_PARM_FOR_CLASS set, and during duplicate_decls the
the redeclaration (which is in terms of this unflagged T) prevails.
To fix this, we could perhaps be more consistent about setting the flag,
but it appears we don't really need this flag to make the determination.

Since the template parameters of an synthesized guide consist of the
template parameters of the class template followed by those of the
constructor (if any), it should suffice to look at the index of the
template parameter to determine whether it comes from the class
template or the constructor (template).  This patch replaces the
TEMPLATE_TYPE_PARM_FOR_CLASS flag with this approach.

	PR c++/88252

gcc/cp/ChangeLog:

	* cp-tree.h (TEMPLATE_TYPE_PARM_FOR_CLASS): Remove.
	* pt.c (push_template_decl): Remove TEMPLATE_TYPE_PARM_FOR_CLASS
	handling.
	(redeclare_class_template): Likewise.
	(forwarding_reference_p): Define.
	(maybe_adjust_types_for_deduction): Use it instead.  Add 'tparms'
	parameter.
	(unify_one_argument): Pass tparms to
	maybe_adjust_types_for_deduction.
	(try_one_overload): Likewise.
	(unify): Likewise.
	(rewrite_template_parm): Remove TEMPLATE_TYPE_PARM_FOR_CLASS
	handling.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1z/class-deduction96.C: New test.
2021-07-14 15:37:30 -04:00
Jason Merrill
91bb571d20 vec: use auto_vec in a few more places
The uses of vec<T> in get_all_loop_exits and process_conditional were memory
leaks, as .release() was never called for them.  The other changes are some
cases that did have proper release handling, but it's simpler to leave
releasing to the auto_vec destructor.

gcc/ChangeLog:

	* sel-sched-ir.h (get_all_loop_exits): Use auto_vec.

gcc/cp/ChangeLog:

	* class.c (struct find_final_overrider_data): Use auto_vec.
	(find_final_overrider): Remove explicit release.
	* coroutines.cc (process_conditional): Use auto_vec.
	* cp-gimplify.c (struct cp_genericize_data): Use auto_vec.
	(cp_genericize_tree): Remove explicit release.
	* parser.c (cp_parser_objc_at_property_declaration): Use
	auto_delete_vec.
	* semantics.c (omp_reduction_lookup): Use auto_vec.
2021-07-14 15:01:27 -04:00
Jason Merrill
b15e301748 c++: enable -fdelete-dead-exceptions by default
As I was discussing with richi, I don't think it makes sense to protect
calls to pure/const functions from DCE just because they aren't explicitly
declared noexcept.  PR100382 indicates that there are different
considerations for Go, which has non-call exceptions.  But still turn the
flag off for that specific testcase.

gcc/c-family/ChangeLog:

	* c-opts.c (c_common_post_options): Set -fdelete-dead-exceptions.

gcc/ChangeLog:

	* doc/invoke.texi: -fdelete-dead-exceptions is on by default for
	C++.

gcc/testsuite/ChangeLog:

	* g++.dg/torture/pr100382.C: Pass -fno-delete-dead-exceptions.
2021-07-14 14:59:56 -04:00
Tamar Christina
4940166a15 Vect: correct rebase issue
The lines being removed have been updated and merged into a new
condition.  But when resolving some conflicts I accidentally
reintroduced them causing some test failes.

This removes them.

Committed as the changes were previously approved in
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574977.html
but the hunk was misapplied during a rebase.

gcc/ChangeLog:

	* tree-vect-patterns.c (vect_recog_dot_prod_pattern):
	Remove erroneous line.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-reduc-dot-11.c: Expect pass.
	* gcc.dg/vect/vect-reduc-dot-15.c: Likewise.
	* gcc.dg/vect/vect-reduc-dot-19.c: Likewise.
	* gcc.dg/vect/vect-reduc-dot-21.c: Likewise.
2021-07-14 19:00:59 +01:00
Andrew MacLeod
398572c154 Turn hybrid mode off, default to ranger-only mode for EVRP.
Change the default EVRP mode to ranger-only.

	gcc/
	* params.opt (param_evrp_mode): Change default.

	gcc/testsuite/
	* gcc.dg/pr80776-1.c: Remove xfail.
2021-07-14 12:47:10 -04:00
Marek Polacek
a42f812044 c++: constexpr array reference and value-initialization [PR101371]
This PR gave me a hard time: I saw multiple issues starting with
different revisions.  But ultimately the root cause seems to be
the following, and the attached patch fixes all issues I've found
here.

In cxx_eval_array_reference we create a new constexpr context for the
CP_AGGREGATE_TYPE_P case, but we also have to create it for the
non-aggregate case.  In this test, we are evaluating

  ((B *)this)->a = rhs->a

which means that we set ctx.object to ((B *)this)->a.  Then we proceed
to evaluate the initializer, rhs->a.  For *rhs, we eval rhs, a PARM_DECL,
for which we have (const B &) &c.arr[0] in the hash table.  Then
cxx_fold_indirect_ref gives us c.arr[0].  c is evaluated to {.arr={}} so
c.arr is {}.  Now we want c.arr[0], so we end up in cxx_eval_array_reference
and since we're initializing from {}, we call build_value_init which
gives us an AGGR_INIT_EXPR that calls 'constexpr B::B()'.  Then we
evaluate this AGGR_INIT_EXPR and since its first argument is dummy,
we take ctx.object instead.  But that is the wrong object, we're not
initializing ((B *)this)->a here.  And so we wound up with an
initializer for A, and then crash in cxx_eval_component_reference:

  gcc_assert (DECL_CONTEXT (part) == TYPE_MAIN_VARIANT (TREE_TYPE (whole)));

where DECL_CONTEXT (part) is B (as it should be) but the type of whole
was A.

So create a new object, if there already was one, and the element type
is not a scalar.

	PR c++/101371

gcc/cp/ChangeLog:

	* constexpr.c (cxx_eval_array_reference): Create a new .object
	and .ctor for the non-aggregate non-scalar case too when
	value-initializing.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1y/constexpr-101371-2.C: New test.
	* g++.dg/cpp1y/constexpr-101371.C: New test.
2021-07-14 11:54:07 -04:00
Harald Anlauf
269ca408e2 Fortran - ICE in gfc_conv_expr_present initializing non-dummy class variable
gcc/fortran/ChangeLog:

	PR fortran/100949
	* trans-expr.c (gfc_trans_class_init_assign): Call
	gfc_conv_expr_present only for dummy variables.

gcc/testsuite/ChangeLog:

	PR fortran/100949
	* gfortran.dg/pr100949.f90: New test.
2021-07-14 17:25:29 +02:00
Tamar Christina
6d1cdb2782 AArch64: Correct dot-product auto-vect optab RTL
The current RTL for the vectorizer patterns for dot-product are incorrect.
Operand3 isn't an output parameter so we can't write to it.

This fixes this issue and reduces the number of RTL.

gcc/ChangeLog:

	* config/aarch64/aarch64-simd-builtins.def (udot, sdot): Rename to...
	(sdot_prod, udot_prod): ...These.
	* config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Remove.
	(aarch64_<sur>dot<vsi2qi>): Rename to...
	(<sur>dot_prod<vsi2qi>): ...This.
	* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32, vdotq_s32):
	Update builtins.
2021-07-14 15:41:31 +01:00
Tamar Christina
c9165e2d58 AArch32: Correct sdot RTL on aarch32
The RTL Generated from <sup>dot_prod<vsi2qi> is invalid as operand3 cannot be
written to, it's a normal input.  For the expand it's just another operand
but the caller does not expect it to be written to.

gcc/ChangeLog:

	* config/arm/neon.md (<sup>dot_prod<vsi2qi>): Drop statements.
2021-07-14 15:22:37 +01:00
Tamar Christina
1e0ab1c4ba middle-end: Add tests middle end generic tests for sign differing dotproduct.
This adds testcases to test for auto-vect detection of the new sign differing
dot product.

gcc/ChangeLog:

	* doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp
	(check_effective_target_arm_v8_2a_imm8_neon_ok_nocache,
	check_effective_target_arm_v8_2a_i8mm_neon_hw,
	check_effective_target_vect_usdot_qi): New.
	* gcc.dg/vect/vect-reduc-dot-9.c: New test.
	* gcc.dg/vect/vect-reduc-dot-10.c: New test.
	* gcc.dg/vect/vect-reduc-dot-11.c: New test.
	* gcc.dg/vect/vect-reduc-dot-12.c: New test.
	* gcc.dg/vect/vect-reduc-dot-13.c: New test.
	* gcc.dg/vect/vect-reduc-dot-14.c: New test.
	* gcc.dg/vect/vect-reduc-dot-15.c: New test.
	* gcc.dg/vect/vect-reduc-dot-16.c: New test.
	* gcc.dg/vect/vect-reduc-dot-17.c: New test.
	* gcc.dg/vect/vect-reduc-dot-18.c: New test.
	* gcc.dg/vect/vect-reduc-dot-19.c: New test.
	* gcc.dg/vect/vect-reduc-dot-20.c: New test.
	* gcc.dg/vect/vect-reduc-dot-21.c: New test.
	* gcc.dg/vect/vect-reduc-dot-22.c: New test.
2021-07-14 15:21:40 +01:00
Tamar Christina
6412c58c78 AArch32: Add support for sign differing dot-product usdot for NEON.
This adds optabs implementing usdot_prod.

The following testcase:

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

Generates

f:
        vmov.i32        q8, #0  @ v4si
        add     r3, r2, #480
.L2:
        vld1.8  {q10}, [r2]!
        vld1.8  {q9}, [r1]!
        vusdot.s8       q8, q9, q10
        cmp     r3, r2
        bne     .L2
        vadd.i32        d16, d16, d17
        vpadd.i32       d16, d16, d16
        vmov.32 r3, d16[0]
        add     r0, r0, r3
        bx      lr

instead of

f:
        vmov.i32        q8, #0  @ v4si
        add     r3, r2, #480
.L2:
        vld1.8  {q9}, [r2]!
        vld1.8  {q11}, [r1]!
        cmp     r3, r2
        vmull.s8 q10, d18, d22
        vmull.s8 q9, d19, d23
        vaddw.s16       q8, q8, d20
        vaddw.s16       q8, q8, d21
        vaddw.s16       q8, q8, d18
        vaddw.s16       q8, q8, d19
        bne     .L2
        vadd.i32        d16, d16, d17
        vpadd.i32       d16, d16, d16
        vmov.32 r3, d16[0]
        add     r0, r0, r3
        bx      lr

For NEON.  I couldn't figure out if the MVE instruction vmlaldav.s16 could be
used to emulate this.  Because it would require additional widening to work I
left MVE out of this patch set but perhaps someone should take a look.

gcc/ChangeLog:

	* config/arm/neon.md (usdot_prod<vsi2qi>): New.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/simd/vusdot-autovec.c: New test.
2021-07-14 15:20:45 +01:00
Tamar Christina
752045ed1e AArch64: Add support for sign differing dot-product usdot for NEON and SVE.
Hi All,

This adds optabs implementing usdot_prod.

The following testcase:

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

Generates for NEON

f:
        movi    v0.4s, 0
        mov     x3, 0
        .p2align 3,,7
.L2:
        ldr     q1, [x2, x3]
        ldr     q2, [x1, x3]
        usdot   v0.4s, v1.16b, v2.16b
        add     x3, x3, 16
        cmp     x3, 480
        bne     .L2
        addv    s0, v0.4s
        fmov    w1, s0
        add     w0, w0, w1
        ret

and for SVE

f:
        mov     x3, 0
        cntb    x5
        mov     w4, 480
        mov     z1.b, #0
        whilelo p0.b, wzr, w4
        mov     z3.b, #0
        ptrue   p1.b, all
        .p2align 3,,7
.L2:
        ld1b    z2.b, p0/z, [x1, x3]
        ld1b    z0.b, p0/z, [x2, x3]
        add     x3, x3, x5
        sel     z0.b, p0, z0.b, z3.b
        whilelo p0.b, w3, w4
        usdot   z1.s, z0.b, z2.b
        b.any   .L2
        uaddv   d0, p1, z1.s
        fmov    x1, d0
        add     w0, w0, w1
        ret

instead of

f:
        movi    v0.4s, 0
        mov     x3, 0
        .p2align 3,,7
.L2:
        ldr     q2, [x1, x3]
        ldr     q1, [x2, x3]
        add     x3, x3, 16
        sxtl    v4.8h, v2.8b
        sxtl2   v3.8h, v2.16b
        uxtl    v2.8h, v1.8b
        uxtl2   v1.8h, v1.16b
        mul     v2.8h, v2.8h, v4.8h
        mul     v1.8h, v1.8h, v3.8h
        saddw   v0.4s, v0.4s, v2.4h
        saddw2  v0.4s, v0.4s, v2.8h
        saddw   v0.4s, v0.4s, v1.4h
        saddw2  v0.4s, v0.4s, v1.8h
        cmp     x3, 480
        bne     .L2
        addv    s0, v0.4s
        fmov    w1, s0
        add     w0, w0, w1
        ret

and

f:
        mov     x3, 0
        cnth    x5
        mov     w4, 480
        mov     z1.b, #0
        whilelo p0.h, wzr, w4
        ptrue   p2.b, all
        .p2align 3,,7
.L2:
        ld1sb   z2.h, p0/z, [x1, x3]
        punpklo p1.h, p0.b
        ld1b    z0.h, p0/z, [x2, x3]
        add     x3, x3, x5
        mul     z0.h, p2/m, z0.h, z2.h
        sunpklo z2.s, z0.h
        sunpkhi z0.s, z0.h
        add     z1.s, p1/m, z1.s, z2.s
        punpkhi p1.h, p0.b
        whilelo p0.h, w3, w4
        add     z1.s, p1/m, z1.s, z0.s
        b.any   .L2
        uaddv   d0, p2, z1.s
        fmov    x1, d0
        add     w0, w0, w1
        ret

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (aarch64_usdot<vsi2qi>): Rename to...
	(usdot_prod<vsi2qi>): ... This.
	* config/aarch64/aarch64-simd-builtins.def (usdot): Rename to...
	(usdot_prod): ...This.
	* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Likewise.
	* config/aarch64/aarch64-sve.md (@aarch64_<sur>dot_prod<vsi2qi>):
	Rename to...
	(@<sur>dot_prod<vsi2qi>): ...This.
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svusdot_impl::expand): Use it.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/simd/vusdot-autovec.c: New test.
	* gcc.target/aarch64/sve/vusdot-autovec.c: New test.
2021-07-14 15:19:32 +01:00
Tamar Christina
ab0a6b213a Vect: Add support for dot-product where the sign for the multiplicant changes.
This patch adds support for a dot product where the sign of the multiplication
arguments differ. i.e. one is signed and one is unsigned but the precisions are
the same.

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

The operations are performed as if the operands were extended to a 32-bit value.
As such this operation isn't valid if there is an intermediate conversion to an
unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.

more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped the same
optab is used but the operands are flipped in the optab expansion.

To support this the patch extends the dot-product detection to optionally
ignore operands with different signs and stores this information in the optab
subtype which is now made a bitfield.

The subtype can now additionally controls which optab an EXPR can expand to.

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it and clarify other dot prod optabs.
	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
2021-07-14 14:54:26 +01:00
H.J. Lu
cc11b924bf x86: Don't enable UINTR in 32-bit mode
UINTR is available only in 64-bit mode.  Since the codegen target is
unknown when the the gcc driver is processing -march=native, to properly
handle UINTR for -march=native:

1. Pass "arch [32|64]" and "tune [32|64]" to host_detect_local_cpu to
indicate 32-bit and 64-bit codegen.
2. Change ix86_option_override_internal to enable UINTR only in 64-bit
mode for -march=CPU when PTA_CPU includes PTA_UINTR.

gcc/

	PR target/101395
	* config/i386/driver-i386.c (host_detect_local_cpu): Check
	"arch [32|64]" and "tune [32|64]" for 32-bit and 64-bit codegen.
	Enable UINTR only for 64-bit codegen.
	* config/i386/i386-options.c
	(ix86_option_override_internal::DEF_PTA): Skip PTA_UINTR if not
	in 64-bit mode.
	* config/i386/i386.h (ARCH_ARG): New.
	(CC1_CPU_SPEC): Pass "[arch|tune] 32" for 32-bit codegen and
	"[arch|tune] 64" for 64-bit codegen.

gcc/testsuite/

	PR target/101395
	* gcc.target/i386/pr101395-1.c: New test.
	* gcc.target/i386/pr101395-2.c: Likewise.
	* gcc.target/i386/pr101395-3.c: Likewise.
2021-07-14 05:14:31 -07:00
Jonathan Wakely
f9c2ce1dae libstdc++: Add noexcept-specifier to basic_string_view(It, End)
This adds a conditional noexcept to the C++20 constructor. The
std::to_address call cannot throw, so only taking the difference of the
two iterators can throw.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/std/string_view (basic_string_view(It, End)): Add
	noexcept-specifier.
	* testsuite/21_strings/basic_string_view/cons/char/range.cc:
	Check noexcept-specifier. Also check construction without CTAD.
2021-07-14 12:23:33 +01:00
Richard Biener
a967a3efd3 tree-optimization/101445 - fix negative stride SLP vect with gaps
The following fixes the IV adjustment for the gap in a negative
stride SLP vectorization.  The adjustment was in the wrong direction,
now fixes as in the patch.

2021-07-14  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101445
	* tree-vect-stmts.c (vectorizable_load): Do the gap adjustment
	of the IV in the correct direction for negative stride
	accesses.

	* gcc.dg/vect/pr101445.c: New testcase.
2021-07-14 12:31:42 +02:00
Jakub Jelinek
3be762c2ed godump: Fix -fdump-go-spec= reproduceability issue [PR101407]
pot_dummy_types is a hash_set from whose traversal the code prints some type
lines.  hash_set normally uses default_hash_traits which for pointer types
(the hash set hashes const char *) uses pointer_hash which hashes the
addresses of the pointers except of the least significant 3 bits.
With address space randomization, that results in non-determinism in the
-fdump-go-specs= generated file, each invocation can have different order of
the lines emitted from pot_dummy_types traversal.

This patch fixes it by hashing the string contents instead to make the
hashes reproduceable.

2021-07-14  Jakub Jelinek  <jakub@redhat.com>

	PR go/101407
	* godump.c (godump_str_hash): New type.
	(godump_container::pot_dummy_types): Use string_hash instead of
	ptr_hash in the hash_set.
2021-07-14 10:22:50 +02:00
Richard Biener
1dd3f21095 Support reduction def re-use for epilogue with different vector size
The following adds support for re-using the vector reduction def
from the main loop in vectorized epilogue loops on architectures
which use different vector sizes for the epilogue.  That's only
x86 as far as I am aware.

2021-07-13  Richard Biener  <rguenther@suse.de>

	* tree-vect-loop.c (vect_find_reusable_accumulator): Handle
	vector types where the old vector type has a multiple of
	the new vector type elements.
	(vect_create_partial_epilog): New function, split out from...
	(vect_create_epilog_for_reduction): ... here.
	(vect_transform_cycle_phi): Reduce the re-used accumulator
	to the new vector type.

	* gcc.target/i386/vect-reduc-1.c: New testcase.
2021-07-14 08:15:17 +02:00
Alexandre Oliva
a7098d6ef4 fix typo in attr_fnspec::verify
Odd-numbered indices describing argument access sizes in the fnspec
string can only hold 't' or a digit, as tested in the beginning of the
case.  When checking that the size-supplying argument does not have
additional information associated with it, the test that excludes the
't' possibility looks for it at the even position in the fnspec
string.  Oops.

This might yield false positives and negatives if a function has a
fnspec in which an argument uses a 't' access-size, and ('t' - '1')
happens to be the index of an argument described in an fnspec string.
Assuming ASCII encoding, it would take a function with at least 68
arguments described in fnspec.  Still, probably worth fixing.


for  gcc/ChangeLog

	* tree-ssa-alias.c (attr_fnspec::verify): Fix index in
	non-'t'-sized arg check.
2021-07-13 22:28:25 -03:00
Alexandre Oliva
66907e7399 adjust landing pads when changing main label
If an artificial label created for a landing pad ends up being
dropped in favor of a user-supplied label, the user-supplied label
inherits the landing pad index, but the post_landing_pad field is not
adjusted to point to the new label.

This patch fixes the problem, and adds verification that we don't
remove a label that's still used as a landing pad.

The circumstance in which this problem can be hit was unusual: removal
of a block with an unreachable label moves the label to some other
unrelated block, in case its address is taken.  In the case at hand
(pr42739.C, complicated by wrappers and cleanups), the chosen block
happened to be an EH landing pad.  (A followup patch will change that.)


for  gcc/ChangeLog

	* tree-cfg.c (cleanup_dead_labels_eh): Update
	post_landing_pad label upon change of landing pad block's
	primary label.
	(cleanup_dead_labels): Check that a removed label is not that
	of a landing pad.
2021-07-13 22:25:54 -03:00
GCC Administrator
0e7754560f Daily bump. 2021-07-14 00:16:44 +00:00
Jonathan Wright
8695bf78da gcc: Add vec_select -> subreg RTL simplification
Add a new RTL simplification for the case of a VEC_SELECT selecting
the low part of a vector. The simplification returns a SUBREG.

The primary goal of this patch is to enable better combinations of
Neon RTL patterns - specifically allowing generation of 'write-to-
high-half' narrowing intructions.

Adding this RTL simplification means that the expected results for a
number of tests need to be updated:
* aarch64 Neon: Update the scan-assembler regex for intrinsics tests
  to expect a scalar register instead of lane 0 of a vector.
* aarch64 SVE: Likewise.
* arm MVE: Use lane 1 instead of lane 0 for lane-extraction
  intrinsics tests (as the move instructions get optimized away for
  lane 0.)

This patch also adds new code generation tests to
narrow_high_combine.c to verify the benefit of this RTL
simplification.

gcc/ChangeLog:

2021-06-08  Jonathan Wright  <jonathan.wright@arm.com>

	* combine.c (combine_simplify_rtx): Add vec_select -> subreg
	simplification.
	* config/aarch64/aarch64.md (*zero_extend<SHORT:mode><GPI:mode>2_aarch64):
	Add Neon to general purpose register case for zero-extend
	pattern.
	* config/arm/vfp.md (*arm_movsi_vfp): Remove "*" from *t -> r
	case to prevent some cases opting to go through memory.
	* cse.c (fold_rtx): Add vec_select -> subreg simplification.
	* rtl.c (rtvec_series_p): Define predicate to determine
	whether a vector contains a linear series of integers.
	* rtl.h (rtvec_series_p): Define.
	* rtlanal.c (vec_series_lowpart_p): Define predicate to
	determine if a vector selection is equivalent to the low part
	of the vector.
	* rtlanal.h (vec_series_lowpart_p): Define.
	* simplify-rtx.c (simplify_context::simplify_binary_operation_1):
	Add vec_select -> subreg simplification.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/extract_zero_extend.c: Remove dump scan
	for RTL pattern match.
	* gcc.target/aarch64/narrow_high_combine.c: Add new tests.
	* gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Update
	scan-assembler regex to look for a scalar register instead of
	lane 0 of a vector.
	* gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: Likewise.
	* gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: Likewise.
	* gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: Likewise.
	* gcc.target/aarch64/simd/vqdmlalh_lane_s16.c: Likewise.
	* gcc.target/aarch64/simd/vqdmlals_lane_s32.c: Likewise.
	* gcc.target/aarch64/simd/vqdmlslh_lane_s16.c: Likewise.
	* gcc.target/aarch64/simd/vqdmlsls_lane_s32.c: Likewise.
	* gcc.target/aarch64/simd/vqdmullh_lane_s16.c: Likewise.
	* gcc.target/aarch64/simd/vqdmullh_laneq_s16.c: Likewise.
	* gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Likewise.
	* gcc.target/aarch64/simd/vqdmulls_laneq_s32.c: Likewise.
	* gcc.target/aarch64/sve/dup_lane_1.c: Likewise.
	* gcc.target/aarch64/sve/extract_1.c: Likewise.
	* gcc.target/aarch64/sve/extract_2.c: Likewise.
	* gcc.target/aarch64/sve/extract_3.c: Likewise.
	* gcc.target/aarch64/sve/extract_4.c: Likewise.
	* gcc.target/aarch64/sve/live_1.c: Update scan-assembler regex
	cases to look for 'b' and 'h' registers instead of 'w'.
	* gcc.target/arm/crypto-vsha1cq_u32.c: Update scan-assembler
	regex to reflect lane 0 vector extractions being simplified
	to scalar register moves.
	* gcc.target/arm/crypto-vsha1h_u32.c: Likewise.
	* gcc.target/arm/crypto-vsha1mq_u32.c: Likewise.
	* gcc.target/arm/crypto-vsha1pq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: Extract
	lane 1 as the moves for lane 0 now get optimized away.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise.
2021-07-13 21:02:58 +01:00
Paul A. Clarke
60aee15bb7 rs6000: Add tests for SSE4.1 "test" intrinsics
Copy the test for _mm_testz_si128, _mm_testc_si128,
_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
_mm_test_mix_ones_zeros from gcc/testsuite/gcc.target/i386.

2021-07-13  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
	* gcc.target/powerpc/sse4_1-ptest-1.c: Copy from
	gcc/testsuite/gcc.target/i386.
2021-07-13 13:50:24 -05:00
Paul A. Clarke
acd4b9103c rs6000: Add support for SSE4.1 "test" intrinsics
2021-07-13  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_testz_si128, _mm_testc_si128,
	_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
	_mm_test_mix_ones_zeros): New.
2021-07-13 13:46:34 -05:00
Jonathan Wakely
4d3eaeb4f5 libstdc++: Simplify basic_string_view::ends_with [PR 101361]
The use of npos triggers a diagnostic as described in PR c++/101361.
This change replaces the use of npos with the exact length, which is
already known. We can further simplify it by inlining the effects of
compare and substr, avoiding the redundant range checks in the latter.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	PR c++/101361
	* include/std/string_view (ends_with): Use traits_type::compare
	directly.
2021-07-13 15:21:26 +01:00
Andrew MacLeod
f75560398a Adjust testcase to test the call is removed.
Ranger now handles the test.

	gcc/testsuite
	PR tree-optimization/93781
	* gcc.dg/tree-ssa/pr93781-1.c: Check that call is removed.
2021-07-13 09:43:18 -04:00
Roger Sayle
9aa5001ef4 Make gimple_could_trap_p const-safe.
Allow gimple_could_trap_p (which previously took a non-const gimple)
to be called from functions that take a const gimple (such as
gimple_has_side_effects), and update its prototypes.  Pre-approved
as obvious.

2021-07-13  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	* gimple.c (gimple_could_trap_p_1):  Make S argument a
	"const gimple*".  Preserve constness in call to
	gimple_asm_volatile_p.
	(gimple_could_trap_p): Make S argument a "const gimple*".
	* gimple.h (gimple_could_trap_p_1, gimple_could_trap_p):
	Update function prototypes.
2021-07-13 14:01:41 +01:00
Jonathan Wakely
bd1eb556b9 libstdc++: Remove duplicate #include in <string_view>
When I added the new C++23 constructor I added a conditional include of
<bits/ranges_base.h>, which was already being included unconditionally.
This removes the unconditional include but changes the condition for the
other one, so it's used for C++20 as well.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/std/string_view: Only include <bits/ranges_base.h>
	once, and only for C++20 and later.
2021-07-13 12:09:37 +01:00
Richard Sandiford
1583b8bff0 vect: Reuse reduction accumulators between loops
This patch adds support for reusing a main loop's reduction accumulator
in an epilogue loop.  This in turn lets the loops share a single piece
of vector->scalar reduction code.

The patch has the following restrictions:

(1) The epilogue reduction can only operate on a single vector
    (e.g. ncopies must be 1 for non-SLP reductions, and the group size
    must be <= the element count for SLP reductions).

(2) Both loops must use the same vector mode for their accumulators.
    This means that the patch is restricted to targets that support
    --param vect-partial-vector-usage=1.

(3) The reduction must be a standard “tree code” reduction.

However, these restrictions could be lifted in future.  For example,
if the main loop operates on 128-bit vectors and the epilogue loop
operates on 64-bit vectors, we could in future reduce the 128-bit
vector by one stage and use the 64-bit result as the starting point
for the epilogue result.

The patch tries to handle chained SLP reductions, unchained SLP
reductions and non-SLP reductions.  It also handles cases in which
the epilogue loop is entered directly (rather than via the main loop)
and cases in which the epilogue loop can be skipped.

vect_get_main_loop_result is a bit more general than the current
patch needs.

gcc/
	* tree-vectorizer.h (vect_reusable_accumulator): New structure.
	(_loop_vec_info::main_loop_edge): New field.
	(_loop_vec_info::skip_main_loop_edge): Likewise.
	(_loop_vec_info::skip_this_loop_edge): Likewise.
	(_loop_vec_info::reusable_accumulators): Likewise.
	(_stmt_vec_info::reduc_scalar_results): Likewise.
	(_stmt_vec_info::reused_accumulator): Likewise.
	(vect_get_main_loop_result): Declare.
	* tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize
	reduc_scalar_inputs.
	(vec_info::free_stmt_vec_info): Free reduc_scalar_inputs.
	* tree-vect-loop-manip.c (vect_get_main_loop_result): New function.
	(vect_do_peeling): Fill an epilogue loop's main_loop_edge,
	skip_main_loop_edge and skip_this_loop_edge fields.
	* tree-vect-loop.c (INCLUDE_ALGORITHM): Define.
	(vect_emit_reduction_init_stmts): New function.
	(get_initial_def_for_reduction): Use it.
	(get_initial_defs_for_reduction): Likewise.  Change the vinfo
	parameter to a loop_vec_info.
	(vect_create_epilog_for_reduction): Store the scalar results
	in the reduc_info.  If an epilogue loop is reusing an accumulator
	from the main loop, and if the epilogue loop can also be skipped,
	try to place the reduction code in the join block.  Record
	accumulators that could potentially be reused by epilogue loops.
	(vect_transform_cycle_phi): When vectorizing epilogue loops,
	try to reuse accumulators from the main loop.  Record the initial
	value in reduc_info for non-SLP reductions too.

gcc/testsuite/
	* gcc.target/aarch64/sve/reduc_9.c: New test.
	* gcc.target/aarch64/sve/reduc_9_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_10.c: Likewise.
	* gcc.target/aarch64/sve/reduc_10_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_11.c: Likewise.
	* gcc.target/aarch64/sve/reduc_11_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_12.c: Likewise.
	* gcc.target/aarch64/sve/reduc_12_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_13.c: Likewise.
	* gcc.target/aarch64/sve/reduc_13_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_14.c: Likewise.
	* gcc.target/aarch64/sve/reduc_14_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_15.c: Likewise.
	* gcc.target/aarch64/sve/reduc_15_run.c: Likewise.
2021-07-13 10:17:43 +01:00
Richard Sandiford
7670b6633e vect: Simplify get_initial_def_for_reduction
After previous patches, we can now easily provide the neutral op
as an argument to get_initial_def_for_reduction.  This in turn
allows the adjustment calculation to be moved outside of
get_initial_def_for_reduction, which is the main motivation
of the patch.

gcc/
	* tree-vect-loop.c (get_initial_def_for_reduction): Remove
	adjustment handling.  Take the neutral value as an argument,
	in place of the code argument.
	(vect_transform_cycle_phi): Update accordingly.  Handle the
	initial values of cond reductions separately from code reductions.
	Choose the adjustment here rather than in
	get_initial_def_for_reduction.  Sink the splat of vec_initial_def.
2021-07-13 10:17:42 +01:00
Richard Sandiford
221bdb333b vect: Generalise neutral_op_for_slp_reduction
This patch generalises the interface to neutral_op_for_slp_reduction
so that it can be used for non-SLP reductions too.  This isn't much
of a win on its own, but it helps later patches.

gcc/
	* tree-vect-loop.c (neutral_op_for_slp_reduction): Replace with...
	(neutral_op_for_reduction): ...this, providing a more general
	interface.
	(vect_create_epilog_for_reduction): Update accordingly.
	(vectorizable_reduction): Likewise.
	(vect_transform_cycle_phi): Likewise.
2021-07-13 10:17:41 +01:00
Richard Sandiford
bd5a69191f vect: Pass reduc_info to get_initial_def_for_reduction
Similarly to the previous patch, this one passes the reduc_info
to get_initial_def_for_reduction, rather than a stmt_vec_info that
lacks the metadata.  This again becomes useful later.

gcc/
	* tree-vect-loop.c (get_initial_def_for_reduction): Take the
	reduc_info instead of the original stmt_vec_info.
	(vect_transform_cycle_phi): Update accordingly.
2021-07-13 10:17:40 +01:00
Richard Sandiford
826c452e57 vect: Pass reduc_info to get_initial_defs_for_reduction
This patch passes the reduc_info to get_initial_defs_for_reduction,
so that the function can get general information from there rather
than from the first SLP statement.  This isn't a win on its own,
but it becomes important with later patches.

gcc/
	* tree-vect-loop.c (get_initial_defs_for_reduction): Take the
	reduc_info as an additional parameter.
	(vect_transform_cycle_phi): Update accordingly.
2021-07-13 10:17:39 +01:00
Richard Sandiford
d592920c89 vect: Add a vect_phi_initial_value helper function
This patch adds a helper function called vect_phi_initial_value
for returning the incoming value of a given loop phi.  The main
reason for adding it is to ensure that the right preheader edge
is used when vectorising nested loops.  (PHI_ARG_DEF_FROM_EDGE
itself doesn't assert that the given edge is for the right block,
although I guess that would be good to add separately.)

gcc/
	* tree-vectorizer.h: Include tree-ssa-operands.h.
	(vect_phi_initial_value): New function.
	* tree-vect-loop.c (neutral_op_for_slp_reduction): Use it.
	(get_initial_defs_for_reduction, info_for_reduction): Likewise.
	(vect_create_epilog_for_reduction, vectorizable_reduction): Likewise.
	(vect_transform_cycle_phi, vectorizable_induction): Likewise.
2021-07-13 10:17:39 +01:00
Richard Sandiford
32b8edd529 vect: Ensure reduc_inputs always have vectype
Vector reduction accumulators can differ in signedness from the
final scalar result.  The conversions to handle that case were
distributed through vect_create_epilog_for_reduction; this patch
does the conversion up-front instead.

gcc/
	* tree-vect-loop.c (vect_create_epilog_for_reduction): Convert
	the phi results to vectype after creating them.  Remove later
	conversion code that thus becomes redundant.
2021-07-13 10:17:38 +01:00
Richard Sandiford
81ad6bfc07 vect: Remove new_phis from vect_create_epilog_for_reduction
vect_create_epilog_for_reduction had a variable called new_phis.
It collected the statements that produce the exit block definitions
of the vector reduction accumulators.  Although those statements
are indeed phis initially, they are often replaced with normal
statements later, leading to puzzling code like:

          FOR_EACH_VEC_ELT (new_phis, i, new_phi)
            {
              int bit_offset;
              if (gimple_code (new_phi) == GIMPLE_PHI)
                vec_temp = PHI_RESULT (new_phi);
              else
                vec_temp = gimple_assign_lhs (new_phi);

Also, although the array collects statements, in practice all users want
the lhs instead.

This patch therefore replaces new_phis with a vector of gimple values
called “reduc_inputs”.

Also, reduction chains and ncopies>1 were handled with identical code
(and there was a comment saying so).  The patch unites them into
a single “if”.

gcc/
	* tree-vect-loop.c (vect_create_epilog_for_reduction): Replace
	the new_phis vector with a reduc_inputs vector.  Combine handling
	of reduction chains and ncopies > 1.
2021-07-13 10:17:37 +01:00
Richard Sandiford
b68eb70bd6 vect: Create array_slice of live-out stmts
This patch constructs an array_slice of the scalar statements that
produce live-out reduction results in the original unvectorised loop.
There are three cases:

- SLP reduction chains: the final SLP stmt is live-out
- full SLP reductions: all SLP stmts are live-out
- non-SLP reductions: the single scalar stmt is live-out

This is a slight simplification on its own, mostly because it maans
“group_size” has a consistent meaning throughout the function.
The main justification though is that it helps with later patches.

gcc/
	* tree-vect-loop.c (vect_create_epilog_for_reduction): Truncate
	scalar_results to group_size elements after reducing down from
	N*group_size elements.  Construct an array_slice of the live-out
	stmts and assert that there is one stmt per scalar result.
2021-07-13 10:17:36 +01:00
Richard Sandiford
3658ee4c73 vect: Simplify epilogue reduction code
vect_create_epilog_for_reduction only handles two cases: single-loop
reductions and double reductions.  “nested cycles” (i.e. reductions
in the inner loop when vectorising an outer loop) are handled elsewhere
and don't need a vector->scalar reduction.

The function had variables called nested_in_vect_loop and double_reduc
and asserted that nested_in_vect_loop implied double_reduc, but it
still had code to handle nested_in_vect_loop && !double_reduc.
This patch removes that and uses double_reduc everywhere.

gcc/
	* tree-vect-loop.c (vect_create_epilog_for_reduction): Remove
	nested_in_vect_loop and use double_reduc everywhere.  Remove dead
	assignment to "loop".
2021-07-13 10:17:35 +01:00
Richard Sandiford
0ae469e8c0 ifcvt: Improve tests for predicated operations
-msve-vector-bits=128 causes the AArch64 port to list 128-bit Advanced
SIMD as the first-choice mode for vectorisation, with SVE being used for
things that Advanced SIMD can't handle as easily.  However, ifcvt would
not then try to use SVE's predicated FP arithmetic, leading to tests
like TSVC ControlFlow-flt failing to vectorise.

The mask load/store code did try other vector modes, but could also be
improved to make sure that SVEness sticks when computing derived modes.

(Unlike mode_for_vector, related_vector_mode always returns a vector
mode, so there's no need to check VECTOR_MODE_P as well.)

gcc/
	* internal-fn.c (vectorized_internal_fn_supported_p): Handle
	vector types first.  For scalar types, consider both the preferred
	vector mode and the alternative vector modes.
	* optabs-query.c (can_vec_mask_load_store_p): Use the same
	structure as above, in particular using related_vector_mode
	for modes provided by autovectorize_vector_modes.

gcc/testsuite/
	* gcc.target/aarch64/sve/cond_arith_6.c: New test.
2021-07-13 10:17:34 +01:00
Jakub Jelinek
dddb6ffdc5 passes: Fix up subobject __bos [PR101419]
The following testcase is miscompiled, because VN during cunrolli changes
__bos argument from address of a larger field to address of a smaller field
and so __builtin_object_size (, 1) then folds into smaller value than the
actually available size.
copy_reference_ops_from_ref has a hack for this, but it was using
cfun->after_inlining as a check whether the hack can be ignored, and
cunrolli is after_inlining.

This patch uses a property to make it exact (set at the end of objsz
pass that doesn't do insert_min_max_p) and additionally based on discussions
in the PR moves the objsz pass earlier after IPA.

2021-07-13  Jakub Jelinek  <jakub@redhat.com>
	    Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101419
	* tree-pass.h (PROP_objsz): Define.
	(make_pass_early_object_sizes): Declare.
	* passes.def (pass_all_early_optimizations): Rename pass_object_sizes
	there to pass_early_object_sizes, drop parameter.
	(pass_all_optimizations): Move pass_object_sizes right after pass_ccp,
	drop parameter, move pass_post_ipa_warn right after that.
	* tree-object-size.c (pass_object_sizes::execute): Rename to...
	(object_sizes_execute): ... this.  Add insert_min_max_p argument.
	(pass_data_object_sizes): Move after object_sizes_execute.
	(pass_object_sizes): Likewise.  In execute method call
	object_sizes_execute, drop set_pass_param method and insert_min_max_p
	non-static data member and its initializer in the ctor.
	(pass_data_early_object_sizes, pass_early_object_sizes,
	make_pass_early_object_sizes): New.
	* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Use
	(cfun->curr_properties & PROP_objsz) instead of cfun->after_inlining.

	* gcc.dg/builtin-object-size-10.c: Pass -fdump-tree-early_objsz-details
	instead of -fdump-tree-objsz1-details in dg-options and adjust names
	of dump file in scan-tree-dump.
	* gcc.dg/pr101419.c: New test.
2021-07-13 11:04:22 +02:00
Jakub Jelinek
42f10ba5b5 libgomp: Don't include limits.h instead of hidden visibility block
sem.h is included in between # pragma GCC visibility push(hidden)
and # pragma GCC visibility pop and includes limits.h there, which
since the introduction of sysconf declaration in recent glibcs
in there causes trouble.  libgomp assumes it is compiled by gcc,
so we don't really need to include limits.h there and can use
-__INT_MAX__ - 1 instead (which clang and icc support too for years).

2021-07-13  Jakub Jelinek  <jakub@redhat.com>
	    Florian Weimer  <fweimer@redhat.com>

	* config/linux/sem.h: Don't include limits.h.
	(SEM_WAIT): Define to -__INT_MAX__ - 1 instead of INT_MIN.
	* config/linux/affinity.c: Include limits.h.
2021-07-13 09:50:49 +02:00
Kito Cheng
18a463bb66 docs: Add 'S' to Machine Constraints for RISC-V
It was undocument before, but it might used in linux kernel for resolve
code model issue, so LLVM community suggest we should document that,
so that make it become supported/documented/non-internal machine constraints.

gcc/ChangeLog:

	PR target/101275
	* config/riscv/constraints.md ("S"): Update description and remove
	@internal.
	* doc/md.texi (Machine Constraints): Document the 'S' constraints
	for RISC-V.
2021-07-13 14:09:34 +08:00
Richard Biener
f546e2b6cc Revert "Display the number of components BB vectorized"
This reverts commit c03cae4e06.
2021-07-13 08:04:55 +02:00
Michael Meissner
063eba7ca7 Deal with prefixed loads/stores in tests, PR testsuite/100166
This patch updates the various tests in the testsuite to treat plxv
and pstxv as being vector loads/stores.  This shows up if you run the
testsuite with a compiler configured with the option: --with-cpu=power10.

2021-07-13  Michael Meissner  <meissner@linux.ibm.com>

gcc/testsuite/
	PR testsuite/100166
	* gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c: Update
	insn counts to account for power10 prefixed loads and stores.
	* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-short.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-char.c: Likewise.
	* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-double.c: Likewise.
	* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-float.c: Likewise.
	* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-int.c: Likewise.
	* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-longlong.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-short.c: Likewise.
	* gcc.target/powerpc/fold-vec-load-vec_xl-char.c: Likewise.
	* gcc.target/powerpc/fold-vec-load-vec_xl-double.c: Likewise.
	* gcc.target/powerpc/fold-vec-load-vec_xl-float.c: Likewise.
	* gcc.target/powerpc/fold-vec-load-vec_xl-int.c: Likewise.
	* gcc.target/powerpc/fold-vec-load-vec_xl-longlong.c: Likewise.
	* gcc.target/powerpc/fold-vec-load-vec_xl-short.c: Likewise.
	* gcc.target/powerpc/fold-vec-splat-floatdouble.c: Likewise.
	* gcc.target/powerpc/fold-vec-splat-longlong.c: Likewise.
	* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-char.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-double.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-float.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-int.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-longlong.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-short.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-store-vec_vsx_st-char.c: Likewise.
	* gcc.target/powerpc/fold-vec-store-vec_vsx_st-double.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-store-vec_vsx_st-float.c: Likewise.
	* gcc.target/powerpc/fold-vec-store-vec_vsx_st-int.c: Likewise.
	* gcc.target/powerpc/fold-vec-store-vec_vsx_st-longlong.c:
	Likewise.
	* gcc.target/powerpc/fold-vec-store-vec_vsx_st-short.c: Likewise.
	* gcc.target/powerpc/fold-vec-store-vec_xst-char.c: Likewise.
	* gcc.target/powerpc/fold-vec-store-vec_xst-double.c: Likewise.
	* gcc.target/powerpc/fold-vec-store-vec_xst-float.c: Likewise.
	* gcc.target/powerpc/fold-vec-store-vec_xst-int.c: Likewise.
	* gcc.target/powerpc/fold-vec-store-vec_xst-longlong.c: Likewise.
	* gcc.target/powerpc/fold-vec-store-vec_xst-short.c: Likewise.
	* gcc.target/powerpc/lvsl-lvsr.c: Likewise.
	* gcc.target/powerpc/pr86731-fwrapv-longlong.c: Likewise.
2021-07-13 00:41:21 -04:00
Michael Meissner
31ff034a1e Fix vec-splati-runnable.c test.
I noticed that the vec-splati-runnable.c did not have an abort after one
of the tests.  If the test was run with optimization, the optimizer could
delete some of the tests and throw off the count.  However, due to the
fact that the value being loaded in that test is undefined, I did not
check what value was loaded, but I just stored it into a volatile global
variable.

2021-07-12  Michael Meissner  <meissner@linux.ibm.com>

gcc/testsuite/
	* gcc.target/powerpc/vec-splati-runnable.c: Run test with -O2
	optimization.  Do not check what XXSPLTIDP generates if the value
	is undefined.
2021-07-12 23:51:24 -04:00
Michael Meissner
7591309696 Change rs6000_const_f32_to_i32 return type.
The function rs6000_const_f32_to_i32 called REAL_VALUE_TO_TARGET_SINGLE
with a long long type and returns it.  This patch changes the type to long
which is the proper type for REAL_VALUE_TO_TARGET_SINGLE.

2021-07-12  Michael Meissner  <meissner@linux.ibm.com>

gcc/
	* config/rs6000/altivec.md (xxspltiw_v4sf): Change local variable
	value to to long.
	* config/rs6000/rs6000-protos.h (rs6000_const_f32_to_i32): Change
	return type to long.
	* config/rs6000/rs6000.c (rs6000_const_f32_to_i32): Change return
	type to long.
2021-07-12 23:50:38 -04:00
GCC Administrator
07bcbf9cc2 Daily bump. 2021-07-13 00:16:30 +00:00