Commit Graph

186731 Commits

Author SHA1 Message Date
Tamar Christina
8e321f2a63 Revert "AArch32: Correct sdot RTL on aarch32"
This reverts commit c9165e2d58.
2021-07-15 13:16:15 +01:00
Tamar Christina
5402023f05 Revert "AArch64: Correct dot-product auto-vect optab RTL"
This reverts commit 6d1cdb2782.
2021-07-15 13:16:00 +01:00
Jakub Jelinek
f6dde32b9d gimplify: Fix endless recursion on volatile empty type reads/writes [PR101437]
Andrew's recent change to optimize away during gimplification not just
assignments of zero sized types, but also assignments of empty types,
caused infinite recursion in the gimplifier.
If such assignment is optimized away, we gimplify separately the to_p
and from_p operands and throw away the result.  When gimplifying the
operand that is volatile, we run into the gimplifier code below, which has
different handling for types with non-BLKmode mode, tries to gimplify
those as vol.N = expr, and for BLKmode just throws those away.
Zero sized types will always have BLKmode and so are fine, but for the
non-BLKmode ones like struct S in the testcase, the vol.N = expr
gimplification will reach again the gimplify_modify_expr code, see it is
assignment of empty type and will gimplify again vol.N separately
(non-volatile, so ok) and expr, on which it will recurse again.

The following patch breaks that infinite recursion by ignoring bare
volatile loads from empty types.
If a volatile load or store for aggregates are supposed to be member-wise
loads or stores, then there are no non-padding members in the empty types that
should be copied and so it is probably ok.

2021-07-15  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/101437
	* gimplify.c (gimplify_expr): Throw away volatile reads from empty
	types even if they have non-BLKmode TYPE_MODE.

	* gcc.c-torture/compile/pr101437.c: New test.
2021-07-15 10:17:06 +02:00
Alan Modra
cd6ca96f5d [POWER10] __morestack calls from pcrel code
Compiling gcc/testsuite/gcc.dg/split-*.c and others with -mcpu=power10
and linking with a non-pcrel libgcc results in crashes due to the
power10 pcrel code not having r2 set for the generic-morestack.c
functions called from __morestack.  There is also a problem when
non-pcrel code calls a pcrel libgcc.  See the patch comments.

A similar situation theoretically occurs with ELFv1 multi-toc
executables, when __morestack might be located in a different toc
group to its caller.  This patch makes no attempt to fix that, since
the gold linker does not support multi-toc (gold is needed for proper
support of -fsplit-stack code) nor does gcc emit __morestack calls
that support multi-toc.

	* config/rs6000/morestack.S (R2_SAVE): Define.
	(__morestack): Save and restore r2.  Set up r2 for called
	functions.
2021-07-15 15:27:09 +09:30
Richard Biener
4f3b383cf8 driver/101383 - handle -gtoggle in driver
The driver amends assembler options with for example --gdwarf-5
when debugging is enabled but the check for that does not consider
the effect of -gtoggle which is not handled in the common option
machinery.  The following alters debug_info_level according to
-gtoggle mimicing what process_options later does in the compiler.

This in particular avoids changing of the cc1-checksum with every
bootstrap (debug) cycle as we compute that from stage2 where we
use -g -gtoggle but with --gdwarf-5 and no debug info from the
compiler the assembler will fill the line table with the temporary
assembler file names.

2021-07-09  Richard Biener  <rguenther@suse.de>

	PR driver/101383
	* gcc.c (process_command): Process -gtoggle like process_options
	would after parsing options.
2021-07-15 07:56:08 +02:00
Trevor Saunders
ef3bb641e9 add myself to DCO section
ChangeLog:

	* MAINTAINERS: Add myself to DCO section.

Signed-off-by: Trevor Saunders <tbsaunde@tbsaunde.org>
2021-07-15 01:16:51 -04:00
Trevor Saunders
8d76ff9922 pass location to md_asm_adjust
So the hook can use it as the location of diagnostics.

gcc/ChangeLog:

	* cfgexpand.c (expand_asm_loc): Adjust.
	(expand_asm_stmt): Likewise.
	* config/arm/aarch-common-protos.h (arm_md_asm_adjust): Likewise.
	* config/arm/aarch-common.c (arm_md_asm_adjust): Likewise.
	* config/arm/arm.c (thumb1_md_asm_adjust): Likewise.
	* config/avr/avr.c (avr_md_asm_adjust): Likewise.
	* config/cris/cris.c (cris_md_asm_adjust): Likewise.
	* config/i386/i386.c (ix86_md_asm_adjust): Likewise.
	* config/mn10300/mn10300.c (mn10300_md_asm_adjust): Likewise.
	* config/nds32/nds32.c (nds32_md_asm_adjust): Likewise.
	* config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise.
	* config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise.
	* config/s390/s390.c (s390_md_asm_adjust): Likewise.
	* config/vax/vax.c (vax_md_asm_adjust): Likewise.
	* config/visium/visium.c (visium_md_asm_adjust): Likewise.
	* doc/tm.texi: Regenerate.
	* target.def: Add location argument to md_asm_adjust.

Signed-off-by: Trevor Saunders <tbsaunde@tbsaunde.org>
2021-07-15 01:10:47 -04:00
Trevor Saunders
329769b720 use diagnostic location in diagnostic_report_current_function
It appears that input_location was used here before the diagnostic's location
was available, and never updated, when the other part of the header was added
that uses it, so this makes it consistent.

gcc/ChangeLog:

	* tree-diagnostic.c (diagnostic_report_current_function): Use the
	diagnostic's location, not input_location.

Signed-off-by: Trevor Saunders <tbsaunde@tbsaunde.org>
2021-07-15 01:10:34 -04:00
Trevor Saunders
28ca844641 use error_at and warning_at in cfgexpand.c
gcc/ChangeLog:

	* cfgexpand.c (tree_conflicts_with_clobbers_p): Pass location to
	diagnostics.
	(expand_asm_stmt): Likewise.

Signed-off-by: Trevor Saunders <tbsaunde@tbsaunde.org>
2021-07-15 01:10:19 -04:00
Jason Merrill
0b7a11874d c++: fix tree_contains_struct for C++ types [PR101095]
Many of the types from cp-tree.def were only marked as having tree_common,
when actually most of them have type_non_common.  This broke
g++.dg/modules/xtreme-header-2, as the modules code relies on
tree_contains_struct to know what bits it needs to stream.

We don't seem to use type_non_common for TYPE_ARGUMENT_PACK, so I bumped it
down to TS_TYPE_COMMON.  I tried doing the same in cp_tree_size, but that
breaks without more extensive changes to tree_node_structure.

Why do we need the init_ts function anyway?  It seems redundant with
tree_node_structure.

	PR c++/101095

gcc/cp/ChangeLog:

	* cp-objcp-common.c (cp_common_init_ts): Mark types as types.
	(cp_tree_size): Remove redundant entries.
2021-07-14 23:18:14 -04:00
GCC Administrator
c4fee1c646 Daily bump. 2021-07-15 00:16:54 +00:00
Peter Bergner
69feb7601e rs6000: Generate an lxvp instead of two adjacent lxv instructions
The MMA build built-ins currently use individual lxv instructions to
load up the registers of a __vector_pair or __vector_quad.  If the
memory addresses of the built-in operands are to adjacent locations,
then we can use an lxvp in some cases to load up two registers at once.
The patch below adds support for checking whether memory addresses are
adjacent and emitting an lxvp instead of two lxv instructions.

2021-07-14  Peter Bergner  <bergner@linux.ibm.com>

gcc/
	* config/rs6000/rs6000.c (adjacent_mem_locations): Return the lower
	addressed memory rtx, if any.
	(rs6000_split_multireg_move): Fix code formatting.
	Handle MMA build built-ins with operands in adjacent memory locations.

gcc/testsuite/
	* gcc.target/powerpc/mma-builtin-9.c: New test.
2021-07-14 18:27:02 -05:00
Peter Bergner
7d914777fc rs6000: Move rs6000_split_multireg_move to later in file
An upcoming change to rs6000_split_multireg_move requires it to be
moved later in the file to fix a declaration issue.

2021-07-14  Peter Bergner  <bergner@linux.ibm.com>

gcc/
	* config/rs6000/rs6000.c (rs6000_split_multireg_move): Move to later
	in the file.
2021-07-14 18:23:31 -05:00
Patrick Palka
bebd8e9da8 c++: CTAD and forwarding references [PR88252]
Here during CTAD we're incorrectly treating T&& as a forwarding
reference even though T is a template parameter of the class template.

This happens because the template parameter T in the out-of-line
definition of the constructor doesn't have the flag
TEMPLATE_TYPE_PARM_FOR_CLASS set, and during duplicate_decls the
the redeclaration (which is in terms of this unflagged T) prevails.
To fix this, we could perhaps be more consistent about setting the flag,
but it appears we don't really need this flag to make the determination.

Since the template parameters of an synthesized guide consist of the
template parameters of the class template followed by those of the
constructor (if any), it should suffice to look at the index of the
template parameter to determine whether it comes from the class
template or the constructor (template).  This patch replaces the
TEMPLATE_TYPE_PARM_FOR_CLASS flag with this approach.

	PR c++/88252

gcc/cp/ChangeLog:

	* cp-tree.h (TEMPLATE_TYPE_PARM_FOR_CLASS): Remove.
	* pt.c (push_template_decl): Remove TEMPLATE_TYPE_PARM_FOR_CLASS
	handling.
	(redeclare_class_template): Likewise.
	(forwarding_reference_p): Define.
	(maybe_adjust_types_for_deduction): Use it instead.  Add 'tparms'
	parameter.
	(unify_one_argument): Pass tparms to
	maybe_adjust_types_for_deduction.
	(try_one_overload): Likewise.
	(unify): Likewise.
	(rewrite_template_parm): Remove TEMPLATE_TYPE_PARM_FOR_CLASS
	handling.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1z/class-deduction96.C: New test.
2021-07-14 15:37:30 -04:00
Jason Merrill
91bb571d20 vec: use auto_vec in a few more places
The uses of vec<T> in get_all_loop_exits and process_conditional were memory
leaks, as .release() was never called for them.  The other changes are some
cases that did have proper release handling, but it's simpler to leave
releasing to the auto_vec destructor.

gcc/ChangeLog:

	* sel-sched-ir.h (get_all_loop_exits): Use auto_vec.

gcc/cp/ChangeLog:

	* class.c (struct find_final_overrider_data): Use auto_vec.
	(find_final_overrider): Remove explicit release.
	* coroutines.cc (process_conditional): Use auto_vec.
	* cp-gimplify.c (struct cp_genericize_data): Use auto_vec.
	(cp_genericize_tree): Remove explicit release.
	* parser.c (cp_parser_objc_at_property_declaration): Use
	auto_delete_vec.
	* semantics.c (omp_reduction_lookup): Use auto_vec.
2021-07-14 15:01:27 -04:00
Jason Merrill
b15e301748 c++: enable -fdelete-dead-exceptions by default
As I was discussing with richi, I don't think it makes sense to protect
calls to pure/const functions from DCE just because they aren't explicitly
declared noexcept.  PR100382 indicates that there are different
considerations for Go, which has non-call exceptions.  But still turn the
flag off for that specific testcase.

gcc/c-family/ChangeLog:

	* c-opts.c (c_common_post_options): Set -fdelete-dead-exceptions.

gcc/ChangeLog:

	* doc/invoke.texi: -fdelete-dead-exceptions is on by default for
	C++.

gcc/testsuite/ChangeLog:

	* g++.dg/torture/pr100382.C: Pass -fno-delete-dead-exceptions.
2021-07-14 14:59:56 -04:00
Tamar Christina
4940166a15 Vect: correct rebase issue
The lines being removed have been updated and merged into a new
condition.  But when resolving some conflicts I accidentally
reintroduced them causing some test failes.

This removes them.

Committed as the changes were previously approved in
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574977.html
but the hunk was misapplied during a rebase.

gcc/ChangeLog:

	* tree-vect-patterns.c (vect_recog_dot_prod_pattern):
	Remove erroneous line.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-reduc-dot-11.c: Expect pass.
	* gcc.dg/vect/vect-reduc-dot-15.c: Likewise.
	* gcc.dg/vect/vect-reduc-dot-19.c: Likewise.
	* gcc.dg/vect/vect-reduc-dot-21.c: Likewise.
2021-07-14 19:00:59 +01:00
Andrew MacLeod
398572c154 Turn hybrid mode off, default to ranger-only mode for EVRP.
Change the default EVRP mode to ranger-only.

	gcc/
	* params.opt (param_evrp_mode): Change default.

	gcc/testsuite/
	* gcc.dg/pr80776-1.c: Remove xfail.
2021-07-14 12:47:10 -04:00
Marek Polacek
a42f812044 c++: constexpr array reference and value-initialization [PR101371]
This PR gave me a hard time: I saw multiple issues starting with
different revisions.  But ultimately the root cause seems to be
the following, and the attached patch fixes all issues I've found
here.

In cxx_eval_array_reference we create a new constexpr context for the
CP_AGGREGATE_TYPE_P case, but we also have to create it for the
non-aggregate case.  In this test, we are evaluating

  ((B *)this)->a = rhs->a

which means that we set ctx.object to ((B *)this)->a.  Then we proceed
to evaluate the initializer, rhs->a.  For *rhs, we eval rhs, a PARM_DECL,
for which we have (const B &) &c.arr[0] in the hash table.  Then
cxx_fold_indirect_ref gives us c.arr[0].  c is evaluated to {.arr={}} so
c.arr is {}.  Now we want c.arr[0], so we end up in cxx_eval_array_reference
and since we're initializing from {}, we call build_value_init which
gives us an AGGR_INIT_EXPR that calls 'constexpr B::B()'.  Then we
evaluate this AGGR_INIT_EXPR and since its first argument is dummy,
we take ctx.object instead.  But that is the wrong object, we're not
initializing ((B *)this)->a here.  And so we wound up with an
initializer for A, and then crash in cxx_eval_component_reference:

  gcc_assert (DECL_CONTEXT (part) == TYPE_MAIN_VARIANT (TREE_TYPE (whole)));

where DECL_CONTEXT (part) is B (as it should be) but the type of whole
was A.

So create a new object, if there already was one, and the element type
is not a scalar.

	PR c++/101371

gcc/cp/ChangeLog:

	* constexpr.c (cxx_eval_array_reference): Create a new .object
	and .ctor for the non-aggregate non-scalar case too when
	value-initializing.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1y/constexpr-101371-2.C: New test.
	* g++.dg/cpp1y/constexpr-101371.C: New test.
2021-07-14 11:54:07 -04:00
Harald Anlauf
269ca408e2 Fortran - ICE in gfc_conv_expr_present initializing non-dummy class variable
gcc/fortran/ChangeLog:

	PR fortran/100949
	* trans-expr.c (gfc_trans_class_init_assign): Call
	gfc_conv_expr_present only for dummy variables.

gcc/testsuite/ChangeLog:

	PR fortran/100949
	* gfortran.dg/pr100949.f90: New test.
2021-07-14 17:25:29 +02:00
Tamar Christina
6d1cdb2782 AArch64: Correct dot-product auto-vect optab RTL
The current RTL for the vectorizer patterns for dot-product are incorrect.
Operand3 isn't an output parameter so we can't write to it.

This fixes this issue and reduces the number of RTL.

gcc/ChangeLog:

	* config/aarch64/aarch64-simd-builtins.def (udot, sdot): Rename to...
	(sdot_prod, udot_prod): ...These.
	* config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Remove.
	(aarch64_<sur>dot<vsi2qi>): Rename to...
	(<sur>dot_prod<vsi2qi>): ...This.
	* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32, vdotq_s32):
	Update builtins.
2021-07-14 15:41:31 +01:00
Tamar Christina
c9165e2d58 AArch32: Correct sdot RTL on aarch32
The RTL Generated from <sup>dot_prod<vsi2qi> is invalid as operand3 cannot be
written to, it's a normal input.  For the expand it's just another operand
but the caller does not expect it to be written to.

gcc/ChangeLog:

	* config/arm/neon.md (<sup>dot_prod<vsi2qi>): Drop statements.
2021-07-14 15:22:37 +01:00
Tamar Christina
1e0ab1c4ba middle-end: Add tests middle end generic tests for sign differing dotproduct.
This adds testcases to test for auto-vect detection of the new sign differing
dot product.

gcc/ChangeLog:

	* doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp
	(check_effective_target_arm_v8_2a_imm8_neon_ok_nocache,
	check_effective_target_arm_v8_2a_i8mm_neon_hw,
	check_effective_target_vect_usdot_qi): New.
	* gcc.dg/vect/vect-reduc-dot-9.c: New test.
	* gcc.dg/vect/vect-reduc-dot-10.c: New test.
	* gcc.dg/vect/vect-reduc-dot-11.c: New test.
	* gcc.dg/vect/vect-reduc-dot-12.c: New test.
	* gcc.dg/vect/vect-reduc-dot-13.c: New test.
	* gcc.dg/vect/vect-reduc-dot-14.c: New test.
	* gcc.dg/vect/vect-reduc-dot-15.c: New test.
	* gcc.dg/vect/vect-reduc-dot-16.c: New test.
	* gcc.dg/vect/vect-reduc-dot-17.c: New test.
	* gcc.dg/vect/vect-reduc-dot-18.c: New test.
	* gcc.dg/vect/vect-reduc-dot-19.c: New test.
	* gcc.dg/vect/vect-reduc-dot-20.c: New test.
	* gcc.dg/vect/vect-reduc-dot-21.c: New test.
	* gcc.dg/vect/vect-reduc-dot-22.c: New test.
2021-07-14 15:21:40 +01:00
Tamar Christina
6412c58c78 AArch32: Add support for sign differing dot-product usdot for NEON.
This adds optabs implementing usdot_prod.

The following testcase:

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

Generates

f:
        vmov.i32        q8, #0  @ v4si
        add     r3, r2, #480
.L2:
        vld1.8  {q10}, [r2]!
        vld1.8  {q9}, [r1]!
        vusdot.s8       q8, q9, q10
        cmp     r3, r2
        bne     .L2
        vadd.i32        d16, d16, d17
        vpadd.i32       d16, d16, d16
        vmov.32 r3, d16[0]
        add     r0, r0, r3
        bx      lr

instead of

f:
        vmov.i32        q8, #0  @ v4si
        add     r3, r2, #480
.L2:
        vld1.8  {q9}, [r2]!
        vld1.8  {q11}, [r1]!
        cmp     r3, r2
        vmull.s8 q10, d18, d22
        vmull.s8 q9, d19, d23
        vaddw.s16       q8, q8, d20
        vaddw.s16       q8, q8, d21
        vaddw.s16       q8, q8, d18
        vaddw.s16       q8, q8, d19
        bne     .L2
        vadd.i32        d16, d16, d17
        vpadd.i32       d16, d16, d16
        vmov.32 r3, d16[0]
        add     r0, r0, r3
        bx      lr

For NEON.  I couldn't figure out if the MVE instruction vmlaldav.s16 could be
used to emulate this.  Because it would require additional widening to work I
left MVE out of this patch set but perhaps someone should take a look.

gcc/ChangeLog:

	* config/arm/neon.md (usdot_prod<vsi2qi>): New.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/simd/vusdot-autovec.c: New test.
2021-07-14 15:20:45 +01:00
Tamar Christina
752045ed1e AArch64: Add support for sign differing dot-product usdot for NEON and SVE.
Hi All,

This adds optabs implementing usdot_prod.

The following testcase:

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

Generates for NEON

f:
        movi    v0.4s, 0
        mov     x3, 0
        .p2align 3,,7
.L2:
        ldr     q1, [x2, x3]
        ldr     q2, [x1, x3]
        usdot   v0.4s, v1.16b, v2.16b
        add     x3, x3, 16
        cmp     x3, 480
        bne     .L2
        addv    s0, v0.4s
        fmov    w1, s0
        add     w0, w0, w1
        ret

and for SVE

f:
        mov     x3, 0
        cntb    x5
        mov     w4, 480
        mov     z1.b, #0
        whilelo p0.b, wzr, w4
        mov     z3.b, #0
        ptrue   p1.b, all
        .p2align 3,,7
.L2:
        ld1b    z2.b, p0/z, [x1, x3]
        ld1b    z0.b, p0/z, [x2, x3]
        add     x3, x3, x5
        sel     z0.b, p0, z0.b, z3.b
        whilelo p0.b, w3, w4
        usdot   z1.s, z0.b, z2.b
        b.any   .L2
        uaddv   d0, p1, z1.s
        fmov    x1, d0
        add     w0, w0, w1
        ret

instead of

f:
        movi    v0.4s, 0
        mov     x3, 0
        .p2align 3,,7
.L2:
        ldr     q2, [x1, x3]
        ldr     q1, [x2, x3]
        add     x3, x3, 16
        sxtl    v4.8h, v2.8b
        sxtl2   v3.8h, v2.16b
        uxtl    v2.8h, v1.8b
        uxtl2   v1.8h, v1.16b
        mul     v2.8h, v2.8h, v4.8h
        mul     v1.8h, v1.8h, v3.8h
        saddw   v0.4s, v0.4s, v2.4h
        saddw2  v0.4s, v0.4s, v2.8h
        saddw   v0.4s, v0.4s, v1.4h
        saddw2  v0.4s, v0.4s, v1.8h
        cmp     x3, 480
        bne     .L2
        addv    s0, v0.4s
        fmov    w1, s0
        add     w0, w0, w1
        ret

and

f:
        mov     x3, 0
        cnth    x5
        mov     w4, 480
        mov     z1.b, #0
        whilelo p0.h, wzr, w4
        ptrue   p2.b, all
        .p2align 3,,7
.L2:
        ld1sb   z2.h, p0/z, [x1, x3]
        punpklo p1.h, p0.b
        ld1b    z0.h, p0/z, [x2, x3]
        add     x3, x3, x5
        mul     z0.h, p2/m, z0.h, z2.h
        sunpklo z2.s, z0.h
        sunpkhi z0.s, z0.h
        add     z1.s, p1/m, z1.s, z2.s
        punpkhi p1.h, p0.b
        whilelo p0.h, w3, w4
        add     z1.s, p1/m, z1.s, z0.s
        b.any   .L2
        uaddv   d0, p2, z1.s
        fmov    x1, d0
        add     w0, w0, w1
        ret

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (aarch64_usdot<vsi2qi>): Rename to...
	(usdot_prod<vsi2qi>): ... This.
	* config/aarch64/aarch64-simd-builtins.def (usdot): Rename to...
	(usdot_prod): ...This.
	* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Likewise.
	* config/aarch64/aarch64-sve.md (@aarch64_<sur>dot_prod<vsi2qi>):
	Rename to...
	(@<sur>dot_prod<vsi2qi>): ...This.
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svusdot_impl::expand): Use it.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/simd/vusdot-autovec.c: New test.
	* gcc.target/aarch64/sve/vusdot-autovec.c: New test.
2021-07-14 15:19:32 +01:00
Tamar Christina
ab0a6b213a Vect: Add support for dot-product where the sign for the multiplicant changes.
This patch adds support for a dot product where the sign of the multiplication
arguments differ. i.e. one is signed and one is unsigned but the precisions are
the same.

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

The operations are performed as if the operands were extended to a 32-bit value.
As such this operation isn't valid if there is an intermediate conversion to an
unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.

more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped the same
optab is used but the operands are flipped in the optab expansion.

To support this the patch extends the dot-product detection to optionally
ignore operands with different signs and stores this information in the optab
subtype which is now made a bitfield.

The subtype can now additionally controls which optab an EXPR can expand to.

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it and clarify other dot prod optabs.
	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
2021-07-14 14:54:26 +01:00
H.J. Lu
cc11b924bf x86: Don't enable UINTR in 32-bit mode
UINTR is available only in 64-bit mode.  Since the codegen target is
unknown when the the gcc driver is processing -march=native, to properly
handle UINTR for -march=native:

1. Pass "arch [32|64]" and "tune [32|64]" to host_detect_local_cpu to
indicate 32-bit and 64-bit codegen.
2. Change ix86_option_override_internal to enable UINTR only in 64-bit
mode for -march=CPU when PTA_CPU includes PTA_UINTR.

gcc/

	PR target/101395
	* config/i386/driver-i386.c (host_detect_local_cpu): Check
	"arch [32|64]" and "tune [32|64]" for 32-bit and 64-bit codegen.
	Enable UINTR only for 64-bit codegen.
	* config/i386/i386-options.c
	(ix86_option_override_internal::DEF_PTA): Skip PTA_UINTR if not
	in 64-bit mode.
	* config/i386/i386.h (ARCH_ARG): New.
	(CC1_CPU_SPEC): Pass "[arch|tune] 32" for 32-bit codegen and
	"[arch|tune] 64" for 64-bit codegen.

gcc/testsuite/

	PR target/101395
	* gcc.target/i386/pr101395-1.c: New test.
	* gcc.target/i386/pr101395-2.c: Likewise.
	* gcc.target/i386/pr101395-3.c: Likewise.
2021-07-14 05:14:31 -07:00
Jonathan Wakely
f9c2ce1dae libstdc++: Add noexcept-specifier to basic_string_view(It, End)
This adds a conditional noexcept to the C++20 constructor. The
std::to_address call cannot throw, so only taking the difference of the
two iterators can throw.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/std/string_view (basic_string_view(It, End)): Add
	noexcept-specifier.
	* testsuite/21_strings/basic_string_view/cons/char/range.cc:
	Check noexcept-specifier. Also check construction without CTAD.
2021-07-14 12:23:33 +01:00
Richard Biener
a967a3efd3 tree-optimization/101445 - fix negative stride SLP vect with gaps
The following fixes the IV adjustment for the gap in a negative
stride SLP vectorization.  The adjustment was in the wrong direction,
now fixes as in the patch.

2021-07-14  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101445
	* tree-vect-stmts.c (vectorizable_load): Do the gap adjustment
	of the IV in the correct direction for negative stride
	accesses.

	* gcc.dg/vect/pr101445.c: New testcase.
2021-07-14 12:31:42 +02:00
Jakub Jelinek
3be762c2ed godump: Fix -fdump-go-spec= reproduceability issue [PR101407]
pot_dummy_types is a hash_set from whose traversal the code prints some type
lines.  hash_set normally uses default_hash_traits which for pointer types
(the hash set hashes const char *) uses pointer_hash which hashes the
addresses of the pointers except of the least significant 3 bits.
With address space randomization, that results in non-determinism in the
-fdump-go-specs= generated file, each invocation can have different order of
the lines emitted from pot_dummy_types traversal.

This patch fixes it by hashing the string contents instead to make the
hashes reproduceable.

2021-07-14  Jakub Jelinek  <jakub@redhat.com>

	PR go/101407
	* godump.c (godump_str_hash): New type.
	(godump_container::pot_dummy_types): Use string_hash instead of
	ptr_hash in the hash_set.
2021-07-14 10:22:50 +02:00
Richard Biener
1dd3f21095 Support reduction def re-use for epilogue with different vector size
The following adds support for re-using the vector reduction def
from the main loop in vectorized epilogue loops on architectures
which use different vector sizes for the epilogue.  That's only
x86 as far as I am aware.

2021-07-13  Richard Biener  <rguenther@suse.de>

	* tree-vect-loop.c (vect_find_reusable_accumulator): Handle
	vector types where the old vector type has a multiple of
	the new vector type elements.
	(vect_create_partial_epilog): New function, split out from...
	(vect_create_epilog_for_reduction): ... here.
	(vect_transform_cycle_phi): Reduce the re-used accumulator
	to the new vector type.

	* gcc.target/i386/vect-reduc-1.c: New testcase.
2021-07-14 08:15:17 +02:00
Alexandre Oliva
a7098d6ef4 fix typo in attr_fnspec::verify
Odd-numbered indices describing argument access sizes in the fnspec
string can only hold 't' or a digit, as tested in the beginning of the
case.  When checking that the size-supplying argument does not have
additional information associated with it, the test that excludes the
't' possibility looks for it at the even position in the fnspec
string.  Oops.

This might yield false positives and negatives if a function has a
fnspec in which an argument uses a 't' access-size, and ('t' - '1')
happens to be the index of an argument described in an fnspec string.
Assuming ASCII encoding, it would take a function with at least 68
arguments described in fnspec.  Still, probably worth fixing.


for  gcc/ChangeLog

	* tree-ssa-alias.c (attr_fnspec::verify): Fix index in
	non-'t'-sized arg check.
2021-07-13 22:28:25 -03:00
Alexandre Oliva
66907e7399 adjust landing pads when changing main label
If an artificial label created for a landing pad ends up being
dropped in favor of a user-supplied label, the user-supplied label
inherits the landing pad index, but the post_landing_pad field is not
adjusted to point to the new label.

This patch fixes the problem, and adds verification that we don't
remove a label that's still used as a landing pad.

The circumstance in which this problem can be hit was unusual: removal
of a block with an unreachable label moves the label to some other
unrelated block, in case its address is taken.  In the case at hand
(pr42739.C, complicated by wrappers and cleanups), the chosen block
happened to be an EH landing pad.  (A followup patch will change that.)


for  gcc/ChangeLog

	* tree-cfg.c (cleanup_dead_labels_eh): Update
	post_landing_pad label upon change of landing pad block's
	primary label.
	(cleanup_dead_labels): Check that a removed label is not that
	of a landing pad.
2021-07-13 22:25:54 -03:00
GCC Administrator
0e7754560f Daily bump. 2021-07-14 00:16:44 +00:00
Jonathan Wright
8695bf78da gcc: Add vec_select -> subreg RTL simplification
Add a new RTL simplification for the case of a VEC_SELECT selecting
the low part of a vector. The simplification returns a SUBREG.

The primary goal of this patch is to enable better combinations of
Neon RTL patterns - specifically allowing generation of 'write-to-
high-half' narrowing intructions.

Adding this RTL simplification means that the expected results for a
number of tests need to be updated:
* aarch64 Neon: Update the scan-assembler regex for intrinsics tests
  to expect a scalar register instead of lane 0 of a vector.
* aarch64 SVE: Likewise.
* arm MVE: Use lane 1 instead of lane 0 for lane-extraction
  intrinsics tests (as the move instructions get optimized away for
  lane 0.)

This patch also adds new code generation tests to
narrow_high_combine.c to verify the benefit of this RTL
simplification.

gcc/ChangeLog:

2021-06-08  Jonathan Wright  <jonathan.wright@arm.com>

	* combine.c (combine_simplify_rtx): Add vec_select -> subreg
	simplification.
	* config/aarch64/aarch64.md (*zero_extend<SHORT:mode><GPI:mode>2_aarch64):
	Add Neon to general purpose register case for zero-extend
	pattern.
	* config/arm/vfp.md (*arm_movsi_vfp): Remove "*" from *t -> r
	case to prevent some cases opting to go through memory.
	* cse.c (fold_rtx): Add vec_select -> subreg simplification.
	* rtl.c (rtvec_series_p): Define predicate to determine
	whether a vector contains a linear series of integers.
	* rtl.h (rtvec_series_p): Define.
	* rtlanal.c (vec_series_lowpart_p): Define predicate to
	determine if a vector selection is equivalent to the low part
	of the vector.
	* rtlanal.h (vec_series_lowpart_p): Define.
	* simplify-rtx.c (simplify_context::simplify_binary_operation_1):
	Add vec_select -> subreg simplification.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/extract_zero_extend.c: Remove dump scan
	for RTL pattern match.
	* gcc.target/aarch64/narrow_high_combine.c: Add new tests.
	* gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Update
	scan-assembler regex to look for a scalar register instead of
	lane 0 of a vector.
	* gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: Likewise.
	* gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: Likewise.
	* gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: Likewise.
	* gcc.target/aarch64/simd/vqdmlalh_lane_s16.c: Likewise.
	* gcc.target/aarch64/simd/vqdmlals_lane_s32.c: Likewise.
	* gcc.target/aarch64/simd/vqdmlslh_lane_s16.c: Likewise.
	* gcc.target/aarch64/simd/vqdmlsls_lane_s32.c: Likewise.
	* gcc.target/aarch64/simd/vqdmullh_lane_s16.c: Likewise.
	* gcc.target/aarch64/simd/vqdmullh_laneq_s16.c: Likewise.
	* gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Likewise.
	* gcc.target/aarch64/simd/vqdmulls_laneq_s32.c: Likewise.
	* gcc.target/aarch64/sve/dup_lane_1.c: Likewise.
	* gcc.target/aarch64/sve/extract_1.c: Likewise.
	* gcc.target/aarch64/sve/extract_2.c: Likewise.
	* gcc.target/aarch64/sve/extract_3.c: Likewise.
	* gcc.target/aarch64/sve/extract_4.c: Likewise.
	* gcc.target/aarch64/sve/live_1.c: Update scan-assembler regex
	cases to look for 'b' and 'h' registers instead of 'w'.
	* gcc.target/arm/crypto-vsha1cq_u32.c: Update scan-assembler
	regex to reflect lane 0 vector extractions being simplified
	to scalar register moves.
	* gcc.target/arm/crypto-vsha1h_u32.c: Likewise.
	* gcc.target/arm/crypto-vsha1mq_u32.c: Likewise.
	* gcc.target/arm/crypto-vsha1pq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: Extract
	lane 1 as the moves for lane 0 now get optimized away.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise.
2021-07-13 21:02:58 +01:00
Paul A. Clarke
60aee15bb7 rs6000: Add tests for SSE4.1 "test" intrinsics
Copy the test for _mm_testz_si128, _mm_testc_si128,
_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
_mm_test_mix_ones_zeros from gcc/testsuite/gcc.target/i386.

2021-07-13  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
	* gcc.target/powerpc/sse4_1-ptest-1.c: Copy from
	gcc/testsuite/gcc.target/i386.
2021-07-13 13:50:24 -05:00
Paul A. Clarke
acd4b9103c rs6000: Add support for SSE4.1 "test" intrinsics
2021-07-13  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_testz_si128, _mm_testc_si128,
	_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
	_mm_test_mix_ones_zeros): New.
2021-07-13 13:46:34 -05:00
Jonathan Wakely
4d3eaeb4f5 libstdc++: Simplify basic_string_view::ends_with [PR 101361]
The use of npos triggers a diagnostic as described in PR c++/101361.
This change replaces the use of npos with the exact length, which is
already known. We can further simplify it by inlining the effects of
compare and substr, avoiding the redundant range checks in the latter.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	PR c++/101361
	* include/std/string_view (ends_with): Use traits_type::compare
	directly.
2021-07-13 15:21:26 +01:00
Andrew MacLeod
f75560398a Adjust testcase to test the call is removed.
Ranger now handles the test.

	gcc/testsuite
	PR tree-optimization/93781
	* gcc.dg/tree-ssa/pr93781-1.c: Check that call is removed.
2021-07-13 09:43:18 -04:00
Roger Sayle
9aa5001ef4 Make gimple_could_trap_p const-safe.
Allow gimple_could_trap_p (which previously took a non-const gimple)
to be called from functions that take a const gimple (such as
gimple_has_side_effects), and update its prototypes.  Pre-approved
as obvious.

2021-07-13  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	* gimple.c (gimple_could_trap_p_1):  Make S argument a
	"const gimple*".  Preserve constness in call to
	gimple_asm_volatile_p.
	(gimple_could_trap_p): Make S argument a "const gimple*".
	* gimple.h (gimple_could_trap_p_1, gimple_could_trap_p):
	Update function prototypes.
2021-07-13 14:01:41 +01:00
Jonathan Wakely
bd1eb556b9 libstdc++: Remove duplicate #include in <string_view>
When I added the new C++23 constructor I added a conditional include of
<bits/ranges_base.h>, which was already being included unconditionally.
This removes the unconditional include but changes the condition for the
other one, so it's used for C++20 as well.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/std/string_view: Only include <bits/ranges_base.h>
	once, and only for C++20 and later.
2021-07-13 12:09:37 +01:00
Richard Sandiford
1583b8bff0 vect: Reuse reduction accumulators between loops
This patch adds support for reusing a main loop's reduction accumulator
in an epilogue loop.  This in turn lets the loops share a single piece
of vector->scalar reduction code.

The patch has the following restrictions:

(1) The epilogue reduction can only operate on a single vector
    (e.g. ncopies must be 1 for non-SLP reductions, and the group size
    must be <= the element count for SLP reductions).

(2) Both loops must use the same vector mode for their accumulators.
    This means that the patch is restricted to targets that support
    --param vect-partial-vector-usage=1.

(3) The reduction must be a standard “tree code” reduction.

However, these restrictions could be lifted in future.  For example,
if the main loop operates on 128-bit vectors and the epilogue loop
operates on 64-bit vectors, we could in future reduce the 128-bit
vector by one stage and use the 64-bit result as the starting point
for the epilogue result.

The patch tries to handle chained SLP reductions, unchained SLP
reductions and non-SLP reductions.  It also handles cases in which
the epilogue loop is entered directly (rather than via the main loop)
and cases in which the epilogue loop can be skipped.

vect_get_main_loop_result is a bit more general than the current
patch needs.

gcc/
	* tree-vectorizer.h (vect_reusable_accumulator): New structure.
	(_loop_vec_info::main_loop_edge): New field.
	(_loop_vec_info::skip_main_loop_edge): Likewise.
	(_loop_vec_info::skip_this_loop_edge): Likewise.
	(_loop_vec_info::reusable_accumulators): Likewise.
	(_stmt_vec_info::reduc_scalar_results): Likewise.
	(_stmt_vec_info::reused_accumulator): Likewise.
	(vect_get_main_loop_result): Declare.
	* tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize
	reduc_scalar_inputs.
	(vec_info::free_stmt_vec_info): Free reduc_scalar_inputs.
	* tree-vect-loop-manip.c (vect_get_main_loop_result): New function.
	(vect_do_peeling): Fill an epilogue loop's main_loop_edge,
	skip_main_loop_edge and skip_this_loop_edge fields.
	* tree-vect-loop.c (INCLUDE_ALGORITHM): Define.
	(vect_emit_reduction_init_stmts): New function.
	(get_initial_def_for_reduction): Use it.
	(get_initial_defs_for_reduction): Likewise.  Change the vinfo
	parameter to a loop_vec_info.
	(vect_create_epilog_for_reduction): Store the scalar results
	in the reduc_info.  If an epilogue loop is reusing an accumulator
	from the main loop, and if the epilogue loop can also be skipped,
	try to place the reduction code in the join block.  Record
	accumulators that could potentially be reused by epilogue loops.
	(vect_transform_cycle_phi): When vectorizing epilogue loops,
	try to reuse accumulators from the main loop.  Record the initial
	value in reduc_info for non-SLP reductions too.

gcc/testsuite/
	* gcc.target/aarch64/sve/reduc_9.c: New test.
	* gcc.target/aarch64/sve/reduc_9_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_10.c: Likewise.
	* gcc.target/aarch64/sve/reduc_10_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_11.c: Likewise.
	* gcc.target/aarch64/sve/reduc_11_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_12.c: Likewise.
	* gcc.target/aarch64/sve/reduc_12_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_13.c: Likewise.
	* gcc.target/aarch64/sve/reduc_13_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_14.c: Likewise.
	* gcc.target/aarch64/sve/reduc_14_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_15.c: Likewise.
	* gcc.target/aarch64/sve/reduc_15_run.c: Likewise.
2021-07-13 10:17:43 +01:00
Richard Sandiford
7670b6633e vect: Simplify get_initial_def_for_reduction
After previous patches, we can now easily provide the neutral op
as an argument to get_initial_def_for_reduction.  This in turn
allows the adjustment calculation to be moved outside of
get_initial_def_for_reduction, which is the main motivation
of the patch.

gcc/
	* tree-vect-loop.c (get_initial_def_for_reduction): Remove
	adjustment handling.  Take the neutral value as an argument,
	in place of the code argument.
	(vect_transform_cycle_phi): Update accordingly.  Handle the
	initial values of cond reductions separately from code reductions.
	Choose the adjustment here rather than in
	get_initial_def_for_reduction.  Sink the splat of vec_initial_def.
2021-07-13 10:17:42 +01:00
Richard Sandiford
221bdb333b vect: Generalise neutral_op_for_slp_reduction
This patch generalises the interface to neutral_op_for_slp_reduction
so that it can be used for non-SLP reductions too.  This isn't much
of a win on its own, but it helps later patches.

gcc/
	* tree-vect-loop.c (neutral_op_for_slp_reduction): Replace with...
	(neutral_op_for_reduction): ...this, providing a more general
	interface.
	(vect_create_epilog_for_reduction): Update accordingly.
	(vectorizable_reduction): Likewise.
	(vect_transform_cycle_phi): Likewise.
2021-07-13 10:17:41 +01:00
Richard Sandiford
bd5a69191f vect: Pass reduc_info to get_initial_def_for_reduction
Similarly to the previous patch, this one passes the reduc_info
to get_initial_def_for_reduction, rather than a stmt_vec_info that
lacks the metadata.  This again becomes useful later.

gcc/
	* tree-vect-loop.c (get_initial_def_for_reduction): Take the
	reduc_info instead of the original stmt_vec_info.
	(vect_transform_cycle_phi): Update accordingly.
2021-07-13 10:17:40 +01:00
Richard Sandiford
826c452e57 vect: Pass reduc_info to get_initial_defs_for_reduction
This patch passes the reduc_info to get_initial_defs_for_reduction,
so that the function can get general information from there rather
than from the first SLP statement.  This isn't a win on its own,
but it becomes important with later patches.

gcc/
	* tree-vect-loop.c (get_initial_defs_for_reduction): Take the
	reduc_info as an additional parameter.
	(vect_transform_cycle_phi): Update accordingly.
2021-07-13 10:17:39 +01:00
Richard Sandiford
d592920c89 vect: Add a vect_phi_initial_value helper function
This patch adds a helper function called vect_phi_initial_value
for returning the incoming value of a given loop phi.  The main
reason for adding it is to ensure that the right preheader edge
is used when vectorising nested loops.  (PHI_ARG_DEF_FROM_EDGE
itself doesn't assert that the given edge is for the right block,
although I guess that would be good to add separately.)

gcc/
	* tree-vectorizer.h: Include tree-ssa-operands.h.
	(vect_phi_initial_value): New function.
	* tree-vect-loop.c (neutral_op_for_slp_reduction): Use it.
	(get_initial_defs_for_reduction, info_for_reduction): Likewise.
	(vect_create_epilog_for_reduction, vectorizable_reduction): Likewise.
	(vect_transform_cycle_phi, vectorizable_induction): Likewise.
2021-07-13 10:17:39 +01:00
Richard Sandiford
32b8edd529 vect: Ensure reduc_inputs always have vectype
Vector reduction accumulators can differ in signedness from the
final scalar result.  The conversions to handle that case were
distributed through vect_create_epilog_for_reduction; this patch
does the conversion up-front instead.

gcc/
	* tree-vect-loop.c (vect_create_epilog_for_reduction): Convert
	the phi results to vectype after creating them.  Remove later
	conversion code that thus becomes redundant.
2021-07-13 10:17:38 +01:00
Richard Sandiford
81ad6bfc07 vect: Remove new_phis from vect_create_epilog_for_reduction
vect_create_epilog_for_reduction had a variable called new_phis.
It collected the statements that produce the exit block definitions
of the vector reduction accumulators.  Although those statements
are indeed phis initially, they are often replaced with normal
statements later, leading to puzzling code like:

          FOR_EACH_VEC_ELT (new_phis, i, new_phi)
            {
              int bit_offset;
              if (gimple_code (new_phi) == GIMPLE_PHI)
                vec_temp = PHI_RESULT (new_phi);
              else
                vec_temp = gimple_assign_lhs (new_phi);

Also, although the array collects statements, in practice all users want
the lhs instead.

This patch therefore replaces new_phis with a vector of gimple values
called “reduc_inputs”.

Also, reduction chains and ncopies>1 were handled with identical code
(and there was a comment saying so).  The patch unites them into
a single “if”.

gcc/
	* tree-vect-loop.c (vect_create_epilog_for_reduction): Replace
	the new_phis vector with a reduc_inputs vector.  Combine handling
	of reduction chains and ncopies > 1.
2021-07-13 10:17:37 +01:00
Richard Sandiford
b68eb70bd6 vect: Create array_slice of live-out stmts
This patch constructs an array_slice of the scalar statements that
produce live-out reduction results in the original unvectorised loop.
There are three cases:

- SLP reduction chains: the final SLP stmt is live-out
- full SLP reductions: all SLP stmts are live-out
- non-SLP reductions: the single scalar stmt is live-out

This is a slight simplification on its own, mostly because it maans
“group_size” has a consistent meaning throughout the function.
The main justification though is that it helps with later patches.

gcc/
	* tree-vect-loop.c (vect_create_epilog_for_reduction): Truncate
	scalar_results to group_size elements after reducing down from
	N*group_size elements.  Construct an array_slice of the live-out
	stmts and assert that there is one stmt per scalar result.
2021-07-13 10:17:36 +01:00