OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
Tamar Christina	8e321f2a63	Revert "AArch32: Correct sdot RTL on aarch32" This reverts commit `c9165e2d58`.	2021-07-15 13:16:15 +01:00
Tamar Christina	5402023f05	Revert "AArch64: Correct dot-product auto-vect optab RTL" This reverts commit `6d1cdb2782`.	2021-07-15 13:16:00 +01:00
Jakub Jelinek	f6dde32b9d	gimplify: Fix endless recursion on volatile empty type reads/writes [PR101437] Andrew's recent change to optimize away during gimplification not just assignments of zero sized types, but also assignments of empty types, caused infinite recursion in the gimplifier. If such assignment is optimized away, we gimplify separately the to_p and from_p operands and throw away the result. When gimplifying the operand that is volatile, we run into the gimplifier code below, which has different handling for types with non-BLKmode mode, tries to gimplify those as vol.N = expr, and for BLKmode just throws those away. Zero sized types will always have BLKmode and so are fine, but for the non-BLKmode ones like struct S in the testcase, the vol.N = expr gimplification will reach again the gimplify_modify_expr code, see it is assignment of empty type and will gimplify again vol.N separately (non-volatile, so ok) and expr, on which it will recurse again. The following patch breaks that infinite recursion by ignoring bare volatile loads from empty types. If a volatile load or store for aggregates are supposed to be member-wise loads or stores, then there are no non-padding members in the empty types that should be copied and so it is probably ok. 2021-07-15 Jakub Jelinek <jakub@redhat.com> PR middle-end/101437 * gimplify.c (gimplify_expr): Throw away volatile reads from empty types even if they have non-BLKmode TYPE_MODE. * gcc.c-torture/compile/pr101437.c: New test.	2021-07-15 10:17:06 +02:00
Alan Modra	cd6ca96f5d	[POWER10] __morestack calls from pcrel code Compiling gcc/testsuite/gcc.dg/split-.c and others with -mcpu=power10 and linking with a non-pcrel libgcc results in crashes due to the power10 pcrel code not having r2 set for the generic-morestack.c functions called from __morestack. There is also a problem when non-pcrel code calls a pcrel libgcc. See the patch comments. A similar situation theoretically occurs with ELFv1 multi-toc executables, when __morestack might be located in a different toc group to its caller. This patch makes no attempt to fix that, since the gold linker does not support multi-toc (gold is needed for proper support of -fsplit-stack code) nor does gcc emit __morestack calls that support multi-toc. config/rs6000/morestack.S (R2_SAVE): Define. (__morestack): Save and restore r2. Set up r2 for called functions.	2021-07-15 15:27:09 +09:30
Richard Biener	4f3b383cf8	driver/101383 - handle -gtoggle in driver The driver amends assembler options with for example --gdwarf-5 when debugging is enabled but the check for that does not consider the effect of -gtoggle which is not handled in the common option machinery. The following alters debug_info_level according to -gtoggle mimicing what process_options later does in the compiler. This in particular avoids changing of the cc1-checksum with every bootstrap (debug) cycle as we compute that from stage2 where we use -g -gtoggle but with --gdwarf-5 and no debug info from the compiler the assembler will fill the line table with the temporary assembler file names. 2021-07-09 Richard Biener <rguenther@suse.de> PR driver/101383 * gcc.c (process_command): Process -gtoggle like process_options would after parsing options.	2021-07-15 07:56:08 +02:00
Trevor Saunders	ef3bb641e9	add myself to DCO section ChangeLog: * MAINTAINERS: Add myself to DCO section. Signed-off-by: Trevor Saunders <tbsaunde@tbsaunde.org>	2021-07-15 01:16:51 -04:00
Trevor Saunders	8d76ff9922	pass location to md_asm_adjust So the hook can use it as the location of diagnostics. gcc/ChangeLog: * cfgexpand.c (expand_asm_loc): Adjust. (expand_asm_stmt): Likewise. * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Likewise. * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise. * config/arm/arm.c (thumb1_md_asm_adjust): Likewise. * config/avr/avr.c (avr_md_asm_adjust): Likewise. * config/cris/cris.c (cris_md_asm_adjust): Likewise. * config/i386/i386.c (ix86_md_asm_adjust): Likewise. * config/mn10300/mn10300.c (mn10300_md_asm_adjust): Likewise. * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise. * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise. * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise. * config/s390/s390.c (s390_md_asm_adjust): Likewise. * config/vax/vax.c (vax_md_asm_adjust): Likewise. * config/visium/visium.c (visium_md_asm_adjust): Likewise. * doc/tm.texi: Regenerate. * target.def: Add location argument to md_asm_adjust. Signed-off-by: Trevor Saunders <tbsaunde@tbsaunde.org>	2021-07-15 01:10:47 -04:00
Trevor Saunders	329769b720	use diagnostic location in diagnostic_report_current_function It appears that input_location was used here before the diagnostic's location was available, and never updated, when the other part of the header was added that uses it, so this makes it consistent. gcc/ChangeLog: * tree-diagnostic.c (diagnostic_report_current_function): Use the diagnostic's location, not input_location. Signed-off-by: Trevor Saunders <tbsaunde@tbsaunde.org>	2021-07-15 01:10:34 -04:00
Trevor Saunders	28ca844641	use error_at and warning_at in cfgexpand.c gcc/ChangeLog: * cfgexpand.c (tree_conflicts_with_clobbers_p): Pass location to diagnostics. (expand_asm_stmt): Likewise. Signed-off-by: Trevor Saunders <tbsaunde@tbsaunde.org>	2021-07-15 01:10:19 -04:00
Jason Merrill	0b7a11874d	c++: fix tree_contains_struct for C++ types [PR101095] Many of the types from cp-tree.def were only marked as having tree_common, when actually most of them have type_non_common. This broke g++.dg/modules/xtreme-header-2, as the modules code relies on tree_contains_struct to know what bits it needs to stream. We don't seem to use type_non_common for TYPE_ARGUMENT_PACK, so I bumped it down to TS_TYPE_COMMON. I tried doing the same in cp_tree_size, but that breaks without more extensive changes to tree_node_structure. Why do we need the init_ts function anyway? It seems redundant with tree_node_structure. PR c++/101095 gcc/cp/ChangeLog: * cp-objcp-common.c (cp_common_init_ts): Mark types as types. (cp_tree_size): Remove redundant entries.	2021-07-14 23:18:14 -04:00
GCC Administrator	c4fee1c646	Daily bump.	2021-07-15 00:16:54 +00:00
Peter Bergner	69feb7601e	rs6000: Generate an lxvp instead of two adjacent lxv instructions The MMA build built-ins currently use individual lxv instructions to load up the registers of a __vector_pair or __vector_quad. If the memory addresses of the built-in operands are to adjacent locations, then we can use an lxvp in some cases to load up two registers at once. The patch below adds support for checking whether memory addresses are adjacent and emitting an lxvp instead of two lxv instructions. 2021-07-14 Peter Bergner <bergner@linux.ibm.com> gcc/ * config/rs6000/rs6000.c (adjacent_mem_locations): Return the lower addressed memory rtx, if any. (rs6000_split_multireg_move): Fix code formatting. Handle MMA build built-ins with operands in adjacent memory locations. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-9.c: New test.	2021-07-14 18:27:02 -05:00
Peter Bergner	7d914777fc	rs6000: Move rs6000_split_multireg_move to later in file An upcoming change to rs6000_split_multireg_move requires it to be moved later in the file to fix a declaration issue. 2021-07-14 Peter Bergner <bergner@linux.ibm.com> gcc/ * config/rs6000/rs6000.c (rs6000_split_multireg_move): Move to later in the file.	2021-07-14 18:23:31 -05:00
Patrick Palka	bebd8e9da8	c++: CTAD and forwarding references [PR88252] Here during CTAD we're incorrectly treating T&& as a forwarding reference even though T is a template parameter of the class template. This happens because the template parameter T in the out-of-line definition of the constructor doesn't have the flag TEMPLATE_TYPE_PARM_FOR_CLASS set, and during duplicate_decls the the redeclaration (which is in terms of this unflagged T) prevails. To fix this, we could perhaps be more consistent about setting the flag, but it appears we don't really need this flag to make the determination. Since the template parameters of an synthesized guide consist of the template parameters of the class template followed by those of the constructor (if any), it should suffice to look at the index of the template parameter to determine whether it comes from the class template or the constructor (template). This patch replaces the TEMPLATE_TYPE_PARM_FOR_CLASS flag with this approach. PR c++/88252 gcc/cp/ChangeLog: * cp-tree.h (TEMPLATE_TYPE_PARM_FOR_CLASS): Remove. * pt.c (push_template_decl): Remove TEMPLATE_TYPE_PARM_FOR_CLASS handling. (redeclare_class_template): Likewise. (forwarding_reference_p): Define. (maybe_adjust_types_for_deduction): Use it instead. Add 'tparms' parameter. (unify_one_argument): Pass tparms to maybe_adjust_types_for_deduction. (try_one_overload): Likewise. (unify): Likewise. (rewrite_template_parm): Remove TEMPLATE_TYPE_PARM_FOR_CLASS handling. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/class-deduction96.C: New test.	2021-07-14 15:37:30 -04:00
Jason Merrill	91bb571d20	vec: use auto_vec in a few more places The uses of vec<T> in get_all_loop_exits and process_conditional were memory leaks, as .release() was never called for them. The other changes are some cases that did have proper release handling, but it's simpler to leave releasing to the auto_vec destructor. gcc/ChangeLog: * sel-sched-ir.h (get_all_loop_exits): Use auto_vec. gcc/cp/ChangeLog: * class.c (struct find_final_overrider_data): Use auto_vec. (find_final_overrider): Remove explicit release. * coroutines.cc (process_conditional): Use auto_vec. * cp-gimplify.c (struct cp_genericize_data): Use auto_vec. (cp_genericize_tree): Remove explicit release. * parser.c (cp_parser_objc_at_property_declaration): Use auto_delete_vec. * semantics.c (omp_reduction_lookup): Use auto_vec.	2021-07-14 15:01:27 -04:00
Jason Merrill	b15e301748	c++: enable -fdelete-dead-exceptions by default As I was discussing with richi, I don't think it makes sense to protect calls to pure/const functions from DCE just because they aren't explicitly declared noexcept. PR100382 indicates that there are different considerations for Go, which has non-call exceptions. But still turn the flag off for that specific testcase. gcc/c-family/ChangeLog: * c-opts.c (c_common_post_options): Set -fdelete-dead-exceptions. gcc/ChangeLog: * doc/invoke.texi: -fdelete-dead-exceptions is on by default for C++. gcc/testsuite/ChangeLog: * g++.dg/torture/pr100382.C: Pass -fno-delete-dead-exceptions.	2021-07-14 14:59:56 -04:00
Tamar Christina	4940166a15	Vect: correct rebase issue The lines being removed have been updated and merged into a new condition. But when resolving some conflicts I accidentally reintroduced them causing some test failes. This removes them. Committed as the changes were previously approved in https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574977.html but the hunk was misapplied during a rebase. gcc/ChangeLog: * tree-vect-patterns.c (vect_recog_dot_prod_pattern): Remove erroneous line. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-reduc-dot-11.c: Expect pass. * gcc.dg/vect/vect-reduc-dot-15.c: Likewise. * gcc.dg/vect/vect-reduc-dot-19.c: Likewise. * gcc.dg/vect/vect-reduc-dot-21.c: Likewise.	2021-07-14 19:00:59 +01:00
Andrew MacLeod	398572c154	Turn hybrid mode off, default to ranger-only mode for EVRP. Change the default EVRP mode to ranger-only. gcc/ * params.opt (param_evrp_mode): Change default. gcc/testsuite/ * gcc.dg/pr80776-1.c: Remove xfail.	2021-07-14 12:47:10 -04:00
Marek Polacek	a42f812044	c++: constexpr array reference and value-initialization [PR101371] This PR gave me a hard time: I saw multiple issues starting with different revisions. But ultimately the root cause seems to be the following, and the attached patch fixes all issues I've found here. In cxx_eval_array_reference we create a new constexpr context for the CP_AGGREGATE_TYPE_P case, but we also have to create it for the non-aggregate case. In this test, we are evaluating ((B )this)->a = rhs->a which means that we set ctx.object to ((B )this)->a. Then we proceed to evaluate the initializer, rhs->a. For rhs, we eval rhs, a PARM_DECL, for which we have (const B &) &c.arr[0] in the hash table. Then cxx_fold_indirect_ref gives us c.arr[0]. c is evaluated to {.arr={}} so c.arr is {}. Now we want c.arr[0], so we end up in cxx_eval_array_reference and since we're initializing from {}, we call build_value_init which gives us an AGGR_INIT_EXPR that calls 'constexpr B::B()'. Then we evaluate this AGGR_INIT_EXPR and since its first argument is dummy, we take ctx.object instead. But that is the wrong object, we're not initializing ((B )this)->a here. And so we wound up with an initializer for A, and then crash in cxx_eval_component_reference: gcc_assert (DECL_CONTEXT (part) == TYPE_MAIN_VARIANT (TREE_TYPE (whole))); where DECL_CONTEXT (part) is B (as it should be) but the type of whole was A. So create a new object, if there already was one, and the element type is not a scalar. PR c++/101371 gcc/cp/ChangeLog: * constexpr.c (cxx_eval_array_reference): Create a new .object and .ctor for the non-aggregate non-scalar case too when value-initializing. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/constexpr-101371-2.C: New test. * g++.dg/cpp1y/constexpr-101371.C: New test.	2021-07-14 11:54:07 -04:00
Harald Anlauf	269ca408e2	Fortran - ICE in gfc_conv_expr_present initializing non-dummy class variable gcc/fortran/ChangeLog: PR fortran/100949 * trans-expr.c (gfc_trans_class_init_assign): Call gfc_conv_expr_present only for dummy variables. gcc/testsuite/ChangeLog: PR fortran/100949 * gfortran.dg/pr100949.f90: New test.	2021-07-14 17:25:29 +02:00
Tamar Christina	6d1cdb2782	AArch64: Correct dot-product auto-vect optab RTL The current RTL for the vectorizer patterns for dot-product are incorrect. Operand3 isn't an output parameter so we can't write to it. This fixes this issue and reduces the number of RTL. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (udot, sdot): Rename to... (sdot_prod, udot_prod): ...These. * config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Remove. (aarch64_<sur>dot<vsi2qi>): Rename to... (<sur>dot_prod<vsi2qi>): ...This. * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32, vdotq_s32): Update builtins.	2021-07-14 15:41:31 +01:00
Tamar Christina	c9165e2d58	AArch32: Correct sdot RTL on aarch32 The RTL Generated from <sup>dot_prod<vsi2qi> is invalid as operand3 cannot be written to, it's a normal input. For the expand it's just another operand but the caller does not expect it to be written to. gcc/ChangeLog: * config/arm/neon.md (<sup>dot_prod<vsi2qi>): Drop statements.	2021-07-14 15:22:37 +01:00
Tamar Christina	1e0ab1c4ba	middle-end: Add tests middle end generic tests for sign differing dotproduct. This adds testcases to test for auto-vect detection of the new sign differing dot product. gcc/ChangeLog: * doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_arm_v8_2a_imm8_neon_ok_nocache, check_effective_target_arm_v8_2a_i8mm_neon_hw, check_effective_target_vect_usdot_qi): New. * gcc.dg/vect/vect-reduc-dot-9.c: New test. * gcc.dg/vect/vect-reduc-dot-10.c: New test. * gcc.dg/vect/vect-reduc-dot-11.c: New test. * gcc.dg/vect/vect-reduc-dot-12.c: New test. * gcc.dg/vect/vect-reduc-dot-13.c: New test. * gcc.dg/vect/vect-reduc-dot-14.c: New test. * gcc.dg/vect/vect-reduc-dot-15.c: New test. * gcc.dg/vect/vect-reduc-dot-16.c: New test. * gcc.dg/vect/vect-reduc-dot-17.c: New test. * gcc.dg/vect/vect-reduc-dot-18.c: New test. * gcc.dg/vect/vect-reduc-dot-19.c: New test. * gcc.dg/vect/vect-reduc-dot-20.c: New test. * gcc.dg/vect/vect-reduc-dot-21.c: New test. * gcc.dg/vect/vect-reduc-dot-22.c: New test.	2021-07-14 15:21:40 +01:00
Tamar Christina	6412c58c78	AArch32: Add support for sign differing dot-product usdot for NEON. This adds optabs implementing usdot_prod. The following testcase: #define N 480 #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2 signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4 unsigned SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char restrict a, SIGNEDNESS_4 char restrict b) { for (__INTPTR_TYPE__ i = 0; i < N; ++i) { int av = a[i]; int bv = b[i]; SIGNEDNESS_2 short mult = av * bv; res += mult; } return res; } Generates f: vmov.i32 q8, #0 @ v4si add r3, r2, #480 .L2: vld1.8 {q10}, [r2]! vld1.8 {q9}, [r1]! vusdot.s8 q8, q9, q10 cmp r3, r2 bne .L2 vadd.i32 d16, d16, d17 vpadd.i32 d16, d16, d16 vmov.32 r3, d16[0] add r0, r0, r3 bx lr instead of f: vmov.i32 q8, #0 @ v4si add r3, r2, #480 .L2: vld1.8 {q9}, [r2]! vld1.8 {q11}, [r1]! cmp r3, r2 vmull.s8 q10, d18, d22 vmull.s8 q9, d19, d23 vaddw.s16 q8, q8, d20 vaddw.s16 q8, q8, d21 vaddw.s16 q8, q8, d18 vaddw.s16 q8, q8, d19 bne .L2 vadd.i32 d16, d16, d17 vpadd.i32 d16, d16, d16 vmov.32 r3, d16[0] add r0, r0, r3 bx lr For NEON. I couldn't figure out if the MVE instruction vmlaldav.s16 could be used to emulate this. Because it would require additional widening to work I left MVE out of this patch set but perhaps someone should take a look. gcc/ChangeLog: * config/arm/neon.md (usdot_prod<vsi2qi>): New. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vusdot-autovec.c: New test.	2021-07-14 15:20:45 +01:00
Tamar Christina	752045ed1e	AArch64: Add support for sign differing dot-product usdot for NEON and SVE. Hi All, This adds optabs implementing usdot_prod. The following testcase: #define N 480 #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2 signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4 unsigned SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char restrict a, SIGNEDNESS_4 char restrict b) { for (__INTPTR_TYPE__ i = 0; i < N; ++i) { int av = a[i]; int bv = b[i]; SIGNEDNESS_2 short mult = av * bv; res += mult; } return res; } Generates for NEON f: movi v0.4s, 0 mov x3, 0 .p2align 3,,7 .L2: ldr q1, [x2, x3] ldr q2, [x1, x3] usdot v0.4s, v1.16b, v2.16b add x3, x3, 16 cmp x3, 480 bne .L2 addv s0, v0.4s fmov w1, s0 add w0, w0, w1 ret and for SVE f: mov x3, 0 cntb x5 mov w4, 480 mov z1.b, #0 whilelo p0.b, wzr, w4 mov z3.b, #0 ptrue p1.b, all .p2align 3,,7 .L2: ld1b z2.b, p0/z, [x1, x3] ld1b z0.b, p0/z, [x2, x3] add x3, x3, x5 sel z0.b, p0, z0.b, z3.b whilelo p0.b, w3, w4 usdot z1.s, z0.b, z2.b b.any .L2 uaddv d0, p1, z1.s fmov x1, d0 add w0, w0, w1 ret instead of f: movi v0.4s, 0 mov x3, 0 .p2align 3,,7 .L2: ldr q2, [x1, x3] ldr q1, [x2, x3] add x3, x3, 16 sxtl v4.8h, v2.8b sxtl2 v3.8h, v2.16b uxtl v2.8h, v1.8b uxtl2 v1.8h, v1.16b mul v2.8h, v2.8h, v4.8h mul v1.8h, v1.8h, v3.8h saddw v0.4s, v0.4s, v2.4h saddw2 v0.4s, v0.4s, v2.8h saddw v0.4s, v0.4s, v1.4h saddw2 v0.4s, v0.4s, v1.8h cmp x3, 480 bne .L2 addv s0, v0.4s fmov w1, s0 add w0, w0, w1 ret and f: mov x3, 0 cnth x5 mov w4, 480 mov z1.b, #0 whilelo p0.h, wzr, w4 ptrue p2.b, all .p2align 3,,7 .L2: ld1sb z2.h, p0/z, [x1, x3] punpklo p1.h, p0.b ld1b z0.h, p0/z, [x2, x3] add x3, x3, x5 mul z0.h, p2/m, z0.h, z2.h sunpklo z2.s, z0.h sunpkhi z0.s, z0.h add z1.s, p1/m, z1.s, z2.s punpkhi p1.h, p0.b whilelo p0.h, w3, w4 add z1.s, p1/m, z1.s, z0.s b.any .L2 uaddv d0, p2, z1.s fmov x1, d0 add w0, w0, w1 ret gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_usdot<vsi2qi>): Rename to... (usdot_prod<vsi2qi>): ... This. * config/aarch64/aarch64-simd-builtins.def (usdot): Rename to... (usdot_prod): ...This. * config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Likewise. * config/aarch64/aarch64-sve.md (@aarch64_<sur>dot_prod<vsi2qi>): Rename to... (@<sur>dot_prod<vsi2qi>): ...This. * config/aarch64/aarch64-sve-builtins-base.cc (svusdot_impl::expand): Use it. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/vusdot-autovec.c: New test. * gcc.target/aarch64/sve/vusdot-autovec.c: New test.	2021-07-14 15:19:32 +01:00
Tamar Christina	ab0a6b213a	Vect: Add support for dot-product where the sign for the multiplicant changes. This patch adds support for a dot product where the sign of the multiplication arguments differ. i.e. one is signed and one is unsigned but the precisions are the same. #define N 480 #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2 signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4 unsigned SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char restrict a, SIGNEDNESS_4 char restrict b) { for (__INTPTR_TYPE__ i = 0; i < N; ++i) { int av = a[i]; int bv = b[i]; SIGNEDNESS_2 short mult = av * bv; res += mult; } return res; } The operations are performed as if the operands were extended to a 32-bit value. As such this operation isn't valid if there is an intermediate conversion to an unsigned value. i.e. if SIGNEDNESS_2 is unsigned. more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped the same optab is used but the operands are flipped in the optab expansion. To support this the patch extends the dot-product detection to optionally ignore operands with different signs and stores this information in the optab subtype which is now made a bitfield. The subtype can now additionally controls which optab an EXPR can expand to. gcc/ChangeLog: * optabs.def (usdot_prod_optab): New. * doc/md.texi: Document it and clarify other dot prod optabs. * optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign. * optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab. * optabs.c (expand_widen_pattern_expr): Likewise. * tree-cfg.c (verify_gimple_assign_ternary): Likewise. * tree-vect-loop.c (vectorizable_reduction): Query dot-product kind. * tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional optab subtype. (vect_widened_op_tree): Optionally ignore mismatch types. (vect_recog_dot_prod_pattern): Support usdot_prod_optab.	2021-07-14 14:54:26 +01:00
H.J. Lu	cc11b924bf	x86: Don't enable UINTR in 32-bit mode UINTR is available only in 64-bit mode. Since the codegen target is unknown when the the gcc driver is processing -march=native, to properly handle UINTR for -march=native: 1. Pass "arch [32\|64]" and "tune [32\|64]" to host_detect_local_cpu to indicate 32-bit and 64-bit codegen. 2. Change ix86_option_override_internal to enable UINTR only in 64-bit mode for -march=CPU when PTA_CPU includes PTA_UINTR. gcc/ PR target/101395 * config/i386/driver-i386.c (host_detect_local_cpu): Check "arch [32\|64]" and "tune [32\|64]" for 32-bit and 64-bit codegen. Enable UINTR only for 64-bit codegen. * config/i386/i386-options.c (ix86_option_override_internal::DEF_PTA): Skip PTA_UINTR if not in 64-bit mode. * config/i386/i386.h (ARCH_ARG): New. (CC1_CPU_SPEC): Pass "[arch\|tune] 32" for 32-bit codegen and "[arch\|tune] 64" for 64-bit codegen. gcc/testsuite/ PR target/101395 * gcc.target/i386/pr101395-1.c: New test. * gcc.target/i386/pr101395-2.c: Likewise. * gcc.target/i386/pr101395-3.c: Likewise.	2021-07-14 05:14:31 -07:00
Jonathan Wakely	f9c2ce1dae	libstdc++: Add noexcept-specifier to basic_string_view(It, End) This adds a conditional noexcept to the C++20 constructor. The std::to_address call cannot throw, so only taking the difference of the two iterators can throw. Signed-off-by: Jonathan Wakely <jwakely@redhat.com> libstdc++-v3/ChangeLog: * include/std/string_view (basic_string_view(It, End)): Add noexcept-specifier. * testsuite/21_strings/basic_string_view/cons/char/range.cc: Check noexcept-specifier. Also check construction without CTAD.	2021-07-14 12:23:33 +01:00
Richard Biener	a967a3efd3	tree-optimization/101445 - fix negative stride SLP vect with gaps The following fixes the IV adjustment for the gap in a negative stride SLP vectorization. The adjustment was in the wrong direction, now fixes as in the patch. 2021-07-14 Richard Biener <rguenther@suse.de> PR tree-optimization/101445 * tree-vect-stmts.c (vectorizable_load): Do the gap adjustment of the IV in the correct direction for negative stride accesses. * gcc.dg/vect/pr101445.c: New testcase.	2021-07-14 12:31:42 +02:00
Jakub Jelinek	3be762c2ed	godump: Fix -fdump-go-spec= reproduceability issue [PR101407] pot_dummy_types is a hash_set from whose traversal the code prints some type lines. hash_set normally uses default_hash_traits which for pointer types (the hash set hashes const char ) uses pointer_hash which hashes the addresses of the pointers except of the least significant 3 bits. With address space randomization, that results in non-determinism in the -fdump-go-specs= generated file, each invocation can have different order of the lines emitted from pot_dummy_types traversal. This patch fixes it by hashing the string contents instead to make the hashes reproduceable. 2021-07-14 Jakub Jelinek <jakub@redhat.com> PR go/101407 godump.c (godump_str_hash): New type. (godump_container::pot_dummy_types): Use string_hash instead of ptr_hash in the hash_set.	2021-07-14 10:22:50 +02:00
Richard Biener	1dd3f21095	Support reduction def re-use for epilogue with different vector size The following adds support for re-using the vector reduction def from the main loop in vectorized epilogue loops on architectures which use different vector sizes for the epilogue. That's only x86 as far as I am aware. 2021-07-13 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vect_find_reusable_accumulator): Handle vector types where the old vector type has a multiple of the new vector type elements. (vect_create_partial_epilog): New function, split out from... (vect_create_epilog_for_reduction): ... here. (vect_transform_cycle_phi): Reduce the re-used accumulator to the new vector type. * gcc.target/i386/vect-reduc-1.c: New testcase.	2021-07-14 08:15:17 +02:00
Alexandre Oliva	a7098d6ef4	fix typo in attr_fnspec::verify Odd-numbered indices describing argument access sizes in the fnspec string can only hold 't' or a digit, as tested in the beginning of the case. When checking that the size-supplying argument does not have additional information associated with it, the test that excludes the 't' possibility looks for it at the even position in the fnspec string. Oops. This might yield false positives and negatives if a function has a fnspec in which an argument uses a 't' access-size, and ('t' - '1') happens to be the index of an argument described in an fnspec string. Assuming ASCII encoding, it would take a function with at least 68 arguments described in fnspec. Still, probably worth fixing. for gcc/ChangeLog * tree-ssa-alias.c (attr_fnspec::verify): Fix index in non-'t'-sized arg check.	2021-07-13 22:28:25 -03:00
Alexandre Oliva	66907e7399	adjust landing pads when changing main label If an artificial label created for a landing pad ends up being dropped in favor of a user-supplied label, the user-supplied label inherits the landing pad index, but the post_landing_pad field is not adjusted to point to the new label. This patch fixes the problem, and adds verification that we don't remove a label that's still used as a landing pad. The circumstance in which this problem can be hit was unusual: removal of a block with an unreachable label moves the label to some other unrelated block, in case its address is taken. In the case at hand (pr42739.C, complicated by wrappers and cleanups), the chosen block happened to be an EH landing pad. (A followup patch will change that.) for gcc/ChangeLog * tree-cfg.c (cleanup_dead_labels_eh): Update post_landing_pad label upon change of landing pad block's primary label. (cleanup_dead_labels): Check that a removed label is not that of a landing pad.	2021-07-13 22:25:54 -03:00
GCC Administrator	0e7754560f	Daily bump.	2021-07-14 00:16:44 +00:00
Jonathan Wright	8695bf78da	gcc: Add vec_select -> subreg RTL simplification Add a new RTL simplification for the case of a VEC_SELECT selecting the low part of a vector. The simplification returns a SUBREG. The primary goal of this patch is to enable better combinations of Neon RTL patterns - specifically allowing generation of 'write-to- high-half' narrowing intructions. Adding this RTL simplification means that the expected results for a number of tests need to be updated: * aarch64 Neon: Update the scan-assembler regex for intrinsics tests to expect a scalar register instead of lane 0 of a vector. * aarch64 SVE: Likewise. * arm MVE: Use lane 1 instead of lane 0 for lane-extraction intrinsics tests (as the move instructions get optimized away for lane 0.) This patch also adds new code generation tests to narrow_high_combine.c to verify the benefit of this RTL simplification. gcc/ChangeLog: 2021-06-08 Jonathan Wright <jonathan.wright@arm.com> * combine.c (combine_simplify_rtx): Add vec_select -> subreg simplification. * config/aarch64/aarch64.md (zero_extend<SHORT:mode><GPI:mode>2_aarch64): Add Neon to general purpose register case for zero-extend pattern. config/arm/vfp.md (arm_movsi_vfp): Remove "" from t -> r case to prevent some cases opting to go through memory. cse.c (fold_rtx): Add vec_select -> subreg simplification. * rtl.c (rtvec_series_p): Define predicate to determine whether a vector contains a linear series of integers. * rtl.h (rtvec_series_p): Define. * rtlanal.c (vec_series_lowpart_p): Define predicate to determine if a vector selection is equivalent to the low part of the vector. * rtlanal.h (vec_series_lowpart_p): Define. * simplify-rtx.c (simplify_context::simplify_binary_operation_1): Add vec_select -> subreg simplification. gcc/testsuite/ChangeLog: * gcc.target/aarch64/extract_zero_extend.c: Remove dump scan for RTL pattern match. * gcc.target/aarch64/narrow_high_combine.c: Add new tests. * gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Update scan-assembler regex to look for a scalar register instead of lane 0 of a vector. * gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: Likewise. * gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: Likewise. * gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: Likewise. * gcc.target/aarch64/simd/vqdmlalh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmlals_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmlslh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmlsls_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmullh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmullh_laneq_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmulls_laneq_s32.c: Likewise. * gcc.target/aarch64/sve/dup_lane_1.c: Likewise. * gcc.target/aarch64/sve/extract_1.c: Likewise. * gcc.target/aarch64/sve/extract_2.c: Likewise. * gcc.target/aarch64/sve/extract_3.c: Likewise. * gcc.target/aarch64/sve/extract_4.c: Likewise. * gcc.target/aarch64/sve/live_1.c: Update scan-assembler regex cases to look for 'b' and 'h' registers instead of 'w'. * gcc.target/arm/crypto-vsha1cq_u32.c: Update scan-assembler regex to reflect lane 0 vector extractions being simplified to scalar register moves. * gcc.target/arm/crypto-vsha1h_u32.c: Likewise. * gcc.target/arm/crypto-vsha1mq_u32.c: Likewise. * gcc.target/arm/crypto-vsha1pq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: Extract lane 1 as the moves for lane 0 now get optimized away. * gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise.	2021-07-13 21:02:58 +01:00
Paul A. Clarke	60aee15bb7	rs6000: Add tests for SSE4.1 "test" intrinsics Copy the test for _mm_testz_si128, _mm_testc_si128, _mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros, _mm_test_mix_ones_zeros from gcc/testsuite/gcc.target/i386. 2021-07-13 Paul A. Clarke <pc@us.ibm.com> gcc/testsuite * gcc.target/powerpc/sse4_1-ptest-1.c: Copy from gcc/testsuite/gcc.target/i386.	2021-07-13 13:50:24 -05:00
Paul A. Clarke	acd4b9103c	rs6000: Add support for SSE4.1 "test" intrinsics 2021-07-13 Paul A. Clarke <pc@us.ibm.com> gcc * config/rs6000/smmintrin.h (_mm_testz_si128, _mm_testc_si128, _mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros, _mm_test_mix_ones_zeros): New.	2021-07-13 13:46:34 -05:00
Jonathan Wakely	4d3eaeb4f5	libstdc++: Simplify basic_string_view::ends_with [PR 101361] The use of npos triggers a diagnostic as described in PR c++/101361. This change replaces the use of npos with the exact length, which is already known. We can further simplify it by inlining the effects of compare and substr, avoiding the redundant range checks in the latter. Signed-off-by: Jonathan Wakely <jwakely@redhat.com> libstdc++-v3/ChangeLog: PR c++/101361 * include/std/string_view (ends_with): Use traits_type::compare directly.	2021-07-13 15:21:26 +01:00
Andrew MacLeod	f75560398a	Adjust testcase to test the call is removed. Ranger now handles the test. gcc/testsuite PR tree-optimization/93781 * gcc.dg/tree-ssa/pr93781-1.c: Check that call is removed.	2021-07-13 09:43:18 -04:00
Roger Sayle	9aa5001ef4	Make gimple_could_trap_p const-safe. Allow gimple_could_trap_p (which previously took a non-const gimple) to be called from functions that take a const gimple (such as gimple_has_side_effects), and update its prototypes. Pre-approved as obvious. 2021-07-13 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog * gimple.c (gimple_could_trap_p_1): Make S argument a "const gimple". Preserve constness in call to gimple_asm_volatile_p. (gimple_could_trap_p): Make S argument a "const gimple". * gimple.h (gimple_could_trap_p_1, gimple_could_trap_p): Update function prototypes.	2021-07-13 14:01:41 +01:00
Jonathan Wakely	bd1eb556b9	libstdc++: Remove duplicate #include in <string_view> When I added the new C++23 constructor I added a conditional include of <bits/ranges_base.h>, which was already being included unconditionally. This removes the unconditional include but changes the condition for the other one, so it's used for C++20 as well. Signed-off-by: Jonathan Wakely <jwakely@redhat.com> libstdc++-v3/ChangeLog: * include/std/string_view: Only include <bits/ranges_base.h> once, and only for C++20 and later.	2021-07-13 12:09:37 +01:00
Richard Sandiford	1583b8bff0	vect: Reuse reduction accumulators between loops This patch adds support for reusing a main loop's reduction accumulator in an epilogue loop. This in turn lets the loops share a single piece of vector->scalar reduction code. The patch has the following restrictions: (1) The epilogue reduction can only operate on a single vector (e.g. ncopies must be 1 for non-SLP reductions, and the group size must be <= the element count for SLP reductions). (2) Both loops must use the same vector mode for their accumulators. This means that the patch is restricted to targets that support --param vect-partial-vector-usage=1. (3) The reduction must be a standard “tree code” reduction. However, these restrictions could be lifted in future. For example, if the main loop operates on 128-bit vectors and the epilogue loop operates on 64-bit vectors, we could in future reduce the 128-bit vector by one stage and use the 64-bit result as the starting point for the epilogue result. The patch tries to handle chained SLP reductions, unchained SLP reductions and non-SLP reductions. It also handles cases in which the epilogue loop is entered directly (rather than via the main loop) and cases in which the epilogue loop can be skipped. vect_get_main_loop_result is a bit more general than the current patch needs. gcc/ * tree-vectorizer.h (vect_reusable_accumulator): New structure. (_loop_vec_info::main_loop_edge): New field. (_loop_vec_info::skip_main_loop_edge): Likewise. (_loop_vec_info::skip_this_loop_edge): Likewise. (_loop_vec_info::reusable_accumulators): Likewise. (_stmt_vec_info::reduc_scalar_results): Likewise. (_stmt_vec_info::reused_accumulator): Likewise. (vect_get_main_loop_result): Declare. * tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize reduc_scalar_inputs. (vec_info::free_stmt_vec_info): Free reduc_scalar_inputs. * tree-vect-loop-manip.c (vect_get_main_loop_result): New function. (vect_do_peeling): Fill an epilogue loop's main_loop_edge, skip_main_loop_edge and skip_this_loop_edge fields. * tree-vect-loop.c (INCLUDE_ALGORITHM): Define. (vect_emit_reduction_init_stmts): New function. (get_initial_def_for_reduction): Use it. (get_initial_defs_for_reduction): Likewise. Change the vinfo parameter to a loop_vec_info. (vect_create_epilog_for_reduction): Store the scalar results in the reduc_info. If an epilogue loop is reusing an accumulator from the main loop, and if the epilogue loop can also be skipped, try to place the reduction code in the join block. Record accumulators that could potentially be reused by epilogue loops. (vect_transform_cycle_phi): When vectorizing epilogue loops, try to reuse accumulators from the main loop. Record the initial value in reduc_info for non-SLP reductions too. gcc/testsuite/ * gcc.target/aarch64/sve/reduc_9.c: New test. * gcc.target/aarch64/sve/reduc_9_run.c: Likewise. * gcc.target/aarch64/sve/reduc_10.c: Likewise. * gcc.target/aarch64/sve/reduc_10_run.c: Likewise. * gcc.target/aarch64/sve/reduc_11.c: Likewise. * gcc.target/aarch64/sve/reduc_11_run.c: Likewise. * gcc.target/aarch64/sve/reduc_12.c: Likewise. * gcc.target/aarch64/sve/reduc_12_run.c: Likewise. * gcc.target/aarch64/sve/reduc_13.c: Likewise. * gcc.target/aarch64/sve/reduc_13_run.c: Likewise. * gcc.target/aarch64/sve/reduc_14.c: Likewise. * gcc.target/aarch64/sve/reduc_14_run.c: Likewise. * gcc.target/aarch64/sve/reduc_15.c: Likewise. * gcc.target/aarch64/sve/reduc_15_run.c: Likewise.	2021-07-13 10:17:43 +01:00
Richard Sandiford	7670b6633e	vect: Simplify get_initial_def_for_reduction After previous patches, we can now easily provide the neutral op as an argument to get_initial_def_for_reduction. This in turn allows the adjustment calculation to be moved outside of get_initial_def_for_reduction, which is the main motivation of the patch. gcc/ * tree-vect-loop.c (get_initial_def_for_reduction): Remove adjustment handling. Take the neutral value as an argument, in place of the code argument. (vect_transform_cycle_phi): Update accordingly. Handle the initial values of cond reductions separately from code reductions. Choose the adjustment here rather than in get_initial_def_for_reduction. Sink the splat of vec_initial_def.	2021-07-13 10:17:42 +01:00
Richard Sandiford	221bdb333b	vect: Generalise neutral_op_for_slp_reduction This patch generalises the interface to neutral_op_for_slp_reduction so that it can be used for non-SLP reductions too. This isn't much of a win on its own, but it helps later patches. gcc/ * tree-vect-loop.c (neutral_op_for_slp_reduction): Replace with... (neutral_op_for_reduction): ...this, providing a more general interface. (vect_create_epilog_for_reduction): Update accordingly. (vectorizable_reduction): Likewise. (vect_transform_cycle_phi): Likewise.	2021-07-13 10:17:41 +01:00
Richard Sandiford	bd5a69191f	vect: Pass reduc_info to get_initial_def_for_reduction Similarly to the previous patch, this one passes the reduc_info to get_initial_def_for_reduction, rather than a stmt_vec_info that lacks the metadata. This again becomes useful later. gcc/ * tree-vect-loop.c (get_initial_def_for_reduction): Take the reduc_info instead of the original stmt_vec_info. (vect_transform_cycle_phi): Update accordingly.	2021-07-13 10:17:40 +01:00
Richard Sandiford	826c452e57	vect: Pass reduc_info to get_initial_defs_for_reduction This patch passes the reduc_info to get_initial_defs_for_reduction, so that the function can get general information from there rather than from the first SLP statement. This isn't a win on its own, but it becomes important with later patches. gcc/ * tree-vect-loop.c (get_initial_defs_for_reduction): Take the reduc_info as an additional parameter. (vect_transform_cycle_phi): Update accordingly.	2021-07-13 10:17:39 +01:00
Richard Sandiford	d592920c89	vect: Add a vect_phi_initial_value helper function This patch adds a helper function called vect_phi_initial_value for returning the incoming value of a given loop phi. The main reason for adding it is to ensure that the right preheader edge is used when vectorising nested loops. (PHI_ARG_DEF_FROM_EDGE itself doesn't assert that the given edge is for the right block, although I guess that would be good to add separately.) gcc/ * tree-vectorizer.h: Include tree-ssa-operands.h. (vect_phi_initial_value): New function. * tree-vect-loop.c (neutral_op_for_slp_reduction): Use it. (get_initial_defs_for_reduction, info_for_reduction): Likewise. (vect_create_epilog_for_reduction, vectorizable_reduction): Likewise. (vect_transform_cycle_phi, vectorizable_induction): Likewise.	2021-07-13 10:17:39 +01:00
Richard Sandiford	32b8edd529	vect: Ensure reduc_inputs always have vectype Vector reduction accumulators can differ in signedness from the final scalar result. The conversions to handle that case were distributed through vect_create_epilog_for_reduction; this patch does the conversion up-front instead. gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Convert the phi results to vectype after creating them. Remove later conversion code that thus becomes redundant.	2021-07-13 10:17:38 +01:00
Richard Sandiford	81ad6bfc07	vect: Remove new_phis from vect_create_epilog_for_reduction vect_create_epilog_for_reduction had a variable called new_phis. It collected the statements that produce the exit block definitions of the vector reduction accumulators. Although those statements are indeed phis initially, they are often replaced with normal statements later, leading to puzzling code like: FOR_EACH_VEC_ELT (new_phis, i, new_phi) { int bit_offset; if (gimple_code (new_phi) == GIMPLE_PHI) vec_temp = PHI_RESULT (new_phi); else vec_temp = gimple_assign_lhs (new_phi); Also, although the array collects statements, in practice all users want the lhs instead. This patch therefore replaces new_phis with a vector of gimple values called “reduc_inputs”. Also, reduction chains and ncopies>1 were handled with identical code (and there was a comment saying so). The patch unites them into a single “if”. gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Replace the new_phis vector with a reduc_inputs vector. Combine handling of reduction chains and ncopies > 1.	2021-07-13 10:17:37 +01:00
Richard Sandiford	b68eb70bd6	vect: Create array_slice of live-out stmts This patch constructs an array_slice of the scalar statements that produce live-out reduction results in the original unvectorised loop. There are three cases: - SLP reduction chains: the final SLP stmt is live-out - full SLP reductions: all SLP stmts are live-out - non-SLP reductions: the single scalar stmt is live-out This is a slight simplification on its own, mostly because it maans “group_size” has a consistent meaning throughout the function. The main justification though is that it helps with later patches. gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Truncate scalar_results to group_size elements after reducing down from N*group_size elements. Construct an array_slice of the live-out stmts and assert that there is one stmt per scalar result.	2021-07-13 10:17:36 +01:00

1 2 3 4 5 ...

186731 Commits