OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
Kyrylo Tkachov	9b588cfb42	aarch64: Reimplement vabdl_high* intrinsics using builtins This patch reimplements the vabdl_high intrinsics using builtins. It slightly cleans up the RTL pattern (the mode iterators) but nothing interesting apart from that. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (sabdl2, uabdl2): Define builtins. * config/aarch64/aarch64-simd.md (aarch64_<sur>abdl2<mode>_3): Rename to... (aarch64_<sur>abdl2<mode>): ... This. (<sur>sadv16qi): Adjust use of above. * config/aarch64/arm_neon.h (vabdl_high_s8): Reimplement using builtin. (vabdl_high_s16): Likewise. (vabdl_high_s32): Likewise. (vabdl_high_u8): Likewise. (vabdl_high_u16): Likewise. (vabdl_high_u32): Likewise.	2021-01-29 13:49:19 +00:00
Kyrylo Tkachov	9f499a86b2	aarch64: Re-implement vabal_high* intrinsics using builtins This patch reimplements the vabal_high* intrinsics using RTL builtins. It's straightforward, defining new unspecs and a new pattern. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (sabal2): Define builtin. (uabal2): Likewise. * config/aarch64/aarch64-simd.md (aarch64_<sur>abal2<mode>): New pattern. * config/aarch64/aarch64.md (unspec): Add UNSPEC_SABAL2 and UNSPEC_UABAL2. * config/aarch64/arm_neon.h (vabal_high_s8): Reimplement using builtin. (vabal_high_s16): Likewise. (vabal_high_s32): Likewise. (vabal_high_u8): Likewise. (vabal_high_u16): Likewise. (vabal_high_u32): Likewise. * config/aarch64/iterators.md (ABAL2): New mode iterator. (sur): Handle UNSPEC_SABAL2, UNSPEC_UABAL2.	2021-01-29 13:49:19 +00:00
Kyrylo Tkachov	d5e0d1f1d2	aarch64: Reimplement vabal* intrinsics using builtins This patch reimplements the vabal intrinsics with builtins. The RTL pattern is cleaned up to emit the right .8b suffixes for the inputs (though .16b is also accepted) and iterate over the right modes. The pattern's only other use is through the sadv16qi expander, which is adjusted. I've verified that the codegen for sadv16qi is not worse off. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (sabal): Define builtin. (uabal): Likewise. * config/aarch64/aarch64-simd.md (aarch64_<sur>abal<mode>_4): Rename to... (aarch64_<sur>abal<mode>): ... This (<sur>sadv16qi): Adust use of the above. * config/aarch64/arm_neon.h (vabal_s8): Reimplement using builtin. (vabal_s16): Likewise. (vabal_s32): Likewise. (vabal_u8): Likewise. (vabal_u16): Likewise. (vabal_u32): Likewise.	2021-01-29 13:49:19 +00:00
Kyrylo Tkachov	cb995de62a	aarch64: Reimplement vaddlv* intrinsics using builtins This patch reimplements the vaddlv* intrinsics using builtins. The vaddlv_s32 and vaddlv_u32 intrinsics actually perform a pairwise SADDLP/UADDLP instead of a SADDLV/UADDLV but because they only use two elements it has the same semantics. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (saddlv, uaddlv): Define builtins. * config/aarch64/aarch64-simd.md (aarch64_<su>addlv<mode>): Define. * config/aarch64/arm_neon.h (vaddlv_s8): Reimplement using builtin. (vaddlv_s16): Likewise. (vaddlv_u8): Likewise. (vaddlv_u16): Likewise. (vaddlvq_s8): Likewise. (vaddlvq_s16): Likewise. (vaddlvq_s32): Likewise. (vaddlvq_u8): Likewise. (vaddlvq_u16): Likewise. (vaddlvq_u32): Likewise. (vaddlv_s32): Likewise. (vaddlv_u32): Likewise. * config/aarch64/iterators.md (VDQV_L): New mode iterator. (unspec): Add UNSPEC_SADDLV, UNSPEC_UADDLV. (Vwstype): New mode attribute. (Vwsuf): Likewise. (VWIDE_S): Likewise. (USADDLV): New int iterator. (su): Handle UNSPEC_SADDLV, UNSPEC_UADDLV. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/vaddlv_1.c: New test.	2021-01-29 13:49:19 +00:00
Jonathan Wright	e053f96a9f	aarch64: Use RTL builtins for [su]mlsl_lane[q] intrinsics Rewrite [su]mlsl_lane[q] Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-01-28 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add [su]mlsl_lane[q] builtin generator macros. * config/aarch64/aarch64-simd.md (aarch64_vec_<su>mlsl_lane<Qlane>): Define. * config/aarch64/arm_neon.h (vmlsl_lane_s16): Use RTL builtin instead of inline asm. (vmlsl_lane_s32): Likewise. (vmlsl_lane_u16): Likewise. (vmlsl_lane_u32): Likewise. (vmlsl_laneq_s16): Likewise. (vmlsl_laneq_s32): Likewise. (vmlsl_laneq_u16): Likewise. (vmlsl_laneq_u32): Likewise.	2021-01-29 13:42:00 +00:00
Richard Biener	0833e3e1ff	change unit of --param max-gcse-memory to kB This changes it from bytes to kB since its value is limited to 2147483648. 2021-01-29 Richard Biener <rguenther@suse.de> * doc/invoke.texi (--param max-gcse-memory): Document unit of size. * gcse.c (gcse_or_cprop_is_too_expensive): Adjust. * params.opt (--param max-gcse-memory): Adjust default and document unit of size.	2021-01-29 14:01:21 +01:00
Richard Biener	cb52e59e33	rtl-optimization/98863 - fix PRE/CPROP memory usage check This fixes overflow of the memory usage estimate in turn failing to disable itself on WRF with LTO, causing a few GBs worth of memory peak. 2021-01-29 Richard Biener <rguenther@suse.de> PR rtl-optimization/98863 * gcse.c (gcse_or_cprop_is_too_expensive): Use unsigned HOST_WIDE_INT for the memory estimate.	2021-01-29 14:01:21 +01:00
Richard Biener	f4e426f7bd	tree-optimization/97627 - Avoid computing niters for fake edges This avoids computing niters information for fake edges. 2021-01-29 Bin Cheng <bin.cheng@linux.alibaba.com> Richard Biener <rguenther@suse.de> PR tree-optimization/97627 * tree-ssa-loop-niter.c (number_of_iterations_exit_assumptions): Do not analyze fake edges. * g++.dg/pr97627.C: New testcase.	2021-01-29 12:09:10 +01:00
Richard Biener	a8c455bafd	rtl-optimization/98144 - tame REE memory usage This changes the REE dataflow to change the explicit all-ones starting solution to be implicit via a visited flag, removing the need to initially start with fully populated bitmaps for all basic-blocks. That reduces peak memory use when compiling the RTL checking enabled insn-extract.c testcase from PR98144 from 6GB to less than 2GB. 2021-01-29 Richard Biener <rguenther@suse.de> PR rtl-optimization/98144 * df.h (df_mir_bb_info): Add con_visited member. * df-problems.c (df_mir_alloc): Initialize con_visited, do not fully populate IN and OUT. (df_mir_reset): Likewise. (df_mir_confluence_0): Set con_visited. (df_mir_confluence_n): Properly handle implicitely fully populated IN and OUT as designated by con_visited and update con_visited accordingly.	2021-01-29 12:01:58 +01:00
Jakub Jelinek	e7429bc9d6	arm: Fix up -mcpu=iwmmxt ICEs [PR98849] The https://gcc.gnu.org/r11-6707-g7432f255b70811dafaf325d94036ac580891de69 https://gcc.gnu.org/r11-6708-gbfab355012ca0f5219da8beb04f2fdaf757d34b7 changes moved the vashl/vashr/vlshr expanders from neon.md to vec-common.md and changed their condition from TARGET_NEON to ARM_HAVE_<MODE>_ARITH, so that they apply also for TARGET_HAVE_MVE. But, the ARM_HAVE_<MODE>_ARITH macros are sometimes true also for TARGET_REALLY_IWMMXT, which at least from quick skimming of former iwmmxt.md doesn't have such instructions, so it seems incorrect to enable them for iwmmxt. Furthermore, even if it had them, iwmmxt doesn't support any way to broadcast values in those modes (vec_duplicate and vec_init optabs) and the middle end relies on if the vector x vector shift/rotate patterns are supported it can emit vector x scalar shift/rotate by broadcasting the shift amount to a vector. As the TARGET_NEON vs. TARGET_REALLY_IWMMXT vs. TARGET_HAVE_MVE never seem to be enabled together, I think we can just write it the following way. Note, seems iwmmxt actually does support vector x scalar shifts, but doesn't really enable the optabs that would tell the middle-end code that it does (and neon and mve don't seem to support those). I'll defer that to anybody that cares about iwmmxt (if any). 2021-01-29 Jakub Jelinek <jakub@redhat.com> PR target/98849 config/arm/vec-common.md (mve_vshlq_<supf><mode>, vashl<mode>3, vashr<mode>3, vlshr<mode>3): Add && !TARGET_REALLY_IWMMXT to conditions. * gcc.c-torture/compile/pr98849.c: New test.	2021-01-29 11:54:22 +01:00
Jakub Jelinek	9c445c07cd	expand: Fix up find_bb_boundaries [PR98331] When expansion emits some control flow insns etc. inside of a former GIMPLE basic block, find_bb_boundaries needs to split it into multiple basic blocks. The code needs to ignore debug insns in decisions how many splits to do or where in between some non-debug insns the split should be done, but it can decide where to put debug insns if they can be kept and otherwise throws them away (they can't stay outside of basic blocks). On the following testcase, we end up in the bb from expander with control flow insn debug insns barrier some other insn (the some other insn is effectively dead after __builtin_unreachable and we'll optimize that out later). Without debug insns, we'd do the split when encountering some other insn and split after PREV_INSN (some other insn), i.e. after barrier (and the splitting code then moves the barrier in between basic blocks). But if there are debug insns, we actually split before the first debug insn that appeared after the control flow insn, so after control flow insn, and get a basic block that starts with debug insns and then has a barrier in the middle that nothing moves it out of the bb. This leads to ICEs and even if it wouldn't, different behavior from -g0. The reason for treating debug insns that way is a different case, e.g. control flow insn debug insns some other insn or even control flow insn barrier debug insns some other insn where splitting before the first such debug insn allows us to keep them while otherwise we would have to drop them on the floor, and in those situations we behave the same with -g0 and -g. So, the following patch fixes it by resetting debug_insn not just when splitting the blocks (it is set only after seeing a control flow insn and before splitting for it if needed), but also when seeing a barrier, which effectively means we always throw away debug insns after a control flow insn and before following barrier if any, but there is no way around that, control flow insn must be the last in the bb (BB_END) and BARRIER after it, debug insns aren't allowed outside of bb. We still handle the other cases fine (when there is no barrier or when debug insns appear only after the barrier). 2021-01-29 Jakub Jelinek <jakub@redhat.com> PR debug/98331 * cfgbuild.c (find_bb_boundaries): Reset debug_insn when seeing a BARRIER. * gcc.dg/pr98331.c: New test.	2021-01-29 10:30:09 +01:00
Xionghu Luo	280a59d921	testsuite: Run vec_insert case on P8 and P9 with option specified Move run_test and TEST_VEC_INSERT_ALL to header file for share usage. gcc/testsuite/ChangeLog: 2021-01-29 Xionghu Luo <luoxhu@linux.ibm.com> * gcc.target/powerpc/pr79251.p8.c: Move TEST_VEC_INSERT_ALL to ... * gcc.target/powerpc/pr79251.h: ...this. * gcc.target/powerpc/pr79251.p9.c: Likewise. * gcc.target/powerpc/pr79251-run.c: Move run_test to pr79251.h. Rename to... * gcc.target/powerpc/pr79251-run.p8.c: ...this. * gcc.target/powerpc/pr79251-run.p9.c: New test.	2021-01-29 01:33:09 -06:00
Marek Polacek	f8f5388c9e	c++: Fix infinite looping with invalid operator [PR96137] My r11-86 adjusted cp_parser_class_name to do - scope = parser->scope; + scope = parser->scope ? parser->scope : parser->context->object_type; if (scope == error_mark_node) return error_mark_node; but that caused endless looping in cp_parser_type_specifier_seq (the while (true) loop) in this invalid test, because we never set a parser error, therefore cp_parser_type_specifier returned error_mark_node instead of NULL_TREE, and we never issued the "expected type-specifier" error. At first I thought I'd just add cp_parser_simulate_error right before the return, but that regresses crash81.C -- we'd emit multiple errors for "T::X". So the next best thing seemed to revert to pre-r11-86 behavior: return early when parser->scope is bad, otherwise proceed to get the parser error. gcc/cp/ChangeLog: PR c++/96137 * parser.c (cp_parser_class_name): If parser->scope is error_mark_node, return it, otherwise continue. gcc/testsuite/ChangeLog: PR c++/96137 * g++.dg/parse/error63.C: New test.	2021-01-28 23:29:35 -05:00
GCC Administrator	85d04a2ecb	Daily bump.	2021-01-29 00:16:21 +00:00
Ian Lance Taylor	e6bce7fe17	gccgo driver: always act as though -g is passed The go1 compiler always turns on debugging, to support Go stack traces and functions like runtime.Callers. With the recent switch to turn on DWARF 5 by default, this caused failures with some versions of gas, such as 2.35.1, because the assembly code would assume DWARF 5 but the driver would not pass --gdwarf-5 to gas. gas would then give an error: "file number less than one". This change avoids that problem by having the gccgo driver spec add a -g option to the command line if no other -g option is present. The newly added -g option is passed to the assembler as --gdwarf-5. * gospec.c (lang_specific_driver): Add -g if no debugging options were passed.	2021-01-28 15:54:03 -08:00
Jakub Jelinek	850a8ec54c	c++: Fix -Weffc++ in templates [PR98841] We emit a bogus warning on the following testcase, suggesting that the operator should return this even when it does that already. The problem is that normally cp_build_indirect_ref_1 ensures that this is folded as current_class_ref, but in templates (if return type is non-dependent, otherwise check_return_expr doesn't check it) it didn't go through cp_build_indirect_ref_1, but just built another INDIRECT_REF. Which means it then doesn't compare pointer-equal to current_class_ref. The following patch fixes it by doing in build_x_indirect_ref for this what cp_build_indirect_ref_1 would do. 2021-01-28 Jakub Jelinek <jakub@redhat.com> PR c++/98841 typeck.c (build_x_indirect_ref): For this, return current_class_ref. g++.dg/warn/effc5.C: New test.	2021-01-29 00:39:00 +01:00
Marek Polacek	513ee7d2cd	tree: Don't reuse types if TYPE_USER_ALIGN differ [PR94775] A year ago I submitted this patch: ~~ Here we trip on the TYPE_USER_ALIGN (t) assert in strip_typedefs: it gets "const d[0]" with TYPE_USER_ALIGN=0 but the result built by build_cplus_array_type is "const char[0]" with TYPE_USER_ALIGN=1. When we strip_typedefs the element of the array "const d", we see it's a typedef_variant_p, so we look at its DECL_ORIGINAL_TYPE, which is char, but we need to add the const qualifier, so we call cp_build_qualified_type -> build_qualified_type where get_qualified_type checks to see if we already have such a type by walking the variants list, which in this case is: char -> c -> const char -> const char -> d -> const d Because check_base_type only checks TYPE_ALIGN and not TYPE_USER_ALIGN, we choose the first const char, which has TYPE_USER_ALIGN set. If the element type of an array has TYPE_USER_ALIGN, the array type gets it too. So we can make check_base_type stricter. I was afraid that it might make us reuse types less often, but measuring showed that we build the same amount of types with and without the patch, while bootstrapping. ~~ However, the patch broke a few tests on STRICT_ALIGNMENT platforms and had to be reverted. This is another try. The original patch is kept unchanged, but I added the finalize_type_size hunk that ought to fix the STRICT_ALIGNMENT issues. The problem is that finalize_type_size can clear TYPE_USER_ALIGN on the main variant of a type, but doesn't clear it on any of the variants. Then we end up with types which share the same TYPE_MAIN_VARIANT, but their TYPE_CANONICAL differs and then the usual "canonical types differ for identical types" follows. I've created alignas19.C to exercise this scenario. What happens is: - when parsing the class S we create a type S in xref_tag, - we see alignas(8) so common_handle_aligned_attribute sets T_U_A in S, - we parse the member function fn and build_memfn_type creates a copy of S to add const; this variant has T_U_A set, - we finish_struct S which calls layout_class_type -> finish_record_type -> finalize_size_type where we reset T_U_A in S (but const S keeps it), - finish_non_static_data_member for arr calls maybe_dummy_object with type = S, - maybe_dummy_object calls same_type_ignoring_top_level_qualifiers_p to check if S and TREE_TYPE (current_class_ref), which is const S, are the same, - same_type_ignoring_top_level_qualifiers_p creates cv-unqualified versions of the passed types. Previously we'd use our main variant S when stripping "const S" of const, but since the T_U_A flags don't match (check_base_type), we create a new variant S'. Then we crash in comptypes because S and S' have the same TYPE_MAIN_VARIANT but different TYPE_CANONICALs. With my patch we'll clear T_U_A for S's variants too, and then instead of S' we'll just use S. gcc/ChangeLog: PR c++/94775 * stor-layout.c (finalize_type_size): If we reset TYPE_USER_ALIGN in the main variant, maybe reset it in its variants too. * tree.c (check_base_type): Return true only if TYPE_USER_ALIGN match. (check_aligned_type): Check if TYPE_USER_ALIGN match. gcc/testsuite/ChangeLog: PR c++/94775 * g++.dg/cpp0x/alignas19.C: New test. * g++.dg/warn/Warray-bounds15.C: New test.	2021-01-28 16:21:50 -05:00
Jonathan Wakely	a054608c9c	libstdc++: Fix copyright dates for simd headers and tests libstdc++-v3/ChangeLog: * include/experimental/bits/numeric_traits.h: Update copyright dates. * include/experimental/bits/simd.h: Likewise. * include/experimental/bits/simd_builtin.h: Likewise. * include/experimental/bits/simd_converter.h: Likewise. * include/experimental/bits/simd_detail.h: Likewise. * include/experimental/bits/simd_fixed_size.h: Likewise. * include/experimental/bits/simd_math.h: Likewise. * include/experimental/bits/simd_neon.h: Likewise. * include/experimental/bits/simd_ppc.h: Likewise. * include/experimental/bits/simd_scalar.h: Likewise. * include/experimental/bits/simd_x86.h: Likewise. * include/experimental/bits/simd_x86_conversions.h: Likewise. * include/experimental/simd: Likewise. * testsuite/experimental/simd/*: Likewise.	2021-01-28 18:13:03 +00:00
Christophe Lyon	31a0ab9213	arm: Adjust cost of vector of constant zero Neon vector comparisons have a dedicated version when comparing with constant zero: it means its cost is free. Adjust the cost in arm_rtx_costs_internal accordingly, for Neon only, since MVE does not support this. 2021-01-28 Christophe Lyon <christophe.lyon@linaro.org> gcc/ PR target/98730 * config/arm/arm.c (arm_rtx_costs_internal): Adjust cost of vector of constant zero for comparisons. gcc/testsuite/ PR target/98730 * gcc.target/arm/simd/vceqzq_p64.c: Update expected result.	2021-01-28 17:55:45 +00:00
David Edelsohn	e28bd09498	testsuite: Fix up a testcase to find the right ISO_Fortran_binding.h. gcc/testsuite/ChangeLog: * gfortran.dg/ISO_Fortran_binding_18.c: Include ../../../libgfortran/ISO_Fortran_binding.h rather than ISO_Fortran_binding.h.	2021-01-28 12:44:30 -05:00
Michael Meissner	e11e5d3889	Map long double built-ins correctly with IEEE 128-bit long double. The PowerPC has two different 128-bit long double types, one that uses a pair of doubles to get more mantissa range, and the other using the IEEE 128-bit 754R binary floating point format. The pair of doubles has been used as the traditional format, and we are in the process of moving to allow an implementation to switch to using IEEE 128-bit floating point. The GLIBC and LIBSTDC++ libraries have been modified to have functions using the two different formats in their libraries with different names. This patch goes through all of the built-in functions that either take long double arguments or return long double, and changes the name from the traditional name to the IEEE 128-bit name. The minimum GLIBC version to support IEEE 128-bit floating point is 2.32. The names changed are: * <name>l is usually mapped to __<name>ieee128; * <extra>printf is mapped to __<extra>printfieee128; (and) * <extra>scanf is mapped to __isoc99_<extra>scanfieee128. A few functions have different mappings: * dreml => __remainderieee128; * gammal => __lgammaieee128; * gammal_r => __lgammaieee128_r; * lgammal_r => __lgammaieee128_r; * nexttoward => __nexttoward_to_ieee128; * nexttowardf => __nexttowardf_to_ieee128; * nexttowardl => __nexttowardl_to_ieee128; * pow10l => __exp10ieee128; * scalbl => __scalbieee128; * significandl => __significandieee128; (and) * sincosl => __sincosieee128. gcc/ 2021-01-28 Michael Meissner <meissner@linux.ibm.com> * config/rs6000/rs6000.c (rs6000_mangle_decl_assembler_name): Add support for mapping built-in function names for long double built-in functions if long double is IEEE 128-bit. gcc/testsuite/ 2021-01-28 Michael Meissner <meissner@linux.ibm.com> * gcc.target/powerpc/float128-longdouble-math.c: New test. * gcc.target/powerpc/float128-longdouble-stdio.c: New test. * gcc.target/powerpc/float128-math.c: Adjust test for new name being generated. Add support for running test on power10. Add support for running if long double defaults to 64-bits.	2021-01-28 11:30:46 -05:00
Jakub Jelinek	6bb207b468	c++: Fix up handling of register ... asm ("...") vars in templates [PR33661, PR98847] As the testcase shows, for vars appearing in templates, we don't attach the asm spec string to the pattern decls, nor pass it back to cp_finish_decl during instantiation. The following patch does that. 2021-01-28 Jakub Jelinek <jakub@redhat.com> PR c++/33661 PR c++/98847 * decl.c (cp_finish_decl): For register vars with asmspec in templates call set_user_assembler_name and set DECL_HARD_REGISTER. * pt.c (tsubst_expr): When instantiating DECL_HARD_REGISTER vars, pass asmspec_tree to cp_finish_decl. * g++.target/i386/pr98847.C: New test.	2021-01-28 16:13:11 +01:00
Jonathan Wright	8a8e515c2b	aarch64: Use RTL builtins for [su]mlsl_n intrinsics Rewrite [su]mlsl_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-01-27 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add [su]mlsl_n builtin generator macros. * config/aarch64/aarch64-simd.md (aarch64_<su>mlsl_n<mode>): Define. * config/aarch64/arm_neon.h (vmlsl_n_s16): Use RTL builtin instead of inline asm. (vmlsl_n_s32): Likewise. (vmlsl_n_u16): Likewise. (vmlsl_n_u32): Likewise.	2021-01-28 14:18:17 +00:00
Kyrylo Tkachov	ff119f340e	aarch64: Fix gcc.target/aarch64/narrow_high-intrinsics.c testism Pushing to fix recently-updated assembly generation gcc/testsuite/ * gcc.target/aarch64/narrow_high-intrinsics.c: Fix shrn2 scan.	2021-01-28 14:10:29 +00:00
Jonathan Wright	87301e3956	aarch64: Use RTL builtins for [su]mlal_n intrinsics Rewrite [su]mlal_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-01-26 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add [su]mlal_n builtin generator macros. * config/aarch64/aarch64-simd.md (aarch64_<su>mlal_n<mode>): Define. * config/aarch64/arm_neon.h (vmlal_n_s16): Use RTL builtin instead of inline asm. (vmlal_n_s32): Likewise. (vmlal_n_u16): Likewise. (vmlal_n_u32): Likewise.	2021-01-28 13:12:52 +00:00
Nathan Sidwell	af66f4f1b0	c++: header unit template alias merging [PR 98770] Typedefs are streamed by streaming the underlying type, and then recreating the typedef. But this breaks checking a duplicate is the same as the original when it is a template alias -- we end up checking a template alias (eg __void_t) against the underlying type (void). And those are not the same template alias. This stops pretendig that the underlying type is the typedef for that checking and tells is_matching_decl 'you have a typedef', so it knows what to do. (We do not want to recreate the typedef of the duplicate, because that whole set of nodes is going to go away.) PR c++/98770 gcc/cp/ * module.cc (trees_out::decl_value): Swap is_typedef & TYPE_NAME check order. (trees_in::decl_value): Do typedef frobbing only when installing a new typedef, adjust is_matching_decl call. Swap is_typedef & TYPE_NAME check. (trees_in::is_matching_decl): Add is_typedef parm. Adjust variable names and deal with typedef checking. gcc/testsuite/ * g++.dg/modules/pr98770_a.C: New. * g++.dg/modules/pr98770_b.C: New.	2021-01-28 04:55:02 -08:00
Kyrylo Tkachov	d61ca09ec9	aarch64: Reimplement vshrn_high_n* intrinsics using builtins This patch reimplements the vshrn_high_n* intrinsics that generate the SHRN2 instruction. It is a vec_concat of the narrowing shift with the bottom part of the destination register, so we need a little-endian and a big-endian version and an expander to pick between them. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (shrn2): Define builtin. * config/aarch64/aarch64-simd.md (aarch64_shrn2<mode>_insn_le): Define. (aarch64_shrn2<mode>_insn_be): Likewise. (aarch64_shrn2<mode>): Likewise. * config/aarch64/arm_neon.h (vshrn_high_n_s16): Reimlplement using builtins. (vshrn_high_n_s32): Likewise. (vshrn_high_n_s64): Likewise. (vshrn_high_n_u16): Likewise. (vshrn_high_n_u32): Likewise. (vshrn_high_n_u64): Likewise.	2021-01-28 11:43:06 +00:00
Kyrylo Tkachov	fdb904a182	aarch64: Reimplement vshrn_n* intrinsics using builtins This patch reimplements the vshrn_n* intrinsics to use RTL builtins. These perform a narrowing right shift. Although the intrinsic generates the half-width mode (e.g. V8HI -> V8QI), the new pattern generates a full 128-bit mode (V8HI -> V16QI) by representing the fill-with-zeroes semantics of the SHRN instruction. The narrower (V8QI) result is extracted with a lowpart subreg. I found this allows the RTL optimisers to do a better job at optimising redundant moves away in frequently-occurring SHRN+SRHN2 pairs, like in: uint8x16_t foo (uint16x8_t in1, uint16x8_t in2) { uint8x8_t tmp = vshrn_n_u16 (in2, 7); uint8x16_t tmp2 = vshrn_high_n_u16 (tmp, in1, 4); return tmp2; } gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (shrn): Define builtin. * config/aarch64/aarch64-simd.md (aarch64_shrn<mode>_insn_le): Define. (aarch64_shrn<mode>_insn_be): Likewise. (aarch64_shrn<mode>): Likewise. * config/aarch64/arm_neon.h (vshrn_n_s16): Reimplement using builtins. (vshrn_n_s32): Likewise. (vshrn_n_s64): Likewise. (vshrn_n_u16): Likewise. (vshrn_n_u32): Likewise. (vshrn_n_u64): Likewise. * config/aarch64/iterators.md (vn_mode): New mode attribute.	2021-01-28 11:42:20 +00:00
Eric Botcazou	f7a6d314e7	Fix LTO bootstrap on Windows The latest fix introduced a comparison of executables and this cannot directly work on Windows because they are timestamped. Moreover nobody sets $(exeext) at top level, at least on MinGW, so you get weird behavior because some tools add the implicit .exe suffix and others do not. contrib/ PR lto/85574 * compare-lto: Deal with PE-COFF executables specifically.	2021-01-28 11:33:53 +01:00
Harald Anlauf	33a7a93218	PR fortran/86470 - ICE with OpenMP, class() allocatable gfc_call_malloc should malloc an area of size 1 if no size given. gcc/fortran/ChangeLog: PR fortran/86470 trans.c (gfc_call_malloc): Allocate area of size 1 if passed size is NULL (as documented). gcc/testsuite/ChangeLog: PR fortran/86470 * gfortran.dg/gomp/pr86470.f90: New test.	2021-01-28 10:13:46 +01:00
Jakub Jelinek	c392d040f6	c++: Some C++20 and C++23 option help fixes I've noticed we still refer to C++20 as draft standard, and there is a pasto in C++23 description. 2021-01-28 Jakub Jelinek <jakub@redhat.com> * c.opt (-std=c++2a, -std=c++20, -std=gnu++2a, -std=gnu++20): Remove draft from description. (-std=c++2b): Fix a pasto, 2020 -> 2023.	2021-01-28 10:00:52 +01:00
Richard Biener	a523add327	rtl-optimization/80960 - avoid creating garbage RTL in DSE The following avoids repeatedly turning VALUE RTXen into sth useful and re-applying a constant offset through get_addr via DSE check_mem_read_rtx. Instead perform this once for all stores to be visited in check_mem_read_rtx. This avoids allocating 1.6GB of garbage PLUS RTXen on the PR80960 testcase, fixing the memory usage regression from old GCC. 2021-01-27 Richard Biener <rguenther@suse.de> PR rtl-optimization/80960 * dse.c (check_mem_read_rtx): Call get_addr on the offsetted address.	2021-01-28 09:14:46 +01:00
Xionghu Luo	fbe37371cf	rs6000: Fix vec insert ilp32 ICE and test failures [PR98799] UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for variable vector insert. Remove rs6000_expand_vector_set_var helper function, adjust the p8 and p9 definitions position and make them static. The previous commit r11-6858 missed check m32, This patch is tested pass on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with RUNTESTFLAGS="--target_board =unix'{-m32,-m64}'" for BE targets. gcc/ChangeLog: 2021-01-27 Xionghu Luo <luoxhu@linux.ibm.com> David Edelsohn <dje.gcc@gmail.com> PR target/98799 * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Don't generate VIEW_CONVERT_EXPR for fcode ALTIVEC_BUILTIN_VEC_INSERT when -m32. * config/rs6000/rs6000-protos.h (rs6000_expand_vector_set_var): Delete. * config/rs6000/rs6000.c (rs6000_expand_vector_set): Remove the wrapper call rs6000_expand_vector_set_var for cleanup. Call rs6000_expand_vector_set_var_p9 and rs6000_expand_vector_set_var_p8 directly. (rs6000_expand_vector_set_var): Delete. (rs6000_expand_vector_set_var_p9): Make static. (rs6000_expand_vector_set_var_p8): Make static. gcc/testsuite/ChangeLog: 2021-01-27 Xionghu Luo <luoxhu@linux.ibm.com> PR target/98827 * gcc.target/powerpc/fold-vec-insert-char-p8.c: Adjust ilp32. * gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise. * gcc.target/powerpc/fold-vec-insert-double.c: Likewise. * gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise. * gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise. * gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise. * gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise. * gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise. * gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise. * gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise. * gcc.target/powerpc/pr79251.p8.c: Likewise. * gcc.target/powerpc/pr79251.p9.c: Likewise. * gcc.target/powerpc/vsx-builtin-7.c: Likewise. * gcc.target/powerpc/pr79251-run.c: Build and run with vsx option.	2021-01-27 21:34:08 -06:00
Xing GUO	f76d0d8645	RISC-V: Fix -march option parsing when extension exists. This patch fixes -march option parsing when `p` extension exists, e.g., -march=rv64imafdcp should produce .attribute arch, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_p" rather than .attribute arch, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c_p" gcc/ChangeLog: * common/config/riscv/riscv-common.c (riscv_subset_list::parsing_subset_version): Fix -march option parsing when `p` extension exists. gcc/testsuite/ChangeLog: * gcc.target/riscv/attribute-18.c: New test.	2021-01-28 11:25:50 +08:00
GCC Administrator	aa69f0a820	Daily bump.	2021-01-28 00:16:56 +00:00
Harris Snyder	1cdca4261e	Fix strides for C descriptors with stride > 2. libgfortran/ChangeLog: * runtime/ISO_Fortran_binding.c (CFI_establish): fixed strides for rank >2 arrays. gcc/testsuite/ChangeLog: * gfortran.dg/ISO_Fortran_binding_18.c: New test. * gfortran.dg/ISO_Fortran_binding_18.f90: New test.	2021-01-27 22:57:41 +01:00
Vladimir N. Makarov	081c96621d	[PR97684] IRA: Recalculate pseudo classes if we added new pseduos since last calculation before updating equiv regs update_equiv_regs can use reg classes of pseudos and they are set up in register pressure sensitive scheduling and loop invariant motion and in live range shrinking. This info can become obsolete if we add new pseudos since the last set up. Recalculate it again if the new pseudos were added. gcc/ChangeLog: PR rtl-optimization/97684 * ira.c (ira): Call ira_set_pseudo_classes before update_equiv_regs when it is necessary. gcc/testsuite/ChangeLog: PR rtl-optimization/97684 * gcc.target/i386/pr97684.c: New.	2021-01-27 15:59:05 -05:00
Jason Merrill	9cd7c32549	c++: Dependent using enum [PR97874] The handling of dependent scopes and unsuitable scopes in lookup_using_decl was a bit convoluted; I tweaked it for a while and then eventually reorganized much of the function to hopefully be clearer. Along the way I noticed a couple of ways we were mishandling inherited constructors. The local binding for a dependent using is the USING_DECL. Implement instantiation of a dependent USING_DECL at function scope. gcc/cp/ChangeLog: PR c++/97874 * name-lookup.c (lookup_using_decl): Clean up handling of dependency and inherited constructors. (finish_nonmember_using_decl): Handle DECL_DEPENDENT_P. * pt.c (tsubst_expr): Handle DECL_DEPENDENT_P. gcc/testsuite/ChangeLog: PR c++/97874 * g++.dg/lookup/using4.C: No error in C++20. * g++.dg/cpp0x/decltype37.C: Adjust message. * g++.dg/template/crash75.C: Adjust message. * g++.dg/template/crash76.C: Adjust message. * g++.dg/cpp0x/inh-ctor36.C: New test. * g++.dg/cpp1z/inh-ctor39.C: New test. * g++.dg/cpp2a/using-enum-7.C: New test.	2021-01-27 15:08:05 -05:00
Jakub Jelinek	5516341921	aarch64: Fix up aarch64_bfxilsi_uxtw [PR98853] The https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01895.html patch that introduced this pattern claimed: Would generate: combine_balanced_int: bfxil w0, w1, 0, 16 uxtw x0, w0 ret But with this patch generates: combine_balanced_int: bfxil w0, w1, 0, 16 ret and it is indeed what it should generate, but it doesn't do that, it emits bfxil x0, x1, 0, 16 instead which doesn't zero extend from 32 to 64 bits, but preserves the bits from the destination register. 2021-01-27 Jakub Jelinek <jakub@redhat.com> PR target/98853 config/aarch64/aarch64.md (aarch64_bfxilsi_uxtw): Use %w0, %w1 and %2 instead of %0, %1 and %2. gcc.c-torture/execute/pr98853-1.c: New test. * gcc.c-torture/execute/pr98853-2.c: New test.	2021-01-27 20:35:21 +01:00
Aaron Sawdey	7a279bed24	Combine patterns for p10 load-cmpi fusion This patch adds the first batch of patterns to support p10 fusion. These will allow combine to create a single insn for a pair of instructions that power10 can fuse and execute. These particular fusion pairs have the requirement that only cr0 can be used when fusing a load with a compare immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine to put that requirement in, and if it doesn't work out the splitter can change it back into 2 insns so scheduling can move them apart. The patterns are generated by a script genfusion.pl and live in new file fusion.md. This script will be expanded to generate more patterns for fusion. This also adds option -mpower10-fusion which defaults on for power10 and will gate all these fusion patterns. In addition I have added an undocumented option -mpower10-fusion-ld-cmpi (which may be removed later) that just controls the load+compare-immediate patterns. I have made these default on for power10 but they are not disallowed for earlier processors because it is still valid code. This allows us to test the correctness of fusion code generation by turning it on explicitly. gcc/ChangeLog: * config/rs6000/genfusion.pl: New script to generate define_insn_and_split patterns so combine can arrange fused instructions next to each other. * config/rs6000/fusion.md: New file, generated fused instruction patterns for combine. * config/rs6000/predicates.md (const_m1_to_1_operand): New predicate. (non_update_memory_operand): New predicate. * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and POWERPC_MASKS. * config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add prototype. * config/rs6000/rs6000.c (rs6000_option_override_internal): Automatically set OPTION_MASK_P10_FUSION and OPTION_MASK_P10_FUSION_LD_CMPI if target is power10. (rs600_opt_masks): Allow -mpower10-fusion in function attributes. (address_is_non_pfx_d_or_x): New function. * config/rs6000/rs6000.h: Add MASK_P10_FUSION. * config/rs6000/rs6000.md: Include fusion.md. * config/rs6000/rs6000.opt: Add -mpower10-fusion and -mpower10-fusion-ld-cmpi. * config/rs6000/t-rs6000: Add dependencies involving fusion.md.	2021-01-27 12:24:59 -06:00
Jonathan Wakely	3670dbe490	libstdc++: Regenerate libstdc++ HTML docs libstdc++-v3/ChangeLog: * doc/xml/manual/status_cxx2017.xml: Replace invalid entity. * doc/html/*: Regenerate.	2021-01-27 17:53:07 +00:00
Jonathan Wright	d53a4f9b68	aarch64: Use RTL builtins for [su]mlal intrinsics Rewrite [su]mlal Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-01-26 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add [su]mlal builtin generator macros. * config/aarch64/aarch64-simd.md (aarch64_<su>mlal<mode>): Rename to... (aarch64_<su>mlal<mode>): This. config/aarch64/arm_neon.h (vmlal_s8): Use RTL builtin instead of inline asm. (vmlal_s16): Likewise. (vmlal_s32): Likewise. (vmlal_u8): Likewise. (vmlal_u16): Likewise. (vmlal_u32): Likewise.	2021-01-27 17:44:43 +00:00
Jonathan Wakely	c31a633e13	libstdc++: Use printf to print control characters Bash and GNU echo do not interpret backslash escapes by default, so use printf when printing \n or \t in strings. libstdc++-v3/ChangeLog: * testsuite/experimental/simd/generate_makefile.sh: Use printf instead of echo when printing escape characters.	2021-01-27 16:37:27 +00:00
Matthias Kretz	02e32295b2	libstdc++: Add simd testsuite Add a new check-simd target to the testsuite. The new target creates a subdirectory, generates the necessary Makefiles, and spawns submakes to build and run the tests. Running this testsuite with defaults on my machine takes half of the time the dejagnu testsuite required to only determine whether to run tests. Since the simd testsuite integrated in dejagnu increased the time of the whole libstdc++ testsuite by ~100% this approach is a compromise for speed while not sacrificing coverage too much. Since the test driver is invoked individually per test executable from a Makefile, make's jobserver (-j) trivially parallelizes testing. Testing different flags and with simulator (or remote execution) is possible. E.g. `make check-simd DRIVEROPTS=-q target_list="unix{-m64,-m32}{-march=sandybridge,-march=skylake-avx512}{,- ffast-math}"` runs the testsuite 8 times in different subdirectories, using 8 different combinations of compiler flags, only outputs failing tests (-q), and prints all summaries at the end. It skips most ABI tags by default unless --run-expensive is passed to DRIVEROPTS or GCC_TEST_RUN_EXPENSIVE is not empty. To use a simulator, the CHECK_SIMD_CONFIG variable needs to point to a shell script which calls `define_target <name> <flags> <simulator>` and set target_list as needed. E.g.: case "$target_triplet" in x86_64-) target_list="unix{-march=sandybridge,-march=skylake-avx512} ;; powerpc64le-) define_target power8 "-static -mcpu=power8" "/usr/bin/qemu-ppc64le -cpu power8" define_target power9 -mcpu=power9 "$HOME/bin/run_on_gcc135" target_list="power8 power9{,-ffast-math}" ;; esac libstdc++-v3/ChangeLog: * scripts/check_simd: New file. This script is called from the the check-simd target. It determines a set of compiler flags and simulator setups for calling generate_makefile.sh and passes the information back to the check-simd target, which recurses to the generated Makefiles. * scripts/create_testsuite_files: Remove files below simd/tests/ from testsuite_files and place them in testsuite_files_simd. * testsuite/Makefile.am: Add testsuite_files_simd. Add check-simd target. * testsuite/Makefile.in: Regenerate. * testsuite/experimental/simd/driver.sh: New file. This script compiles and runs a given simd test, logging its output and status. It uses the timeout command to implement compile and test timeouts. * testsuite/experimental/simd/generate_makefile.sh: New file. This script generates a Makefile which uses driver.sh to compile and run the tests and collect the logs into a single log file. * testsuite/experimental/simd/tests/abs.cc: New file. Tests abs(simd). * testsuite/experimental/simd/tests/algorithms.cc: New file. Tests min/max(simd, simd). * testsuite/experimental/simd/tests/bits/conversions.h: New file. Contains functions to support tests involving conversions. * testsuite/experimental/simd/tests/bits/make_vec.h: New file. Support functions make_mask and make_vec. * testsuite/experimental/simd/tests/bits/mathreference.h: New file. Support functions to supply precomputed math function reference data. * testsuite/experimental/simd/tests/bits/metahelpers.h: New file. Support code for SFINAE testing. * testsuite/experimental/simd/tests/bits/simd_view.h: New file. * testsuite/experimental/simd/tests/bits/test_values.h: New file. Test functions to easily drive a test with simd objects initialized from a given list of values and a range of random values. * testsuite/experimental/simd/tests/bits/ulp.h: New file. Support code to determine the ULP distance of simd objects. * testsuite/experimental/simd/tests/bits/verify.h: New file. Test framework for COMPARE'ing simd objects and instantiating the test templates with value_type and ABI tag. * testsuite/experimental/simd/tests/broadcast.cc: New file. Test simd broadcasts. * testsuite/experimental/simd/tests/casts.cc: New file. Test simd casts. * testsuite/experimental/simd/tests/fpclassify.cc: New file. Test floating-point classification functions. * testsuite/experimental/simd/tests/frexp.cc: New file. Test frexp(simd). * testsuite/experimental/simd/tests/generator.cc: New file. Test simd generator constructor. * testsuite/experimental/simd/tests/hypot3_fma.cc: New file. Test 3-arg hypot(simd,simd,simd) and fma(simd,simd,sim). * testsuite/experimental/simd/tests/integer_operators.cc: New file. Test integer operators. * testsuite/experimental/simd/tests/ldexp_scalbn_scalbln_modf.cc: New file. Test ldexp(simd), scalbn(simd), scalbln(simd), and modf(simd). * testsuite/experimental/simd/tests/loadstore.cc: New file. Test (converting) simd loads and stores. * testsuite/experimental/simd/tests/logarithm.cc: New file. Test log(simd). testsuite/experimental/simd/tests/mask_broadcast.cc: New file. Test simd_mask broadcasts. * testsuite/experimental/simd/tests/mask_conversions.cc: New file. Test simd_mask conversions. * testsuite/experimental/simd/tests/mask_implicit_cvt.cc: New file. Test simd_mask implicit conversions. * testsuite/experimental/simd/tests/mask_loadstore.cc: New file. Test simd_mask loads and stores. * testsuite/experimental/simd/tests/mask_operator_cvt.cc: New file. Test simd_mask operators convert as specified. * testsuite/experimental/simd/tests/mask_operators.cc: New file. Test simd_mask compares, subscripts, and negation. * testsuite/experimental/simd/tests/mask_reductions.cc: New file. Test simd_mask reductions. * testsuite/experimental/simd/tests/math_1arg.cc: New file. Test 1-arg math functions on simd. * testsuite/experimental/simd/tests/math_2arg.cc: New file. Test 2-arg math functions on simd. * testsuite/experimental/simd/tests/operator_cvt.cc: New file. Test implicit conversions on simd binary operators behave as specified. * testsuite/experimental/simd/tests/operators.cc: New file. Test simd compares, subscripts, not, unary minus, plus, minus, multiplies, divides, increment, and decrement. * testsuite/experimental/simd/tests/reductions.cc: New file. Test reduce(simd). * testsuite/experimental/simd/tests/remqo.cc: New file. Test remqo(simd). * testsuite/experimental/simd/tests/simd.cc: New file. Basic sanity checks of simd types. * testsuite/experimental/simd/tests/sincos.cc: New file. Test sin(simd) and cos(simd). * testsuite/experimental/simd/tests/split_concat.cc: New file. Test split(simd) and concat(simd, simd). * testsuite/experimental/simd/tests/splits.cc: New file. Test split(simd_mask). * testsuite/experimental/simd/tests/trigonometric.cc: New file. Test remaining trigonometric functions on simd. * testsuite/experimental/simd/tests/trunc_ceil_floor.cc: New file. Test trunc(simd), ceil(simd), and floor(simd). * testsuite/experimental/simd/tests/where.cc: New file. Test masked operations using where.	2021-01-27 16:37:26 +00:00
Matthias Kretz	2bcceb6fc5	libstdc++: Add std::experimental::simd from the Parallelism TS 2 Adds <experimental/simd>. This implements the simd and simd_mask class templates via [[gnu::vector_size(N)]] data members. It implements overloads for all of <cmath> for simd. Explicit vectorization of the <cmath> functions is not finished. The majority of functions are marked as [[gnu::always_inline]] to enable quasi-ODR-conforming linking of TUs with different -m flags. Performance optimization was done for x86_64. ARM, Aarch64, and POWER rely on the compiler to recognize reduction, conversion, and shuffle patterns. Besides verification using many different machine flages, the code was also verified with different fast-math flags. libstdc++-v3/ChangeLog: * doc/xml/manual/status_cxx2017.xml: Add implementation status of the Parallelism TS 2. Document implementation-defined types and behavior. * include/Makefile.am: Add new headers. * include/Makefile.in: Regenerate. * include/experimental/simd: New file. New header for Parallelism TS 2. * include/experimental/bits/numeric_traits.h: New file. Implementation of P1841R1 using internal naming. Addition of missing IEC559 functionality query. * include/experimental/bits/simd.h: New file. Definition of the public simd interfaces and general implementation helpers. * include/experimental/bits/simd_builtin.h: New file. Implementation of the _VecBuiltin simd_abi. * include/experimental/bits/simd_converter.h: New file. Generic simd conversions. * include/experimental/bits/simd_detail.h: New file. Internal macros for the simd implementation. * include/experimental/bits/simd_fixed_size.h: New file. Simd fixed_size ABI specific implementations. * include/experimental/bits/simd_math.h: New file. Math overloads for simd. * include/experimental/bits/simd_neon.h: New file. Simd NEON specific implementations. * include/experimental/bits/simd_ppc.h: New file. Implement bit shifts to avoid invalid results for integral types smaller than int. * include/experimental/bits/simd_scalar.h: New file. Simd scalar ABI specific implementations. * include/experimental/bits/simd_x86.h: New file. Simd x86 specific implementations. * include/experimental/bits/simd_x86_conversions.h: New file. x86 specific conversion optimizations. The conversion patterns work around missing conversion patterns in the compiler and should be removed as soon as PR85048 is resolved. * testsuite/experimental/simd/standard_abi_usable.cc: New file. Test that all (not all fixed_size<N>, though) standard simd and simd_mask types are usable. * testsuite/experimental/simd/standard_abi_usable_2.cc: New file. As above but with -ffast-math. * testsuite/libstdc++-dg/conformance.exp: Don't build simd tests from the standard test loop. Instead use check_vect_support_and_set_flags to build simd tests with the relevant machine flags.	2021-01-27 16:37:26 +00:00
Richard Biener	c91db798ec	tree-optimization/98854 - avoid some PHI BB vectorization This avoids cases of PHI node vectorization that just causes us to insert vector CTORs inside loops for values only required outside of the loop. 2021-01-27 Richard Biener <rguenther@suse.de> PR tree-optimization/98854 * tree-vect-slp.c (vect_build_slp_tree_2): Also build PHIs from scalars when the number of CTORs matches the number of children. * gcc.dg/vect/bb-slp-pr98854.c: New testcase.	2021-01-27 17:33:34 +01:00
Jonathan Wright	3fd10728cb	aarch64: Use RTL builtins for integer mls_n intrinsics Rewrite integer mls_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-01-15 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add mls_n builtin generator macro. * config/aarch64/aarch64-simd.md (aarch64_mls_elt_merge<mode>): Rename to... (aarch64_mls_n<mode>): This. config/aarch64/arm_neon.h (vmls_n_s16): Use RTL builtin instead of asm. (vmls_n_s32): Likewise. (vmls_n_u16): Likewise. (vmls_n_u32): Likewise. (vmlsq_n_s16): Likewise. (vmlsq_n_s32): Likewise. (vmlsq_n_u16): Likewise. (vmlsq_n_u32): Likewise.	2021-01-27 15:55:55 +00:00
Jonathan Wright	d2201ac0df	aarch64: Use RTL builtins for integer mls intrinsics Rewrite integer mls Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/Changelog: 2021-01-11 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add mls builtin generator macro. * config/aarch64/arm_neon.h (vmls_s8): Use RTL builtin rather than asm. (vmls_s16): Likewise. (vmls_s32): Likewise. (vmls_u8): Likewise. (vmls_u16): Likewise. (vmls_u32): Likewise. (vmlsq_s8): Likewise. (vmlsq_s16): Likewise. (vmlsq_s32): Likewise. (vmlsq_u8): Likewise. (vmlsq_u16): Likewise. (vmlsq_u32): Likewise.	2021-01-27 14:59:08 +00:00
Jonathan Wakely	a199da782f	libstdc++: Optimize std::string_view::find [PR 66414] This reuses the code from std::string::find, which was improved by r244225, but string_view was not changed to match. libstdc++-v3/ChangeLog: PR libstdc++/66414 * include/bits/string_view.tcc (basic_string_view::find(const CharT*, size_type, size_type)): Optimize.	2021-01-27 13:45:52 +00:00
Jonathan Wright	9d66505a5d	aarch64: Use RTL builtins for integer mla_n intrinsics Rewrite integer mla_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-01-15 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add mla_n builtin generator macro. * config/aarch64/aarch64-simd.md (aarch64_mla_elt_merge<mode>): Rename to... (aarch64_mla_n<mode>): This. config/aarch64/arm_neon.h (vmla_n_s16): Use RTL builtin instead of asm. (vmla_n_s32): Likewise. (vmla_n_u16): Likewise. (vmla_n_u32): Likewise. (vmlaq_n_s16): Likewise. (vmlaq_n_s32): Likewise. (vmlaq_n_u16): Likewise. (vmlaq_n_u32): Likewise.	2021-01-27 12:44:49 +00:00

1 2 3 4 5 ...

183131 Commits