OpenE2K/gcc - gcc - Expired Mentality Git

Commit Graph

Author	SHA1	Message	Date
Jakub Jelinek	463d910876	widening_mul, i386: Improve spaceship expansion on x86 [PR103973] C++20: #include <compare> auto cmp4way(double a, double b) { return a <=> b; } expands to: ucomisd %xmm1, %xmm0 jp .L8 movl $0, %eax jne .L8 .L2: ret .p2align 4,,10 .p2align 3 .L8: comisd %xmm0, %xmm1 movl $-1, %eax ja .L2 ucomisd %xmm1, %xmm0 setbe %al addl $1, %eax ret That is 3 comparisons of the same operands. The following patch improves it to just one comparison: comisd %xmm1, %xmm0 jp .L4 seta %al movl $0, %edx leal -1(%rax,%rax), %eax cmove %edx, %eax ret .L4: movl $2, %eax ret While a <=> b expands to a == b ? 0 : a < b ? -1 : a > b ? 1 : 2 where the first comparison is equality and this shouldn't raise exceptions on qNaN operands, if the operands aren't equal (which includes unordered cases), then it immediately performs < or > comparison and that raises exceptions even on qNaNs, so we can just perform a single comparison that raises exceptions on qNaN. As the 4 different cases are encoded as ZF CF PF 1 1 1 a unordered b 0 0 0 a > b 0 1 0 a < b 1 0 0 a == b we can emit optimal sequence of comparions, first jp for the unordered case, then je for the == case and finally jb for the < case. The patch pattern recognizes spaceship-like comparisons during widening_mul if the spaceship optab is implemented, and replaces those comparisons with comparisons of .SPACESHIP ifn which returns -1/0/1/2 based on the comparison. This seems to work well both for the case of just returning the -1/0/1/2 (when we have just a common successor with a PHI) or when the different cases are handled with various other basic blocks. The testcases cover both of those cases, the latter with different function calls in those. 2022-01-17 Jakub Jelinek <jakub@redhat.com> PR target/103973 * tree-cfg.h (cond_only_block_p): Declare. * tree-ssa-phiopt.c (cond_only_block_p): Move function to ... * tree-cfg.c (cond_only_block_p): ... here. No longer static. * optabs.def (spaceship_optab): New optab. * internal-fn.def (SPACESHIP): New internal function. * internal-fn.h (expand_SPACESHIP): Declare. * internal-fn.c (expand_PHI): Formatting fix. (expand_SPACESHIP): New function. * tree-ssa-math-opts.c (optimize_spaceship): New function. (math_opts_dom_walker::after_dom_children): Use it. * config/i386/i386.md (spaceship<mode>3): New define_expand. * config/i386/i386-protos.h (ix86_expand_fp_spaceship): Declare. * config/i386/i386-expand.c (ix86_expand_fp_spaceship): New function. * doc/md.texi (spaceship@var{m}3): Document. * gcc.target/i386/pr103973-1.c: New test. * gcc.target/i386/pr103973-2.c: New test. * gcc.target/i386/pr103973-3.c: New test. * gcc.target/i386/pr103973-4.c: New test. * gcc.target/i386/pr103973-5.c: New test. * gcc.target/i386/pr103973-6.c: New test. * gcc.target/i386/pr103973-7.c: New test. * gcc.target/i386/pr103973-8.c: New test. * gcc.target/i386/pr103973-9.c: New test. * gcc.target/i386/pr103973-10.c: New test. * gcc.target/i386/pr103973-11.c: New test. * gcc.target/i386/pr103973-12.c: New test. * gcc.target/i386/pr103973-13.c: New test. * gcc.target/i386/pr103973-14.c: New test. * gcc.target/i386/pr103973-15.c: New test. * gcc.target/i386/pr103973-16.c: New test. * gcc.target/i386/pr103973-17.c: New test. * gcc.target/i386/pr103973-18.c: New test. * gcc.target/i386/pr103973-19.c: New test. * gcc.target/i386/pr103973-20.c: New test. * g++.target/i386/pr103973-1.C: New test. * g++.target/i386/pr103973-2.C: New test. * g++.target/i386/pr103973-3.C: New test. * g++.target/i386/pr103973-4.C: New test. * g++.target/i386/pr103973-5.C: New test. * g++.target/i386/pr103973-6.C: New test. * g++.target/i386/pr103973-7.C: New test. * g++.target/i386/pr103973-8.C: New test. * g++.target/i386/pr103973-9.C: New test. * g++.target/i386/pr103973-10.C: New test. * g++.target/i386/pr103973-11.C: New test. * g++.target/i386/pr103973-12.C: New test. * g++.target/i386/pr103973-13.C: New test. * g++.target/i386/pr103973-14.C: New test. * g++.target/i386/pr103973-15.C: New test. * g++.target/i386/pr103973-16.C: New test. * g++.target/i386/pr103973-17.C: New test. * g++.target/i386/pr103973-18.C: New test. * g++.target/i386/pr103973-19.C: New test. * g++.target/i386/pr103973-20.C: New test.	2022-01-17 13:39:05 +01:00
Jakub Jelinek	6362627b27	i386, fab: Optimize __atomic_{add,sub,and,or,xor}_fetch (x, y, z) {==,!=,<,<=,>,>=} 0 [PR98737] On Wed, Jan 27, 2021 at 12:27:13PM +0100, Ulrich Drepper via Gcc-patches wrote: > On 1/27/21 11:37 AM, Jakub Jelinek wrote: > > Would equality comparison against 0 handle the most common cases. > > > > The user can write it as > > __atomic_sub_fetch (x, y, z) == 0 > > or > > __atomic_fetch_sub (x, y, z) - y == 0 > > thouch, so the expansion code would need to be able to cope with both. > > Please also keep !=0, <0, <=0, >0, and >=0 in mind. They all can be > useful and can be handled with the flags. <= 0 and > 0 don't really work well with lock {add,sub,inc,dec}, x86 doesn't have comparisons that would look solely at both SF and ZF and not at other flags (and emitting two separate conditional jumps or two setcc insns and oring them together looks awful). But the rest can work. Here is a patch that adds internal functions and optabs for these, recognizes them at the same spot as e.g. .ATOMIC_BIT_TEST_AND* internal functions (fold all builtins pass) and expands them appropriately (or for the <= 0 and > 0 cases of +/- FAILs and let's middle-end fall back). So far I have handled just the op_fetch builtins, IMHO instead of handling also __atomic_fetch_sub (x, y, z) - y == 0 etc. we should canonicalize __atomic_fetch_sub (x, y, z) - y to __atomic_sub_fetch (x, y, z) (and vice versa). 2022-01-03 Jakub Jelinek <jakub@redhat.com> PR target/98737 * internal-fn.def (ATOMIC_ADD_FETCH_CMP_0, ATOMIC_SUB_FETCH_CMP_0, ATOMIC_AND_FETCH_CMP_0, ATOMIC_OR_FETCH_CMP_0, ATOMIC_XOR_FETCH_CMP_0): New internal fns. * internal-fn.h (ATOMIC_OP_FETCH_CMP_0_EQ, ATOMIC_OP_FETCH_CMP_0_NE, ATOMIC_OP_FETCH_CMP_0_LT, ATOMIC_OP_FETCH_CMP_0_LE, ATOMIC_OP_FETCH_CMP_0_GT, ATOMIC_OP_FETCH_CMP_0_GE): New enumerators. * internal-fn.c (expand_ATOMIC_ADD_FETCH_CMP_0, expand_ATOMIC_SUB_FETCH_CMP_0, expand_ATOMIC_AND_FETCH_CMP_0, expand_ATOMIC_OR_FETCH_CMP_0, expand_ATOMIC_XOR_FETCH_CMP_0): New functions. * optabs.def (atomic_add_fetch_cmp_0_optab, atomic_sub_fetch_cmp_0_optab, atomic_and_fetch_cmp_0_optab, atomic_or_fetch_cmp_0_optab, atomic_xor_fetch_cmp_0_optab): New direct optabs. * builtins.h (expand_ifn_atomic_op_fetch_cmp_0): Declare. * builtins.c (expand_ifn_atomic_op_fetch_cmp_0): New function. * tree-ssa-ccp.c: Include internal-fn.h. (optimize_atomic_bit_test_and): Add . before internal fn call in function comment. Change return type from void to bool and return true only if successfully replaced. (optimize_atomic_op_fetch_cmp_0): New function. (pass_fold_builtins::execute): Use optimize_atomic_op_fetch_cmp_0 for BUILT_IN_ATOMIC_{ADD,SUB,AND,OR,XOR}_FETCH_{1,2,4,8,16} and BUILT_IN_SYNC_{ADD,SUB,AND,OR,XOR}_AND_FETCH_{1,2,4,8,16}, for XOR ones only if optimize_atomic_bit_test_and failed. * config/i386/sync.md (atomic_<plusminus_mnemonic>_fetch_cmp_0<mode>, atomic_<logic>_fetch_cmp_0<mode>): New define_expand patterns. (atomic_add_fetch_cmp_0<mode>_1, atomic_sub_fetch_cmp_0<mode>_1, atomic_<logic>_fetch_cmp_0<mode>_1): New define_insn patterns. * doc/md.texi (atomic_add_fetch_cmp_0<mode>, atomic_sub_fetch_cmp_0<mode>, atomic_and_fetch_cmp_0<mode>, atomic_or_fetch_cmp_0<mode>, atomic_xor_fetch_cmp_0<mode>): Document new named patterns. * gcc.target/i386/pr98737-1.c: New test. * gcc.target/i386/pr98737-2.c: New test. * gcc.target/i386/pr98737-3.c: New test. * gcc.target/i386/pr98737-4.c: New test. * gcc.target/i386/pr98737-5.c: New test. * gcc.target/i386/pr98737-6.c: New test. * gcc.target/i386/pr98737-7.c: New test.	2022-01-03 14:17:26 +01:00
Jakub Jelinek	7adcbafe45	Update copyright years.	2022-01-03 10:42:10 +01:00
Richard Sandiford	e32b9eb32d	vect: Add support for fmax and fmin reductions This patch adds support for reductions involving calls to fmax() and fmin(), without the -ffast-math flags that allow them to be converted to MAX_EXPR and MIN_EXPR. gcc/ * doc/md.texi (reduc_fmin_scal_@var{m}): Document. (reduc_fmax_scal_@var{m}): Likewise. * optabs.def (reduc_fmax_scal_optab): New optab. (reduc_fmin_scal_optab): Likewise * internal-fn.def (REDUC_FMAX, REDUC_FMIN): New functions. * tree-vect-loop.c (reduction_fn_for_scalar_code): Handle CASE_CFN_FMAX and CASE_CFN_FMIN. (neutral_op_for_reduction): Likewise. (needs_fold_left_reduction_p): Likewise. * config/aarch64/iterators.md (FMAXMINV): New iterator. (fmaxmin): Handle UNSPEC_FMAXNMV and UNSPEC_FMINNMV. * config/aarch64/aarch64-simd.md (reduc_<optab>_scal_<mode>): Fix unspec mode. (reduc_<fmaxmin>_scal_<mode>): New pattern. * config/aarch64/aarch64-sve.md (reduc_<fmaxmin>_scal_<mode>): Likewise. gcc/testsuite/ * gcc.dg/vect/vect-fmax-1.c: New test. * gcc.dg/vect/vect-fmax-2.c: Likewise. * gcc.dg/vect/vect-fmax-3.c: Likewise. * gcc.dg/vect/vect-fmin-1.c: New test. * gcc.dg/vect/vect-fmin-2.c: Likewise. * gcc.dg/vect/vect-fmin-3.c: Likewise. * gcc.target/aarch64/fmaxnm_1.c: Likewise. * gcc.target/aarch64/fmaxnm_2.c: Likewise. * gcc.target/aarch64/fminnm_1.c: Likewise. * gcc.target/aarch64/fminnm_2.c: Likewise. * gcc.target/aarch64/sve/fmaxnm_2.c: Likewise. * gcc.target/aarch64/sve/fmaxnm_3.c: Likewise. * gcc.target/aarch64/sve/fminnm_2.c: Likewise. * gcc.target/aarch64/sve/fminnm_3.c: Likewise.	2021-11-30 09:52:25 +00:00
Richard Sandiford	7061300025	Add IFN_COND_FMIN/FMAX functions This patch adds conditional forms of FMAX and FMIN, following the pattern for existing conditional binary functions. gcc/ * doc/md.texi (cond_fmin@var{mode}, cond_fmax@var{mode}): Document. * optabs.def (cond_fmin_optab, cond_fmax_optab): New optabs. * internal-fn.def (COND_FMIN, COND_FMAX): New functions. * internal-fn.c (first_commutative_argument): Handle them. (FOR_EACH_COND_FN_PAIR): Likewise. * match.pd (UNCOND_BINARY, COND_BINARY): Likewise. * config/aarch64/aarch64-sve.md (cond_<fmaxmin><mode>): New pattern. gcc/testsuite/ * gcc.target/aarch64/sve/cond_fmaxnm_5.c: New test. * gcc.target/aarch64/sve/cond_fmaxnm_5_run.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_6.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_6_run.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_7.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_7_run.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_8.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_8_run.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_5.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_5_run.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_6.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_6_run.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_7.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_7_run.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_8.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_8_run.c: Likewise.	2021-11-17 12:28:44 +00:00
prathamesh.kulkarni	20dcda98ed	[sve] PR93183 - Add support for conditional neg. gcc/ChangeLog: PR target/93183 * gimple-match-head.c (try_conditional_simplification): Add case for single operand. * internal-fn.def: Add entry for COND_NEG internal function. * internal-fn.c (FOR_EACH_CODE_MAPPING): Add entry for NEGATE_EXPR, COND_NEG mapping. * optabs.def: Add entry for cond_neg_optab. * match.pd (UNCOND_UNARY, COND_UNARY): New operator lists. (vec_cond COND (foo A) B) -> (IFN_COND_FOO COND A B): New pattern. (vec_cond COND B (foo A)) -> (IFN_COND_FOO ~COND A B): Likewise. gcc/testsuite/ChangeLog: PR target/93183 * gcc.target/aarch64/sve/cond_unary_4.c: Adjust. * gcc.target/aarch64/sve/pr93183.c: New test.	2021-10-18 15:44:06 +05:30
Stefan Schulze Frielinghaus	6f966f0614	ldist: Recognize strlen and rawmemchr like loops This patch adds support for recognizing loops which mimic the behaviour of functions strlen and rawmemchr, and replaces those with internal function calls in case a target provides them. In contrast to the standard strlen and rawmemchr functions, this patch also supports different instances where the memory pointed to is interpreted as 8, 16, and 32-bit sized, respectively. gcc/ChangeLog: * builtins.c (get_memory_rtx): Change to external linkage. * builtins.h (get_memory_rtx): Add function prototype. * doc/md.texi (rawmemchr<mode>): Document. * internal-fn.c (expand_RAWMEMCHR): Define. * internal-fn.def (RAWMEMCHR): Add. * optabs.def (rawmemchr_optab): Add. * tree-loop-distribution.c (find_single_drs): Change return code behaviour by also returning true if no single store was found but a single load. (loop_distribution::classify_partition): Respect the new return code behaviour of function find_single_drs. (loop_distribution::execute): Call new function transform_reduction_loop in order to replace rawmemchr or strlen like loops by calls into builtins. (generate_reduction_builtin_1): New function. (generate_rawmemchr_builtin): New function. (generate_strlen_builtin_1): New function. (generate_strlen_builtin): New function. (generate_strlen_builtin_using_rawmemchr): New function. (reduction_var_overflows_first): New function. (determine_reduction_stmt_1): New function. (determine_reduction_stmt): New function. (loop_distribution::transform_reduction_loop): New function. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ldist-rawmemchr-1.c: New test. * gcc.dg/tree-ssa/ldist-rawmemchr-2.c: New test. * gcc.dg/tree-ssa/ldist-strlen-1.c: New test. * gcc.dg/tree-ssa/ldist-strlen-2.c: New test. * gcc.dg/tree-ssa/ldist-strlen-3.c: New test.	2021-10-11 09:59:13 +02:00
qing zhao	a25e0b5e6a	Add -ftrivial-auto-var-init option and uninitialized variable attribute. Initialize automatic variables with either a pattern or with zeroes to increase the security and predictability of a program by preventing uninitialized memory disclosure and use. GCC still considers an automatic variable that doesn't have an explicit initializer as uninitialized, -Wuninitialized will still report warning messages on such automatic variables. With this option, GCC will also initialize any padding of automatic variables that have structure or union types to zeroes. You can control this behavior for a specific variable by using the variable attribute "uninitialized" to control runtime overhead. gcc/ChangeLog: 2021-09-09 qing zhao <qing.zhao@oracle.com> * builtins.c (expand_builtin_memset): Make external visible. * builtins.h (expand_builtin_memset): Declare extern. * common.opt (ftrivial-auto-var-init=): New option. * doc/extend.texi: Document the uninitialized attribute. * doc/invoke.texi: Document -ftrivial-auto-var-init. * flag-types.h (enum auto_init_type): New enumerated type auto_init_type. * gimple-fold.c (clear_padding_type): Add one new parameter. (clear_padding_union): Likewise. (clear_padding_emit_loop): Likewise. (clear_type_padding_in_mask): Likewise. (gimple_fold_builtin_clear_padding): Handle this new parameter. * gimplify.c (gimple_add_init_for_auto_var): New function. (gimple_add_padding_init_for_auto_var): New function. (is_var_need_auto_init): New function. (gimplify_decl_expr): Add initialization to automatic variables per users' requests. (gimplify_call_expr): Add one new parameter for call to __builtin_clear_padding. (gimplify_init_constructor): Add padding initialization in the end. * internal-fn.c (INIT_PATTERN_VALUE): New macro. (expand_DEFERRED_INIT): New function. * internal-fn.def (DEFERRED_INIT): New internal function. * tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT. * tree-sra.c (generate_subtree_deferred_init): New function. (scan_function): Avoid setting cannot_scalarize_away_bitmap for calls to .DEFERRED_INIT. (sra_modify_deferred_init): New function. (sra_modify_function_body): Handle calls to DEFERRED_INIT specially. * tree-ssa-structalias.c (find_func_aliases_for_call): Likewise. * tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT specially. (check_defs): Likewise. (warn_uninitialized_vars): Likewise. * tree-ssa.c (ssa_undefined_value_p): Likewise. * tree.c (build_common_builtin_nodes): Build tree node for BUILT_IN_CLEAR_PADDING when needed. gcc/c-family/ChangeLog: 2021-09-09 qing zhao <qing.zhao@oracle.com> * c-attribs.c (handle_uninitialized_attribute): New function. (c_common_attribute_table): Add "uninitialized" attribute. gcc/testsuite/ChangeLog: 2021-09-09 qing zhao <qing.zhao@oracle.com> * c-c++-common/auto-init-1.c: New test. * c-c++-common/auto-init-10.c: New test. * c-c++-common/auto-init-11.c: New test. * c-c++-common/auto-init-12.c: New test. * c-c++-common/auto-init-13.c: New test. * c-c++-common/auto-init-14.c: New test. * c-c++-common/auto-init-15.c: New test. * c-c++-common/auto-init-16.c: New test. * c-c++-common/auto-init-2.c: New test. * c-c++-common/auto-init-3.c: New test. * c-c++-common/auto-init-4.c: New test. * c-c++-common/auto-init-5.c: New test. * c-c++-common/auto-init-6.c: New test. * c-c++-common/auto-init-7.c: New test. * c-c++-common/auto-init-8.c: New test. * c-c++-common/auto-init-9.c: New test. * c-c++-common/auto-init-esra.c: New test. * c-c++-common/auto-init-padding-1.c: New test. * c-c++-common/auto-init-padding-2.c: New test. * c-c++-common/auto-init-padding-3.c: New test. * g++.dg/auto-init-uninit-pred-1_a.C: New test. * g++.dg/auto-init-uninit-pred-2_a.C: New test. * g++.dg/auto-init-uninit-pred-3_a.C: New test. * g++.dg/auto-init-uninit-pred-4.C: New test. * gcc.dg/auto-init-sra-1.c: New test. * gcc.dg/auto-init-sra-2.c: New test. * gcc.dg/auto-init-uninit-1.c: New test. * gcc.dg/auto-init-uninit-12.c: New test. * gcc.dg/auto-init-uninit-13.c: New test. * gcc.dg/auto-init-uninit-14.c: New test. * gcc.dg/auto-init-uninit-15.c: New test. * gcc.dg/auto-init-uninit-16.c: New test. * gcc.dg/auto-init-uninit-17.c: New test. * gcc.dg/auto-init-uninit-18.c: New test. * gcc.dg/auto-init-uninit-19.c: New test. * gcc.dg/auto-init-uninit-2.c: New test. * gcc.dg/auto-init-uninit-20.c: New test. * gcc.dg/auto-init-uninit-21.c: New test. * gcc.dg/auto-init-uninit-22.c: New test. * gcc.dg/auto-init-uninit-23.c: New test. * gcc.dg/auto-init-uninit-24.c: New test. * gcc.dg/auto-init-uninit-25.c: New test. * gcc.dg/auto-init-uninit-26.c: New test. * gcc.dg/auto-init-uninit-3.c: New test. * gcc.dg/auto-init-uninit-34.c: New test. * gcc.dg/auto-init-uninit-36.c: New test. * gcc.dg/auto-init-uninit-37.c: New test. * gcc.dg/auto-init-uninit-4.c: New test. * gcc.dg/auto-init-uninit-5.c: New test. * gcc.dg/auto-init-uninit-6.c: New test. * gcc.dg/auto-init-uninit-8.c: New test. * gcc.dg/auto-init-uninit-9.c: New test. * gcc.dg/auto-init-uninit-A.c: New test. * gcc.dg/auto-init-uninit-B.c: New test. * gcc.dg/auto-init-uninit-C.c: New test. * gcc.dg/auto-init-uninit-H.c: New test. * gcc.dg/auto-init-uninit-I.c: New test. * gcc.target/aarch64/auto-init-1.c: New test. * gcc.target/aarch64/auto-init-2.c: New test. * gcc.target/aarch64/auto-init-3.c: New test. * gcc.target/aarch64/auto-init-4.c: New test. * gcc.target/aarch64/auto-init-5.c: New test. * gcc.target/aarch64/auto-init-6.c: New test. * gcc.target/aarch64/auto-init-7.c: New test. * gcc.target/aarch64/auto-init-8.c: New test. * gcc.target/aarch64/auto-init-padding-1.c: New test. * gcc.target/aarch64/auto-init-padding-10.c: New test. * gcc.target/aarch64/auto-init-padding-11.c: New test. * gcc.target/aarch64/auto-init-padding-12.c: New test. * gcc.target/aarch64/auto-init-padding-2.c: New test. * gcc.target/aarch64/auto-init-padding-3.c: New test. * gcc.target/aarch64/auto-init-padding-4.c: New test. * gcc.target/aarch64/auto-init-padding-5.c: New test. * gcc.target/aarch64/auto-init-padding-6.c: New test. * gcc.target/aarch64/auto-init-padding-7.c: New test. * gcc.target/aarch64/auto-init-padding-8.c: New test. * gcc.target/aarch64/auto-init-padding-9.c: New test. * gcc.target/i386/auto-init-1.c: New test. * gcc.target/i386/auto-init-2.c: New test. * gcc.target/i386/auto-init-21.c: New test. * gcc.target/i386/auto-init-22.c: New test. * gcc.target/i386/auto-init-23.c: New test. * gcc.target/i386/auto-init-24.c: New test. * gcc.target/i386/auto-init-3.c: New test. * gcc.target/i386/auto-init-4.c: New test. * gcc.target/i386/auto-init-5.c: New test. * gcc.target/i386/auto-init-6.c: New test. * gcc.target/i386/auto-init-7.c: New test. * gcc.target/i386/auto-init-8.c: New test. * gcc.target/i386/auto-init-padding-1.c: New test. * gcc.target/i386/auto-init-padding-10.c: New test. * gcc.target/i386/auto-init-padding-11.c: New test. * gcc.target/i386/auto-init-padding-12.c: New test. * gcc.target/i386/auto-init-padding-2.c: New test. * gcc.target/i386/auto-init-padding-3.c: New test. * gcc.target/i386/auto-init-padding-4.c: New test. * gcc.target/i386/auto-init-padding-5.c: New test. * gcc.target/i386/auto-init-padding-6.c: New test. * gcc.target/i386/auto-init-padding-7.c: New test. * gcc.target/i386/auto-init-padding-8.c: New test. * gcc.target/i386/auto-init-padding-9.c: New test.	2021-09-09 15:44:49 -07:00
Kewen Lin	a1d2756077	vect: Recog mul_highpart pattern [PR100696] This patch is to extend the existing pattern mulhs handlings to cover normal multiply highpart pattern recognization, it introduces one new internal function IFN_MULH for 1:1 map to [su]mul_highpart optab. Since it covers MULT_HIGHPART_EXPR with optab support, i386 part change is to ensure it follows the consistent costing path. Bootstrapped & regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux and aarch64-linux-gnu. gcc/ChangeLog: PR tree-optimization/100696 * internal-fn.c (first_commutative_argument): Add info for IFN_MULH. * internal-fn.def (IFN_MULH): New internal function. * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to recog normal multiply highpart as IFN_MULH. * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined function CFN_MULH. gcc/testsuite/ChangeLog: PR tree-optimization/100696 * gcc.target/i386/pr100637-3w.c: Adjust for mul_highpart recog.	2021-07-19 20:49:17 -05:00
Richard Biener	7d810646d4	Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs This adds named expanders for vec_fmaddsub<mode>4 and vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and vfmsubaddXXXp{ds} instructions. This complements the previous addition of ADDSUB support. x86 lacks SUBADD and the negate variants of FMA with mixed plus minus so I did not add optabs or patterns for those but it would not be difficult if there's a target that has them. 2021-07-05 Richard Biener <rguenther@suse.de> * doc/md.texi (vec_fmaddsub<mode>4): Document. (vec_fmsubadd<mode>4): Likewise. * optabs.def (vec_fmaddsub$a4): Add. (vec_fmsubadd$a4): Likewise. * internal-fn.def (IFN_VEC_FMADDSUB): Add. (IFN_VEC_FMSUBADD): Likewise. * tree-vect-slp-patterns.c (addsub_pattern::recognize): Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD. (addsub_pattern::build): Likewise. * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB and CFN_VEC_FMSUBADD are not transparent for permutes. * config/i386/sse.md (vec_fmaddsub<mode>4): New expander. (vec_fmsubadd<mode>4): Likewise. * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase. * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise. * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise. * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.	2021-07-06 11:56:47 +02:00
Richard Biener	7a6c31f0f8	Add x86 addsub SLP pattern This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1 instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... } thus subtract, add alternating on lanes, starting with subtract. It adds a corresponding optab and direct internal function, vec_addsub$a3 and renames the existing i386 backend patterns to the new canonical name. The SLP pattern matches the exact alternating lane sequence rather than trying to be clever and anticipating incoming permutes - we could permute the two input vectors to the needed lane alternation, do the addsub and then permute the result vector back but that's only profitable in case the two input or the output permute will vanish - something Tamars refactoring of SLP pattern recog should make possible. 2021-06-17 Richard Biener <rguenther@suse.de> * config/i386/sse.md (avx_addsubv4df3): Rename to vec_addsubv4df3. (avx_addsubv8sf3): Rename to vec_addsubv8sf3. (sse3_addsubv2df3): Rename to vec_addsubv2df3. (sse3_addsubv4sf3): Rename to vec_addsubv4sf3. * config/i386/i386-builtin.def: Adjust. * internal-fn.def (VEC_ADDSUB): New internal optab fn. * optabs.def (vec_addsub_optab): New optab. * tree-vect-slp-patterns.c (class addsub_pattern): New. (slp_patterns): Add addsub_pattern. * tree-vect-slp.c (vect_optimize_slp): Disable propagation across CFN_VEC_ADDSUB. * tree-vectorizer.h (vect_pattern::vect_pattern): Make m_ops optional. * doc/md.texi (vec_addsub<mode>3): Document. * gcc.target/i386/vect-addsubv2df.c: New testcase. * gcc.target/i386/vect-addsubv4sf.c: Likewise. * gcc.target/i386/vect-addsubv4df.c: Likewise. * gcc.target/i386/vect-addsubv8sf.c: Likewise. * gcc.target/i386/vect-addsub-2.c: Likewise. * gcc.target/i386/vect-addsub-3.c: Likewise.	2021-06-24 13:08:25 +02:00
Richard Biener	ef8176e0fa	c++/88601 - [C/C++] __builtin_shufflevector support This adds support for the clang __builtin_shufflevector extension to the C and C++ frontends. The builtin is lowered to VEC_PERM_EXPR. Because VEC_PERM_EXPR does not support different sized vector inputs or result or the special permute index of -1 (don't-care) c_build_shufflevector applies lowering by widening inputs and output to the widest vector, replacing -1 by a defined index and subsetting the final vector if we produced a wider result than desired. Code generation thus can be sub-optimal, followup patches will aim to fix that by recovering from part of the missing features during RTL expansion and by relaxing the constraints of the GIMPLE IL with regard to VEC_PERM_EXPR. 2021-05-21 Richard Biener <rguenther@suse.de> PR c++/88601 gcc/c-family/ * c-common.c: Include tree-vector-builder.h and vec-perm-indices.h. (c_common_reswords): Add __builtin_shufflevector. (c_build_shufflevector): New funtion. * c-common.h (enum rid): Add RID_BUILTIN_SHUFFLEVECTOR. (c_build_shufflevector): Declare. gcc/c/ * c-decl.c (names_builtin_p): Handle RID_BUILTIN_SHUFFLEVECTOR. * c-parser.c (c_parser_postfix_expression): Likewise. gcc/cp/ * cp-objcp-common.c (names_builtin_p): Handle RID_BUILTIN_SHUFFLEVECTOR. * cp-tree.h (build_x_shufflevector): Declare. * parser.c (cp_parser_postfix_expression): Handle RID_BUILTIN_SHUFFLEVECTOR. * pt.c (tsubst_copy_and_build): Handle IFN_SHUFFLEVECTOR. * typeck.c (build_x_shufflevector): Build either a lowered VEC_PERM_EXPR or an unlowered shufflevector via a temporary internal function IFN_SHUFFLEVECTOR. gcc/ * internal-fn.c (expand_SHUFFLEVECTOR): Define. * internal-fn.def (SHUFFLEVECTOR): New. * internal-fn.h (expand_SHUFFLEVECTOR): Declare. * doc/extend.texi: Document __builtin_shufflevector. gcc/testsuite/ * c-c++-common/builtin-shufflevector-2.c: New testcase. * c-c++-common/torture/builtin-shufflevector-1.c: Likewise. * g++.dg/ext/builtin-shufflevector-1.C: Likewise. * g++.dg/ext/builtin-shufflevector-2.C: Likewise.	2021-05-31 08:46:04 +02:00
Tamar Christina	478e571a3e	slp: support complex FMS and complex FMS conjugate This adds support for FMS and FMS conjugated to the slp pattern matcher. Example of matches: #include <stdio.h> #include <complex.h> #define N 200 #define ROT #define TYPE float #define TYPE2 float void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] -= a[i] * (b[i] ROT); } } void g_f1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] -= conjf (a[i]) * (b[i]); } } void g_s1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] -= a[i] * conjf (b[i] ROT); } } void caxpy_sub(double complex * restrict y, double complex * restrict x, size_t N, double complex f) { for (size_t i = 0; i < N; ++i) y[i] -= x[i]* f; } gcc/ChangeLog: * internal-fn.def (COMPLEX_FMS, COMPLEX_FMS_CONJ): New. * optabs.def (cmls_optab, cmls_conj_optab): New. * doc/md.texi: Document them. * tree-vect-slp-patterns.c (class complex_fms_pattern, complex_fms_pattern::matches, complex_fms_pattern::recognize, complex_fms_pattern::build): New.	2021-01-14 20:59:12 +00:00
Tamar Christina	31fac31800	slp: support complex FMA and complex FMA conjugate This adds support for FMA and FMA conjugated to the slp pattern matcher. Example of instructions matched: #include <stdio.h> #include <complex.h> #define N 200 #define ROT #define TYPE float #define TYPE2 float void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] += a[i] * (b[i] ROT); } } void g_f1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] += conjf (a[i]) * (b[i] ROT); } } void g_s1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] += a[i] * conjf (b[i] ROT); } } void caxpy_add(double complex * restrict y, double complex * restrict x, size_t N, double complex f) { for (size_t i = 0; i < N; ++i) y[i] += x[i]* f; } gcc/ChangeLog: * internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ): New. * optabs.def (cmla_optab, cmla_conj_optab): New. * doc/md.texi: Document them. * tree-vect-slp-patterns.c (vect_match_call_p, class complex_fma_pattern, vect_slp_reset_pattern, complex_fma_pattern::matches, complex_fma_pattern::recognize, complex_fma_pattern::build): New.	2021-01-14 20:58:12 +00:00
Tamar Christina	e09173d84d	slp: support complex multiply and complex multiply conjugate This adds support for complex multiply and complex multiply and accumulate to the vect pattern detector. Example of instructions matched: #include <stdio.h> #include <complex.h> #define N 200 #define ROT #define TYPE float #define TYPE2 float void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] = a[i] * (b[i] ROT); } } void g_f1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] = conjf (a[i]) * (b[i] ROT); } } void g_s1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] = a[i] * conjf (b[i] ROT); } } gcc/ChangeLog: * internal-fn.def (COMPLEX_MUL, COMPLEX_MUL_CONJ): New. * optabs.def (cmul_optab, cmul_conj_optab): New. * doc/md.texi: Document them. * tree-vect-slp-patterns.c (vect_match_call_complex_mla, vect_normalize_conj_loc, is_eq_or_top, vect_validate_multiplication, vect_build_combine_node, class complex_mul_pattern, complex_mul_pattern::matches, complex_mul_pattern::recognize, complex_mul_pattern::build): New.	2021-01-14 20:57:17 +00:00
Richard Sandiford	298e76e656	gimple-isel: Check whether IFN_VCONDEQ is supported [PR98560] This patch follows on from the previous one for the PR and makes sure that we can handle == as well as <. Previously we assumed without checking that IFN_VCONDEQ was available if IFN_VCOND or IFN_VCONDU wasn't. The patch also fixes the definition of the IFN_VCOND* functions. The optabs are convert optabs in which the first mode is the data mode and the second mode is the comparison or mask mode. gcc/ PR tree-optimization/98560 * internal-fn.def (IFN_VCONDU, IFN_VCONDEQ): Use type vec_cond. * internal-fn.c (vec_cond_mask_direct): Get the data mode from argument 1. (vec_cond_direct): Likewise argument 2. (vec_condu_direct, vec_condeq_direct): Delete. (expand_vect_cond_optab_fn): Rename to... (expand_vec_cond_optab_fn): ...this, replacing old macro. (expand_vec_condu_optab_fn, expand_vec_condeq_optab_fn): Delete. (expand_vect_cond_mask_optab_fn): Rename to... (expand_vec_cond_mask_optab_fn): ...this, replacing old macro. (direct_vec_cond_mask_optab_supported_p): Treat the optab as a convert optab. (direct_vec_cond_optab_supported_p): Likewise. (direct_vec_condu_optab_supported_p): Delete. (direct_vec_condeq_optab_supported_p): Delete. * gimple-isel.cc: Include internal-fn.h. (gimple_expand_vec_cond_expr): Check that IFN_VCONDEQ is supported before using it. gcc/testsuite/ PR tree-optimization/98560 * gcc.dg/vect/pr98560-2.c: New test.	2021-01-07 15:00:39 +00:00
Jakub Jelinek	99dee82307	Update copyright years.	2021-01-04 10:26:59 +01:00
Tamar Christina	3ed472af6b	middle-end: Support complex Addition This patch adds support for * Complex Addition with rotation of 90 and 270. Addition with rotation of the second argument around the Argand plane. Supported rotations are 90 and 180. c = a + (b * I) and c = a + (b * I * I * I) gcc/ChangeLog: * tree-vect-slp-patterns.c: New file. * Makefile.in: Add it. * doc/passes.texi: Document it. * internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New. * optabs.def (cadd90_optab, cadd270_optab): New. * doc/md.texi: Document them. * tree-vect-loop.c (vect_analyze_loop_2): Add dissolve code. * tree-vect-slp.c: (vect_free_slp_instance, vect_create_new_slp_node): Export. (vect_match_slp_patterns_2, vect_match_slp_patterns): New. (vect_analyze_slp): Use it. * tree-vectorizer.h (vect_free_slp_tree): Export. (enum _complex_operation): Forward declare. (class vect_pattern): New gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_arm_v8_3a_complex_neon_ok_nocache): Fix it. (check_effective_target_vect_complex_add_byte ,check_effective_target_vect_complex_add_int ,check_effective_target_vect_complex_add_short ,check_effective_target_vect_complex_add_long ,check_effective_target_vect_complex_add_half ,check_effective_target_vect_complex_add_float ,check_effective_target_vect_complex_add_double): New. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c: New test. * gcc.dg/vect/complex/complex-add-pattern-template.c: New test. * gcc.dg/vect/complex/complex-add-template.c: New test. * gcc.dg/vect/complex/complex-operations-run.c: New test. * gcc.dg/vect/complex/complex-operations.c: New test. * gcc.dg/vect/complex/complex.exp: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-double.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-half-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-short.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c: New test.	2020-12-13 14:09:11 +00:00
Matthew Malcomson	93a7325148	libsanitizer: Add hwasan pass and associated gimple changes There are four main features to this change: 1) Check pointer tags match address tags. When sanitizing for hwasan we now put HWASAN_CHECK internal functions before memory accesses in the `asan` pass. This checks that a tag in the pointer being used match the tag stored in shadow memory for the memory region being used. These internal functions are expanded into actual checks in the sanopt pass that happens just before expansion into RTL. We use the same mechanism that currently inserts ASAN_CHECK internal functions to insert the new HWASAN_CHECK functions. 2) Instrument known builtin function calls. Handle all builtin functions that we know use memory accesses. This commit uses the machinery added for ASAN to identify builtin functions that access memory. The main differences between the approaches for HWASAN and ASAN are: - libhwasan intercepts much less builtin functions. - Alloca needs to be transformed differently (instead of adding redzones it needs to tag shadow memory and return a tagged pointer). - stack_restore needs to untag the shadow stack between the current position and where it's going. - `noreturn` functions can not be handled by simply unpoisoning the entire shadow stack -- there is no "always valid" tag. (exceptions and things such as longjmp need to be handled in a different way, usually in the runtime). For hardware implemented checking (such as AArch64's memory tagging extension) alloca and stack_restore will need to be handled by hooks in the backend rather than transformation at the gimple level. This will allow architecture specific handling of such stack modifications. 3) Introduce HWASAN block-scope poisoning Here we use exactly the same mechanism as ASAN_MARK to poison/unpoison variables on entry/exit of a block. In order to simply use the exact same machinery we're using the same internal functions until the SANOPT pass. This means that all handling of ASAN_MARK is the same. This has the negative that the naming may be a little confusing, but a positive that handling of the internal function doesn't have to be duplicated for a function that behaves exactly the same but has a different name. gcc/ChangeLog: * asan.c (asan_instrument_reads): New. (asan_instrument_writes): New. (asan_memintrin): New. (handle_builtin_stack_restore): Account for HWASAN. (handle_builtin_alloca): Account for HWASAN. (get_mem_refs_of_builtin_call): Special case strlen for HWASAN. (hwasan_instrument_reads): New. (hwasan_instrument_writes): New. (hwasan_memintrin): New. (report_error_func): Assert not HWASAN. (build_check_stmt): Make HWASAN_CHECK instead of ASAN_CHECK. (instrument_derefs): HWASAN does not tag globals. (instrument_builtin_call): Use new helper functions. (maybe_instrument_call): Don't instrument `noreturn` functions. (initialize_sanitizer_builtins): Add new type. (asan_expand_mark_ifn): Account for HWASAN. (asan_expand_check_ifn): Assert never called by HWASAN. (asan_expand_poison_ifn): Account for HWASAN. (asan_instrument): Branch based on whether using HWASAN or ASAN. (pass_asan::gate): Return true if sanitizing HWASAN. (pass_asan_O0::gate): Return true if sanitizing HWASAN. (hwasan_check_func): New. (hwasan_expand_check_ifn): New. (hwasan_expand_mark_ifn): New. (gate_hwasan): New. * asan.h (hwasan_expand_check_ifn): New decl. (hwasan_expand_mark_ifn): New decl. (gate_hwasan): New decl. (asan_intercepted_p): Always false for hwasan. (asan_sanitize_use_after_scope): Account for HWASAN. * builtin-types.def (BT_FN_PTR_CONST_PTR_UINT8): New. * gimple-fold.c (gimple_build): New overload for building function calls without arguments. (gimple_build_round_up): New. * gimple-fold.h (gimple_build): New decl. (gimple_build): New inline function. (gimple_build_round_up): New decl. (gimple_build_round_up): New inline function. * gimple-pretty-print.c (dump_gimple_call_args): Account for HWASAN. * gimplify.c (asan_poison_variable): Account for HWASAN. (gimplify_function_tree): Remove requirement of SANITIZE_ADDRESS, requiring asan or hwasan is accounted for in `asan_sanitize_use_after_scope`. * internal-fn.c (expand_HWASAN_CHECK): New. (expand_HWASAN_ALLOCA_UNPOISON): New. (expand_HWASAN_CHOOSE_TAG): New. (expand_HWASAN_MARK): New. (expand_HWASAN_SET_TAG): New. * internal-fn.def (HWASAN_ALLOCA_UNPOISON): New. (HWASAN_CHOOSE_TAG): New. (HWASAN_CHECK): New. (HWASAN_MARK): New. (HWASAN_SET_TAG): New. * sanitizer.def (BUILT_IN_HWASAN_LOAD1): New. (BUILT_IN_HWASAN_LOAD2): New. (BUILT_IN_HWASAN_LOAD4): New. (BUILT_IN_HWASAN_LOAD8): New. (BUILT_IN_HWASAN_LOAD16): New. (BUILT_IN_HWASAN_LOADN): New. (BUILT_IN_HWASAN_STORE1): New. (BUILT_IN_HWASAN_STORE2): New. (BUILT_IN_HWASAN_STORE4): New. (BUILT_IN_HWASAN_STORE8): New. (BUILT_IN_HWASAN_STORE16): New. (BUILT_IN_HWASAN_STOREN): New. (BUILT_IN_HWASAN_LOAD1_NOABORT): New. (BUILT_IN_HWASAN_LOAD2_NOABORT): New. (BUILT_IN_HWASAN_LOAD4_NOABORT): New. (BUILT_IN_HWASAN_LOAD8_NOABORT): New. (BUILT_IN_HWASAN_LOAD16_NOABORT): New. (BUILT_IN_HWASAN_LOADN_NOABORT): New. (BUILT_IN_HWASAN_STORE1_NOABORT): New. (BUILT_IN_HWASAN_STORE2_NOABORT): New. (BUILT_IN_HWASAN_STORE4_NOABORT): New. (BUILT_IN_HWASAN_STORE8_NOABORT): New. (BUILT_IN_HWASAN_STORE16_NOABORT): New. (BUILT_IN_HWASAN_STOREN_NOABORT): New. (BUILT_IN_HWASAN_TAG_MISMATCH4): New. (BUILT_IN_HWASAN_HANDLE_LONGJMP): New. (BUILT_IN_HWASAN_TAG_PTR): New. * sanopt.c (sanopt_optimize_walker): Act for hwasan. (pass_sanopt::execute): Act for hwasan. * toplev.c (compile_file): Use `gate_hwasan` function.	2020-11-25 16:39:07 +00:00
Jan Hubicka	762cca0023	Perforate fnspec strings gcc/ChangeLog: 2020-10-02 Jan Hubicka <hubicka@ucw.cz> * attr-fnspec.h: Update documentation. (attr_fnsec::return_desc_size): Set to 2 (attr_fnsec::arg_desc_size): Set to 2 * builtin-attrs.def (STR1): Update fnspec. * internal-fn.def (UBSAN_NULL): Update fnspec. (UBSAN_VPTR): Update fnspec. (UBSAN_PTR): Update fnspec. (ASAN_CHECK): Update fnspec. (GOACC_DIM_SIZE): Remove fnspec. (GOACC_DIM_POS): Remove fnspec. * tree-ssa-alias.c (attr_fnspec::verify): Update verification. gcc/fortran/ChangeLog: 2020-10-02 Jan Hubicka <hubicka@ucw.cz> * trans-decl.c (gfc_build_library_function_decl_with_spec): Verify fnspec. (gfc_build_intrinsic_function_decls): Update fnspecs. (gfc_build_builtin_function_decls): Update fnspecs. * trans-io.c (gfc_build_io_library_fndecls): Update fnspecs. * trans-types.c (create_fn_spec): Update fnspecs.	2020-10-02 15:56:12 +02:00
Xionghu Luo	683e55facf	IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR This patch enables transformation from ARRAY_REF(VIEW_CONVERT_EXPR) to VEC_SET internal function in gimple-isel pass if target supports vec_set with variable index by checking can_vec_set_var_idx_p. gcc/ChangeLog: 2020-09-27 Xionghu Luo <luoxhu@linux.ibm.com> * gimple-isel.cc (gimple_expand_vec_set_expr): New function. (gimple_expand_vec_cond_exprs): Rename to ... (gimple_expand_vec_exprs): ... this and call gimple_expand_vec_set_expr. * internal-fn.c (vec_set_direct): New define. (expand_vec_set_optab_fn): New function. (direct_vec_set_optab_supported_p): New define. * internal-fn.def (VEC_SET): New DEF_INTERNAL_OPTAB_FN. * optabs.c (can_vec_set_var_idx_p): New function. * optabs.h (can_vec_set_var_idx_p): New declaration.	2020-09-27 00:27:32 -05:00
Kewen Lin	d496134a6b	IFN/optabs: Support vector load/store with length This patch is to add the internal function and optabs support for vector load/store with length. For the vector load/store with length optab, the length item would be measured in lanes by default. For the targets which support length measured in bytes like Power, they should only define VnQI modes to wrap the other same size vector modes. If the length is larger than total lane/byte count of the given mode, the behavior is undefined. For the remaining lanes/bytes which isn't specified by length, they would be taken as undefined value. gcc/ChangeLog: * doc/md.texi (len_load_@var{m}): Document. (len_store_@var{m}): Likewise. * internal-fn.c (len_load_direct): New macro. (len_store_direct): Likewise. (expand_len_load_optab_fn): Likewise. (expand_len_store_optab_fn): Likewise. (direct_len_load_optab_supported_p): Likewise. (direct_len_store_optab_supported_p): Likewise. (expand_mask_load_optab_fn): New macro. Original renamed to ... (expand_partial_load_optab_fn): ... here. Add handlings for len_load_optab. (expand_mask_store_optab_fn): New macro. Original renamed to ... (expand_partial_store_optab_fn): ... here. Add handlings for len_store_optab. (internal_load_fn_p): Handle IFN_LEN_LOAD. (internal_store_fn_p): Handle IFN_LEN_STORE. (internal_fn_stored_value_index): Handle IFN_LEN_STORE. * internal-fn.def (LEN_LOAD): New internal function. (LEN_STORE): Likewise. * optabs.def (len_load_optab, len_store_optab): New optab.	2020-07-08 02:33:03 -05:00
Martin Liska	502d63b6d6	Lower VEC_COND_EXPR into internal functions. gcc/ChangeLog: * Makefile.in: Add new file. * expr.c (expand_expr_real_2): Add gcc_unreachable as we should not meet this condition. (do_store_flag): Likewise. * gimplify.c (gimplify_expr): Gimplify first argument of VEC_COND_EXPR to be a SSA name. * internal-fn.c (vec_cond_mask_direct): New. (vec_cond_direct): Likewise. (vec_condu_direct): Likewise. (vec_condeq_direct): Likewise. (expand_vect_cond_optab_fn): New. (expand_vec_cond_optab_fn): Likewise. (expand_vec_condu_optab_fn): Likewise. (expand_vec_condeq_optab_fn): Likewise. (expand_vect_cond_mask_optab_fn): Likewise. (expand_vec_cond_mask_optab_fn): Likewise. (direct_vec_cond_mask_optab_supported_p): Likewise. (direct_vec_cond_optab_supported_p): Likewise. (direct_vec_condu_optab_supported_p): Likewise. (direct_vec_condeq_optab_supported_p): Likewise. * internal-fn.def (VCOND): New OPTAB. (VCONDU): Likewise. (VCONDEQ): Likewise. (VCOND_MASK): Likewise. * optabs.c (get_rtx_code): Make it global. (expand_vec_cond_mask_expr): Removed. (expand_vec_cond_expr): Removed. * optabs.h (expand_vec_cond_expr): Likewise. (vector_compare_rtx): Make it global. * passes.def: Add new pass_gimple_isel pass. * tree-cfg.c (verify_gimple_assign_ternary): Add check for VEC_COND_EXPR about first argument. * tree-pass.h (make_pass_gimple_isel): New. * tree-ssa-forwprop.c (pass_forwprop::execute): Prevent propagation of the first argument of a VEC_COND_EXPR. * tree-ssa-reassoc.c (ovce_extract_ops): Support SSA_NAME as first argument of a VEC_COND_EXPR. (optimize_vec_cond_expr): Likewise. * tree-vect-generic.c (expand_vector_divmod): Make SSA_NAME for a first argument of created VEC_COND_EXPR. (expand_vector_condition): Fix coding style. * tree-vect-stmts.c (vectorizable_condition): Gimplify first argument. * gimple-isel.cc: New file. gcc/testsuite/ChangeLog: * g++.dg/vect/vec-cond-expr-eh.C: New test.	2020-06-17 12:04:22 +02:00
Iain Sandoe	49789fd083	[C++ coroutines] Initial implementation. This is the squashed version of the first 6 patches that were split to facilitate review. The changes to libiberty (7th patch) to support demangling the co_await operator stand alone and are applied separately. The patch series is an initial implementation of a coroutine feature, expected to be standardised in C++20. Standardisation status (and potential impact on this implementation) -------------------------------------------------------------------- The facility was accepted into the working draft for C++20 by WG21 in February 2019. During following WG21 meetings, design and national body comments have been reviewed, with no significant change resulting. The current GCC implementation is against n4835 [1]. At this stage, the remaining potential for change comes from: * Areas of national body comments that were not resolved in the version we have worked to: (a) handling of the situation where aligned allocation is available. (b) handling of the situation where a user wants coroutines, but does not want exceptions (e.g. a GPU). * Agreed changes that have not yet been worded in a draft standard that we have worked to. It is not expected that the resolution to these can produce any major change at this phase of the standardisation process. Such changes should be limited to the coroutine-specific code. ABI --- The various compiler developers 'vendors' have discussed a minimal ABI to allow one implementation to call coroutines compiled by another. This amounts to: 1. The layout of a public portion of the coroutine frame. Coroutines need to preserve state across suspension points, the storage for this is called a "coroutine frame". The ABI mandates that pointers into the coroutine frame point to an area begining with two function pointers (to the resume and destroy functions described below); these are immediately followed by the "promise object" described in the standard. This is sufficient that the builtins can take a coroutine frame pointer and determine the address of the promise (or call the resume/destroy functions). 2. A number of compiler builtins that the standard library might use. These are implemented by this patch series. 3. This introduces a new operator 'co_await' the mangling for which is also agreed between vendors (and has an issue filed for that against the upstream c++abi). Demangling for this is added to libiberty in a separate patch. The ABI has currently no target-specific content (a given psABI might elect to mandate alignment, but the common ABI does not do this). Standard Library impact ----------------------- The current implementations require addition of only a single header to the standard library (no change to the runtime). This header is part of the patch. GCC Implementation outline -------------------------- The standard's design for coroutines does not decorate the definition of a coroutine in any way, so that a function is only known to be a coroutine when one of the keywords (co_await, co_yield, co_return) is encountered. This means that we cannot special-case such functions from the outset, but must process them differently when they are finalised - which we do from "finish_function ()". At a high level, this design of coroutine produces four pieces from the original user's function: 1. A coroutine state frame (taking the logical place of the activation record for a regular function). One item stored in that state is the index of the current suspend point. 2. A "ramp" function This is what the user calls to construct the coroutine frame and start the coroutine execution. This will return some object representing the coroutine's eventual return value (or means to continue it when it it suspended). 3. A "resume" function. This is what gets called when a the coroutine is resumed when suspended. 4. A "destroy" function. This is what gets called when the coroutine state should be destroyed and its memory released. The standard's coroutines involve cooperation of the user's authored function with a provided "promise" class, which includes mandatory methods for handling the state transitions and providing output values. Most realistic coroutines will also have one or more 'awaiter' classes that implement the user's actions for each suspend point. As we parse (or during template expansion) the types of the promise and awaiter classes become known, and can then be verified against the signatures expected by the standard. Once the function is parsed (and templates expanded) we are able to make the transformation into the four pieces noted above. The implementation here takes the approach of a series of AST transforms. The state machine suspend points are encoded in three internal functions (one of which represents an exit from scope without cleanups). These three IFNs are lowered early in the middle end, such that the majority of GCC's optimisers can be run on the resulting output. As a design choice, we have carried out the outlining of the user's function in the front end, and taken advantage of the existing middle end's abilities to inline and DCE where that is profitable. Since the state machine is actually common to both resumer and destroyer functions, we make only a single function "actor" that contains both the resume and destroy paths. The destroy function is represented by a small stub that sets a value to signal the use of the destroy path and calls the actor. The idea is that optimisation of the state machine need only be done once - and then the resume and destroy paths can be identified allowing the middle end's inline and DCE machinery to optimise as profitable as noted above. The middle end components for this implementation are: A pass that: 1. Lowers the coroutine builtins that allow the standard library header to interact with the coroutine frame (these fairly simple logical or numerical substitution of values, given a coroutine frame pointer). 2. Lowers the IFN that represents the exit from state without cleanup. Essentially, this becomes a gimple goto. 3. Sets the final size of the coroutine frame at this stage. A second pass (that requires the revised CFG that results from the lowering of the scope exit IFNs in the first). 1. Lower the IFNs that represent the state machine paths for the resume and destroy cases. Patches squashed into this commit: [C++ coroutines 1] Common code and base definitions. This part of the patch series provides the gating flag, the keywords, cpp defines etc. [C++ coroutines 2] Define builtins and internal functions. This part of the patch series provides the builtin functions used by the standard library code and the internal functions used to implement lowering of the coroutine state machine. [C++ coroutines 3] Front end parsing and transforms. There are two parts to this. 1. Parsing, template instantiation and diagnostics for the standard- mandated class entries. The user authors a function that becomes a coroutine (lazily) by making use of any of the co_await, co_yield or co_return keywords. Unlike a regular function, where the activation record is placed on the stack, and is destroyed on function exit, a coroutine has some state that persists between calls - the 'coroutine frame' (thus analogous to a stack frame). We transform the user's function into three pieces: 1. A so-called ramp function, that establishes the coroutine frame and begins execution of the coroutine. 2. An actor function that contains the state machine corresponding to the user's suspend/resume structure. 3. A stub function that calls the actor function in 'destroy' mode. The actor function is executed: * from "resume point 0" by the ramp. * from resume point N ( > 0 ) for handle.resume() calls. * from the destroy stub for destroy point N for handle.destroy() calls. The C++ coroutine design described in the standard makes use of some helper methods that are authored in a so-called "promise" class provided by the user. At parse time (or post substitution) the type of the coroutine promise will be determined. At that point, we can look up the required promise class methods and issue diagnostics if they are missing or incorrect. To avoid repeating these actions at code-gen time, we make use of temporary 'proxy' variables for the coroutine handle and the promise - which will eventually be instantiated in the coroutine frame. Each of the keywords will expand to a code sequence (although co_yield is just syntactic sugar for a co_await). We defer the analysis and transformatin until template expansion is complete so that we have complete types at that time. 2. AST analysis and transformation which performs the code-gen for the outlined state machine. The entry point here is morph_fn_to_coro () which is called from finish_function () when we have completed any template expansion. This is preceded by helper functions that implement the phases below. The process proceeds in four phases. A Initial framing. The user's function body is wrapped in the initial and final suspend points and we begin building the coroutine frame. We build empty decls for the actor and destroyer functions at this time too. When exceptions are enabled, the user's function body will also be wrapped in a try-catch block with the catch invoking the promise class 'unhandled_exception' method. B Analysis. The user's function body is analysed to determine the suspend points, if any, and to capture local variables that might persist across such suspensions. In most cases, it is not necessary to capture compiler temporaries, since the tree-lowering nests the suspensions correctly. However, in the case of a captured reference, there is a lifetime extension to the end of the full expression - which can mean across a suspend point in which case it must be promoted to a frame variable. At the conclusion of analysis, we have a conservative frame layout and maps of the local variables to their frame entry points. C Build the ramp function. Carry out the allocation for the coroutine frame (NOTE; the actual size computation is deferred until late in the middle end to allow for future optimisations that will be allowed to elide unused frame entries). We build the return object. D Build and expand the actor and destroyer function bodies. The destroyer is a trivial shim that sets a bit to indicate that the destroy dispatcher should be used and then calls into the actor. The actor function is the implementation of the user's state machine. The current suspend point is noted in an index. Each suspend point is encoded as a pair of internal functions, one in the relevant dispatcher, and one representing the suspend point. During this process, the user's local variables and the proxies for the self-handle and the promise class instanceare re-written to their coroutine frame equivalents. The complete bodies for the ramp, actor and destroy function are passed back to finish_function for folding and gimplification. [C++ coroutines 4] Middle end expanders and transforms. The first part of this is a pass that provides: * expansion of the library support builtins, these are simple boolean or numerical substitutions. * The functionality of implementing an exit from scope without cleanup is performed here by lowering an IFN to a gimple goto. This pass has to run for non-coroutine functions, since functions calling the builtins are not necessarily coroutines (i.e. they are implementing the library interfaces which may be called from anywhere). The second part is the expansion of the coroutine IFNs that describe the state machine connections to the dispatchers. This only has to be run for functions that are coroutine components. The work done by this pass is: In the front end we construct a single actor function that contains the coroutine state machine. The actor function has three entry conditions: 1. from the ramp, resume point 0 - to initial-suspend. 2. when resume () is executed (resume point N). 3. from the destroy () shim when that is executed. The actor function begins with two dispatchers; one for resume and one for destroy (where the initial entry from the ramp is a special- case of resume point 0). Each suspend point and each dispatch entry is marked with an IFN such that we can connect the relevant dispatchers to their target labels. So, if we have: CO_YIELD (NUM, FINAL, RES_LAB, DEST_LAB, FRAME_PTR) This is await point NUM, and is the final await if FINAL is non-zero. The resume point is RES_LAB, and the destroy point is DEST_LAB. We expect to find a CO_ACTOR (NUM) in the resume dispatcher and a CO_ACTOR (NUM+1) in the destroy dispatcher. Initially, the intent of keeping the resume and destroy paths together is that the conditionals controlling them are identical, and thus there would be duplication of any optimisation of those paths if the split were earlier. Subsequent inlining of the actor (and DCE) is then able to extract the resume and destroy paths as separate functions if that is found profitable by the optimisers. Once we have remade the connections to their correct postions, we elide the labels that the front end inserted. [C++ coroutines 5] Standard library header. This provides the interfaces mandated by the standard and implements the interaction with the coroutine frame by means of inline use of builtins expanded at compile-time. There should be a 1:1 correspondence with the standard sections which are cross-referenced. There is no runtime content. At this stage, we have the content in an inline namespace "__n4835" for the CD we worked to. [C++ coroutines 6] Testsuite. There are two categories of test: 1. Checks for correctly formed source code and the error reporting. 2. Checks for transformation and code-gen. The second set are run as 'torture' tests for the standard options set, including LTO. These are also intentionally run with no options provided (from the coroutines.exp script). gcc/ChangeLog: 2020-01-18 Iain Sandoe <iain@sandoe.co.uk> * Makefile.in: Add coroutine-passes.o. * builtin-types.def (BT_CONST_SIZE): New. (BT_FN_BOOL_PTR): New. (BT_FN_PTR_PTR_CONST_SIZE_BOOL): New. * builtins.def (DEF_COROUTINE_BUILTIN): New. * coroutine-builtins.def: New file. * coroutine-passes.cc: New file. * function.h (struct GTY function): Add a bit to indicate that the function is a coroutine component. * internal-fn.c (expand_CO_FRAME): New. (expand_CO_YIELD): New. (expand_CO_SUSPN): New. (expand_CO_ACTOR): New. * internal-fn.def (CO_ACTOR): New. (CO_YIELD): New. (CO_SUSPN): New. (CO_FRAME): New. * passes.def: Add pass_coroutine_lower_builtins, pass_coroutine_early_expand_ifns. * tree-pass.h (make_pass_coroutine_lower_builtins): New. (make_pass_coroutine_early_expand_ifns): New. * doc/invoke.texi: Document the fcoroutines command line switch. gcc/c-family/ChangeLog: 2020-01-18 Iain Sandoe <iain@sandoe.co.uk> * c-common.c (co_await, co_yield, co_return): New. * c-common.h (RID_CO_AWAIT, RID_CO_YIELD, RID_CO_RETURN): New enumeration values. (D_CXX_COROUTINES): Bit to identify coroutines are active. (D_CXX_COROUTINES_FLAGS): Guard for coroutine keywords. * c-cppbuiltin.c (__cpp_coroutines): New cpp define. * c.opt (fcoroutines): New command-line switch. gcc/cp/ChangeLog: 2020-01-18 Iain Sandoe <iain@sandoe.co.uk> * Make-lang.in: Add coroutines.o. * cp-tree.h (lang_decl-fn): coroutine_p, new bit. (DECL_COROUTINE_P): New. * lex.c (init_reswords): Enable keywords when the coroutine flag is set, * operators.def (co_await): New operator. * call.c (add_builtin_candidates): Handle CO_AWAIT_EXPR. (op_error): Likewise. (build_new_op_1): Likewise. (build_new_function_call): Validate coroutine builtin arguments. * constexpr.c (potential_constant_expression_1): Handle CO_AWAIT_EXPR, CO_YIELD_EXPR, CO_RETURN_EXPR. * coroutines.cc: New file. * cp-objcp-common.c (cp_common_init_ts): Add CO_AWAIT_EXPR, CO_YIELD_EXPR, CO_RETRN_EXPR as TS expressions. * cp-tree.def (CO_AWAIT_EXPR, CO_YIELD_EXPR, (CO_RETURN_EXPR): New. * cp-tree.h (coro_validate_builtin_call): New. * decl.c (emit_coro_helper): New. (finish_function): Handle the case when a function is found to be a coroutine, perform the outlining and emit the outlined functions. Set a bit to signal that this is a coroutine component. * parser.c (enum required_token): New enumeration RT_CO_YIELD. (cp_parser_unary_expression): Handle co_await. (cp_parser_assignment_expression): Handle co_yield. (cp_parser_statement): Handle RID_CO_RETURN. (cp_parser_jump_statement): Handle co_return. (cp_parser_operator): Handle co_await operator. (cp_parser_yield_expression): New. (cp_parser_required_error): Handle RT_CO_YIELD. * pt.c (tsubst_copy): Handle CO_AWAIT_EXPR. (tsubst_expr): Handle CO_AWAIT_EXPR, CO_YIELD_EXPR and CO_RETURN_EXPRs. * tree.c (cp_walk_subtrees): Likewise. libstdc++-v3/ChangeLog: 2020-01-18 Iain Sandoe <iain@sandoe.co.uk> * include/Makefile.am: Add coroutine to the std set. * include/Makefile.in: Regenerated. * include/std/coroutine: New file. gcc/testsuite/ChangeLog: 2020-01-18 Iain Sandoe <iain@sandoe.co.uk> * g++.dg/coroutines/co-await-syntax-00-needs-expr.C: New test. * g++.dg/coroutines/co-await-syntax-01-outside-fn.C: New test. * g++.dg/coroutines/co-await-syntax-02-outside-fn.C: New test. * g++.dg/coroutines/co-await-syntax-03-auto.C: New test. * g++.dg/coroutines/co-await-syntax-04-ctor-dtor.C: New test. * g++.dg/coroutines/co-await-syntax-05-constexpr.C: New test. * g++.dg/coroutines/co-await-syntax-06-main.C: New test. * g++.dg/coroutines/co-await-syntax-07-varargs.C: New test. * g++.dg/coroutines/co-await-syntax-08-lambda-auto.C: New test. * g++.dg/coroutines/co-return-syntax-01-outside-fn.C: New test. * g++.dg/coroutines/co-return-syntax-02-outside-fn.C: New test. * g++.dg/coroutines/co-return-syntax-03-auto.C: New test. * g++.dg/coroutines/co-return-syntax-04-ctor-dtor.C: New test. * g++.dg/coroutines/co-return-syntax-05-constexpr-fn.C: New test. * g++.dg/coroutines/co-return-syntax-06-main.C: New test. * g++.dg/coroutines/co-return-syntax-07-vararg.C: New test. * g++.dg/coroutines/co-return-syntax-08-bad-return.C: New test. * g++.dg/coroutines/co-return-syntax-09-lambda-auto.C: New test. * g++.dg/coroutines/co-yield-syntax-00-needs-expr.C: New test. * g++.dg/coroutines/co-yield-syntax-01-outside-fn.C: New test. * g++.dg/coroutines/co-yield-syntax-02-outside-fn.C: New test. * g++.dg/coroutines/co-yield-syntax-03-auto.C: New test. * g++.dg/coroutines/co-yield-syntax-04-ctor-dtor.C: New test. * g++.dg/coroutines/co-yield-syntax-05-constexpr.C: New test. * g++.dg/coroutines/co-yield-syntax-06-main.C: New test. * g++.dg/coroutines/co-yield-syntax-07-varargs.C: New test. * g++.dg/coroutines/co-yield-syntax-08-needs-expr.C: New test. * g++.dg/coroutines/co-yield-syntax-09-lambda-auto.C: New test. * g++.dg/coroutines/coro-builtins.C: New test. * g++.dg/coroutines/coro-missing-gro.C: New test. * g++.dg/coroutines/coro-missing-promise-yield.C: New test. * g++.dg/coroutines/coro-missing-ret-value.C: New test. * g++.dg/coroutines/coro-missing-ret-void.C: New test. * g++.dg/coroutines/coro-missing-ueh-1.C: New test. * g++.dg/coroutines/coro-missing-ueh-2.C: New test. * g++.dg/coroutines/coro-missing-ueh-3.C: New test. * g++.dg/coroutines/coro-missing-ueh.h: New test. * g++.dg/coroutines/coro-pre-proc.C: New test. * g++.dg/coroutines/coro.h: New file. * g++.dg/coroutines/coro1-ret-int-yield-int.h: New file. * g++.dg/coroutines/coroutines.exp: New file. * g++.dg/coroutines/torture/alloc-00-gro-on-alloc-fail.C: New test. * g++.dg/coroutines/torture/alloc-01-overload-newdel.C: New test. * g++.dg/coroutines/torture/call-00-co-aw-arg.C: New test. * g++.dg/coroutines/torture/call-01-multiple-co-aw.C: New test. * g++.dg/coroutines/torture/call-02-temp-co-aw.C: New test. * g++.dg/coroutines/torture/call-03-temp-ref-co-aw.C: New test. * g++.dg/coroutines/torture/class-00-co-ret.C: New test. * g++.dg/coroutines/torture/class-01-co-ret-parm.C: New test. * g++.dg/coroutines/torture/class-02-templ-parm.C: New test. * g++.dg/coroutines/torture/class-03-operator-templ-parm.C: New test. * g++.dg/coroutines/torture/class-04-lambda-1.C: New test. * g++.dg/coroutines/torture/class-05-lambda-capture-copy-local.C: New test. * g++.dg/coroutines/torture/class-06-lambda-capture-ref.C: New test. * g++.dg/coroutines/torture/co-await-00-trivial.C: New test. * g++.dg/coroutines/torture/co-await-01-with-value.C: New test. * g++.dg/coroutines/torture/co-await-02-xform.C: New test. * g++.dg/coroutines/torture/co-await-03-rhs-op.C: New test. * g++.dg/coroutines/torture/co-await-04-control-flow.C: New test. * g++.dg/coroutines/torture/co-await-05-loop.C: New test. * g++.dg/coroutines/torture/co-await-06-ovl.C: New test. * g++.dg/coroutines/torture/co-await-07-tmpl.C: New test. * g++.dg/coroutines/torture/co-await-08-cascade.C: New test. * g++.dg/coroutines/torture/co-await-09-pair.C: New test. * g++.dg/coroutines/torture/co-await-10-template-fn-arg.C: New test. * g++.dg/coroutines/torture/co-await-11-forwarding.C: New test. * g++.dg/coroutines/torture/co-await-12-operator-2.C: New test. * g++.dg/coroutines/torture/co-await-13-return-ref.C: New test. * g++.dg/coroutines/torture/co-ret-00-void-return-is-ready.C: New test. * g++.dg/coroutines/torture/co-ret-01-void-return-is-suspend.C: New test. * g++.dg/coroutines/torture/co-ret-03-different-GRO-type.C: New test. * g++.dg/coroutines/torture/co-ret-04-GRO-nontriv.C: New test. * g++.dg/coroutines/torture/co-ret-05-return-value.C: New test. * g++.dg/coroutines/torture/co-ret-06-template-promise-val-1.C: New test. * g++.dg/coroutines/torture/co-ret-07-void-cast-expr.C: New test. * g++.dg/coroutines/torture/co-ret-08-template-cast-ret.C: New test. * g++.dg/coroutines/torture/co-ret-09-bool-await-susp.C: New test. * g++.dg/coroutines/torture/co-ret-10-expression-evaluates-once.C: New test. * g++.dg/coroutines/torture/co-ret-11-co-ret-co-await.C: New test. * g++.dg/coroutines/torture/co-ret-12-co-ret-fun-co-await.C: New test. * g++.dg/coroutines/torture/co-ret-13-template-2.C: New test. * g++.dg/coroutines/torture/co-ret-14-template-3.C: New test. * g++.dg/coroutines/torture/co-yield-00-triv.C: New test. * g++.dg/coroutines/torture/co-yield-01-multi.C: New test. * g++.dg/coroutines/torture/co-yield-02-loop.C: New test. * g++.dg/coroutines/torture/co-yield-03-tmpl.C: New test. * g++.dg/coroutines/torture/co-yield-04-complex-local-state.C: New test. * g++.dg/coroutines/torture/co-yield-05-co-aw.C: New test. * g++.dg/coroutines/torture/co-yield-06-fun-parm.C: New test. * g++.dg/coroutines/torture/co-yield-07-template-fn-param.C: New test. * g++.dg/coroutines/torture/co-yield-08-more-refs.C: New test. * g++.dg/coroutines/torture/co-yield-09-more-templ-refs.C: New test. * g++.dg/coroutines/torture/coro-torture.exp: New file. * g++.dg/coroutines/torture/exceptions-test-0.C: New test. * g++.dg/coroutines/torture/func-params-00.C: New test. * g++.dg/coroutines/torture/func-params-01.C: New test. * g++.dg/coroutines/torture/func-params-02.C: New test. * g++.dg/coroutines/torture/func-params-03.C: New test. * g++.dg/coroutines/torture/func-params-04.C: New test. * g++.dg/coroutines/torture/func-params-05.C: New test. * g++.dg/coroutines/torture/func-params-06.C: New test. * g++.dg/coroutines/torture/lambda-00-co-ret.C: New test. * g++.dg/coroutines/torture/lambda-01-co-ret-parm.C: New test. * g++.dg/coroutines/torture/lambda-02-co-yield-values.C: New test. * g++.dg/coroutines/torture/lambda-03-auto-parm-1.C: New test. * g++.dg/coroutines/torture/lambda-04-templ-parm.C: New test. * g++.dg/coroutines/torture/lambda-05-capture-copy-local.C: New test. * g++.dg/coroutines/torture/lambda-06-multi-capture.C: New test. * g++.dg/coroutines/torture/lambda-07-multi-yield.C: New test. * g++.dg/coroutines/torture/lambda-08-co-ret-parm-ref.C: New test. * g++.dg/coroutines/torture/local-var-0.C: New test. * g++.dg/coroutines/torture/local-var-1.C: New test. * g++.dg/coroutines/torture/local-var-2.C: New test. * g++.dg/coroutines/torture/local-var-3.C: New test. * g++.dg/coroutines/torture/local-var-4.C: New test. * g++.dg/coroutines/torture/mid-suspend-destruction-0.C: New test. * g++.dg/coroutines/torture/pr92933.C: New test.	2020-01-18 11:55:56 +00:00
Jakub Jelinek	8d9254fc8a	Update copyright years. From-SVN: r279813	2020-01-01 12:51:42 +01:00
Richard Sandiford	58c036c835	Add optabs for accelerating RAW and WAR alias checks This patch adds optabs that check whether a read followed by a write or a write followed by a read can be divided into interleaved byte accesses without changing the dependencies between the bytes. This is one of the uses of the SVE2 WHILERW and WHILEWR instructions. (The instructions can also be used to limit the VF at runtime, but that's future work.) 2019-11-18 Richard Sandiford <richard.sandiford@arm.com> gcc/ * doc/sourcebuild.texi (vect_check_ptrs): Document. * optabs.def (check_raw_ptrs_optab, check_war_ptrs_optab): New optabs. * doc/md.texi: Document them. * internal-fn.def (IFN_CHECK_RAW_PTRS, IFN_CHECK_WAR_PTRS): New internal functions. * internal-fn.h (internal_check_ptrs_fn_supported_p): Declare. * internal-fn.c (check_ptrs_direct): New macro. (expand_check_ptrs_optab_fn): Likewise. (direct_check_ptrs_optab_supported_p): Likewise. (internal_check_ptrs_fn_supported_p): New fuction. * tree-data-ref.c: Include internal-fn.h. (create_ifn_alias_checks): New function. (create_intersect_range_checks): Use it. * config/aarch64/iterators.md (SVE2_WHILE_PTR): New int iterator. (optab, cmp_op): Handle it. (raw_war, unspec): New int attributes. * config/aarch64/aarch64.md (UNSPEC_WHILERW, UNSPEC_WHILE_WR): New constants. * config/aarch64/predicates.md (aarch64_bytes_per_sve_vector_operand): New predicate. * config/aarch64/aarch64-sve2.md (check_<raw_war>_ptrs<mode>): New expander. (@aarch64_sve2_while<cmp_op><GPI:mode><PRED_ALL:mode>_ptest): New pattern. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_check_ptrs): New procedure. * gcc.dg/vect/vect-alias-check-14.c: Expect IFN_CHECK_WAR to be used, if available. * gcc.dg/vect/vect-alias-check-15.c: Likewise. * gcc.dg/vect/vect-alias-check-16.c: Likewise IFN_CHECK_RAW. * gcc.target/aarch64/sve2/whilerw_1.c: New test. * gcc.target/aarch64/sve2/whilewr_1.c: Likewise. * gcc.target/aarch64/sve2/whilewr_2.c: Likewise. From-SVN: r278414	2019-11-18 15:36:10 +00:00
Yuliang Wang	c0c2f01390	[AArch64][SVE] Utilize ASRD instruction for division and remainder 2019-09-30 Yuliang Wang <yuliang.wang@arm.com> gcc/ * config/aarch64/aarch64-sve.md (sdiv_pow2<mode>3): New pattern for ASRD. * config/aarch64/iterators.md (UNSPEC_ASRD): New unspec. * internal-fn.def (IFN_DIV_POW2): New internal function. * optabs.def (sdiv_pow2_optab): New optab. * tree-vect-patterns.c (vect_recog_divmod_pattern): Modify pattern to support new operation. * doc/md.texi (sdiv_pow2$var{m3}): Documentation for the above. * doc/sourcebuild.texi (vect_sdiv_pow2_si): Document new target selector. gcc/testsuite/ * gcc.dg/vect/vect-sdiv-pow2-1.c: New test. * gcc.target/aarch64/sve/asrdiv_1.c: As above. * lib/target-supports.exp (check_effective_target_vect_sdiv_pow2_si): Return true for AArch64 with SVE. From-SVN: r276343	2019-09-30 16:55:45 +00:00
Yuliang Wang	58cc98767a	Vectorise multiply high with scaling operations (PR 89386) 2019-09-12 Yuliang Wang <yuliang.wang@arm.com> gcc/ PR tree-optimization/89386 * config/aarch64/aarch64-sve2.md (<su>mull<bt><Vwide>) (<r>shrnb<mode>, <r>shrnt<mode>): New SVE2 patterns. (<su>mulh<r>s<mode>3): New pattern for MULHRS. * config/aarch64/iterators.md (UNSPEC_SMULLB, UNSPEC_SMULLT) (UNSPEC_UMULLB, UNSPEC_UMULLT, UNSPEC_SHRNB, UNSPEC_SHRNT) (UNSPEC_RSHRNB, UNSPEC_RSHRNT, UNSPEC_SMULHS, UNSPEC_SMULHRS) UNSPEC_UMULHS, UNSPEC_UMULHRS): New unspecs. (MULLBT, SHRNB, SHRNT, MULHRS): New int iterators. (su, r): Handle the unspecs above. (bt): New int attribute. * internal-fn.def (IFN_MULHS, IFN_MULHRS): New internal functions. * internal-fn.c (first_commutative_argument): Commutativity info for above. * optabs.def (smulhs_optab, smulhrs_optab, umulhs_optab) (umulhrs_optab): New optabs. * doc/md.texi (smulhs$var{m3}, umulhs$var{m3}) (smulhrs$var{m3}, umulhrs$var{m3}): Documentation for the above. * tree-vect-patterns.c (vect_recog_mulhs_pattern): New pattern function. (vect_vect_recog_func_ptrs): Add it. * testsuite/gcc.target/aarch64/sve2/mulhrs_1.c: New test. * testsuite/gcc.dg/vect/vect-mulhrs-1.c: As above. * testsuite/gcc.dg/vect/vect-mulhrs-2.c: As above. * testsuite/gcc.dg/vect/vect-mulhrs-3.c: As above. * testsuite/gcc.dg/vect/vect-mulhrs-4.c: As above. * doc/sourcebuild.texi (vect_mulhrs_hi): Document new target selector. * testsuite/lib/target-supports.exp (check_effective_target_vect_mulhrs_hi): Return true for AArch64 with SVE2. From-SVN: r275682	2019-09-12 09:59:58 +00:00
Tejas Joshi	d3b92f35d8	i386: Roundeven expansion for SSE4.1+ gcc/ChangeLog: 2019-08-26 Tejas Joshi <tejasjoshi9673@gmail.com> Uros Bizjak <ubizjak@gmail.com> * builtins.c (mathfn_built_in_2): Change CASE_MATHFN to CASE_MATHFN_FLOATN for roundeven. * config/i386/i386.c (ix86_i387_mode_needed): Add case I387_ROUNDEVEN. (ix86_mode_needed): Likewise. (ix86_mode_after): Likewise. (ix86_mode_entry): Likewise. (ix86_mode_exit): Likewise. (ix86_emit_mode_set): Likewise. (emit_i387_cw_initialization): Add case I387_CW_ROUNDEVEN. * config/i386/i386.h (ix86_stack_slot): Add SLOT_CW_ROUNDEVEN. (ix86_entry): Add I387_ROUNDEVEN. (avx_u128_state): Add I387_CW_ANY. * config/i386/i386.md: Define UNSPEC_FRNDINT_ROUNDEVEN. (define_int_iterator): Likewise. (define_int_attr): Likewise for rounding_insn, rounding and ROUNDING. (define_constant): Define ROUND_ROUNDEVEN mode. (define_attr): Add roundeven mode for i387_cw. (<rouding_insn><mode>2): Add condition for ROUND_ROUNDEVEN. * internal-fn.def (ROUNDEVEN): New builtin function. * optabs.def (roundeven_optab): New optab. gcc/testsuite/ChangeLog: 2019-08-26 Tejas Joshi <tejasjoshi9673@gmail.com> * gcc.target/i386/sse4_1-round-roundeven-1.c: New test. * gcc.target/i386/sse4_1-round-roundeven-2.c: New test. Co-Authored-By: Uros Bizjak <ubizjak@gmail.com> From-SVN: r274928	2019-08-26 14:41:59 +02:00
Richard Sandiford	20103c0ea9	Add support for conditional shifts This patch adds support for IFN_COND shifts left and shifts right. This is mostly mechanical, but since we try to handle conditional operations in the same way as unconditional operations in match.pd, we need to support IFN_COND shifts by scalars as well as vectors. E.g.: IFN_COND_SHL (cond, a, { 1, 1, ... }, fallback) and: IFN_COND_SHL (cond, a, 1, fallback) are the same operation, with: (for shiftrotate (lrotate rrotate lshift rshift) ... /* Prefer vector1 << scalar to vector1 << vector2 if vector2 is uniform. / (for vec (VECTOR_CST CONSTRUCTOR) (simplify (shiftrotate @0 vec@1) (with { tree tem = uniform_vector_p (@1); } (if (tem) (shiftrotate @0 { tem; })))))) preferring the latter. The patch copes with this by extending create_convert_operand_from to handle scalar-to-vector conversions. 2019-08-15 Richard Sandiford <richard.sandiford@arm.com> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> gcc/ internal-fn.def (IFN_COND_SHL, IFN_COND_SHR): New internal functions. * internal-fn.c (FOR_EACH_CODE_MAPPING): Handle shifts. * match.pd (UNCOND_BINARY, COND_BINARY): Likewise. * optabs.def (cond_ashl_optab, cond_ashr_optab, cond_lshr_optab): New optabs. * optabs.h (create_convert_operand_from): Expand comment. * optabs.c (maybe_legitimize_operand): Allow implicit broadcasts when mapping scalar rtxes to vector operands. * config/aarch64/iterators.md (SVE_INT_BINARY): Add ashift, ashiftrt and lshiftrt. (sve_int_op, sve_int_op_rev, sve_pred_int_rhs2_operand): Handle them. * config/aarch64/aarch64-sve.md (cond_<optab><mode>_2_const) (cond_<optab><mode>_any_const): New patterns. gcc/testsuite/ * gcc.target/aarch64/sve/cond_shift_1.c: New test. * gcc.target/aarch64/sve/cond_shift_1_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_2.c: Likewise. * gcc.target/aarch64/sve/cond_shift_2_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_3.c: Likewise. * gcc.target/aarch64/sve/cond_shift_3_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_4.c: Likewise. * gcc.target/aarch64/sve/cond_shift_4_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_5.c: Likewise. * gcc.target/aarch64/sve/cond_shift_5_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_6.c: Likewise. * gcc.target/aarch64/sve/cond_shift_6_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_7.c: Likewise. * gcc.target/aarch64/sve/cond_shift_7_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_8.c: Likewise. * gcc.target/aarch64/sve/cond_shift_8_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_9.c: Likewise. * gcc.target/aarch64/sve/cond_shift_9_run.c: Likewise. Co-Authored-By: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> From-SVN: r274505	2019-08-15 08:05:50 +00:00
Alejandro Martinez	bce29d65eb	[Vectorizer] Support masking fold left reductions This patch adds support in the vectorizer for masking fold left reductions. This avoids the need to insert a conditional assignement with some identity value. From-SVN: r272407	2019-06-18 08:09:00 +00:00
Przemyslaw Wirkus	a52cf5cf27	2019-05-14 Przemyslaw Wirkus <przemyslaw.wirkus@arm.com\> gcc/ * internal-fn.def (SIGNBIT): New. * config/aarch64/aarch64-simd.md (signbitv2sf2): New expand defined. (signbitv4sf2): Likewise. gcc/testsuite/ * gcc.target/aarch64/signbitv4sf.c: New test. * gcc.target/aarch64/signbitv2sf.c: New test. From-SVN: r271149	2019-05-14 08:07:56 +00:00
Jakub Jelinek	d8fcab6894	re PR c++/85052 (Implement support for clang's __builtin_convertvector) PR c++/85052 * tree-vect-generic.c: Include insn-config.h and recog.h. (expand_vector_piecewise): Add defaulted ret_type argument, if non-NULL, use that in preference to type for the result type. (expand_vector_parallel): Formatting fix. (do_vec_conversion, do_vec_narrowing_conversion, expand_vector_conversion): New functions. (expand_vector_operations_1): Call expand_vector_conversion for VEC_CONVERT ifn calls. * internal-fn.def (VEC_CONVERT): New internal function. * internal-fn.c (expand_VEC_CONVERT): New function. * fold-const-call.c (fold_const_vec_convert): New function. (fold_const_call): Use it for CFN_VEC_CONVERT. * doc/extend.texi (__builtin_convertvector): Document. c-family/ * c-common.h (enum rid): Add RID_BUILTIN_CONVERTVECTOR. (c_build_vec_convert): Declare. * c-common.c (c_build_vec_convert): New function. c/ * c-parser.c (c_parser_postfix_expression): Parse __builtin_convertvector. cp/ * cp-tree.h (cp_build_vec_convert): Declare. * parser.c (cp_parser_postfix_expression): Parse __builtin_convertvector. * constexpr.c: Include fold-const-call.h. (cxx_eval_internal_function): Handle IFN_VEC_CONVERT. (potential_constant_expression_1): Likewise. * semantics.c (cp_build_vec_convert): New function. * pt.c (tsubst_copy_and_build): Handle CALL_EXPR to IFN_VEC_CONVERT. testsuite/ * c-c++-common/builtin-convertvector-1.c: New test. * c-c++-common/torture/builtin-convertvector-1.c: New test. * g++.dg/ext/builtin-convertvector-1.C: New test. * g++.dg/cpp0x/constexpr-builtin4.C: New test. From-SVN: r267632	2019-01-07 09:49:08 +01:00
Jakub Jelinek	a554497024	Update copyright years. From-SVN: r267494	2019-01-01 13:31:55 +01:00
Uros Bizjak	247c45b265	re PR target/88556 (Inline built-in sinh, cosh, tanh for -ffast-math) PR target/88556 * internal-fn.def (COSH): New. (SINH): Ditto. (TANH): Ditto. * optabs.def (cosh_optab): New. (sinh_optab): Ditto. (tanh_optab): Ditto. * config/i386/i386-protos.h (ix86_emit_i387_sinh): New prototype. (ix86_emit_i387_cosh): Ditto. (ix86_emit_i387_tanh): Ditto. * config/i386/i386.c (ix86_emit_i387_sinh): New function. (ix86_emit_i387_cosh): Ditto. (ix86_emit_i387_tanh): Ditto. * config/i386/i386.md (sinhxf2): New expander. (sinh<mode>2): Ditto. (coshxf2): Ditto. (cosh<mode>2): Ditto. (tanhxf2): Ditto. (tanh<mode>2): Ditto. From-SVN: r267325	2018-12-21 14:30:58 +01:00
Uros Bizjak	a81037cea6	re PR target/88502 (Inline built-in asinh, acosh, atanh for -ffast-math) PR target/88502 * internal-fn.def (ACOSH): New. (ASINH): Ditto. (ATANH): Ditto. * optabs.def (acosh_optab): New. (asinh_optab): Ditto. (atanh_optab): Ditto. * config/i386/i386-protos.h (ix86_emit_i387_asinh): New prototype. (ix86_emit_i387_acosh): Ditto. (ix86_emit_i387_atanh): Ditto. * config/i386/i386.c (ix86_emit_i387_asinh): New function. (ix86_emit_i387_acosh): Ditto. (ix86_emit_i387_atanh): Ditto. * config/i386/i386.md (asinhxf2): New expander. (asinh<mode>2): Ditto. (acoshxf2): Ditto. (acosh<mode>2): Ditto. (atanhxf2): Ditto. (atanh<mode>2): Ditto. From-SVN: r267204	2018-12-17 16:46:20 +01:00
Uros Bizjak	4dd9b6c6bc	re PR target/88474 (Inline built-in hypot for -ffast-math) PR target/88474 * internal-fn.def (HYPOT): New. * optabs.def (hypot_optab): New. * config/i386/i386.md (hypot<mode>3): New expander. From-SVN: r267137	2018-12-14 18:04:48 +01:00
Richard Sandiford	b41d1f6ed7	Add IFN_COND_FMA functions This patch adds conditional equivalents of the IFN_FMA built-in functions. Most of it is just a mechanical extension of the binary stuff. 2018-07-12 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/md.texi (cond_fma, cond_fms, cond_fnma, cond_fnms): Document. * optabs.def (cond_fma_optab, cond_fms_optab, cond_fnma_optab) (cond_fnms_optab): New optabs. * internal-fn.def (COND_FMA, COND_FMS, COND_FNMA, COND_FNMS): New internal functions. (FMA): Use DEF_INTERNAL_FLT_FN rather than DEF_INTERNAL_FLT_FLOATN_FN. * internal-fn.h (get_conditional_internal_fn): Declare. (get_unconditional_internal_fn): Likewise. * internal-fn.c (cond_ternary_direct): New macro. (expand_cond_ternary_optab_fn): Likewise. (direct_cond_ternary_optab_supported_p): Likewise. (FOR_EACH_COND_FN_PAIR): Likewise. (get_conditional_internal_fn): New function. (get_unconditional_internal_fn): Likewise. * gimple-match.h (gimple_match_op::MAX_NUM_OPS): Bump to 5. (gimple_match_op::gimple_match_op): Add a new overload for 5 operands. (gimple_match_op::set_op): Likewise. (gimple_resimplify5): Declare. * genmatch.c (decision_tree::gen): Generate simplifications for 5 operands. * gimple-match-head.c (gimple_simplify): Define an overload for 5 operands. Handle calls with 5 arguments in the top-level overload. (convert_conditional_op): Handle conversions from unconditional internal functions to conditional ones. (gimple_resimplify5): New function. (build_call_internal): Pass a fifth operand. (maybe_push_res_to_seq): Likewise. (try_conditional_simplification): Try converting conditional internal functions to unconditional internal functions. Handle 3-operand unconditional forms. * match.pd (UNCOND_TERNARY, COND_TERNARY): Operator lists. Define ternary equivalents of the current rules for binary conditional internal functions. * config/aarch64/aarch64.c (aarch64_preferred_else_value): Handle ternary operations. * config/aarch64/iterators.md (UNSPEC_COND_FMLA, UNSPEC_COND_FMLS) (UNSPEC_COND_FNMLA, UNSPEC_COND_FNMLS): New unspecs. (optab): Handle them. (SVE_COND_FP_TERNARY): New int iterator. (sve_fmla_op, sve_fmad_op): New int attributes. * config/aarch64/aarch64-sve.md (cond_<optab><mode>) (cond_<optab><mode>_2, cond_<optab><mode_4) (cond_<optab><mode>_any): New SVE_COND_FP_TERNARY patterns. gcc/testsuite/ gcc.dg/vect/vect-cond-arith-3.c: New test. * gcc.target/aarch64/sve/vcond_13.c: Likewise. * gcc.target/aarch64/sve/vcond_13_run.c: Likewise. * gcc.target/aarch64/sve/vcond_14.c: Likewise. * gcc.target/aarch64/sve/vcond_14_run.c: Likewise. * gcc.target/aarch64/sve/vcond_15.c: Likewise. * gcc.target/aarch64/sve/vcond_15_run.c: Likewise. * gcc.target/aarch64/sve/vcond_16.c: Likewise. * gcc.target/aarch64/sve/vcond_16_run.c: Likewise. From-SVN: r262587	2018-07-12 13:01:33 +00:00
Richard Sandiford	0267732bae	[16/n] PR85694: Add detection of averaging operations This patch adds detection of average instructions: a = (((wide) b + (wide) c) >> 1); --> a = (wide) .AVG_FLOOR (b, c); a = (((wide) b + (wide) c + 1) >> 1); --> a = (wide) .AVG_CEIL (b, c); in cases where users of "a" need only the low half of the result, making the cast to (wide) redundant. The heavy lifting was done by earlier patches. This showed up another problem in vectorizable_call: if the call is a pattern definition statement rather than the main pattern statement, the type of vectorised call might be different from the type of the original statement. 2018-07-03 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR tree-optimization/85694 * doc/md.texi (avgM3_floor, uavgM3_floor, avgM3_ceil) (uavgM3_ceil): Document new optabs. * doc/sourcebuild.texi (vect_avg_qi): Document new target selector. * internal-fn.def (IFN_AVG_FLOOR, IFN_AVG_CEIL): New internal functions. * optabs.def (savg_floor_optab, uavg_floor_optab, savg_ceil_optab) (savg_ceil_optab): New optabs. * tree-vect-patterns.c (vect_recog_average_pattern): New function. (vect_vect_recog_func_ptrs): Add it. * tree-vect-stmts.c (vectorizable_call): Get the type of the zero constant directly from the associated lhs. gcc/testsuite/ PR tree-optimization/85694 * lib/target-supports.exp (check_effective_target_vect_avg_qi): New proc. * gcc.dg/vect/vect-avg-1.c: New test. * gcc.dg/vect/vect-avg-2.c: Likewise. * gcc.dg/vect/vect-avg-3.c: Likewise. * gcc.dg/vect/vect-avg-4.c: Likewise. * gcc.dg/vect/vect-avg-5.c: Likewise. * gcc.dg/vect/vect-avg-6.c: Likewise. * gcc.dg/vect/vect-avg-7.c: Likewise. * gcc.dg/vect/vect-avg-8.c: Likewise. * gcc.dg/vect/vect-avg-9.c: Likewise. * gcc.dg/vect/vect-avg-10.c: Likewise. * gcc.dg/vect/vect-avg-11.c: Likewise. * gcc.dg/vect/vect-avg-12.c: Likewise. * gcc.dg/vect/vect-avg-13.c: Likewise. * gcc.dg/vect/vect-avg-14.c: Likewise. From-SVN: r262335	2018-07-03 10:03:44 +00:00
Richard Sandiford	6c4fd4a9fe	Add IFN_COND_{MUL,DIV,MOD,RDIV} This patch adds support for conditional multiplication and division. It's mostly mechanical, but a few notes: * The _optab name and the .md names are the same as the unconditional forms, just with "cond_" added to the front. This means we still have the awkward difference between sdiv and div, etc. It was easier to retain the difference between integer and FP division in the function names, given that they map to different tree codes (TRUNC_DIV_EXPR and RDIV_EXPR). * SVE has no direct support for IFN_COND_MOD, but it seemed more consistent to add it anyway. * Adding IFN_COND_MUL enables an extra fully-masked reduction in gcc.dg/vect/pr53773.c. * In practice we don't actually use the integer division forms without if-conversion support (added by a later patch). 2018-05-25 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/sourcebuild.texi (vect_double_cond_arith): Include multiplication and division. * doc/md.texi (cond_mul@var{m}, cond_div@var{m}, cond_mod@var{m}) (cond_udiv@var{m}, cond_umod@var{m}): Document. * optabs.def (cond_smul_optab, cond_sdiv_optab, cond_smod_optab) (cond_udiv_optab, cond_umod_optab): New optabs. * internal-fn.def (IFN_COND_MUL, IFN_COND_DIV, IFN_COND_MOD) (IFN_COND_RDIV): New internal functions. * internal-fn.c (get_conditional_internal_fn): Handle TRUNC_DIV_EXPR, TRUNC_MOD_EXPR and RDIV_EXPR. * match.pd (UNCOND_BINARY, COND_BINARY): Handle them. * config/aarch64/iterators.md (UNSPEC_COND_MUL, UNSPEC_COND_DIV): New unspecs. (SVE_INT_BINARY): Include mult. (SVE_COND_FP_BINARY): Include UNSPEC_MUL and UNSPEC_DIV. (optab, sve_int_op): Handle mult. (optab, sve_fp_op, commutative): Handle UNSPEC_COND_MUL and UNSPEC_COND_DIV. * config/aarch64/aarch64-sve.md (cond_<optab><mode>): New pattern for SVE_INT_BINARY_SD. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_double_cond_arith): Include multiplication and division. * gcc.dg/vect/pr53773.c: Do not expect a scalar tail when using fully-masked loops with a fixed vector length. * gcc.dg/vect/vect-cond-arith-1.c: Add multiplication and division tests. * gcc.target/aarch64/sve/vcond_8.c: Likewise. * gcc.target/aarch64/sve/vcond_9.c: Likewise. * gcc.target/aarch64/sve/vcond_12.c: Add multiplication tests. From-SVN: r260713	2018-05-25 08:53:15 +00:00
Richard Sandiford	9d4ac06e02	Add an "else" argument to IFN_COND_* functions As suggested by Richard B, this patch changes the IFN_COND_* functions so that they take the else value of the ?: operation as a final argument, rather than always using argument 1. All current callers will still use the equivalent of argument 1, so this patch makes the SVE code assert that for now. Later patches add the general case. 2018-05-25 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/md.texi: Update the documentation of the cond_* optabs to mention the new final operand. Fix GET_MODE_NUNITS call. Describe the scalar case too. * internal-fn.def (IFN_EXTRACT_LAST): Change type to fold_left. * internal-fn.c (expand_cond_unary_optab_fn): Expect 3 operands instead of 2. (expand_cond_binary_optab_fn): Expect 4 operands instead of 3. (get_conditional_internal_fn): Update comment. * tree-vect-loop.c (vectorizable_reduction): Pass the original accumulator value as a final argument to conditional functions. * config/aarch64/aarch64-sve.md (cond_<optab><mode>): Turn into a define_expand and add an "else" operand. Assert for now that the else operand is equal to operand 2. Use SVE_INT_BINARY and SVE_COND_FP_BINARY instead of SVE_COND_INT_OP and SVE_COND_FP_OP. (cond_<optab><mode>): New patterns. config/aarch64/iterators.md (UNSPEC_COND_SMAX, UNSPEC_COND_UMAX) (UNSPEC_COND_SMIN, UNSPEC_COND_UMIN, UNSPEC_COND_AND, UNSPEC_COND_ORR) (UNSPEC_COND_EOR): Delete. (optab): Remove associated mappings. (SVE_INT_BINARY): New code iterator. (sve_int_op): Remove int attribute and add "minus" to the code attribute. (SVE_COND_INT_OP): Delete. (SVE_COND_FP_OP): Rename to... (SVE_COND_FP_BINARY): ...this. From-SVN: r260707	2018-05-25 06:48:47 +00:00
Richard Sandiford	c566cc9f78	Replace FMA_EXPR with one internal fn per optab There are four optabs for various forms of fused multiply-add: fma, fms, fnma and fnms. Of these, only fma had a direct gimple representation. For the other three we relied on special pattern- matching during expand, although tree-ssa-math-opts.c did have some code to try to second-guess what expand would do. This patch removes the old FMA_EXPR representation of fma and introduces four new internal functions, one for each optab. IFN_FMA is tied to BUILT_IN_FMA* while the other three are independent directly-mapped internal functions. It's then possible to do the pattern-matching in match.pd and tree-ssa-math-opts.c (via folding) can select the exact FMA-based operation. The BRIG & HSA parts are a best guess, but seem relatively simple. 2018-05-18 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/sourcebuild.texi (scalar_all_fma): Document. * tree.def (FMA_EXPR): Delete. * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions. * internal-fn.c (ternary_direct): New macro. (expand_ternary_optab_fn): Likewise. (direct_ternary_optab_supported_p): Likewise. * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h. * builtins.c (fold_builtin_fma): Delete. (fold_builtin_3): Don't call it. * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling. * expr.c (expand_expr_real_2): Likewise. * fold-const.c (operand_equal_p): Likewise. (fold_ternary_loc): Likewise. * gimple-pretty-print.c (dump_ternary_rhs): Likewise. * gimple.c (DEFTREECODE): Likewise. * gimplify.c (gimplify_expr): Likewise. * optabs-tree.c (optab_for_tree_code): Likewise. * tree-cfg.c (verify_gimple_assign_ternary): Likewise. * tree-eh.c (operation_could_trap_p): Likewise. (stmt_could_throw_1_p): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. (op_code_prio): Likewise. * tree-ssa-loop-im.c (stmt_cost): Likewise. * tree-ssa-operands.c (get_expr_operands): Likewise. * tree.c (commutative_ternary_tree_code, add_expr): Likewise. * fold-const-call.h (fold_fma): Delete. * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS, CFN_FNMA and CFN_FNMS. (fold_fma): Delete. * genmatch.c (combined_fn): New enum. (commutative_ternary_tree_code): Remove FMA_EXPR handling. (commutative_op): New function. (commutate): Use it. Handle more than 2 operands. (dt_operand::gen_gimple_expr): Use commutative_op. (parser::parse_expr): Allow :c to be used with non-binary operators if the commutative operand is known. * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle CFN_FMS, CFN_FNMA and CFN_FNMS. (backprop::process_assign_use): Remove FMA_EXPR handling. * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise. (gen_hsa_fma): New function. (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS, IFN_FNMA and IFN_FNMS. * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS. * gimple-fold.h (follow_all_ssa_edges): Declare. * gimple-fold.c (follow_all_ssa_edges): New function. * tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the gimple_build interface and use follow_all_ssa_edges to fold the result. (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p instead of checking for optabs directly. * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls rather than FMA_EXPRs. * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a call to IFN_FMA instead of an FMA_EXPR. gcc/brig/ * brigfrontend/brig-function.cc (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA for BRIG_OPCODE_FMA. (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA as a call. gcc/c/ * gimple-parser.c (c_parser_gimple_postfix_expression): Remove __FMA_EXPR handlng. gcc/cp/ * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling. (potential_constant_expression_1): Likewise. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_scalar_all_fma): New proc. * gcc.dg/fma-1.c: New test. * gcc.dg/fma-2.c: Likewise. * gcc.dg/fma-3.c: Likewise. * gcc.dg/fma-4.c: Likewise. * gcc.dg/fma-5.c: Likewise. * gcc.dg/fma-6.c: Likewise. * gcc.dg/fma-7.c: Likewise. * gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require scalar_all_fma. * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off. * gfortran.dg/reassoc_8.f: Likewise. * gfortran.dg/reassoc_9.f: Likewise. * gfortran.dg/reassoc_10.f: Likewise. From-SVN: r260348	2018-05-18 08:27:58 +00:00
Martin Liska	d80956bb05	Set proper internal functions fnspec (PR sanitizer/84307). 2018-02-16 Martin Liska <mliska@suse.cz> PR sanitizer/84307 * internal-fn.def (ASAN_CHECK): Set proper flags. (ASAN_MARK): Likewise. From-SVN: r257729	2018-02-16 10:03:47 +00:00
Paolo Bonzini	1bbae6518f	re PR sanitizer/84340 (g++.dg/asan/use-after-scope-types-1.C (and others) fails after r257585) gcc: 2018-02-13 Paolo Bonzini <bonzini@gnu.org> PR sanitizer/84340 * internal-fn.def (ASAN_CHECK, ASAN_MARK): Revert changes to fnspec. gcc/testsuite: 2018-02-13 Paolo Bonzini <bonzini@gnu.org> PR sanitizer/84307 * gcc.dg/asan/pr84307.c: Remove test. From-SVN: r257625	2018-02-13 13:03:22 +00:00
Paolo Bonzini	74a5138a61	re PR sanitizer/84307 (asan blocks dead-store elimination) gcc: 2018-02-12 Paolo Bonzini <bonzini@gnu.org> PR sanitizer/84307 * internal-fn.def (ASAN_CHECK): Fix fnspec to account for return value. (ASAN_MARK): Fix fnspec to account for return value, change pointer argument from 'R' to 'W' so that the pointed-to datum is clobbered. gcc/testsuite: 2018-02-12 Paolo Bonzini <bonzini@gnu.org> PR sanitizer/84307 * gcc.dg/asan/pr84307.c: New test. From-SVN: r257585	2018-02-12 12:47:56 +00:00
Richard Sandiford	f307441ac4	Add support for SVE scatter stores This is mostly a mechanical extension of the previous gather load support to scatter stores. The internal functions in this case are: IFN_SCATTER_STORE (base, offsets, scale, values) IFN_MASK_SCATTER_STORE (base, offsets, scale, values, mask) However, one nonobvious change is to vect_analyze_data_ref_access. If we're treating an access as a gather load or scatter store (i.e. if STMT_VINFO_GATHER_SCATTER_P is true), the existing code would create a dummy data_reference whose step is 0. There's not really much else it could do, since the whole point is that the step isn't predictable from iteration to iteration. We then went into this code in vect_analyze_data_ref_access: /* Allow loads with zero step in inner-loop vectorization. / if (loop_vinfo && integer_zerop (step)) { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; if (!nested_in_vect_loop_p (loop, stmt)) return DR_IS_READ (dr); I.e. we'd take the step literally and assume that this is a load or store to an invariant address. Loads from invariant addresses are supported but stores to them aren't. The code therefore had the effect of disabling all scatter stores. AFAICT this is true of AVX too: although tests like avx512f-scatter-1.c test for the correctness of a scatter-like loop, they don't seem to check whether a scatter instruction is actually used. The patch therefore makes vect_analyze_data_ref_access return true for scatters. We do seem to handle the aliasing correctly; that's tested by other functions, and is symmetrical to the already-working gather case. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ doc/sourcebuild.texi (vect_scatter_store): Document. * optabs.def (scatter_store_optab, mask_scatter_store_optab): New optabs. * doc/md.texi (scatter_store@var{m}, mask_scatter_store@var{m}): Document. * genopinit.c (main): Add supports_vec_scatter_store and supports_vec_scatter_store_cached to target_optabs. * gimple.h (gimple_expr_type): Handle IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE. * internal-fn.def (SCATTER_STORE, MASK_SCATTER_STORE): New internal functions. * internal-fn.h (internal_store_fn_p): Declare. (internal_fn_stored_value_index): Likewise. * internal-fn.c (scatter_store_direct): New macro. (expand_scatter_store_optab_fn): New function. (direct_scatter_store_optab_supported_p): New macro. (internal_store_fn_p): New function. (internal_gather_scatter_fn_p): Handle IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE. (internal_fn_mask_index): Likewise. (internal_fn_stored_value_index): New function. (internal_gather_scatter_fn_supported_p): Adjust operand numbers for scatter stores. * optabs-query.h (supports_vec_scatter_store_p): Declare. * optabs-query.c (supports_vec_scatter_store_p): New function. * tree-vectorizer.h (vect_get_store_rhs): Declare. * tree-vect-data-refs.c (vect_analyze_data_ref_access): Return true for scatter stores. (vect_gather_scatter_fn_p): Handle scatter stores too. (vect_check_gather_scatter): Consider using scatter stores if supports_vec_scatter_store_p. * tree-vect-patterns.c (vect_try_gather_scatter_pattern): Handle scatter stores too. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_stored_value_index. (check_load_store_masking): Handle scatter stores too. (vect_get_store_rhs): Make public. (vectorizable_call): Use internal_store_fn_p. (vectorizable_store): Handle scatter store internal functions. (vect_transform_stmt): Compare GROUP_STORE_COUNT with GROUP_SIZE when deciding whether the end of the group has been reached. * config/aarch64/aarch64.md (UNSPEC_ST1_SCATTER): New unspec. * config/aarch64/aarch64-sve.md (scatter_store<mode>): New expander. (mask_scatter_store<mode>): New insns. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_scatter_store): New proc. * gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on targets with scatter stores. * gcc.dg/vect/vect-71.c: Restrict XFAIL to targets without scatter stores. * gcc.target/aarch64/sve/mask_scatter_store_1.c: New test. * gcc.target/aarch64/sve/mask_scatter_store_2.c: Likewise. * gcc.target/aarch64/sve/scatter_store_1.c: Likewise. * gcc.target/aarch64/sve/scatter_store_2.c: Likewise. * gcc.target/aarch64/sve/scatter_store_3.c: Likewise. * gcc.target/aarch64/sve/scatter_store_4.c: Likewise. * gcc.target/aarch64/sve/scatter_store_5.c: Likewise. * gcc.target/aarch64/sve/scatter_store_6.c: Likewise. * gcc.target/aarch64/sve/scatter_store_7.c: Likewise. * gcc.target/aarch64/sve/strided_store_1.c: Likewise. * gcc.target/aarch64/sve/strided_store_2.c: Likewise. * gcc.target/aarch64/sve/strided_store_3.c: Likewise. * gcc.target/aarch64/sve/strided_store_4.c: Likewise. * gcc.target/aarch64/sve/strided_store_5.c: Likewise. * gcc.target/aarch64/sve/strided_store_6.c: Likewise. * gcc.target/aarch64/sve/strided_store_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256643	2018-01-13 18:01:59 +00:00
Richard Sandiford	bfaa08b7ba	Add support for SVE gather loads This patch adds support for SVE gather loads. It uses the basically the same analysis code as the AVX gather support, but after that there are two major differences: - It uses new internal functions rather than target built-ins. The interface is: IFN_GATHER_LOAD (base, offsets scale) IFN_MASK_GATHER_LOAD (base, offsets scale, mask) which should be reasonably generic. One of the advantages of using internal functions is that other passes can understand what the functions do, but a more immediate advantage is that we can query the underlying target pattern to see which scales it supports. - It uses pattern recognition to convert the offset to the right width, if it was originally narrower than that. This avoids having to do a widening operation as part of the gather expansion itself. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (gather_load@var{m}): Document. (mask_gather_load@var{m}): Likewise. * genopinit.c (main): Add supports_vec_gather_load and supports_vec_gather_load_cached to target_optabs. * optabs-tree.c (init_tree_optimization_optabs): Use ggc_cleared_alloc to allocate target_optabs. * optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs. * internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal functions. * internal-fn.h (internal_load_fn_p): Declare. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * internal-fn.c (gather_load_direct): New macro. (expand_gather_load_optab_fn): New function. (direct_gather_load_optab_supported_p): New macro. (direct_internal_fn_optab): New function. (internal_load_fn_p): Likewise. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * optabs-query.c (supports_at_least_one_mode_p): New function. (supports_vec_gather_load_p): Likewise. * optabs-query.h (supports_vec_gather_load_p): Declare. * tree-vectorizer.h (gather_scatter_info): Add ifn, element_type and memory_type field. (NUM_PATTERNS): Bump to 15. * tree-vect-data-refs.c: Include internal-fn.h. (vect_gather_scatter_fn_p): New function. (vect_describe_gather_scatter_call): Likewise. (vect_check_gather_scatter): Try using internal functions for gather loads. Recognize existing calls to a gather load function. (vect_analyze_data_refs): Consider using gather loads if supports_vec_gather_load_p. * tree-vect-patterns.c (vect_get_load_store_mask): New function. (vect_get_gather_scatter_offset_type): Likewise. (vect_convert_mask_for_vectype): Likewise. (vect_add_conversion_to_patterm): Likewise. (vect_try_gather_scatter_pattern): Likewise. (vect_recog_gather_scatter_pattern): New pattern recognizer. (vect_vect_recog_func_ptrs): Add it. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_mask_index and internal_gather_scatter_fn_p. (check_load_store_masking): Take the gather_scatter_info as an argument and handle gather loads. (vect_get_gather_scatter_ops): New function. (vectorizable_call): Check internal_load_fn_p. (vectorizable_load): Likewise. Handle gather load internal functions. (vectorizable_store): Update call to check_load_store_masking. * config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec. * config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators. * config/aarch64/predicates.md (aarch64_gather_scale_operand_w) (aarch64_gather_scale_operand_d): New predicates. * config/aarch64/aarch64-sve.md (gather_load<mode>): New expander. (mask_gather_load<mode>): New insns. gcc/testsuite/ * gcc.target/aarch64/sve/gather_load_1.c: New test. * gcc.target/aarch64/sve/gather_load_2.c: Likewise. * gcc.target/aarch64/sve/gather_load_3.c: Likewise. * gcc.target/aarch64/sve/gather_load_4.c: Likewise. * gcc.target/aarch64/sve/gather_load_5.c: Likewise. * gcc.target/aarch64/sve/gather_load_6.c: Likewise. * gcc.target/aarch64/sve/gather_load_7.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256640	2018-01-13 18:01:34 +00:00
Richard Sandiford	b781a135a0	Add support for in-order addition reduction using SVE FADDA This patch adds support for in-order floating-point addition reductions, which are suitable even in strict IEEE mode. Previously vect_is_simple_reduction would reject any cases that forbid reassociation. The idea is instead to tentatively accept them as "FOLD_LEFT_REDUCTIONs" and only fail later if there is no support for them. Although this patch only handles the particular case of plus and minus on floating-point types, there's no reason in principle why we couldn't handle other cases. The reductions use a new fold_left_plus_optab if available, otherwise they fall back to elementwise additions or subtractions. The vect_force_simple_reduction change makes it easier for parloops to read the type of reduction. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (fold_left_plus_optab): New optab. * doc/md.texi (fold_left_plus_@var{m}): Document. * internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function. * internal-fn.c (fold_left_direct): Define. (expand_fold_left_optab_fn): Likewise. (direct_fold_left_optab_supported_p): Likewise. * fold-const-call.c (fold_const_fold_left): New function. (fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS. * tree-parloops.c (valid_reduction_p): New function. (gather_scalar_reductions): Use it. * tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type. (vect_finish_replace_stmt): Declare. * tree-vect-loop.c (fold_left_reduction_fn): New function. (needs_fold_left_reduction_p): New function, split out from... (vect_is_simple_reduction): ...here. Accept reductions that forbid reassociation, but give them type FOLD_LEFT_REDUCTION. (vect_force_simple_reduction): Also store the reduction type in the assignment's STMT_VINFO_REDUC_TYPE. (vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION. (merge_with_identity): New function. (vect_expand_fold_left): Likewise. (vectorize_fold_left_reduction): Likewise. (vectorizable_reduction): Handle FOLD_LEFT_REDUCTION. Leave the scalar phi in place for it. Check for target support and reject cases that would reassociate the operation. Defer the transform phase to vectorize_fold_left_reduction. * config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec. * config/aarch64/aarch64-sve.md (fold_left_plus_<mode>): New expander. (fold_left_plus_<mode>, pred_fold_left_plus_<mode>): New insns. gcc/testsuite/ * gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and check for a message about using in-order reductions. * gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and check for a message about using in-order reductions. * gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be vectorized and check for a message about using in-order reductions. Expect targets with variable-length vectors to fall back to the fixed-length mininum. * gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and check for a message about using in-order reductions. * gcc.dg/vect/vect-reduc-in-order-1.c: New test. * gcc.dg/vect/vect-reduc-in-order-2.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-3.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-4.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_1.c: New test. * gcc.target/aarch64/sve/reduc_strict_1_run.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_2.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_2_run.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_3.c: Likewise. * gcc.target/aarch64/sve/slp_13.c: Add floating-point types. * gfortran.dg/vect/vect-8.f90: Expect 22 loops to be vectorized if vect_fold_left_plus. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256639	2018-01-13 18:01:24 +00:00
Richard Sandiford	bb6c2b68d6	Add support for conditional reductions using SVE CLASTB This patch uses SVE CLASTB to optimise conditional reductions. It means that we no longer need to maintain a separate index vector to record the most recent valid value, and no longer need to worry about overflow cases. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (fold_extract_last_@var{m}): Document. * doc/sourcebuild.texi (vect_fold_extract_last): Likewise. * optabs.def (fold_extract_last_optab): New optab. * internal-fn.def (FOLD_EXTRACT_LAST): New internal function. * internal-fn.c (fold_extract_direct): New macro. (expand_fold_extract_optab_fn): Likewise. (direct_fold_extract_optab_supported_p): Likewise. * tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type. * tree-vect-loop.c (vect_model_reduction_cost): Handle EXTRACT_LAST_REDUCTION. (get_initial_def_for_reduction): Do not create an initial vector for EXTRACT_LAST_REDUCTION reductions. (vectorizable_reduction): Leave the scalar phi in place for EXTRACT_LAST_REDUCTIONs. Try using EXTRACT_LAST_REDUCTION ahead of INTEGER_INDUC_COND_REDUCTION. Do not check for an epilogue code for EXTRACT_LAST_REDUCTION and defer the transform phase to vectorizable_condition. * tree-vect-stmts.c (vect_finish_stmt_generation_1): New function, split out from... (vect_finish_stmt_generation): ...here. (vect_finish_replace_stmt): New function. (vectorizable_condition): Handle EXTRACT_LAST_REDUCTION. * config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New pattern. * config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_fold_extract_last): New proc. * gcc.dg/vect/pr65947-1.c: Update dump messages. Add markup for fold_extract_last. * gcc.dg/vect/pr65947-2.c: Likewise. * gcc.dg/vect/pr65947-3.c: Likewise. * gcc.dg/vect/pr65947-4.c: Likewise. * gcc.dg/vect/pr65947-5.c: Likewise. * gcc.dg/vect/pr65947-6.c: Likewise. * gcc.dg/vect/pr65947-9.c: Likewise. * gcc.dg/vect/pr65947-10.c: Likewise. * gcc.dg/vect/pr65947-12.c: Likewise. * gcc.dg/vect/pr65947-14.c: Likewise. * gcc.dg/vect/pr80631-1.c: Likewise. * gcc.target/aarch64/sve/clastb_1.c: New test. * gcc.target/aarch64/sve/clastb_1_run.c: Likewise. * gcc.target/aarch64/sve/clastb_2.c: Likewise. * gcc.target/aarch64/sve/clastb_2_run.c: Likewise. * gcc.target/aarch64/sve/clastb_3.c: Likewise. * gcc.target/aarch64/sve/clastb_3_run.c: Likewise. * gcc.target/aarch64/sve/clastb_4.c: Likewise. * gcc.target/aarch64/sve/clastb_4_run.c: Likewise. * gcc.target/aarch64/sve/clastb_5.c: Likewise. * gcc.target/aarch64/sve/clastb_5_run.c: Likewise. * gcc.target/aarch64/sve/clastb_6.c: Likewise. * gcc.target/aarch64/sve/clastb_6_run.c: Likewise. * gcc.target/aarch64/sve/clastb_7.c: Likewise. * gcc.target/aarch64/sve/clastb_7_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256633	2018-01-13 17:59:59 +00:00
Richard Sandiford	bfe1bb57ba	Add support for vectorising live-out values using SVE LASTB This patch uses the SVE LASTB instruction to optimise cases in which a value produced by the final scalar iteration of a vectorised loop is live outside the loop. Previously this situation would stop us from using a fully-masked loop. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (extract_last_@var{m}): Document. * optabs.def (extract_last_optab): New optab. * internal-fn.def (EXTRACT_LAST): New internal function. * internal-fn.c (cond_unary_direct): New macro. (expand_cond_unary_optab_fn): Likewise. (direct_cond_unary_optab_supported_p): Likewise. * tree-vect-loop.c (vectorizable_live_operation): Allow fully-masked loops using EXTRACT_LAST. * config/aarch64/aarch64-sve.md (aarch64_sve_lastb<mode>): Rename to... (extract_last_<mode>): ...this optab. (vec_extract<mode><Vel>): Update accordingly. gcc/testsuite/ * gcc.target/aarch64/sve/live_1.c: New test. * gcc.target/aarch64/sve/live_1_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256632	2018-01-13 17:59:50 +00:00

1 2 3

118 Commits