Commit Graph

118 Commits

Author SHA1 Message Date
Jakub Jelinek 463d910876 widening_mul, i386: Improve spaceship expansion on x86 [PR103973]
C++20:
 #include <compare>
 auto cmp4way(double a, double b)
 {
   return a <=> b;
 }
expands to:
        ucomisd %xmm1, %xmm0
        jp      .L8
        movl    $0, %eax
        jne     .L8
.L2:
        ret
        .p2align 4,,10
        .p2align 3
.L8:
        comisd  %xmm0, %xmm1
        movl    $-1, %eax
        ja      .L2
        ucomisd %xmm1, %xmm0
        setbe   %al
        addl    $1, %eax
        ret
That is 3 comparisons of the same operands.
The following patch improves it to just one comparison:
        comisd  %xmm1, %xmm0
        jp      .L4
        seta    %al
        movl    $0, %edx
        leal    -1(%rax,%rax), %eax
        cmove   %edx, %eax
        ret
.L4:
        movl    $2, %eax
        ret
While a <=> b expands to a == b ? 0 : a < b ? -1 : a > b ? 1 : 2
where the first comparison is equality and this shouldn't raise
exceptions on qNaN operands, if the operands aren't equal (which
includes unordered cases), then it immediately performs < or >
comparison and that raises exceptions even on qNaNs, so we can just
perform a single comparison that raises exceptions on qNaN.
As the 4 different cases are encoded as
ZF CF PF
1  1  1  a unordered b
0  0  0  a > b
0  1  0  a < b
1  0  0  a == b
we can emit optimal sequence of comparions, first jp
for the unordered case, then je for the == case and finally jb
for the < case.

The patch pattern recognizes spaceship-like comparisons during
widening_mul if the spaceship optab is implemented, and replaces
those comparisons with comparisons of .SPACESHIP ifn which returns
-1/0/1/2 based on the comparison.  This seems to work well both for the
case of just returning the -1/0/1/2 (when we have just a common
successor with a PHI) or when the different cases are handled with
various other basic blocks.  The testcases cover both of those cases,
the latter with different function calls in those.

2022-01-17  Jakub Jelinek  <jakub@redhat.com>

	PR target/103973
	* tree-cfg.h (cond_only_block_p): Declare.
	* tree-ssa-phiopt.c (cond_only_block_p): Move function to ...
	* tree-cfg.c (cond_only_block_p): ... here.  No longer static.
	* optabs.def (spaceship_optab): New optab.
	* internal-fn.def (SPACESHIP): New internal function.
	* internal-fn.h (expand_SPACESHIP): Declare.
	* internal-fn.c (expand_PHI): Formatting fix.
	(expand_SPACESHIP): New function.
	* tree-ssa-math-opts.c (optimize_spaceship): New function.
	(math_opts_dom_walker::after_dom_children): Use it.
	* config/i386/i386.md (spaceship<mode>3): New define_expand.
	* config/i386/i386-protos.h (ix86_expand_fp_spaceship): Declare.
	* config/i386/i386-expand.c (ix86_expand_fp_spaceship): New function.
	* doc/md.texi (spaceship@var{m}3): Document.

	* gcc.target/i386/pr103973-1.c: New test.
	* gcc.target/i386/pr103973-2.c: New test.
	* gcc.target/i386/pr103973-3.c: New test.
	* gcc.target/i386/pr103973-4.c: New test.
	* gcc.target/i386/pr103973-5.c: New test.
	* gcc.target/i386/pr103973-6.c: New test.
	* gcc.target/i386/pr103973-7.c: New test.
	* gcc.target/i386/pr103973-8.c: New test.
	* gcc.target/i386/pr103973-9.c: New test.
	* gcc.target/i386/pr103973-10.c: New test.
	* gcc.target/i386/pr103973-11.c: New test.
	* gcc.target/i386/pr103973-12.c: New test.
	* gcc.target/i386/pr103973-13.c: New test.
	* gcc.target/i386/pr103973-14.c: New test.
	* gcc.target/i386/pr103973-15.c: New test.
	* gcc.target/i386/pr103973-16.c: New test.
	* gcc.target/i386/pr103973-17.c: New test.
	* gcc.target/i386/pr103973-18.c: New test.
	* gcc.target/i386/pr103973-19.c: New test.
	* gcc.target/i386/pr103973-20.c: New test.
	* g++.target/i386/pr103973-1.C: New test.
	* g++.target/i386/pr103973-2.C: New test.
	* g++.target/i386/pr103973-3.C: New test.
	* g++.target/i386/pr103973-4.C: New test.
	* g++.target/i386/pr103973-5.C: New test.
	* g++.target/i386/pr103973-6.C: New test.
	* g++.target/i386/pr103973-7.C: New test.
	* g++.target/i386/pr103973-8.C: New test.
	* g++.target/i386/pr103973-9.C: New test.
	* g++.target/i386/pr103973-10.C: New test.
	* g++.target/i386/pr103973-11.C: New test.
	* g++.target/i386/pr103973-12.C: New test.
	* g++.target/i386/pr103973-13.C: New test.
	* g++.target/i386/pr103973-14.C: New test.
	* g++.target/i386/pr103973-15.C: New test.
	* g++.target/i386/pr103973-16.C: New test.
	* g++.target/i386/pr103973-17.C: New test.
	* g++.target/i386/pr103973-18.C: New test.
	* g++.target/i386/pr103973-19.C: New test.
	* g++.target/i386/pr103973-20.C: New test.
2022-01-17 13:39:05 +01:00
Jakub Jelinek 6362627b27 i386, fab: Optimize __atomic_{add,sub,and,or,xor}_fetch (x, y, z) {==,!=,<,<=,>,>=} 0 [PR98737]
On Wed, Jan 27, 2021 at 12:27:13PM +0100, Ulrich Drepper via Gcc-patches wrote:
> On 1/27/21 11:37 AM, Jakub Jelinek wrote:
> > Would equality comparison against 0 handle the most common cases.
> >
> > The user can write it as
> > __atomic_sub_fetch (x, y, z) == 0
> > or
> > __atomic_fetch_sub (x, y, z) - y == 0
> > thouch, so the expansion code would need to be able to cope with both.
>
> Please also keep !=0, <0, <=0, >0, and >=0 in mind.  They all can be
> useful and can be handled with the flags.

<= 0 and > 0 don't really work well with lock {add,sub,inc,dec}, x86 doesn't
have comparisons that would look solely at both SF and ZF and not at other
flags (and emitting two separate conditional jumps or two setcc insns and
oring them together looks awful).

But the rest can work.

Here is a patch that adds internal functions and optabs for these,
recognizes them at the same spot as e.g. .ATOMIC_BIT_TEST_AND* internal
functions (fold all builtins pass) and expands them appropriately (or for
the <= 0 and > 0 cases of +/- FAILs and let's middle-end fall back).

So far I have handled just the op_fetch builtins, IMHO instead of handling
also __atomic_fetch_sub (x, y, z) - y == 0 etc. we should canonicalize
__atomic_fetch_sub (x, y, z) - y to __atomic_sub_fetch (x, y, z) (and vice
versa).

2022-01-03  Jakub Jelinek  <jakub@redhat.com>

	PR target/98737
	* internal-fn.def (ATOMIC_ADD_FETCH_CMP_0, ATOMIC_SUB_FETCH_CMP_0,
	ATOMIC_AND_FETCH_CMP_0, ATOMIC_OR_FETCH_CMP_0, ATOMIC_XOR_FETCH_CMP_0):
	New internal fns.
	* internal-fn.h (ATOMIC_OP_FETCH_CMP_0_EQ, ATOMIC_OP_FETCH_CMP_0_NE,
	ATOMIC_OP_FETCH_CMP_0_LT, ATOMIC_OP_FETCH_CMP_0_LE,
	ATOMIC_OP_FETCH_CMP_0_GT, ATOMIC_OP_FETCH_CMP_0_GE): New enumerators.
	* internal-fn.c (expand_ATOMIC_ADD_FETCH_CMP_0,
	expand_ATOMIC_SUB_FETCH_CMP_0, expand_ATOMIC_AND_FETCH_CMP_0,
	expand_ATOMIC_OR_FETCH_CMP_0, expand_ATOMIC_XOR_FETCH_CMP_0): New
	functions.
	* optabs.def (atomic_add_fetch_cmp_0_optab,
	atomic_sub_fetch_cmp_0_optab, atomic_and_fetch_cmp_0_optab,
	atomic_or_fetch_cmp_0_optab, atomic_xor_fetch_cmp_0_optab): New
	direct optabs.
	* builtins.h (expand_ifn_atomic_op_fetch_cmp_0): Declare.
	* builtins.c (expand_ifn_atomic_op_fetch_cmp_0): New function.
	* tree-ssa-ccp.c: Include internal-fn.h.
	(optimize_atomic_bit_test_and): Add . before internal fn call
	in function comment.  Change return type from void to bool and
	return true only if successfully replaced.
	(optimize_atomic_op_fetch_cmp_0): New function.
	(pass_fold_builtins::execute): Use optimize_atomic_op_fetch_cmp_0
	for BUILT_IN_ATOMIC_{ADD,SUB,AND,OR,XOR}_FETCH_{1,2,4,8,16} and
	BUILT_IN_SYNC_{ADD,SUB,AND,OR,XOR}_AND_FETCH_{1,2,4,8,16},
	for *XOR* ones only if optimize_atomic_bit_test_and failed.
	* config/i386/sync.md (atomic_<plusminus_mnemonic>_fetch_cmp_0<mode>,
	atomic_<logic>_fetch_cmp_0<mode>): New define_expand patterns.
	(atomic_add_fetch_cmp_0<mode>_1, atomic_sub_fetch_cmp_0<mode>_1,
	atomic_<logic>_fetch_cmp_0<mode>_1): New define_insn patterns.
	* doc/md.texi (atomic_add_fetch_cmp_0<mode>,
	atomic_sub_fetch_cmp_0<mode>, atomic_and_fetch_cmp_0<mode>,
	atomic_or_fetch_cmp_0<mode>, atomic_xor_fetch_cmp_0<mode>): Document
	new named patterns.

	* gcc.target/i386/pr98737-1.c: New test.
	* gcc.target/i386/pr98737-2.c: New test.
	* gcc.target/i386/pr98737-3.c: New test.
	* gcc.target/i386/pr98737-4.c: New test.
	* gcc.target/i386/pr98737-5.c: New test.
	* gcc.target/i386/pr98737-6.c: New test.
	* gcc.target/i386/pr98737-7.c: New test.
2022-01-03 14:17:26 +01:00
Jakub Jelinek 7adcbafe45 Update copyright years. 2022-01-03 10:42:10 +01:00
Richard Sandiford e32b9eb32d vect: Add support for fmax and fmin reductions
This patch adds support for reductions involving calls to fmax*()
and fmin*(), without the -ffast-math flags that allow them to be
converted to MAX_EXPR and MIN_EXPR.

gcc/
	* doc/md.texi (reduc_fmin_scal_@var{m}): Document.
	(reduc_fmax_scal_@var{m}): Likewise.
	* optabs.def (reduc_fmax_scal_optab): New optab.
	(reduc_fmin_scal_optab): Likewise
	* internal-fn.def (REDUC_FMAX, REDUC_FMIN): New functions.
	* tree-vect-loop.c (reduction_fn_for_scalar_code): Handle
	CASE_CFN_FMAX and CASE_CFN_FMIN.
	(neutral_op_for_reduction): Likewise.
	(needs_fold_left_reduction_p): Likewise.
	* config/aarch64/iterators.md (FMAXMINV): New iterator.
	(fmaxmin): Handle UNSPEC_FMAXNMV and UNSPEC_FMINNMV.
	* config/aarch64/aarch64-simd.md (reduc_<optab>_scal_<mode>): Fix
	unspec mode.
	(reduc_<fmaxmin>_scal_<mode>): New pattern.
	* config/aarch64/aarch64-sve.md (reduc_<fmaxmin>_scal_<mode>):
	Likewise.

gcc/testsuite/
	* gcc.dg/vect/vect-fmax-1.c: New test.
	* gcc.dg/vect/vect-fmax-2.c: Likewise.
	* gcc.dg/vect/vect-fmax-3.c: Likewise.
	* gcc.dg/vect/vect-fmin-1.c: New test.
	* gcc.dg/vect/vect-fmin-2.c: Likewise.
	* gcc.dg/vect/vect-fmin-3.c: Likewise.
	* gcc.target/aarch64/fmaxnm_1.c: Likewise.
	* gcc.target/aarch64/fmaxnm_2.c: Likewise.
	* gcc.target/aarch64/fminnm_1.c: Likewise.
	* gcc.target/aarch64/fminnm_2.c: Likewise.
	* gcc.target/aarch64/sve/fmaxnm_2.c: Likewise.
	* gcc.target/aarch64/sve/fmaxnm_3.c: Likewise.
	* gcc.target/aarch64/sve/fminnm_2.c: Likewise.
	* gcc.target/aarch64/sve/fminnm_3.c: Likewise.
2021-11-30 09:52:25 +00:00
Richard Sandiford 7061300025 Add IFN_COND_FMIN/FMAX functions
This patch adds conditional forms of FMAX and FMIN, following
the pattern for existing conditional binary functions.

gcc/
	* doc/md.texi (cond_fmin@var{mode}, cond_fmax@var{mode}): Document.
	* optabs.def (cond_fmin_optab, cond_fmax_optab): New optabs.
	* internal-fn.def (COND_FMIN, COND_FMAX): New functions.
	* internal-fn.c (first_commutative_argument): Handle them.
	(FOR_EACH_COND_FN_PAIR): Likewise.
	* match.pd (UNCOND_BINARY, COND_BINARY): Likewise.
	* config/aarch64/aarch64-sve.md (cond_<fmaxmin><mode>): New
	pattern.

gcc/testsuite/
	* gcc.target/aarch64/sve/cond_fmaxnm_5.c: New test.
	* gcc.target/aarch64/sve/cond_fmaxnm_5_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_fmaxnm_6.c: Likewise.
	* gcc.target/aarch64/sve/cond_fmaxnm_6_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_fmaxnm_7.c: Likewise.
	* gcc.target/aarch64/sve/cond_fmaxnm_7_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_fmaxnm_8.c: Likewise.
	* gcc.target/aarch64/sve/cond_fmaxnm_8_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_fminnm_5.c: Likewise.
	* gcc.target/aarch64/sve/cond_fminnm_5_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_fminnm_6.c: Likewise.
	* gcc.target/aarch64/sve/cond_fminnm_6_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_fminnm_7.c: Likewise.
	* gcc.target/aarch64/sve/cond_fminnm_7_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_fminnm_8.c: Likewise.
	* gcc.target/aarch64/sve/cond_fminnm_8_run.c: Likewise.
2021-11-17 12:28:44 +00:00
prathamesh.kulkarni 20dcda98ed [sve] PR93183 - Add support for conditional neg.
gcc/ChangeLog:
	PR target/93183
	* gimple-match-head.c (try_conditional_simplification): Add case for single operand.
	* internal-fn.def: Add entry for COND_NEG internal function.
	* internal-fn.c (FOR_EACH_CODE_MAPPING): Add entry for
	NEGATE_EXPR, COND_NEG mapping.
	* optabs.def: Add entry for cond_neg_optab.
	* match.pd (UNCOND_UNARY, COND_UNARY): New operator lists.
	(vec_cond COND (foo A) B) -> (IFN_COND_FOO COND A B): New pattern.
	(vec_cond COND B (foo A)) -> (IFN_COND_FOO ~COND A B): Likewise.

gcc/testsuite/ChangeLog:
	PR target/93183
	* gcc.target/aarch64/sve/cond_unary_4.c: Adjust.
	* gcc.target/aarch64/sve/pr93183.c: New test.
2021-10-18 15:44:06 +05:30
Stefan Schulze Frielinghaus 6f966f0614 ldist: Recognize strlen and rawmemchr like loops
This patch adds support for recognizing loops which mimic the behaviour
of functions strlen and rawmemchr, and replaces those with internal
function calls in case a target provides them.  In contrast to the
standard strlen and rawmemchr functions, this patch also supports
different instances where the memory pointed to is interpreted as 8, 16,
and 32-bit sized, respectively.

gcc/ChangeLog:

	* builtins.c (get_memory_rtx): Change to external linkage.
	* builtins.h (get_memory_rtx): Add function prototype.
	* doc/md.texi (rawmemchr<mode>): Document.
	* internal-fn.c (expand_RAWMEMCHR): Define.
	* internal-fn.def (RAWMEMCHR): Add.
	* optabs.def (rawmemchr_optab): Add.
	* tree-loop-distribution.c (find_single_drs): Change return code
	behaviour by also returning true if no single store was found
	but a single load.
	(loop_distribution::classify_partition): Respect the new return
	code behaviour of function find_single_drs.
	(loop_distribution::execute): Call new function
	transform_reduction_loop in order to replace rawmemchr or strlen
	like loops by calls into builtins.
	(generate_reduction_builtin_1): New function.
	(generate_rawmemchr_builtin): New function.
	(generate_strlen_builtin_1): New function.
	(generate_strlen_builtin): New function.
	(generate_strlen_builtin_using_rawmemchr): New function.
	(reduction_var_overflows_first): New function.
	(determine_reduction_stmt_1): New function.
	(determine_reduction_stmt): New function.
	(loop_distribution::transform_reduction_loop): New function.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ldist-rawmemchr-1.c: New test.
	* gcc.dg/tree-ssa/ldist-rawmemchr-2.c: New test.
	* gcc.dg/tree-ssa/ldist-strlen-1.c: New test.
	* gcc.dg/tree-ssa/ldist-strlen-2.c: New test.
	* gcc.dg/tree-ssa/ldist-strlen-3.c: New test.
2021-10-11 09:59:13 +02:00
qing zhao a25e0b5e6a Add -ftrivial-auto-var-init option and uninitialized variable attribute.
Initialize automatic variables with either a pattern or with zeroes to increase
the security and predictability of a program by preventing uninitialized memory
disclosure and use.
GCC still considers an automatic variable that doesn't have an explicit
initializer as uninitialized, -Wuninitialized will still report warning messages
on such automatic variables.
With this option, GCC will also initialize any padding of automatic variables
that have structure or union types to zeroes.
You can control this behavior for a specific variable by using the variable
attribute "uninitialized" to control runtime overhead.

gcc/ChangeLog:

2021-09-09  qing zhao  <qing.zhao@oracle.com>

	* builtins.c (expand_builtin_memset): Make external visible.
	* builtins.h (expand_builtin_memset): Declare extern.
	* common.opt (ftrivial-auto-var-init=): New option.
	* doc/extend.texi: Document the uninitialized attribute.
	* doc/invoke.texi: Document -ftrivial-auto-var-init.
	* flag-types.h (enum auto_init_type): New enumerated type
	auto_init_type.
	* gimple-fold.c (clear_padding_type): Add one new parameter.
	(clear_padding_union): Likewise.
	(clear_padding_emit_loop): Likewise.
	(clear_type_padding_in_mask): Likewise.
	(gimple_fold_builtin_clear_padding): Handle this new parameter.
	* gimplify.c (gimple_add_init_for_auto_var): New function.
	(gimple_add_padding_init_for_auto_var): New function.
	(is_var_need_auto_init): New function.
	(gimplify_decl_expr): Add initialization to automatic variables per
	users' requests.
	(gimplify_call_expr): Add one new parameter for call to
	__builtin_clear_padding.
	(gimplify_init_constructor): Add padding initialization in the end.
	* internal-fn.c (INIT_PATTERN_VALUE): New macro.
	(expand_DEFERRED_INIT): New function.
	* internal-fn.def (DEFERRED_INIT): New internal function.
	* tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT.
	* tree-sra.c (generate_subtree_deferred_init): New function.
	(scan_function): Avoid setting cannot_scalarize_away_bitmap for
	calls to .DEFERRED_INIT.
	(sra_modify_deferred_init): New function.
	(sra_modify_function_body): Handle calls to DEFERRED_INIT specially.
	* tree-ssa-structalias.c (find_func_aliases_for_call): Likewise.
	* tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT
	specially.
	(check_defs): Likewise.
	(warn_uninitialized_vars): Likewise.
	* tree-ssa.c (ssa_undefined_value_p): Likewise.
	* tree.c (build_common_builtin_nodes): Build tree node for
	BUILT_IN_CLEAR_PADDING when needed.

gcc/c-family/ChangeLog:

2021-09-09  qing zhao  <qing.zhao@oracle.com>

	* c-attribs.c (handle_uninitialized_attribute): New function.
	(c_common_attribute_table): Add "uninitialized" attribute.

gcc/testsuite/ChangeLog:

2021-09-09  qing zhao  <qing.zhao@oracle.com>

	* c-c++-common/auto-init-1.c: New test.
	* c-c++-common/auto-init-10.c: New test.
	* c-c++-common/auto-init-11.c: New test.
	* c-c++-common/auto-init-12.c: New test.
	* c-c++-common/auto-init-13.c: New test.
	* c-c++-common/auto-init-14.c: New test.
	* c-c++-common/auto-init-15.c: New test.
	* c-c++-common/auto-init-16.c: New test.
	* c-c++-common/auto-init-2.c: New test.
	* c-c++-common/auto-init-3.c: New test.
	* c-c++-common/auto-init-4.c: New test.
	* c-c++-common/auto-init-5.c: New test.
	* c-c++-common/auto-init-6.c: New test.
	* c-c++-common/auto-init-7.c: New test.
	* c-c++-common/auto-init-8.c: New test.
	* c-c++-common/auto-init-9.c: New test.
	* c-c++-common/auto-init-esra.c: New test.
	* c-c++-common/auto-init-padding-1.c: New test.
	* c-c++-common/auto-init-padding-2.c: New test.
	* c-c++-common/auto-init-padding-3.c: New test.
	* g++.dg/auto-init-uninit-pred-1_a.C: New test.
	* g++.dg/auto-init-uninit-pred-2_a.C: New test.
	* g++.dg/auto-init-uninit-pred-3_a.C: New test.
	* g++.dg/auto-init-uninit-pred-4.C: New test.
	* gcc.dg/auto-init-sra-1.c: New test.
	* gcc.dg/auto-init-sra-2.c: New test.
	* gcc.dg/auto-init-uninit-1.c: New test.
	* gcc.dg/auto-init-uninit-12.c: New test.
	* gcc.dg/auto-init-uninit-13.c: New test.
	* gcc.dg/auto-init-uninit-14.c: New test.
	* gcc.dg/auto-init-uninit-15.c: New test.
	* gcc.dg/auto-init-uninit-16.c: New test.
	* gcc.dg/auto-init-uninit-17.c: New test.
	* gcc.dg/auto-init-uninit-18.c: New test.
	* gcc.dg/auto-init-uninit-19.c: New test.
	* gcc.dg/auto-init-uninit-2.c: New test.
	* gcc.dg/auto-init-uninit-20.c: New test.
	* gcc.dg/auto-init-uninit-21.c: New test.
	* gcc.dg/auto-init-uninit-22.c: New test.
	* gcc.dg/auto-init-uninit-23.c: New test.
	* gcc.dg/auto-init-uninit-24.c: New test.
	* gcc.dg/auto-init-uninit-25.c: New test.
	* gcc.dg/auto-init-uninit-26.c: New test.
	* gcc.dg/auto-init-uninit-3.c: New test.
	* gcc.dg/auto-init-uninit-34.c: New test.
	* gcc.dg/auto-init-uninit-36.c: New test.
	* gcc.dg/auto-init-uninit-37.c: New test.
	* gcc.dg/auto-init-uninit-4.c: New test.
	* gcc.dg/auto-init-uninit-5.c: New test.
	* gcc.dg/auto-init-uninit-6.c: New test.
	* gcc.dg/auto-init-uninit-8.c: New test.
	* gcc.dg/auto-init-uninit-9.c: New test.
	* gcc.dg/auto-init-uninit-A.c: New test.
	* gcc.dg/auto-init-uninit-B.c: New test.
	* gcc.dg/auto-init-uninit-C.c: New test.
	* gcc.dg/auto-init-uninit-H.c: New test.
	* gcc.dg/auto-init-uninit-I.c: New test.
	* gcc.target/aarch64/auto-init-1.c: New test.
	* gcc.target/aarch64/auto-init-2.c: New test.
	* gcc.target/aarch64/auto-init-3.c: New test.
	* gcc.target/aarch64/auto-init-4.c: New test.
	* gcc.target/aarch64/auto-init-5.c: New test.
	* gcc.target/aarch64/auto-init-6.c: New test.
	* gcc.target/aarch64/auto-init-7.c: New test.
	* gcc.target/aarch64/auto-init-8.c: New test.
	* gcc.target/aarch64/auto-init-padding-1.c: New test.
	* gcc.target/aarch64/auto-init-padding-10.c: New test.
	* gcc.target/aarch64/auto-init-padding-11.c: New test.
	* gcc.target/aarch64/auto-init-padding-12.c: New test.
	* gcc.target/aarch64/auto-init-padding-2.c: New test.
	* gcc.target/aarch64/auto-init-padding-3.c: New test.
	* gcc.target/aarch64/auto-init-padding-4.c: New test.
	* gcc.target/aarch64/auto-init-padding-5.c: New test.
	* gcc.target/aarch64/auto-init-padding-6.c: New test.
	* gcc.target/aarch64/auto-init-padding-7.c: New test.
	* gcc.target/aarch64/auto-init-padding-8.c: New test.
	* gcc.target/aarch64/auto-init-padding-9.c: New test.
	* gcc.target/i386/auto-init-1.c: New test.
	* gcc.target/i386/auto-init-2.c: New test.
	* gcc.target/i386/auto-init-21.c: New test.
	* gcc.target/i386/auto-init-22.c: New test.
	* gcc.target/i386/auto-init-23.c: New test.
	* gcc.target/i386/auto-init-24.c: New test.
	* gcc.target/i386/auto-init-3.c: New test.
	* gcc.target/i386/auto-init-4.c: New test.
	* gcc.target/i386/auto-init-5.c: New test.
	* gcc.target/i386/auto-init-6.c: New test.
	* gcc.target/i386/auto-init-7.c: New test.
	* gcc.target/i386/auto-init-8.c: New test.
	* gcc.target/i386/auto-init-padding-1.c: New test.
	* gcc.target/i386/auto-init-padding-10.c: New test.
	* gcc.target/i386/auto-init-padding-11.c: New test.
	* gcc.target/i386/auto-init-padding-12.c: New test.
	* gcc.target/i386/auto-init-padding-2.c: New test.
	* gcc.target/i386/auto-init-padding-3.c: New test.
	* gcc.target/i386/auto-init-padding-4.c: New test.
	* gcc.target/i386/auto-init-padding-5.c: New test.
	* gcc.target/i386/auto-init-padding-6.c: New test.
	* gcc.target/i386/auto-init-padding-7.c: New test.
	* gcc.target/i386/auto-init-padding-8.c: New test.
	* gcc.target/i386/auto-init-padding-9.c: New test.
2021-09-09 15:44:49 -07:00
Kewen Lin a1d2756077 vect: Recog mul_highpart pattern [PR100696]
This patch is to extend the existing pattern mulhs handlings
to cover normal multiply highpart pattern recognization, it
introduces one new internal function IFN_MULH for 1:1 map to
[su]mul_highpart optab.  Since it covers MULT_HIGHPART_EXPR
with optab support, i386 part change is to ensure it follows
the consistent costing path.

Bootstrapped & regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu.

gcc/ChangeLog:

	PR tree-optimization/100696
	* internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
	* internal-fn.def (IFN_MULH): New internal function.
	* tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
	recog normal multiply highpart as IFN_MULH.
	* config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
	function CFN_MULH.

gcc/testsuite/ChangeLog:

	PR tree-optimization/100696
	* gcc.target/i386/pr100637-3w.c: Adjust for mul_highpart recog.
2021-07-19 20:49:17 -05:00
Richard Biener 7d810646d4 Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs
This adds named expanders for vec_fmaddsub<mode>4 and
vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
vfmsubaddXXXp{ds} instructions.  This complements the previous
addition of ADDSUB support.

x86 lacks SUBADD and the negate variants of FMA with mixed
plus minus so I did not add optabs or patterns for those but
it would not be difficult if there's a target that has them.

2021-07-05  Richard Biener  <rguenther@suse.de>

	* doc/md.texi (vec_fmaddsub<mode>4): Document.
	(vec_fmsubadd<mode>4): Likewise.
	* optabs.def (vec_fmaddsub$a4): Add.
	(vec_fmsubadd$a4): Likewise.
	* internal-fn.def (IFN_VEC_FMADDSUB): Add.
	(IFN_VEC_FMSUBADD): Likewise.
	* tree-vect-slp-patterns.c (addsub_pattern::recognize):
	Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
	(addsub_pattern::build): Likewise.
	* tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
	and CFN_VEC_FMSUBADD are not transparent for permutes.
	* config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
	(vec_fmsubadd<mode>4): Likewise.

	* gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
	* gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
	* gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
	* gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
2021-07-06 11:56:47 +02:00
Richard Biener 7a6c31f0f8 Add x86 addsub SLP pattern
This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1
instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... }
thus subtract, add alternating on lanes, starting with subtract.

It adds a corresponding optab and direct internal function,
vec_addsub$a3 and renames the existing i386 backend patterns to
the new canonical name.

The SLP pattern matches the exact alternating lane sequence rather
than trying to be clever and anticipating incoming permutes - we
could permute the two input vectors to the needed lane alternation,
do the addsub and then permute the result vector back but that's
only profitable in case the two input or the output permute will
vanish - something Tamars refactoring of SLP pattern recog should
make possible.

2021-06-17  Richard Biener  <rguenther@suse.de>

	* config/i386/sse.md (avx_addsubv4df3): Rename to
	vec_addsubv4df3.
	(avx_addsubv8sf3): Rename to vec_addsubv8sf3.
	(sse3_addsubv2df3): Rename to vec_addsubv2df3.
	(sse3_addsubv4sf3): Rename to vec_addsubv4sf3.
	* config/i386/i386-builtin.def: Adjust.
	* internal-fn.def (VEC_ADDSUB): New internal optab fn.
	* optabs.def (vec_addsub_optab): New optab.
	* tree-vect-slp-patterns.c (class addsub_pattern): New.
	(slp_patterns): Add addsub_pattern.
	* tree-vect-slp.c (vect_optimize_slp): Disable propagation
	across CFN_VEC_ADDSUB.
	* tree-vectorizer.h (vect_pattern::vect_pattern): Make
	m_ops optional.
	* doc/md.texi (vec_addsub<mode>3): Document.

	* gcc.target/i386/vect-addsubv2df.c: New testcase.
	* gcc.target/i386/vect-addsubv4sf.c: Likewise.
	* gcc.target/i386/vect-addsubv4df.c: Likewise.
	* gcc.target/i386/vect-addsubv8sf.c: Likewise.
	* gcc.target/i386/vect-addsub-2.c: Likewise.
	* gcc.target/i386/vect-addsub-3.c: Likewise.
2021-06-24 13:08:25 +02:00
Richard Biener ef8176e0fa c++/88601 - [C/C++] __builtin_shufflevector support
This adds support for the clang __builtin_shufflevector extension to
the C and C++ frontends.  The builtin is lowered to VEC_PERM_EXPR.
Because VEC_PERM_EXPR does not support different sized vector inputs
or result or the special permute index of -1 (don't-care)
c_build_shufflevector applies lowering by widening inputs and output
to the widest vector, replacing -1 by a defined index and
subsetting the final vector if we produced a wider result than
desired.

Code generation thus can be sub-optimal, followup patches will
aim to fix that by recovering from part of the missing features
during RTL expansion and by relaxing the constraints of the GIMPLE
IL with regard to VEC_PERM_EXPR.

2021-05-21  Richard Biener  <rguenther@suse.de>

	PR c++/88601
gcc/c-family/
	* c-common.c: Include tree-vector-builder.h and
	vec-perm-indices.h.
	(c_common_reswords): Add __builtin_shufflevector.
	(c_build_shufflevector): New funtion.
	* c-common.h (enum rid): Add RID_BUILTIN_SHUFFLEVECTOR.
	(c_build_shufflevector): Declare.

gcc/c/
	* c-decl.c (names_builtin_p): Handle RID_BUILTIN_SHUFFLEVECTOR.
	* c-parser.c (c_parser_postfix_expression): Likewise.

gcc/cp/
	* cp-objcp-common.c (names_builtin_p): Handle
	RID_BUILTIN_SHUFFLEVECTOR.
	* cp-tree.h (build_x_shufflevector): Declare.
	* parser.c (cp_parser_postfix_expression): Handle
	RID_BUILTIN_SHUFFLEVECTOR.
	* pt.c (tsubst_copy_and_build): Handle IFN_SHUFFLEVECTOR.
	* typeck.c (build_x_shufflevector): Build either a lowered
	VEC_PERM_EXPR or an unlowered shufflevector via a temporary
	internal function IFN_SHUFFLEVECTOR.

gcc/
	* internal-fn.c (expand_SHUFFLEVECTOR): Define.
	* internal-fn.def (SHUFFLEVECTOR): New.
	* internal-fn.h (expand_SHUFFLEVECTOR): Declare.
	* doc/extend.texi: Document __builtin_shufflevector.

gcc/testsuite/
	* c-c++-common/builtin-shufflevector-2.c: New testcase.
	* c-c++-common/torture/builtin-shufflevector-1.c: Likewise.
	* g++.dg/ext/builtin-shufflevector-1.C: Likewise.
	* g++.dg/ext/builtin-shufflevector-2.C: Likewise.
2021-05-31 08:46:04 +02:00
Tamar Christina 478e571a3e slp: support complex FMS and complex FMS conjugate
This adds support for FMS and FMS conjugated to the slp pattern matcher.

Example of matches:

#include <stdio.h>
#include <complex.h>

#define N 200
#define ROT
#define TYPE float
#define TYPE2 float

void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
  for (int i=0; i < N; i++)
    {
      c[i] -=  a[i] * (b[i] ROT);
    }
}

void g_f1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
  for (int i=0; i < N; i++)
    {
      c[i] -=  conjf (a[i]) * (b[i]);
    }
}

void g_s1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
  for (int i=0; i < N; i++)
    {
      c[i] -=  a[i] * conjf (b[i] ROT);
    }
}

void caxpy_sub(double complex * restrict y, double complex * restrict x, size_t N, double complex f) {
  for (size_t i = 0; i < N; ++i)
    y[i] -= x[i]* f;
}

gcc/ChangeLog:

	* internal-fn.def (COMPLEX_FMS, COMPLEX_FMS_CONJ): New.
	* optabs.def (cmls_optab, cmls_conj_optab): New.
	* doc/md.texi: Document them.
	* tree-vect-slp-patterns.c (class complex_fms_pattern,
	complex_fms_pattern::matches, complex_fms_pattern::recognize,
	complex_fms_pattern::build): New.
2021-01-14 20:59:12 +00:00
Tamar Christina 31fac31800 slp: support complex FMA and complex FMA conjugate
This adds support for FMA and FMA conjugated to the slp pattern matcher.

Example of instructions matched:

#include <stdio.h>
#include <complex.h>

#define N 200
#define ROT
#define TYPE float
#define TYPE2 float

void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
  for (int i=0; i < N; i++)
    {
      c[i] +=  a[i] * (b[i] ROT);
    }
}

void g_f1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
  for (int i=0; i < N; i++)
    {
      c[i] +=  conjf (a[i]) * (b[i] ROT);
    }
}

void g_s1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
  for (int i=0; i < N; i++)
    {
      c[i] +=  a[i] * conjf (b[i] ROT);
    }
}

void caxpy_add(double complex * restrict y, double complex * restrict x, size_t N, double complex f) {
  for (size_t i = 0; i < N; ++i)
    y[i] += x[i]* f;
}

gcc/ChangeLog:

	* internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ): New.
	* optabs.def (cmla_optab, cmla_conj_optab): New.
	* doc/md.texi: Document them.
	* tree-vect-slp-patterns.c (vect_match_call_p,
	class complex_fma_pattern, vect_slp_reset_pattern,
	complex_fma_pattern::matches, complex_fma_pattern::recognize,
	complex_fma_pattern::build): New.
2021-01-14 20:58:12 +00:00
Tamar Christina e09173d84d slp: support complex multiply and complex multiply conjugate
This adds support for complex multiply and complex multiply and accumulate to
the vect pattern detector.

Example of instructions matched:

#include <stdio.h>
#include <complex.h>

#define N 200
#define ROT
#define TYPE float
#define TYPE2 float

void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
  for (int i=0; i < N; i++)
    {
      c[i] =  a[i] * (b[i] ROT);
    }
}

void g_f1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
  for (int i=0; i < N; i++)
    {
      c[i] =  conjf (a[i]) * (b[i] ROT);
    }
}

void g_s1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
  for (int i=0; i < N; i++)
    {
      c[i] =  a[i] * conjf (b[i] ROT);
    }
}

gcc/ChangeLog:

	* internal-fn.def (COMPLEX_MUL, COMPLEX_MUL_CONJ): New.
	* optabs.def (cmul_optab, cmul_conj_optab): New.
	* doc/md.texi: Document them.
	* tree-vect-slp-patterns.c (vect_match_call_complex_mla,
	vect_normalize_conj_loc, is_eq_or_top, vect_validate_multiplication,
	vect_build_combine_node, class complex_mul_pattern,
	complex_mul_pattern::matches, complex_mul_pattern::recognize,
	complex_mul_pattern::build): New.
2021-01-14 20:57:17 +00:00
Richard Sandiford 298e76e656 gimple-isel: Check whether IFN_VCONDEQ is supported [PR98560]
This patch follows on from the previous one for the PR and
makes sure that we can handle == as well as <.  Previously
we assumed without checking that IFN_VCONDEQ was available
if IFN_VCOND or IFN_VCONDU wasn't.

The patch also fixes the definition of the IFN_VCOND* functions.
The optabs are convert optabs in which the first mode is the
data mode and the second mode is the comparison or mask mode.

gcc/
	PR tree-optimization/98560
	* internal-fn.def (IFN_VCONDU, IFN_VCONDEQ): Use type vec_cond.
	* internal-fn.c (vec_cond_mask_direct): Get the data mode from
	argument 1.
	(vec_cond_direct): Likewise argument 2.
	(vec_condu_direct, vec_condeq_direct): Delete.
	(expand_vect_cond_optab_fn): Rename to...
	(expand_vec_cond_optab_fn): ...this, replacing old macro.
	(expand_vec_condu_optab_fn, expand_vec_condeq_optab_fn): Delete.
	(expand_vect_cond_mask_optab_fn): Rename to...
	(expand_vec_cond_mask_optab_fn): ...this, replacing old macro.
	(direct_vec_cond_mask_optab_supported_p): Treat the optab as a
	convert optab.
	(direct_vec_cond_optab_supported_p): Likewise.
	(direct_vec_condu_optab_supported_p): Delete.
	(direct_vec_condeq_optab_supported_p): Delete.
	* gimple-isel.cc: Include internal-fn.h.
	(gimple_expand_vec_cond_expr): Check that IFN_VCONDEQ is supported
	before using it.

gcc/testsuite/
	PR tree-optimization/98560
	* gcc.dg/vect/pr98560-2.c: New test.
2021-01-07 15:00:39 +00:00
Jakub Jelinek 99dee82307 Update copyright years. 2021-01-04 10:26:59 +01:00
Tamar Christina 3ed472af6b middle-end: Support complex Addition
This patch adds support for

  * Complex Addition with rotation of 90 and 270.

  Addition with rotation of the second argument around the Argand plane.
    Supported rotations are 90 and 180.

    c = a + (b * I) and c = a + (b * I * I * I)

gcc/ChangeLog:

	* tree-vect-slp-patterns.c: New file.
	* Makefile.in: Add it.
	* doc/passes.texi: Document it.
	* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
	* optabs.def (cadd90_optab, cadd270_optab): New.
	* doc/md.texi: Document them.
	* tree-vect-loop.c (vect_analyze_loop_2): Add dissolve code.
	* tree-vect-slp.c:
	(vect_free_slp_instance, vect_create_new_slp_node): Export.
	(vect_match_slp_patterns_2, vect_match_slp_patterns): New.
	(vect_analyze_slp): Use it.
	* tree-vectorizer.h (vect_free_slp_tree): Export.
	(enum _complex_operation): Forward declare.
	(class vect_pattern): New

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp
	(check_effective_target_arm_v8_3a_complex_neon_ok_nocache): Fix it.
	(check_effective_target_vect_complex_add_byte
	,check_effective_target_vect_complex_add_int
	,check_effective_target_vect_complex_add_short
	,check_effective_target_vect_complex_add_long
	,check_effective_target_vect_complex_add_half
	,check_effective_target_vect_complex_add_float
	,check_effective_target_vect_complex_add_double): New.
	* gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: New test.
	* gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: New test.
	* gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: New test.
	* gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: New test.
	* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c: New test.
	* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c: New test.
	* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: New test.
	* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c: New test.
	* gcc.dg/vect/complex/complex-add-pattern-template.c: New test.
	* gcc.dg/vect/complex/complex-add-template.c: New test.
	* gcc.dg/vect/complex/complex-operations-run.c: New test.
	* gcc.dg/vect/complex/complex-operations.c: New test.
	* gcc.dg/vect/complex/complex.exp: New test.
	* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: New test.
	* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: New test.
	* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c: New test.
	* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c: New test.
	* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c: New test.
	* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c: New test.
	* gcc.dg/vect/complex/fast-math-complex-add-double.c: New test.
	* gcc.dg/vect/complex/fast-math-complex-add-float.c: New test.
	* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: New test.
	* gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: New test.
	* gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: New test.
	* gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c: New test.
	* gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: New test.
	* gcc.dg/vect/complex/vect-complex-add-pattern-int.c: New test.
	* gcc.dg/vect/complex/vect-complex-add-pattern-long.c: New test.
	* gcc.dg/vect/complex/vect-complex-add-pattern-short.c: New test.
	* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c: New test.
	* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c: New test.
	* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: New test.
	* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c: New test.
2020-12-13 14:09:11 +00:00
Matthew Malcomson 93a7325148 libsanitizer: Add hwasan pass and associated gimple changes
There are four main features to this change:

1) Check pointer tags match address tags.

When sanitizing for hwasan we now put HWASAN_CHECK internal functions before
memory accesses in the `asan` pass.  This checks that a tag in the pointer
being used match the tag stored in shadow memory for the memory region being
used.

These internal functions are expanded into actual checks in the sanopt
pass that happens just before expansion into RTL.

We use the same mechanism that currently inserts ASAN_CHECK internal
functions to insert the new HWASAN_CHECK functions.

2) Instrument known builtin function calls.

Handle all builtin functions that we know use memory accesses.
This commit uses the machinery added for ASAN to identify builtin
functions that access memory.

The main differences between the approaches for HWASAN and ASAN are:
 - libhwasan intercepts much less builtin functions.
 - Alloca needs to be transformed differently (instead of adding
   redzones it needs to tag shadow memory and return a tagged pointer).
 - stack_restore needs to untag the shadow stack between the current
   position and where it's going.
 - `noreturn` functions can not be handled by simply unpoisoning the
   entire shadow stack -- there is no "always valid" tag.
   (exceptions and things such as longjmp need to be handled in a
   different way, usually in the runtime).

For hardware implemented checking (such as AArch64's memory tagging
extension) alloca and stack_restore will need to be handled by hooks in
the backend rather than transformation at the gimple level.  This will
allow architecture specific handling of such stack modifications.

3) Introduce HWASAN block-scope poisoning

Here we use exactly the same mechanism as ASAN_MARK to poison/unpoison
variables on entry/exit of a block.

In order to simply use the exact same machinery we're using the same
internal functions until the SANOPT pass.  This means that all handling
of ASAN_MARK is the same.
This has the negative that the naming may be a little confusing, but a
positive that handling of the internal function doesn't have to be
duplicated for a function that behaves exactly the same but has a
different name.

gcc/ChangeLog:

	* asan.c (asan_instrument_reads): New.
	(asan_instrument_writes): New.
	(asan_memintrin): New.
	(handle_builtin_stack_restore): Account for HWASAN.
	(handle_builtin_alloca): Account for HWASAN.
	(get_mem_refs_of_builtin_call): Special case strlen for HWASAN.
	(hwasan_instrument_reads): New.
	(hwasan_instrument_writes): New.
	(hwasan_memintrin): New.
	(report_error_func): Assert not HWASAN.
	(build_check_stmt): Make HWASAN_CHECK instead of ASAN_CHECK.
	(instrument_derefs): HWASAN does not tag globals.
	(instrument_builtin_call): Use new helper functions.
	(maybe_instrument_call): Don't instrument `noreturn` functions.
	(initialize_sanitizer_builtins): Add new type.
	(asan_expand_mark_ifn): Account for HWASAN.
	(asan_expand_check_ifn): Assert never called by HWASAN.
	(asan_expand_poison_ifn): Account for HWASAN.
	(asan_instrument): Branch based on whether using HWASAN or ASAN.
	(pass_asan::gate): Return true if sanitizing HWASAN.
	(pass_asan_O0::gate): Return true if sanitizing HWASAN.
	(hwasan_check_func): New.
	(hwasan_expand_check_ifn): New.
	(hwasan_expand_mark_ifn): New.
	(gate_hwasan): New.
	* asan.h (hwasan_expand_check_ifn): New decl.
	(hwasan_expand_mark_ifn): New decl.
	(gate_hwasan): New decl.
	(asan_intercepted_p): Always false for hwasan.
	(asan_sanitize_use_after_scope): Account for HWASAN.
	* builtin-types.def (BT_FN_PTR_CONST_PTR_UINT8): New.
	* gimple-fold.c (gimple_build): New overload for building function
	calls without arguments.
	(gimple_build_round_up): New.
	* gimple-fold.h (gimple_build): New decl.
	(gimple_build): New inline function.
	(gimple_build_round_up): New decl.
	(gimple_build_round_up): New inline function.
	* gimple-pretty-print.c (dump_gimple_call_args): Account for
	HWASAN.
	* gimplify.c (asan_poison_variable): Account for HWASAN.
	(gimplify_function_tree): Remove requirement of
	SANITIZE_ADDRESS, requiring asan or hwasan is accounted for in
	`asan_sanitize_use_after_scope`.
	* internal-fn.c (expand_HWASAN_CHECK): New.
	(expand_HWASAN_ALLOCA_UNPOISON): New.
	(expand_HWASAN_CHOOSE_TAG): New.
	(expand_HWASAN_MARK): New.
	(expand_HWASAN_SET_TAG): New.
	* internal-fn.def (HWASAN_ALLOCA_UNPOISON): New.
	(HWASAN_CHOOSE_TAG): New.
	(HWASAN_CHECK): New.
	(HWASAN_MARK): New.
	(HWASAN_SET_TAG): New.
	* sanitizer.def (BUILT_IN_HWASAN_LOAD1): New.
	(BUILT_IN_HWASAN_LOAD2): New.
	(BUILT_IN_HWASAN_LOAD4): New.
	(BUILT_IN_HWASAN_LOAD8): New.
	(BUILT_IN_HWASAN_LOAD16): New.
	(BUILT_IN_HWASAN_LOADN): New.
	(BUILT_IN_HWASAN_STORE1): New.
	(BUILT_IN_HWASAN_STORE2): New.
	(BUILT_IN_HWASAN_STORE4): New.
	(BUILT_IN_HWASAN_STORE8): New.
	(BUILT_IN_HWASAN_STORE16): New.
	(BUILT_IN_HWASAN_STOREN): New.
	(BUILT_IN_HWASAN_LOAD1_NOABORT): New.
	(BUILT_IN_HWASAN_LOAD2_NOABORT): New.
	(BUILT_IN_HWASAN_LOAD4_NOABORT): New.
	(BUILT_IN_HWASAN_LOAD8_NOABORT): New.
	(BUILT_IN_HWASAN_LOAD16_NOABORT): New.
	(BUILT_IN_HWASAN_LOADN_NOABORT): New.
	(BUILT_IN_HWASAN_STORE1_NOABORT): New.
	(BUILT_IN_HWASAN_STORE2_NOABORT): New.
	(BUILT_IN_HWASAN_STORE4_NOABORT): New.
	(BUILT_IN_HWASAN_STORE8_NOABORT): New.
	(BUILT_IN_HWASAN_STORE16_NOABORT): New.
	(BUILT_IN_HWASAN_STOREN_NOABORT): New.
	(BUILT_IN_HWASAN_TAG_MISMATCH4): New.
	(BUILT_IN_HWASAN_HANDLE_LONGJMP): New.
	(BUILT_IN_HWASAN_TAG_PTR): New.
	* sanopt.c (sanopt_optimize_walker): Act for hwasan.
	(pass_sanopt::execute): Act for hwasan.
	* toplev.c (compile_file): Use `gate_hwasan` function.
2020-11-25 16:39:07 +00:00
Jan Hubicka 762cca0023 Perforate fnspec strings
gcc/ChangeLog:

2020-10-02  Jan Hubicka  <hubicka@ucw.cz>

	* attr-fnspec.h: Update documentation.
	(attr_fnsec::return_desc_size): Set to 2
	(attr_fnsec::arg_desc_size): Set to 2
	* builtin-attrs.def (STR1): Update fnspec.
	* internal-fn.def (UBSAN_NULL): Update fnspec.
	(UBSAN_VPTR): Update fnspec.
	(UBSAN_PTR): Update fnspec.
	(ASAN_CHECK): Update fnspec.
	(GOACC_DIM_SIZE): Remove fnspec.
	(GOACC_DIM_POS): Remove fnspec.
	* tree-ssa-alias.c (attr_fnspec::verify): Update verification.

gcc/fortran/ChangeLog:

2020-10-02  Jan Hubicka  <hubicka@ucw.cz>

	* trans-decl.c (gfc_build_library_function_decl_with_spec): Verify
	fnspec.
	(gfc_build_intrinsic_function_decls): Update fnspecs.
	(gfc_build_builtin_function_decls): Update fnspecs.
	* trans-io.c (gfc_build_io_library_fndecls): Update fnspecs.
	* trans-types.c (create_fn_spec): Update fnspecs.
2020-10-02 15:56:12 +02:00
Xionghu Luo 683e55facf IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR
This patch enables transformation from ARRAY_REF(VIEW_CONVERT_EXPR) to
VEC_SET internal function in gimple-isel pass if target supports
vec_set with variable index by checking can_vec_set_var_idx_p.

gcc/ChangeLog:

2020-09-27  Xionghu Luo  <luoxhu@linux.ibm.com>

	* gimple-isel.cc (gimple_expand_vec_set_expr): New function.
	(gimple_expand_vec_cond_exprs): Rename to ...
	(gimple_expand_vec_exprs): ... this and call
	gimple_expand_vec_set_expr.
	* internal-fn.c (vec_set_direct): New define.
	(expand_vec_set_optab_fn): New function.
	(direct_vec_set_optab_supported_p): New define.
	* internal-fn.def (VEC_SET): New DEF_INTERNAL_OPTAB_FN.
	* optabs.c (can_vec_set_var_idx_p): New function.
	* optabs.h (can_vec_set_var_idx_p): New declaration.
2020-09-27 00:27:32 -05:00
Kewen Lin d496134a6b IFN/optabs: Support vector load/store with length
This patch is to add the internal function and optabs support for
vector load/store with length.

For the vector load/store with length optab, the length item would
be measured in lanes by default.  For the targets which support
length measured in bytes like Power, they should only define VnQI
modes to wrap the other same size vector modes.  If the length is
larger than total lane/byte count of the given mode, the behavior
is undefined.  For the remaining lanes/bytes which isn't specified
by length, they would be taken as undefined value.

gcc/ChangeLog:

	* doc/md.texi (len_load_@var{m}): Document.
	(len_store_@var{m}): Likewise.
	* internal-fn.c (len_load_direct): New macro.
	(len_store_direct): Likewise.
	(expand_len_load_optab_fn): Likewise.
	(expand_len_store_optab_fn): Likewise.
	(direct_len_load_optab_supported_p): Likewise.
	(direct_len_store_optab_supported_p): Likewise.
	(expand_mask_load_optab_fn): New macro.  Original renamed to ...
	(expand_partial_load_optab_fn): ... here.  Add handlings for
	len_load_optab.
	(expand_mask_store_optab_fn): New macro.  Original renamed to ...
	(expand_partial_store_optab_fn): ... here. Add handlings for
	len_store_optab.
	(internal_load_fn_p): Handle IFN_LEN_LOAD.
	(internal_store_fn_p): Handle IFN_LEN_STORE.
	(internal_fn_stored_value_index): Handle IFN_LEN_STORE.
	* internal-fn.def (LEN_LOAD): New internal function.
	(LEN_STORE): Likewise.
	* optabs.def (len_load_optab, len_store_optab): New optab.
2020-07-08 02:33:03 -05:00
Martin Liska 502d63b6d6
Lower VEC_COND_EXPR into internal functions.
gcc/ChangeLog:

	* Makefile.in: Add new file.
	* expr.c (expand_expr_real_2): Add gcc_unreachable as we should
	not meet this condition.
	(do_store_flag): Likewise.
	* gimplify.c (gimplify_expr): Gimplify first argument of
	VEC_COND_EXPR to be a SSA name.
	* internal-fn.c (vec_cond_mask_direct): New.
	(vec_cond_direct): Likewise.
	(vec_condu_direct): Likewise.
	(vec_condeq_direct): Likewise.
	(expand_vect_cond_optab_fn):  New.
	(expand_vec_cond_optab_fn): Likewise.
	(expand_vec_condu_optab_fn): Likewise.
	(expand_vec_condeq_optab_fn): Likewise.
	(expand_vect_cond_mask_optab_fn): Likewise.
	(expand_vec_cond_mask_optab_fn): Likewise.
	(direct_vec_cond_mask_optab_supported_p): Likewise.
	(direct_vec_cond_optab_supported_p): Likewise.
	(direct_vec_condu_optab_supported_p): Likewise.
	(direct_vec_condeq_optab_supported_p): Likewise.
	* internal-fn.def (VCOND): New OPTAB.
	(VCONDU): Likewise.
	(VCONDEQ): Likewise.
	(VCOND_MASK): Likewise.
	* optabs.c (get_rtx_code): Make it global.
	(expand_vec_cond_mask_expr): Removed.
	(expand_vec_cond_expr): Removed.
	* optabs.h (expand_vec_cond_expr): Likewise.
	(vector_compare_rtx): Make it global.
	* passes.def: Add new pass_gimple_isel pass.
	* tree-cfg.c (verify_gimple_assign_ternary): Add check
	for VEC_COND_EXPR about first argument.
	* tree-pass.h (make_pass_gimple_isel): New.
	* tree-ssa-forwprop.c (pass_forwprop::execute): Prevent
	propagation of the first argument of a VEC_COND_EXPR.
	* tree-ssa-reassoc.c (ovce_extract_ops): Support SSA_NAME as
	first argument of a VEC_COND_EXPR.
	(optimize_vec_cond_expr): Likewise.
	* tree-vect-generic.c (expand_vector_divmod): Make SSA_NAME
	for a first argument of created VEC_COND_EXPR.
	(expand_vector_condition): Fix coding style.
	* tree-vect-stmts.c (vectorizable_condition): Gimplify
	first argument.
	* gimple-isel.cc: New file.

gcc/testsuite/ChangeLog:

	* g++.dg/vect/vec-cond-expr-eh.C: New test.
2020-06-17 12:04:22 +02:00
Iain Sandoe 49789fd083 [C++ coroutines] Initial implementation.
This is the squashed version of the first 6 patches that were split to
facilitate review.

The changes to libiberty (7th patch) to support demangling the co_await
operator stand alone and are applied separately.

The patch series is an initial implementation of a coroutine feature,
expected to be standardised in C++20.

Standardisation status (and potential impact on this implementation)
--------------------------------------------------------------------

The facility was accepted into the working draft for C++20 by WG21 in
February 2019.  During following WG21 meetings, design and national body
comments have been reviewed, with no significant change resulting.

The current GCC implementation is against n4835 [1].

At this stage, the remaining potential for change comes from:

* Areas of national body comments that were not resolved in the version we
  have worked to:
  (a) handling of the situation where aligned allocation is available.
  (b) handling of the situation where a user wants coroutines, but does not
      want exceptions (e.g. a GPU).

* Agreed changes that have not yet been worded in a draft standard that we
  have worked to.

It is not expected that the resolution to these can produce any major
change at this phase of the standardisation process.  Such changes should be
limited to the coroutine-specific code.

ABI
---

The various compiler developers 'vendors' have discussed a minimal ABI to
allow one implementation to call coroutines compiled by another.

This amounts to:

1. The layout of a public portion of the coroutine frame.

 Coroutines need to preserve state across suspension points, the storage for
 this is called a "coroutine frame".

 The ABI mandates that pointers into the coroutine frame point to an area
 begining with two function pointers (to the resume and destroy functions
 described below); these are immediately followed by the "promise object"
 described in the standard.

 This is sufficient that the builtins can take a coroutine frame pointer and
 determine the address of the promise (or call the resume/destroy functions).

2. A number of compiler builtins that the standard library might use.

  These are implemented by this patch series.

3. This introduces a new operator 'co_await' the mangling for which is also
agreed between vendors (and has an issue filed for that against the upstream
c++abi).  Demangling for this is added to libiberty in a separate patch.

The ABI has currently no target-specific content (a given psABI might elect
to mandate alignment, but the common ABI does not do this).

Standard Library impact
-----------------------

The current implementations require addition of only a single header to
the standard library (no change to the runtime).  This header is part of
the patch.

GCC Implementation outline
--------------------------

The standard's design for coroutines does not decorate the definition of
a coroutine in any way, so that a function is only known to be a coroutine
when one of the keywords (co_await, co_yield, co_return) is encountered.

This means that we cannot special-case such functions from the outset, but
must process them differently when they are finalised - which we do from
"finish_function ()".

At a high level, this design of coroutine produces four pieces from the
original user's function:

  1. A coroutine state frame (taking the logical place of the activation
     record for a regular function).  One item stored in that state is the
     index of the current suspend point.
  2. A "ramp" function
     This is what the user calls to construct the coroutine frame and start
     the coroutine execution.  This will return some object representing the
     coroutine's eventual return value (or means to continue it when it it
     suspended).
  3. A "resume" function.
     This is what gets called when a the coroutine is resumed when suspended.
  4. A "destroy" function.
     This is what gets called when the coroutine state should be destroyed
     and its memory released.

The standard's coroutines involve cooperation of the user's authored function
with a provided "promise" class, which includes mandatory methods for
handling the state transitions and providing output values.  Most realistic
coroutines will also have one or more 'awaiter' classes that implement the
user's actions for each suspend point.  As we parse (or during template
expansion) the types of the promise and awaiter classes become known, and can
then be verified against the signatures expected by the standard.

Once the function is parsed (and templates expanded) we are able to make the
transformation into the four pieces noted above.

The implementation here takes the approach of a series of AST transforms.
The state machine suspend points are encoded in three internal functions
(one of which represents an exit from scope without cleanups).  These three
IFNs are lowered early in the middle end, such that the majority of GCC's
optimisers can be run on the resulting output.

As a design choice, we have carried out the outlining of the user's function
in the front end, and taken advantage of the existing middle end's abilities
to inline and DCE where that is profitable.

Since the state machine is actually common to both resumer and destroyer
functions, we make only a single function "actor" that contains both the
resume and destroy paths.  The destroy function is represented by a small
stub that sets a value to signal the use of the destroy path and calls the
actor.  The idea is that optimisation of the state machine need only be done
once - and then the resume and destroy paths can be identified allowing the
middle end's inline and DCE machinery to optimise as profitable as noted
above.

The middle end components for this implementation are:

A pass that:
 1. Lowers the coroutine builtins that allow the standard library header to
    interact with the coroutine frame (these fairly simple logical or
    numerical substitution of values, given a coroutine frame pointer).
 2. Lowers the IFN that represents the exit from state without cleanup.
    Essentially, this becomes a gimple goto.
 3. Sets the final size of the coroutine frame at this stage.

A second pass (that requires the revised CFG that results from the lowering
of the scope exit IFNs in the first).

 1. Lower the IFNs that represent the state machine paths for the resume and
    destroy cases.

Patches squashed into this commit:

[C++ coroutines 1] Common code and base definitions.

This part of the patch series provides the gating flag, the keywords,
cpp defines etc.

[C++ coroutines 2] Define builtins and internal functions.

This part of the patch series provides the builtin functions
used by the standard library code and the internal functions
used to implement lowering of the coroutine state machine.

[C++ coroutines 3] Front end parsing and transforms.

There are two parts to this.

1. Parsing, template instantiation and diagnostics for the standard-
   mandated class entries.

  The user authors a function that becomes a coroutine (lazily) by
  making use of any of the co_await, co_yield or co_return keywords.

  Unlike a regular function, where the activation record is placed on the
  stack, and is destroyed on function exit, a coroutine has some state that
  persists between calls - the 'coroutine frame' (thus analogous to a stack
  frame).

  We transform the user's function into three pieces:
  1. A so-called ramp function, that establishes the coroutine frame and
     begins execution of the coroutine.
  2. An actor function that contains the state machine corresponding to the
     user's suspend/resume structure.
  3. A stub function that calls the actor function in 'destroy' mode.

  The actor function is executed:
   * from "resume point 0" by the ramp.
   * from resume point N ( > 0 ) for handle.resume() calls.
   * from the destroy stub for destroy point N for handle.destroy() calls.

  The C++ coroutine design described in the standard makes use of some helper
  methods that are authored in a so-called "promise" class provided by the
  user.

  At parse time (or post substitution) the type of the coroutine promise
  will be determined.  At that point, we can look up the required promise
  class methods and issue diagnostics if they are missing or incorrect.  To
  avoid repeating these actions at code-gen time, we make use of temporary
  'proxy' variables for the coroutine handle and the promise - which will
  eventually be instantiated in the coroutine frame.

  Each of the keywords will expand to a code sequence (although co_yield is
  just syntactic sugar for a co_await).

  We defer the analysis and transformatin until template expansion is
  complete so that we have complete types at that time.

2. AST analysis and transformation which performs the code-gen for the
   outlined state machine.

   The entry point here is morph_fn_to_coro () which is called from
   finish_function () when we have completed any template expansion.

   This is preceded by helper functions that implement the phases below.

   The process proceeds in four phases.

   A Initial framing.
     The user's function body is wrapped in the initial and final suspend
     points and we begin building the coroutine frame.
     We build empty decls for the actor and destroyer functions at this
     time too.
     When exceptions are enabled, the user's function body will also be
     wrapped in a try-catch block with the catch invoking the promise
     class 'unhandled_exception' method.

   B Analysis.
     The user's function body is analysed to determine the suspend points,
     if any, and to capture local variables that might persist across such
     suspensions.  In most cases, it is not necessary to capture compiler
     temporaries, since the tree-lowering nests the suspensions correctly.
     However, in the case of a captured reference, there is a lifetime
     extension to the end of the full expression - which can mean across a
     suspend point in which case it must be promoted to a frame variable.

     At the conclusion of analysis, we have a conservative frame layout and
     maps of the local variables to their frame entry points.

   C Build the ramp function.
     Carry out the allocation for the coroutine frame (NOTE; the actual size
     computation is deferred until late in the middle end to allow for future
     optimisations that will be allowed to elide unused frame entries).
     We build the return object.

   D Build and expand the actor and destroyer function bodies.
     The destroyer is a trivial shim that sets a bit to indicate that the
     destroy dispatcher should be used and then calls into the actor.

     The actor function is the implementation of the user's state machine.
     The current suspend point is noted in an index.
     Each suspend point is encoded as a pair of internal functions, one in
     the relevant dispatcher, and one representing the suspend point.

     During this process, the user's local variables and the proxies for the
     self-handle and the promise class instanceare re-written to their
     coroutine frame equivalents.

     The complete bodies for the ramp, actor and destroy function are passed
     back to finish_function for folding and gimplification.

[C++ coroutines 4] Middle end expanders and transforms.

The first part of this is a pass that provides:
 * expansion of the library support builtins, these are simple boolean
   or numerical substitutions.

 * The functionality of implementing an exit from scope without cleanup
   is performed here by lowering an IFN to a gimple goto.

This pass has to run for non-coroutine functions, since functions calling
the builtins are not necessarily coroutines (i.e. they are implementing the
library interfaces which may be called from anywhere).

The second part is the expansion of the coroutine IFNs that describe the
state machine connections to the dispatchers.  This only has to be run
for functions that are coroutine components.  The work done by this pass
is:

   In the front end we construct a single actor function that contains
   the coroutine state machine.

   The actor function has three entry conditions:
    1. from the ramp, resume point 0 - to initial-suspend.
    2. when resume () is executed (resume point N).
    3. from the destroy () shim when that is executed.

   The actor function begins with two dispatchers; one for resume and
   one for destroy (where the initial entry from the ramp is a special-
   case of resume point 0).

   Each suspend point and each dispatch entry is marked with an IFN such
   that we can connect the relevant dispatchers to their target labels.

   So, if we have:

   CO_YIELD (NUM, FINAL, RES_LAB, DEST_LAB, FRAME_PTR)

   This is await point NUM, and is the final await if FINAL is non-zero.
   The resume point is RES_LAB, and the destroy point is DEST_LAB.

   We expect to find a CO_ACTOR (NUM) in the resume dispatcher and a
   CO_ACTOR (NUM+1) in the destroy dispatcher.

   Initially, the intent of keeping the resume and destroy paths together
   is that the conditionals controlling them are identical, and thus there
   would be duplication of any optimisation of those paths if the split
   were earlier.

   Subsequent inlining of the actor (and DCE) is then able to extract the
   resume and destroy paths as separate functions if that is found
   profitable by the optimisers.

   Once we have remade the connections to their correct postions, we elide
   the labels that the front end inserted.

[C++ coroutines 5] Standard library header.

This provides the interfaces mandated by the standard and implements
the interaction with the coroutine frame by means of inline use of
builtins expanded at compile-time.  There should be a 1:1 correspondence
with the standard sections which are cross-referenced.

There is no runtime content.

At this stage, we have the content in an inline namespace "__n4835" for
the CD we worked to.

[C++ coroutines 6] Testsuite.

There are two categories of test:

1. Checks for correctly formed source code and the error reporting.
2. Checks for transformation and code-gen.

The second set are run as 'torture' tests for the standard options
set, including LTO.  These are also intentionally run with no options
provided (from the coroutines.exp script).

gcc/ChangeLog:

2020-01-18  Iain Sandoe  <iain@sandoe.co.uk>

	* Makefile.in: Add coroutine-passes.o.
	* builtin-types.def (BT_CONST_SIZE): New.
	(BT_FN_BOOL_PTR): New.
	(BT_FN_PTR_PTR_CONST_SIZE_BOOL): New.
	* builtins.def (DEF_COROUTINE_BUILTIN): New.
	* coroutine-builtins.def: New file.
	* coroutine-passes.cc: New file.
	* function.h (struct GTY function): Add a bit to indicate that the
	function is a coroutine component.
	* internal-fn.c (expand_CO_FRAME): New.
	(expand_CO_YIELD): New.
	(expand_CO_SUSPN): New.
	(expand_CO_ACTOR): New.
	* internal-fn.def (CO_ACTOR): New.
	(CO_YIELD): New.
	(CO_SUSPN): New.
	(CO_FRAME): New.
	* passes.def: Add pass_coroutine_lower_builtins,
	pass_coroutine_early_expand_ifns.
	* tree-pass.h (make_pass_coroutine_lower_builtins): New.
	(make_pass_coroutine_early_expand_ifns): New.
	* doc/invoke.texi: Document the fcoroutines command line
	switch.

gcc/c-family/ChangeLog:

2020-01-18  Iain Sandoe  <iain@sandoe.co.uk>

	* c-common.c (co_await, co_yield, co_return): New.
	* c-common.h (RID_CO_AWAIT, RID_CO_YIELD,
	RID_CO_RETURN): New enumeration values.
	(D_CXX_COROUTINES): Bit to identify coroutines are active.
	(D_CXX_COROUTINES_FLAGS): Guard for coroutine keywords.
	* c-cppbuiltin.c (__cpp_coroutines): New cpp define.
	* c.opt (fcoroutines): New command-line switch.

gcc/cp/ChangeLog:

2020-01-18  Iain Sandoe  <iain@sandoe.co.uk>

	* Make-lang.in: Add coroutines.o.
	* cp-tree.h (lang_decl-fn): coroutine_p, new bit.
	(DECL_COROUTINE_P): New.
	* lex.c (init_reswords): Enable keywords when the coroutine flag
	is set,
	* operators.def (co_await): New operator.
	* call.c (add_builtin_candidates): Handle CO_AWAIT_EXPR.
	(op_error): Likewise.
	(build_new_op_1): Likewise.
	(build_new_function_call): Validate coroutine builtin arguments.
	* constexpr.c (potential_constant_expression_1): Handle
	CO_AWAIT_EXPR, CO_YIELD_EXPR, CO_RETURN_EXPR.
	* coroutines.cc: New file.
	* cp-objcp-common.c (cp_common_init_ts): Add CO_AWAIT_EXPR,
	CO_YIELD_EXPR, CO_RETRN_EXPR as TS expressions.
	* cp-tree.def (CO_AWAIT_EXPR, CO_YIELD_EXPR, (CO_RETURN_EXPR): New.
	* cp-tree.h (coro_validate_builtin_call): New.
	* decl.c (emit_coro_helper): New.
	(finish_function): Handle the case when a function is found to
	be a coroutine, perform the outlining and emit the outlined
	functions. Set a bit to signal that this is a coroutine component.
	* parser.c (enum required_token): New enumeration RT_CO_YIELD.
	(cp_parser_unary_expression): Handle co_await.
	(cp_parser_assignment_expression): Handle co_yield.
	(cp_parser_statement): Handle RID_CO_RETURN.
	(cp_parser_jump_statement): Handle co_return.
	(cp_parser_operator): Handle co_await operator.
	(cp_parser_yield_expression): New.
	(cp_parser_required_error): Handle RT_CO_YIELD.
	* pt.c (tsubst_copy): Handle CO_AWAIT_EXPR.
	(tsubst_expr): Handle CO_AWAIT_EXPR, CO_YIELD_EXPR and
	CO_RETURN_EXPRs.
	* tree.c (cp_walk_subtrees): Likewise.

libstdc++-v3/ChangeLog:

2020-01-18  Iain Sandoe  <iain@sandoe.co.uk>

	* include/Makefile.am: Add coroutine to the std set.
	* include/Makefile.in: Regenerated.
	* include/std/coroutine: New file.

gcc/testsuite/ChangeLog:

2020-01-18  Iain Sandoe  <iain@sandoe.co.uk>

	* g++.dg/coroutines/co-await-syntax-00-needs-expr.C: New test.
	* g++.dg/coroutines/co-await-syntax-01-outside-fn.C: New test.
	* g++.dg/coroutines/co-await-syntax-02-outside-fn.C: New test.
	* g++.dg/coroutines/co-await-syntax-03-auto.C: New test.
	* g++.dg/coroutines/co-await-syntax-04-ctor-dtor.C: New test.
	* g++.dg/coroutines/co-await-syntax-05-constexpr.C: New test.
	* g++.dg/coroutines/co-await-syntax-06-main.C: New test.
	* g++.dg/coroutines/co-await-syntax-07-varargs.C: New test.
	* g++.dg/coroutines/co-await-syntax-08-lambda-auto.C: New test.
	* g++.dg/coroutines/co-return-syntax-01-outside-fn.C: New test.
	* g++.dg/coroutines/co-return-syntax-02-outside-fn.C: New test.
	* g++.dg/coroutines/co-return-syntax-03-auto.C: New test.
	* g++.dg/coroutines/co-return-syntax-04-ctor-dtor.C: New test.
	* g++.dg/coroutines/co-return-syntax-05-constexpr-fn.C: New test.
	* g++.dg/coroutines/co-return-syntax-06-main.C: New test.
	* g++.dg/coroutines/co-return-syntax-07-vararg.C: New test.
	* g++.dg/coroutines/co-return-syntax-08-bad-return.C: New test.
	* g++.dg/coroutines/co-return-syntax-09-lambda-auto.C: New test.
	* g++.dg/coroutines/co-yield-syntax-00-needs-expr.C: New test.
	* g++.dg/coroutines/co-yield-syntax-01-outside-fn.C: New test.
	* g++.dg/coroutines/co-yield-syntax-02-outside-fn.C: New test.
	* g++.dg/coroutines/co-yield-syntax-03-auto.C: New test.
	* g++.dg/coroutines/co-yield-syntax-04-ctor-dtor.C: New test.
	* g++.dg/coroutines/co-yield-syntax-05-constexpr.C: New test.
	* g++.dg/coroutines/co-yield-syntax-06-main.C: New test.
	* g++.dg/coroutines/co-yield-syntax-07-varargs.C: New test.
	* g++.dg/coroutines/co-yield-syntax-08-needs-expr.C: New test.
	* g++.dg/coroutines/co-yield-syntax-09-lambda-auto.C: New test.
	* g++.dg/coroutines/coro-builtins.C: New test.
	* g++.dg/coroutines/coro-missing-gro.C: New test.
	* g++.dg/coroutines/coro-missing-promise-yield.C: New test.
	* g++.dg/coroutines/coro-missing-ret-value.C: New test.
	* g++.dg/coroutines/coro-missing-ret-void.C: New test.
	* g++.dg/coroutines/coro-missing-ueh-1.C: New test.
	* g++.dg/coroutines/coro-missing-ueh-2.C: New test.
	* g++.dg/coroutines/coro-missing-ueh-3.C: New test.
	* g++.dg/coroutines/coro-missing-ueh.h: New test.
	* g++.dg/coroutines/coro-pre-proc.C: New test.
	* g++.dg/coroutines/coro.h: New file.
	* g++.dg/coroutines/coro1-ret-int-yield-int.h: New file.
	* g++.dg/coroutines/coroutines.exp: New file.
	* g++.dg/coroutines/torture/alloc-00-gro-on-alloc-fail.C: New test.
	* g++.dg/coroutines/torture/alloc-01-overload-newdel.C: New test.
	* g++.dg/coroutines/torture/call-00-co-aw-arg.C: New test.
	* g++.dg/coroutines/torture/call-01-multiple-co-aw.C: New test.
	* g++.dg/coroutines/torture/call-02-temp-co-aw.C: New test.
	* g++.dg/coroutines/torture/call-03-temp-ref-co-aw.C: New test.
	* g++.dg/coroutines/torture/class-00-co-ret.C: New test.
	* g++.dg/coroutines/torture/class-01-co-ret-parm.C: New test.
	* g++.dg/coroutines/torture/class-02-templ-parm.C: New test.
	* g++.dg/coroutines/torture/class-03-operator-templ-parm.C: New test.
	* g++.dg/coroutines/torture/class-04-lambda-1.C: New test.
	* g++.dg/coroutines/torture/class-05-lambda-capture-copy-local.C: New test.
	* g++.dg/coroutines/torture/class-06-lambda-capture-ref.C: New test.
	* g++.dg/coroutines/torture/co-await-00-trivial.C: New test.
	* g++.dg/coroutines/torture/co-await-01-with-value.C: New test.
	* g++.dg/coroutines/torture/co-await-02-xform.C: New test.
	* g++.dg/coroutines/torture/co-await-03-rhs-op.C: New test.
	* g++.dg/coroutines/torture/co-await-04-control-flow.C: New test.
	* g++.dg/coroutines/torture/co-await-05-loop.C: New test.
	* g++.dg/coroutines/torture/co-await-06-ovl.C: New test.
	* g++.dg/coroutines/torture/co-await-07-tmpl.C: New test.
	* g++.dg/coroutines/torture/co-await-08-cascade.C: New test.
	* g++.dg/coroutines/torture/co-await-09-pair.C: New test.
	* g++.dg/coroutines/torture/co-await-10-template-fn-arg.C: New test.
	* g++.dg/coroutines/torture/co-await-11-forwarding.C: New test.
	* g++.dg/coroutines/torture/co-await-12-operator-2.C: New test.
	* g++.dg/coroutines/torture/co-await-13-return-ref.C: New test.
	* g++.dg/coroutines/torture/co-ret-00-void-return-is-ready.C: New test.
	* g++.dg/coroutines/torture/co-ret-01-void-return-is-suspend.C: New test.
	* g++.dg/coroutines/torture/co-ret-03-different-GRO-type.C: New test.
	* g++.dg/coroutines/torture/co-ret-04-GRO-nontriv.C: New test.
	* g++.dg/coroutines/torture/co-ret-05-return-value.C: New test.
	* g++.dg/coroutines/torture/co-ret-06-template-promise-val-1.C: New test.
	* g++.dg/coroutines/torture/co-ret-07-void-cast-expr.C: New test.
	* g++.dg/coroutines/torture/co-ret-08-template-cast-ret.C: New test.
	* g++.dg/coroutines/torture/co-ret-09-bool-await-susp.C: New test.
	* g++.dg/coroutines/torture/co-ret-10-expression-evaluates-once.C: New test.
	* g++.dg/coroutines/torture/co-ret-11-co-ret-co-await.C: New test.
	* g++.dg/coroutines/torture/co-ret-12-co-ret-fun-co-await.C: New test.
	* g++.dg/coroutines/torture/co-ret-13-template-2.C: New test.
	* g++.dg/coroutines/torture/co-ret-14-template-3.C: New test.
	* g++.dg/coroutines/torture/co-yield-00-triv.C: New test.
	* g++.dg/coroutines/torture/co-yield-01-multi.C: New test.
	* g++.dg/coroutines/torture/co-yield-02-loop.C: New test.
	* g++.dg/coroutines/torture/co-yield-03-tmpl.C: New test.
	* g++.dg/coroutines/torture/co-yield-04-complex-local-state.C: New test.
	* g++.dg/coroutines/torture/co-yield-05-co-aw.C: New test.
	* g++.dg/coroutines/torture/co-yield-06-fun-parm.C: New test.
	* g++.dg/coroutines/torture/co-yield-07-template-fn-param.C: New test.
	* g++.dg/coroutines/torture/co-yield-08-more-refs.C: New test.
	* g++.dg/coroutines/torture/co-yield-09-more-templ-refs.C: New test.
	* g++.dg/coroutines/torture/coro-torture.exp: New file.
	* g++.dg/coroutines/torture/exceptions-test-0.C: New test.
	* g++.dg/coroutines/torture/func-params-00.C: New test.
	* g++.dg/coroutines/torture/func-params-01.C: New test.
	* g++.dg/coroutines/torture/func-params-02.C: New test.
	* g++.dg/coroutines/torture/func-params-03.C: New test.
	* g++.dg/coroutines/torture/func-params-04.C: New test.
	* g++.dg/coroutines/torture/func-params-05.C: New test.
	* g++.dg/coroutines/torture/func-params-06.C: New test.
	* g++.dg/coroutines/torture/lambda-00-co-ret.C: New test.
	* g++.dg/coroutines/torture/lambda-01-co-ret-parm.C: New test.
	* g++.dg/coroutines/torture/lambda-02-co-yield-values.C: New test.
	* g++.dg/coroutines/torture/lambda-03-auto-parm-1.C: New test.
	* g++.dg/coroutines/torture/lambda-04-templ-parm.C: New test.
	* g++.dg/coroutines/torture/lambda-05-capture-copy-local.C: New test.
	* g++.dg/coroutines/torture/lambda-06-multi-capture.C: New test.
	* g++.dg/coroutines/torture/lambda-07-multi-yield.C: New test.
	* g++.dg/coroutines/torture/lambda-08-co-ret-parm-ref.C: New test.
	* g++.dg/coroutines/torture/local-var-0.C: New test.
	* g++.dg/coroutines/torture/local-var-1.C: New test.
	* g++.dg/coroutines/torture/local-var-2.C: New test.
	* g++.dg/coroutines/torture/local-var-3.C: New test.
	* g++.dg/coroutines/torture/local-var-4.C: New test.
	* g++.dg/coroutines/torture/mid-suspend-destruction-0.C: New test.
	* g++.dg/coroutines/torture/pr92933.C: New test.
2020-01-18 11:55:56 +00:00
Jakub Jelinek 8d9254fc8a Update copyright years.
From-SVN: r279813
2020-01-01 12:51:42 +01:00
Richard Sandiford 58c036c835 Add optabs for accelerating RAW and WAR alias checks
This patch adds optabs that check whether a read followed by a write
or a write followed by a read can be divided into interleaved byte
accesses without changing the dependencies between the bytes.
This is one of the uses of the SVE2 WHILERW and WHILEWR instructions.
(The instructions can also be used to limit the VF at runtime,
but that's future work.)

2019-11-18  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* doc/sourcebuild.texi (vect_check_ptrs): Document.
	* optabs.def (check_raw_ptrs_optab, check_war_ptrs_optab): New optabs.
	* doc/md.texi: Document them.
	* internal-fn.def (IFN_CHECK_RAW_PTRS, IFN_CHECK_WAR_PTRS): New
	internal functions.
	* internal-fn.h (internal_check_ptrs_fn_supported_p): Declare.
	* internal-fn.c (check_ptrs_direct): New macro.
	(expand_check_ptrs_optab_fn): Likewise.
	(direct_check_ptrs_optab_supported_p): Likewise.
	(internal_check_ptrs_fn_supported_p): New fuction.
	* tree-data-ref.c: Include internal-fn.h.
	(create_ifn_alias_checks): New function.
	(create_intersect_range_checks): Use it.
	* config/aarch64/iterators.md (SVE2_WHILE_PTR): New int iterator.
	(optab, cmp_op): Handle it.
	(raw_war, unspec): New int attributes.
	* config/aarch64/aarch64.md (UNSPEC_WHILERW, UNSPEC_WHILE_WR): New
	constants.
	* config/aarch64/predicates.md (aarch64_bytes_per_sve_vector_operand):
	New predicate.
	* config/aarch64/aarch64-sve2.md (check_<raw_war>_ptrs<mode>): New
	expander.
	(@aarch64_sve2_while<cmp_op><GPI:mode><PRED_ALL:mode>_ptest): New
	pattern.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_check_ptrs):
	New procedure.
	* gcc.dg/vect/vect-alias-check-14.c: Expect IFN_CHECK_WAR to be
	used, if available.
	* gcc.dg/vect/vect-alias-check-15.c: Likewise.
	* gcc.dg/vect/vect-alias-check-16.c: Likewise IFN_CHECK_RAW.
	* gcc.target/aarch64/sve2/whilerw_1.c: New test.
	* gcc.target/aarch64/sve2/whilewr_1.c: Likewise.
	* gcc.target/aarch64/sve2/whilewr_2.c: Likewise.

From-SVN: r278414
2019-11-18 15:36:10 +00:00
Yuliang Wang c0c2f01390 [AArch64][SVE] Utilize ASRD instruction for division and remainder
2019-09-30  Yuliang Wang  <yuliang.wang@arm.com>

gcc/
	* config/aarch64/aarch64-sve.md (sdiv_pow2<mode>3):
	New pattern for ASRD.
	* config/aarch64/iterators.md (UNSPEC_ASRD): New unspec.
	* internal-fn.def (IFN_DIV_POW2): New internal function.
	* optabs.def (sdiv_pow2_optab): New optab.
	* tree-vect-patterns.c (vect_recog_divmod_pattern):
	Modify pattern to support new operation.
	* doc/md.texi (sdiv_pow2$var{m3}): Documentation for the above.
	* doc/sourcebuild.texi (vect_sdiv_pow2_si):
	Document new target selector.

gcc/testsuite/
	* gcc.dg/vect/vect-sdiv-pow2-1.c: New test.
	* gcc.target/aarch64/sve/asrdiv_1.c: As above.
	* lib/target-supports.exp (check_effective_target_vect_sdiv_pow2_si):
	Return true for AArch64 with SVE.

From-SVN: r276343
2019-09-30 16:55:45 +00:00
Yuliang Wang 58cc98767a Vectorise multiply high with scaling operations (PR 89386)
2019-09-12  Yuliang Wang  <yuliang.wang@arm.com>

gcc/
	PR tree-optimization/89386
	* config/aarch64/aarch64-sve2.md (<su>mull<bt><Vwide>)
	(<r>shrnb<mode>, <r>shrnt<mode>): New SVE2 patterns.
	(<su>mulh<r>s<mode>3): New pattern for MULHRS.
	* config/aarch64/iterators.md (UNSPEC_SMULLB, UNSPEC_SMULLT)
	(UNSPEC_UMULLB, UNSPEC_UMULLT, UNSPEC_SHRNB, UNSPEC_SHRNT)
	(UNSPEC_RSHRNB, UNSPEC_RSHRNT, UNSPEC_SMULHS, UNSPEC_SMULHRS)
	UNSPEC_UMULHS, UNSPEC_UMULHRS): New unspecs.
	(MULLBT, SHRNB, SHRNT, MULHRS): New int iterators.
	(su, r): Handle the unspecs above.
	(bt): New int attribute.
	* internal-fn.def (IFN_MULHS, IFN_MULHRS): New internal functions.
	* internal-fn.c (first_commutative_argument): Commutativity info for
	above.
	* optabs.def (smulhs_optab, smulhrs_optab, umulhs_optab)
	(umulhrs_optab): New optabs.
	* doc/md.texi (smulhs$var{m3}, umulhs$var{m3})
	(smulhrs$var{m3}, umulhrs$var{m3}): Documentation for the above.
	* tree-vect-patterns.c (vect_recog_mulhs_pattern): New pattern
	function.
	(vect_vect_recog_func_ptrs): Add it.
	* testsuite/gcc.target/aarch64/sve2/mulhrs_1.c: New test.
	* testsuite/gcc.dg/vect/vect-mulhrs-1.c: As above.
	* testsuite/gcc.dg/vect/vect-mulhrs-2.c: As above.
	* testsuite/gcc.dg/vect/vect-mulhrs-3.c: As above.
	* testsuite/gcc.dg/vect/vect-mulhrs-4.c: As above.
	* doc/sourcebuild.texi (vect_mulhrs_hi): Document new target selector.
	* testsuite/lib/target-supports.exp
	(check_effective_target_vect_mulhrs_hi): Return true for AArch64
	with SVE2.

From-SVN: r275682
2019-09-12 09:59:58 +00:00
Tejas Joshi d3b92f35d8 i386: Roundeven expansion for SSE4.1+
gcc/ChangeLog:

2019-08-26  Tejas Joshi  <tejasjoshi9673@gmail.com>
            Uros Bizjak  <ubizjak@gmail.com>

	* builtins.c (mathfn_built_in_2): Change CASE_MATHFN to
	CASE_MATHFN_FLOATN for roundeven.
	* config/i386/i386.c (ix86_i387_mode_needed): Add case
	I387_ROUNDEVEN.
	(ix86_mode_needed): Likewise.
	(ix86_mode_after): Likewise.
	(ix86_mode_entry): Likewise.
	(ix86_mode_exit): Likewise.
	(ix86_emit_mode_set): Likewise.
	(emit_i387_cw_initialization): Add case I387_CW_ROUNDEVEN.
	* config/i386/i386.h (ix86_stack_slot): Add SLOT_CW_ROUNDEVEN.
	(ix86_entry): Add I387_ROUNDEVEN.
	(avx_u128_state): Add I387_CW_ANY.
	* config/i386/i386.md: Define UNSPEC_FRNDINT_ROUNDEVEN.
	(define_int_iterator): Likewise.
	(define_int_attr): Likewise for rounding_insn, rounding and ROUNDING.
	(define_constant): Define ROUND_ROUNDEVEN mode.
	(define_attr): Add roundeven mode for i387_cw.
	(<rouding_insn><mode>2): Add condition for ROUND_ROUNDEVEN.
	* internal-fn.def (ROUNDEVEN): New builtin function.
	* optabs.def (roundeven_optab): New optab.

gcc/testsuite/ChangeLog:

2019-08-26  Tejas Joshi  <tejasjoshi9673@gmail.com>

	* gcc.target/i386/sse4_1-round-roundeven-1.c: New test.
	* gcc.target/i386/sse4_1-round-roundeven-2.c: New test.


Co-Authored-By: Uros Bizjak <ubizjak@gmail.com>

From-SVN: r274928
2019-08-26 14:41:59 +02:00
Richard Sandiford 20103c0ea9 Add support for conditional shifts
This patch adds support for IFN_COND shifts left and shifts right.
This is mostly mechanical, but since we try to handle conditional
operations in the same way as unconditional operations in match.pd,
we need to support IFN_COND shifts by scalars as well as vectors.
E.g.:

   IFN_COND_SHL (cond, a, { 1, 1, ... }, fallback)

and:

   IFN_COND_SHL (cond, a, 1, fallback)

are the same operation, with:

   (for shiftrotate (lrotate rrotate lshift rshift)
    ...
    /* Prefer vector1 << scalar to vector1 << vector2
       if vector2 is uniform.  */
    (for vec (VECTOR_CST CONSTRUCTOR)
     (simplify
      (shiftrotate @0 vec@1)
      (with { tree tem = uniform_vector_p (@1); }
       (if (tem)
	(shiftrotate @0 { tem; }))))))

preferring the latter.  The patch copes with this by extending
create_convert_operand_from to handle scalar-to-vector conversions.

2019-08-15  Richard Sandiford  <richard.sandiford@arm.com>
	    Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

gcc/
	* internal-fn.def (IFN_COND_SHL, IFN_COND_SHR): New internal functions.
	* internal-fn.c (FOR_EACH_CODE_MAPPING): Handle shifts.
	* match.pd (UNCOND_BINARY, COND_BINARY): Likewise.
	* optabs.def (cond_ashl_optab, cond_ashr_optab, cond_lshr_optab): New
	optabs.
	* optabs.h (create_convert_operand_from): Expand comment.
	* optabs.c (maybe_legitimize_operand): Allow implicit broadcasts
	when mapping scalar rtxes to vector operands.
	* config/aarch64/iterators.md (SVE_INT_BINARY): Add ashift,
	ashiftrt and lshiftrt.
	(sve_int_op, sve_int_op_rev, sve_pred_int_rhs2_operand): Handle them.
	* config/aarch64/aarch64-sve.md (*cond_<optab><mode>_2_const)
	(*cond_<optab><mode>_any_const): New patterns.

gcc/testsuite/
	* gcc.target/aarch64/sve/cond_shift_1.c: New test.
	* gcc.target/aarch64/sve/cond_shift_1_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_2.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_2_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_3.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_3_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_4.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_4_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_5.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_5_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_6.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_6_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_7.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_7_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_8.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_8_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_9.c: Likewise.
	* gcc.target/aarch64/sve/cond_shift_9_run.c: Likewise.

Co-Authored-By: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>

From-SVN: r274505
2019-08-15 08:05:50 +00:00
Alejandro Martinez bce29d65eb [Vectorizer] Support masking fold left reductions
This patch adds support in the vectorizer for masking fold left reductions.
This avoids the need to insert a conditional assignement with some identity
value.

From-SVN: r272407
2019-06-18 08:09:00 +00:00
Przemyslaw Wirkus a52cf5cf27 2019-05-14 Przemyslaw Wirkus <przemyslaw.wirkus@arm.com\>
gcc/
	* internal-fn.def (SIGNBIT): New.
	* config/aarch64/aarch64-simd.md (signbitv2sf2): New expand
	defined.
	(signbitv4sf2): Likewise.

gcc/testsuite/
	* gcc.target/aarch64/signbitv4sf.c: New test.
	* gcc.target/aarch64/signbitv2sf.c: New test.

From-SVN: r271149
2019-05-14 08:07:56 +00:00
Jakub Jelinek d8fcab6894 re PR c++/85052 (Implement support for clang's __builtin_convertvector)
PR c++/85052
	* tree-vect-generic.c: Include insn-config.h and recog.h.
	(expand_vector_piecewise): Add defaulted ret_type argument,
	if non-NULL, use that in preference to type for the result type.
	(expand_vector_parallel): Formatting fix.
	(do_vec_conversion, do_vec_narrowing_conversion,
	expand_vector_conversion): New functions.
	(expand_vector_operations_1): Call expand_vector_conversion
	for VEC_CONVERT ifn calls.
	* internal-fn.def (VEC_CONVERT): New internal function.
	* internal-fn.c (expand_VEC_CONVERT): New function.
	* fold-const-call.c (fold_const_vec_convert): New function.
	(fold_const_call): Use it for CFN_VEC_CONVERT.
	* doc/extend.texi (__builtin_convertvector): Document.
c-family/
	* c-common.h (enum rid): Add RID_BUILTIN_CONVERTVECTOR.
	(c_build_vec_convert): Declare.
	* c-common.c (c_build_vec_convert): New function.
c/
	* c-parser.c (c_parser_postfix_expression): Parse
	__builtin_convertvector.
cp/
	* cp-tree.h (cp_build_vec_convert): Declare.
	* parser.c (cp_parser_postfix_expression): Parse
	__builtin_convertvector.
	* constexpr.c: Include fold-const-call.h.
	(cxx_eval_internal_function): Handle IFN_VEC_CONVERT.
	(potential_constant_expression_1): Likewise.
	* semantics.c (cp_build_vec_convert): New function.
	* pt.c (tsubst_copy_and_build): Handle CALL_EXPR to
	IFN_VEC_CONVERT.
testsuite/
	* c-c++-common/builtin-convertvector-1.c: New test.
	* c-c++-common/torture/builtin-convertvector-1.c: New test.
	* g++.dg/ext/builtin-convertvector-1.C: New test.
	* g++.dg/cpp0x/constexpr-builtin4.C: New test.

From-SVN: r267632
2019-01-07 09:49:08 +01:00
Jakub Jelinek a554497024 Update copyright years.
From-SVN: r267494
2019-01-01 13:31:55 +01:00
Uros Bizjak 247c45b265 re PR target/88556 (Inline built-in sinh, cosh, tanh for -ffast-math)
PR target/88556
	* internal-fn.def (COSH): New.
	(SINH): Ditto.
	(TANH): Ditto.
	* optabs.def (cosh_optab): New.
	(sinh_optab): Ditto.
	(tanh_optab): Ditto.
	* config/i386/i386-protos.h (ix86_emit_i387_sinh): New prototype.
	(ix86_emit_i387_cosh): Ditto.
	(ix86_emit_i387_tanh): Ditto.
	* config/i386/i386.c (ix86_emit_i387_sinh): New function.
	(ix86_emit_i387_cosh): Ditto.
	(ix86_emit_i387_tanh): Ditto.
	* config/i386/i386.md (sinhxf2): New expander.
	(sinh<mode>2):	Ditto.
	(coshxf2): Ditto.
	(cosh<mode>2): Ditto.
	(tanhxf2): Ditto.
	(tanh<mode>2): Ditto.

From-SVN: r267325
2018-12-21 14:30:58 +01:00
Uros Bizjak a81037cea6 re PR target/88502 (Inline built-in asinh, acosh, atanh for -ffast-math)
PR target/88502
	* internal-fn.def (ACOSH): New.
	(ASINH): Ditto.
	(ATANH): Ditto.
	* optabs.def (acosh_optab): New.
	(asinh_optab): Ditto.
	(atanh_optab): Ditto.
	* config/i386/i386-protos.h (ix86_emit_i387_asinh): New prototype.
	(ix86_emit_i387_acosh): Ditto.
	(ix86_emit_i387_atanh): Ditto.
	* config/i386/i386.c (ix86_emit_i387_asinh): New function.
	(ix86_emit_i387_acosh): Ditto.
	(ix86_emit_i387_atanh): Ditto.
	* config/i386/i386.md (asinhxf2): New expander.
	(asinh<mode>2):	Ditto.
	(acoshxf2): Ditto.
	(acosh<mode>2): Ditto.
	(atanhxf2): Ditto.
	(atanh<mode>2): Ditto.

From-SVN: r267204
2018-12-17 16:46:20 +01:00
Uros Bizjak 4dd9b6c6bc re PR target/88474 (Inline built-in hypot for -ffast-math)
PR target/88474
	* internal-fn.def (HYPOT): New.
	* optabs.def (hypot_optab): New.
	* config/i386/i386.md (hypot<mode>3): New expander.

From-SVN: r267137
2018-12-14 18:04:48 +01:00
Richard Sandiford b41d1f6ed7 Add IFN_COND_FMA functions
This patch adds conditional equivalents of the IFN_FMA built-in functions.
Most of it is just a mechanical extension of the binary stuff.

2018-07-12  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* doc/md.texi (cond_fma, cond_fms, cond_fnma, cond_fnms): Document.
	* optabs.def (cond_fma_optab, cond_fms_optab, cond_fnma_optab)
	(cond_fnms_optab): New optabs.
	* internal-fn.def (COND_FMA, COND_FMS, COND_FNMA, COND_FNMS): New
	internal functions.
	(FMA): Use DEF_INTERNAL_FLT_FN rather than DEF_INTERNAL_FLT_FLOATN_FN.
	* internal-fn.h (get_conditional_internal_fn): Declare.
	(get_unconditional_internal_fn): Likewise.
	* internal-fn.c (cond_ternary_direct): New macro.
	(expand_cond_ternary_optab_fn): Likewise.
	(direct_cond_ternary_optab_supported_p): Likewise.
	(FOR_EACH_COND_FN_PAIR): Likewise.
	(get_conditional_internal_fn): New function.
	(get_unconditional_internal_fn): Likewise.
	* gimple-match.h (gimple_match_op::MAX_NUM_OPS): Bump to 5.
	(gimple_match_op::gimple_match_op): Add a new overload for 5
	operands.
	(gimple_match_op::set_op): Likewise.
	(gimple_resimplify5): Declare.
	* genmatch.c (decision_tree::gen): Generate simplifications for
	5 operands.
	* gimple-match-head.c (gimple_simplify): Define an overload for
	5 operands.  Handle calls with 5 arguments in the top-level overload.
	(convert_conditional_op): Handle conversions from unconditional
	internal functions to conditional ones.
	(gimple_resimplify5): New function.
	(build_call_internal): Pass a fifth operand.
	(maybe_push_res_to_seq): Likewise.
	(try_conditional_simplification): Try converting conditional
	internal functions to unconditional internal functions.
	Handle 3-operand unconditional forms.
	* match.pd (UNCOND_TERNARY, COND_TERNARY): Operator lists.
	Define ternary equivalents of the current rules for binary conditional
	internal functions.
	* config/aarch64/aarch64.c (aarch64_preferred_else_value): Handle
	ternary operations.
	* config/aarch64/iterators.md (UNSPEC_COND_FMLA, UNSPEC_COND_FMLS)
	(UNSPEC_COND_FNMLA, UNSPEC_COND_FNMLS): New unspecs.
	(optab): Handle them.
	(SVE_COND_FP_TERNARY): New int iterator.
	(sve_fmla_op, sve_fmad_op): New int attributes.
	* config/aarch64/aarch64-sve.md (cond_<optab><mode>)
	(*cond_<optab><mode>_2, *cond_<optab><mode_4)
	(*cond_<optab><mode>_any): New SVE_COND_FP_TERNARY patterns.

gcc/testsuite/
	* gcc.dg/vect/vect-cond-arith-3.c: New test.
	* gcc.target/aarch64/sve/vcond_13.c: Likewise.
	* gcc.target/aarch64/sve/vcond_13_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_14.c: Likewise.
	* gcc.target/aarch64/sve/vcond_14_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_15.c: Likewise.
	* gcc.target/aarch64/sve/vcond_15_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_16.c: Likewise.
	* gcc.target/aarch64/sve/vcond_16_run.c: Likewise.

From-SVN: r262587
2018-07-12 13:01:33 +00:00
Richard Sandiford 0267732bae [16/n] PR85694: Add detection of averaging operations
This patch adds detection of average instructions:

       a = (((wide) b + (wide) c) >> 1);
   --> a = (wide) .AVG_FLOOR (b, c);

       a = (((wide) b + (wide) c + 1) >> 1);
   --> a = (wide) .AVG_CEIL (b, c);

in cases where users of "a" need only the low half of the result,
making the cast to (wide) redundant.  The heavy lifting was done by
earlier patches.

This showed up another problem in vectorizable_call: if the call is a
pattern definition statement rather than the main pattern statement,
the type of vectorised call might be different from the type of the
original statement.

2018-07-03  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR tree-optimization/85694
	* doc/md.texi (avgM3_floor, uavgM3_floor, avgM3_ceil)
	(uavgM3_ceil): Document new optabs.
	* doc/sourcebuild.texi (vect_avg_qi): Document new target selector.
	* internal-fn.def (IFN_AVG_FLOOR, IFN_AVG_CEIL): New internal
	functions.
	* optabs.def (savg_floor_optab, uavg_floor_optab, savg_ceil_optab)
	(savg_ceil_optab): New optabs.
	* tree-vect-patterns.c (vect_recog_average_pattern): New function.
	(vect_vect_recog_func_ptrs): Add it.
	* tree-vect-stmts.c (vectorizable_call): Get the type of the zero
	constant directly from the associated lhs.

gcc/testsuite/
	PR tree-optimization/85694
	* lib/target-supports.exp (check_effective_target_vect_avg_qi): New
	proc.
	* gcc.dg/vect/vect-avg-1.c: New test.
	* gcc.dg/vect/vect-avg-2.c: Likewise.
	* gcc.dg/vect/vect-avg-3.c: Likewise.
	* gcc.dg/vect/vect-avg-4.c: Likewise.
	* gcc.dg/vect/vect-avg-5.c: Likewise.
	* gcc.dg/vect/vect-avg-6.c: Likewise.
	* gcc.dg/vect/vect-avg-7.c: Likewise.
	* gcc.dg/vect/vect-avg-8.c: Likewise.
	* gcc.dg/vect/vect-avg-9.c: Likewise.
	* gcc.dg/vect/vect-avg-10.c: Likewise.
	* gcc.dg/vect/vect-avg-11.c: Likewise.
	* gcc.dg/vect/vect-avg-12.c: Likewise.
	* gcc.dg/vect/vect-avg-13.c: Likewise.
	* gcc.dg/vect/vect-avg-14.c: Likewise.

From-SVN: r262335
2018-07-03 10:03:44 +00:00
Richard Sandiford 6c4fd4a9fe Add IFN_COND_{MUL,DIV,MOD,RDIV}
This patch adds support for conditional multiplication and division.
It's mostly mechanical, but a few notes:

* The *_optab name and the .md names are the same as the unconditional
  forms, just with "cond_" added to the front.  This means we still
  have the awkward difference between sdiv and div, etc.

* It was easier to retain the difference between integer and FP
  division in the function names, given that they map to different
  tree codes (TRUNC_DIV_EXPR and RDIV_EXPR).

* SVE has no direct support for IFN_COND_MOD, but it seemed more
  consistent to add it anyway.

* Adding IFN_COND_MUL enables an extra fully-masked reduction
  in gcc.dg/vect/pr53773.c.

* In practice we don't actually use the integer division forms without
  if-conversion support (added by a later patch).

2018-05-25  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* doc/sourcebuild.texi (vect_double_cond_arith): Include
	multiplication and division.
	* doc/md.texi (cond_mul@var{m}, cond_div@var{m}, cond_mod@var{m})
	(cond_udiv@var{m}, cond_umod@var{m}): Document.
	* optabs.def (cond_smul_optab, cond_sdiv_optab, cond_smod_optab)
	(cond_udiv_optab, cond_umod_optab): New optabs.
	* internal-fn.def (IFN_COND_MUL, IFN_COND_DIV, IFN_COND_MOD)
	(IFN_COND_RDIV): New internal functions.
	* internal-fn.c (get_conditional_internal_fn): Handle TRUNC_DIV_EXPR,
	TRUNC_MOD_EXPR and RDIV_EXPR.
	* match.pd (UNCOND_BINARY, COND_BINARY): Handle them.
	* config/aarch64/iterators.md (UNSPEC_COND_MUL, UNSPEC_COND_DIV):
	New unspecs.
	(SVE_INT_BINARY): Include mult.
	(SVE_COND_FP_BINARY): Include UNSPEC_MUL and UNSPEC_DIV.
	(optab, sve_int_op): Handle mult.
	(optab, sve_fp_op, commutative): Handle UNSPEC_COND_MUL and
	UNSPEC_COND_DIV.
	* config/aarch64/aarch64-sve.md (cond_<optab><mode>): New pattern
	for SVE_INT_BINARY_SD.

gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_vect_double_cond_arith): Include
	multiplication and division.
	* gcc.dg/vect/pr53773.c: Do not expect a scalar tail when using
	fully-masked loops with a fixed vector length.
	* gcc.dg/vect/vect-cond-arith-1.c: Add multiplication and division
	tests.
	* gcc.target/aarch64/sve/vcond_8.c: Likewise.
	* gcc.target/aarch64/sve/vcond_9.c: Likewise.
	* gcc.target/aarch64/sve/vcond_12.c: Add multiplication tests.

From-SVN: r260713
2018-05-25 08:53:15 +00:00
Richard Sandiford 9d4ac06e02 Add an "else" argument to IFN_COND_* functions
As suggested by Richard B, this patch changes the IFN_COND_*
functions so that they take the else value of the ?: operation
as a final argument, rather than always using argument 1.

All current callers will still use the equivalent of argument 1,
so this patch makes the SVE code assert that for now.  Later patches
add the general case.

2018-05-25  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* doc/md.texi: Update the documentation of the cond_* optabs
	to mention the new final operand.  Fix GET_MODE_NUNITS call.
	Describe the scalar case too.
	* internal-fn.def (IFN_EXTRACT_LAST): Change type to fold_left.
	* internal-fn.c (expand_cond_unary_optab_fn): Expect 3 operands
	instead of 2.
	(expand_cond_binary_optab_fn): Expect 4 operands instead of 3.
	(get_conditional_internal_fn): Update comment.
	* tree-vect-loop.c (vectorizable_reduction): Pass the original
	accumulator value as a final argument to conditional functions.
	* config/aarch64/aarch64-sve.md (cond_<optab><mode>): Turn into
	a define_expand and add an "else" operand.  Assert for now that
	the else operand is equal to operand 2.  Use SVE_INT_BINARY and
	SVE_COND_FP_BINARY instead of SVE_COND_INT_OP and SVE_COND_FP_OP.
	(*cond_<optab><mode>): New patterns.
	* config/aarch64/iterators.md (UNSPEC_COND_SMAX, UNSPEC_COND_UMAX)
	(UNSPEC_COND_SMIN, UNSPEC_COND_UMIN, UNSPEC_COND_AND, UNSPEC_COND_ORR)
	(UNSPEC_COND_EOR): Delete.
	(optab): Remove associated mappings.
	(SVE_INT_BINARY): New code iterator.
	(sve_int_op): Remove int attribute and add "minus" to the code
	attribute.
	(SVE_COND_INT_OP): Delete.
	(SVE_COND_FP_OP): Rename to...
	(SVE_COND_FP_BINARY): ...this.

From-SVN: r260707
2018-05-25 06:48:47 +00:00
Richard Sandiford c566cc9f78 Replace FMA_EXPR with one internal fn per optab
There are four optabs for various forms of fused multiply-add:
fma, fms, fnma and fnms.  Of these, only fma had a direct gimple
representation.  For the other three we relied on special pattern-
matching during expand, although tree-ssa-math-opts.c did have
some code to try to second-guess what expand would do.

This patch removes the old FMA_EXPR representation of fma and
introduces four new internal functions, one for each optab.
IFN_FMA is tied to BUILT_IN_FMA* while the other three are
independent directly-mapped internal functions.  It's then
possible to do the pattern-matching in match.pd and
tree-ssa-math-opts.c (via folding) can select the exact
FMA-based operation.

The BRIG & HSA parts are a best guess, but seem relatively simple.

2018-05-18  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* doc/sourcebuild.texi (scalar_all_fma): Document.
	* tree.def (FMA_EXPR): Delete.
	* internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
	* internal-fn.c (ternary_direct): New macro.
	(expand_ternary_optab_fn): Likewise.
	(direct_ternary_optab_supported_p): Likewise.
	* Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
	* builtins.c (fold_builtin_fma): Delete.
	(fold_builtin_3): Don't call it.
	* cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
	* expr.c (expand_expr_real_2): Likewise.
	* fold-const.c (operand_equal_p): Likewise.
	(fold_ternary_loc): Likewise.
	* gimple-pretty-print.c (dump_ternary_rhs): Likewise.
	* gimple.c (DEFTREECODE): Likewise.
	* gimplify.c (gimplify_expr): Likewise.
	* optabs-tree.c (optab_for_tree_code): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-eh.c (operation_could_trap_p): Likewise.
	(stmt_could_throw_1_p): Likewise.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	(op_code_prio): Likewise.
	* tree-ssa-loop-im.c (stmt_cost): Likewise.
	* tree-ssa-operands.c (get_expr_operands): Likewise.
	* tree.c (commutative_ternary_tree_code, add_expr): Likewise.
	* fold-const-call.h (fold_fma): Delete.
	* fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS,
	CFN_FNMA and CFN_FNMS.
	(fold_fma): Delete.
	* genmatch.c (combined_fn): New enum.
	(commutative_ternary_tree_code): Remove FMA_EXPR handling.
	(commutative_op): New function.
	(commutate): Use it.  Handle more than 2 operands.
	(dt_operand::gen_gimple_expr): Use commutative_op.
	(parser::parse_expr): Allow :c to be used with non-binary
	operators if the commutative operand is known.
	* gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle
	CFN_FMS, CFN_FNMA and CFN_FNMS.
	(backprop::process_assign_use): Remove FMA_EXPR handling.
	* hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
	(gen_hsa_fma): New function.
	(gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
	IFN_FNMA and IFN_FNMS.
	* match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
	* gimple-fold.h (follow_all_ssa_edges): Declare.
	* gimple-fold.c (follow_all_ssa_edges): New function.
	* tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the
	gimple_build interface and use follow_all_ssa_edges to fold the result.
	(convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
	instead of checking for optabs directly.
	* config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
	rather than FMA_EXPRs.
	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
	call to IFN_FMA instead of an FMA_EXPR.

gcc/brig/
	* brigfrontend/brig-function.cc
	(brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
	for BRIG_OPCODE_FMA.
	(brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
	as a call.

gcc/c/
	* gimple-parser.c (c_parser_gimple_postfix_expression): Remove
	__FMA_EXPR handlng.

gcc/cp/
	* constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling.
	(potential_constant_expression_1): Likewise.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_scalar_all_fma):
	New proc.
	* gcc.dg/fma-1.c: New test.
	* gcc.dg/fma-2.c: Likewise.
	* gcc.dg/fma-3.c: Likewise.
	* gcc.dg/fma-4.c: Likewise.
	* gcc.dg/fma-5.c: Likewise.
	* gcc.dg/fma-6.c: Likewise.
	* gcc.dg/fma-7.c: Likewise.
	* gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require
	scalar_all_fma.
	* gfortran.dg/reassoc_7.f: Pass -ffp-contract=off.
	* gfortran.dg/reassoc_8.f: Likewise.
	* gfortran.dg/reassoc_9.f: Likewise.
	* gfortran.dg/reassoc_10.f: Likewise.

From-SVN: r260348
2018-05-18 08:27:58 +00:00
Martin Liska d80956bb05 Set proper internal functions fnspec (PR sanitizer/84307).
2018-02-16  Martin Liska  <mliska@suse.cz>

	PR sanitizer/84307
	* internal-fn.def (ASAN_CHECK): Set proper flags.
	(ASAN_MARK): Likewise.

From-SVN: r257729
2018-02-16 10:03:47 +00:00
Paolo Bonzini 1bbae6518f re PR sanitizer/84340 (g++.dg/asan/use-after-scope-types-1.C (and others) fails after r257585)
gcc:
2018-02-13  Paolo Bonzini <bonzini@gnu.org>

	PR sanitizer/84340
	* internal-fn.def (ASAN_CHECK, ASAN_MARK): Revert changes to fnspec.

gcc/testsuite:
2018-02-13  Paolo Bonzini  <bonzini@gnu.org>

	PR sanitizer/84307
	* gcc.dg/asan/pr84307.c: Remove test.

From-SVN: r257625
2018-02-13 13:03:22 +00:00
Paolo Bonzini 74a5138a61 re PR sanitizer/84307 (asan blocks dead-store elimination)
gcc:
2018-02-12  Paolo Bonzini <bonzini@gnu.org>

	PR sanitizer/84307
	* internal-fn.def (ASAN_CHECK): Fix fnspec to account for return value.
	(ASAN_MARK): Fix fnspec to account for return value, change pointer
	argument from 'R' to 'W' so that the pointed-to datum is clobbered.

gcc/testsuite:
2018-02-12  Paolo Bonzini  <bonzini@gnu.org>

	PR sanitizer/84307
	* gcc.dg/asan/pr84307.c: New test.

From-SVN: r257585
2018-02-12 12:47:56 +00:00
Richard Sandiford f307441ac4 Add support for SVE scatter stores
This is mostly a mechanical extension of the previous gather load
support to scatter stores.  The internal functions in this case are:

  IFN_SCATTER_STORE (base, offsets, scale, values)
  IFN_MASK_SCATTER_STORE (base, offsets, scale, values, mask)

However, one nonobvious change is to vect_analyze_data_ref_access.
If we're treating an access as a gather load or scatter store
(i.e. if STMT_VINFO_GATHER_SCATTER_P is true), the existing code
would create a dummy data_reference whose step is 0.  There's not
really much else it could do, since the whole point is that the
step isn't predictable from iteration to iteration.  We then
went into this code in vect_analyze_data_ref_access:

  /* Allow loads with zero step in inner-loop vectorization.  */
  if (loop_vinfo && integer_zerop (step))
    {
      GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
      if (!nested_in_vect_loop_p (loop, stmt))
	return DR_IS_READ (dr);

I.e. we'd take the step literally and assume that this is a load
or store to an invariant address.  Loads from invariant addresses
are supported but stores to them aren't.

The code therefore had the effect of disabling all scatter stores.
AFAICT this is true of AVX too: although tests like avx512f-scatter-1.c
test for the correctness of a scatter-like loop, they don't seem to
check whether a scatter instruction is actually used.

The patch therefore makes vect_analyze_data_ref_access return true
for scatters.  We do seem to handle the aliasing correctly;
that's tested by other functions, and is symmetrical to the
already-working gather case.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/sourcebuild.texi (vect_scatter_store): Document.
	* optabs.def (scatter_store_optab, mask_scatter_store_optab): New
	optabs.
	* doc/md.texi (scatter_store@var{m}, mask_scatter_store@var{m}):
	Document.
	* genopinit.c (main): Add supports_vec_scatter_store and
	supports_vec_scatter_store_cached to target_optabs.
	* gimple.h (gimple_expr_type): Handle IFN_SCATTER_STORE and
	IFN_MASK_SCATTER_STORE.
	* internal-fn.def (SCATTER_STORE, MASK_SCATTER_STORE): New internal
	functions.
	* internal-fn.h (internal_store_fn_p): Declare.
	(internal_fn_stored_value_index): Likewise.
	* internal-fn.c (scatter_store_direct): New macro.
	(expand_scatter_store_optab_fn): New function.
	(direct_scatter_store_optab_supported_p): New macro.
	(internal_store_fn_p): New function.
	(internal_gather_scatter_fn_p): Handle IFN_SCATTER_STORE and
	IFN_MASK_SCATTER_STORE.
	(internal_fn_mask_index): Likewise.
	(internal_fn_stored_value_index): New function.
	(internal_gather_scatter_fn_supported_p): Adjust operand numbers
	for scatter stores.
	* optabs-query.h (supports_vec_scatter_store_p): Declare.
	* optabs-query.c (supports_vec_scatter_store_p): New function.
	* tree-vectorizer.h (vect_get_store_rhs): Declare.
	* tree-vect-data-refs.c (vect_analyze_data_ref_access): Return
	true for scatter stores.
	(vect_gather_scatter_fn_p): Handle scatter stores too.
	(vect_check_gather_scatter): Consider using scatter stores if
	supports_vec_scatter_store_p.
	* tree-vect-patterns.c (vect_try_gather_scatter_pattern): Handle
	scatter stores too.
	* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
	internal_fn_stored_value_index.
	(check_load_store_masking): Handle scatter stores too.
	(vect_get_store_rhs): Make public.
	(vectorizable_call): Use internal_store_fn_p.
	(vectorizable_store): Handle scatter store internal functions.
	(vect_transform_stmt): Compare GROUP_STORE_COUNT with GROUP_SIZE
	when deciding whether the end of the group has been reached.
	* config/aarch64/aarch64.md (UNSPEC_ST1_SCATTER): New unspec.
	* config/aarch64/aarch64-sve.md (scatter_store<mode>): New expander.
	(mask_scatter_store<mode>): New insns.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_scatter_store):
	New proc.
	* gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on
	targets with scatter stores.
	* gcc.dg/vect/vect-71.c: Restrict XFAIL to targets without scatter
	stores.
	* gcc.target/aarch64/sve/mask_scatter_store_1.c: New test.
	* gcc.target/aarch64/sve/mask_scatter_store_2.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_1.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_2.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_3.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_4.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_5.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_6.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_7.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_1.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_2.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_3.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_4.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_5.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_6.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_7.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256643
2018-01-13 18:01:59 +00:00
Richard Sandiford bfaa08b7ba Add support for SVE gather loads
This patch adds support for SVE gather loads.  It uses the basically
the same analysis code as the AVX gather support, but after that
there are two major differences:

- It uses new internal functions rather than target built-ins.
  The interface is:

     IFN_GATHER_LOAD (base, offsets scale)
     IFN_MASK_GATHER_LOAD (base, offsets scale, mask)

  which should be reasonably generic.  One of the advantages of
  using internal functions is that other passes can understand what
  the functions do, but a more immediate advantage is that we can
  query the underlying target pattern to see which scales it supports.

- It uses pattern recognition to convert the offset to the right width,
  if it was originally narrower than that.  This avoids having to do
  a widening operation as part of the gather expansion itself.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/md.texi (gather_load@var{m}): Document.
	(mask_gather_load@var{m}): Likewise.
	* genopinit.c (main): Add supports_vec_gather_load and
	supports_vec_gather_load_cached to target_optabs.
	* optabs-tree.c (init_tree_optimization_optabs): Use
	ggc_cleared_alloc to allocate target_optabs.
	* optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs.
	* internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal
	functions.
	* internal-fn.h (internal_load_fn_p): Declare.
	(internal_gather_scatter_fn_p): Likewise.
	(internal_fn_mask_index): Likewise.
	(internal_gather_scatter_fn_supported_p): Likewise.
	* internal-fn.c (gather_load_direct): New macro.
	(expand_gather_load_optab_fn): New function.
	(direct_gather_load_optab_supported_p): New macro.
	(direct_internal_fn_optab): New function.
	(internal_load_fn_p): Likewise.
	(internal_gather_scatter_fn_p): Likewise.
	(internal_fn_mask_index): Likewise.
	(internal_gather_scatter_fn_supported_p): Likewise.
	* optabs-query.c (supports_at_least_one_mode_p): New function.
	(supports_vec_gather_load_p): Likewise.
	* optabs-query.h (supports_vec_gather_load_p): Declare.
	* tree-vectorizer.h (gather_scatter_info): Add ifn, element_type
	and memory_type field.
	(NUM_PATTERNS): Bump to 15.
	* tree-vect-data-refs.c: Include internal-fn.h.
	(vect_gather_scatter_fn_p): New function.
	(vect_describe_gather_scatter_call): Likewise.
	(vect_check_gather_scatter): Try using internal functions for
	gather loads.  Recognize existing calls to a gather load function.
	(vect_analyze_data_refs): Consider using gather loads if
	supports_vec_gather_load_p.
	* tree-vect-patterns.c (vect_get_load_store_mask): New function.
	(vect_get_gather_scatter_offset_type): Likewise.
	(vect_convert_mask_for_vectype): Likewise.
	(vect_add_conversion_to_patterm): Likewise.
	(vect_try_gather_scatter_pattern): Likewise.
	(vect_recog_gather_scatter_pattern): New pattern recognizer.
	(vect_vect_recog_func_ptrs): Add it.
	* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
	internal_fn_mask_index and internal_gather_scatter_fn_p.
	(check_load_store_masking): Take the gather_scatter_info as an
	argument and handle gather loads.
	(vect_get_gather_scatter_ops): New function.
	(vectorizable_call): Check internal_load_fn_p.
	(vectorizable_load): Likewise.  Handle gather load internal
	functions.
	(vectorizable_store): Update call to check_load_store_masking.
	* config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec.
	* config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators.
	* config/aarch64/predicates.md (aarch64_gather_scale_operand_w)
	(aarch64_gather_scale_operand_d): New predicates.
	* config/aarch64/aarch64-sve.md (gather_load<mode>): New expander.
	(mask_gather_load<mode>): New insns.

gcc/testsuite/
	* gcc.target/aarch64/sve/gather_load_1.c: New test.
	* gcc.target/aarch64/sve/gather_load_2.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_3.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_4.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_5.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_6.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_7.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256640
2018-01-13 18:01:34 +00:00
Richard Sandiford b781a135a0 Add support for in-order addition reduction using SVE FADDA
This patch adds support for in-order floating-point addition reductions,
which are suitable even in strict IEEE mode.

Previously vect_is_simple_reduction would reject any cases that forbid
reassociation.  The idea is instead to tentatively accept them as
"FOLD_LEFT_REDUCTIONs" and only fail later if there is no support
for them.  Although this patch only handles the particular case of plus
and minus on floating-point types, there's no reason in principle why
we couldn't handle other cases.

The reductions use a new fold_left_plus_optab if available, otherwise
they fall back to elementwise additions or subtractions.

The vect_force_simple_reduction change makes it easier for parloops
to read the type of reduction.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* optabs.def (fold_left_plus_optab): New optab.
	* doc/md.texi (fold_left_plus_@var{m}): Document.
	* internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function.
	* internal-fn.c (fold_left_direct): Define.
	(expand_fold_left_optab_fn): Likewise.
	(direct_fold_left_optab_supported_p): Likewise.
	* fold-const-call.c (fold_const_fold_left): New function.
	(fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS.
	* tree-parloops.c (valid_reduction_p): New function.
	(gather_scalar_reductions): Use it.
	* tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type.
	(vect_finish_replace_stmt): Declare.
	* tree-vect-loop.c (fold_left_reduction_fn): New function.
	(needs_fold_left_reduction_p): New function, split out from...
	(vect_is_simple_reduction): ...here.  Accept reductions that
	forbid reassociation, but give them type FOLD_LEFT_REDUCTION.
	(vect_force_simple_reduction): Also store the reduction type in
	the assignment's STMT_VINFO_REDUC_TYPE.
	(vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION.
	(merge_with_identity): New function.
	(vect_expand_fold_left): Likewise.
	(vectorize_fold_left_reduction): Likewise.
	(vectorizable_reduction): Handle FOLD_LEFT_REDUCTION.  Leave the
	scalar phi in place for it.  Check for target support and reject
	cases that would reassociate the operation.  Defer the transform
	phase to vectorize_fold_left_reduction.
	* config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec.
	* config/aarch64/aarch64-sve.md (fold_left_plus_<mode>): New expander.
	(*fold_left_plus_<mode>, *pred_fold_left_plus_<mode>): New insns.

gcc/testsuite/
	* gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and
	check for a message about using in-order reductions.
	* gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and
	check for a message about using in-order reductions.
	* gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be
	vectorized and check for a message about using in-order reductions.
	Expect targets with variable-length vectors to fall back to the
	fixed-length mininum.
	* gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and
	check for a message about using in-order reductions.
	* gcc.dg/vect/vect-reduc-in-order-1.c: New test.
	* gcc.dg/vect/vect-reduc-in-order-2.c: Likewise.
	* gcc.dg/vect/vect-reduc-in-order-3.c: Likewise.
	* gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.
	* gcc.target/aarch64/sve/reduc_strict_1.c: New test.
	* gcc.target/aarch64/sve/reduc_strict_1_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_strict_2.c: Likewise.
	* gcc.target/aarch64/sve/reduc_strict_2_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_strict_3.c: Likewise.
	* gcc.target/aarch64/sve/slp_13.c: Add floating-point types.
	* gfortran.dg/vect/vect-8.f90: Expect 22 loops to be vectorized if
	vect_fold_left_plus.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256639
2018-01-13 18:01:24 +00:00
Richard Sandiford bb6c2b68d6 Add support for conditional reductions using SVE CLASTB
This patch uses SVE CLASTB to optimise conditional reductions.  It means
that we no longer need to maintain a separate index vector to record
the most recent valid value, and no longer need to worry about overflow
cases.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/md.texi (fold_extract_last_@var{m}): Document.
	* doc/sourcebuild.texi (vect_fold_extract_last): Likewise.
	* optabs.def (fold_extract_last_optab): New optab.
	* internal-fn.def (FOLD_EXTRACT_LAST): New internal function.
	* internal-fn.c (fold_extract_direct): New macro.
	(expand_fold_extract_optab_fn): Likewise.
	(direct_fold_extract_optab_supported_p): Likewise.
	* tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type.
	* tree-vect-loop.c (vect_model_reduction_cost): Handle
	EXTRACT_LAST_REDUCTION.
	(get_initial_def_for_reduction): Do not create an initial vector
	for EXTRACT_LAST_REDUCTION reductions.
	(vectorizable_reduction): Leave the scalar phi in place for
	EXTRACT_LAST_REDUCTIONs.  Try using EXTRACT_LAST_REDUCTION
	ahead of INTEGER_INDUC_COND_REDUCTION.  Do not check for an
	epilogue code for EXTRACT_LAST_REDUCTION and defer the
	transform phase to vectorizable_condition.
	* tree-vect-stmts.c (vect_finish_stmt_generation_1): New function,
	split out from...
	(vect_finish_stmt_generation): ...here.
	(vect_finish_replace_stmt): New function.
	(vectorizable_condition): Handle EXTRACT_LAST_REDUCTION.
	* config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New
	pattern.
	* config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec.

gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_vect_fold_extract_last): New proc.
	* gcc.dg/vect/pr65947-1.c: Update dump messages.  Add markup
	for fold_extract_last.
	* gcc.dg/vect/pr65947-2.c: Likewise.
	* gcc.dg/vect/pr65947-3.c: Likewise.
	* gcc.dg/vect/pr65947-4.c: Likewise.
	* gcc.dg/vect/pr65947-5.c: Likewise.
	* gcc.dg/vect/pr65947-6.c: Likewise.
	* gcc.dg/vect/pr65947-9.c: Likewise.
	* gcc.dg/vect/pr65947-10.c: Likewise.
	* gcc.dg/vect/pr65947-12.c: Likewise.
	* gcc.dg/vect/pr65947-14.c: Likewise.
	* gcc.dg/vect/pr80631-1.c: Likewise.
	* gcc.target/aarch64/sve/clastb_1.c: New test.
	* gcc.target/aarch64/sve/clastb_1_run.c: Likewise.
	* gcc.target/aarch64/sve/clastb_2.c: Likewise.
	* gcc.target/aarch64/sve/clastb_2_run.c: Likewise.
	* gcc.target/aarch64/sve/clastb_3.c: Likewise.
	* gcc.target/aarch64/sve/clastb_3_run.c: Likewise.
	* gcc.target/aarch64/sve/clastb_4.c: Likewise.
	* gcc.target/aarch64/sve/clastb_4_run.c: Likewise.
	* gcc.target/aarch64/sve/clastb_5.c: Likewise.
	* gcc.target/aarch64/sve/clastb_5_run.c: Likewise.
	* gcc.target/aarch64/sve/clastb_6.c: Likewise.
	* gcc.target/aarch64/sve/clastb_6_run.c: Likewise.
	* gcc.target/aarch64/sve/clastb_7.c: Likewise.
	* gcc.target/aarch64/sve/clastb_7_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256633
2018-01-13 17:59:59 +00:00
Richard Sandiford bfe1bb57ba Add support for vectorising live-out values using SVE LASTB
This patch uses the SVE LASTB instruction to optimise cases in which
a value produced by the final scalar iteration of a vectorised loop is
live outside the loop.  Previously this situation would stop us from
using a fully-masked loop.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/md.texi (extract_last_@var{m}): Document.
	* optabs.def (extract_last_optab): New optab.
	* internal-fn.def (EXTRACT_LAST): New internal function.
	* internal-fn.c (cond_unary_direct): New macro.
	(expand_cond_unary_optab_fn): Likewise.
	(direct_cond_unary_optab_supported_p): Likewise.
	* tree-vect-loop.c (vectorizable_live_operation): Allow fully-masked
	loops using EXTRACT_LAST.
	* config/aarch64/aarch64-sve.md (aarch64_sve_lastb<mode>): Rename to...
	(extract_last_<mode>): ...this optab.
	(vec_extract<mode><Vel>): Update accordingly.

gcc/testsuite/
	* gcc.target/aarch64/sve/live_1.c: New test.
	* gcc.target/aarch64/sve/live_1_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256632
2018-01-13 17:59:50 +00:00