Commit Graph

159014 Commits

Author SHA1 Message Date
H.J. Lu
95d11c1707 x86: Disallow -mindirect-branch=/-mfunction-return= with -mcmodel=large
Since the thunk function may not be reachable in large code model,
-mcmodel=large is incompatible with -mindirect-branch=thunk,
-mindirect-branch=thunk-extern, -mfunction-return=thunk and
-mfunction-return=thunk-extern.  Issue an error when they are used with
-mcmodel=large.

gcc/

	* config/i386/i386.c (ix86_set_indirect_branch_type): Disallow
	-mcmodel=large with -mindirect-branch=thunk,
	-mindirect-branch=thunk-extern, -mfunction-return=thunk and
	-mfunction-return=thunk-extern.
	* doc/invoke.texi: Document -mcmodel=large is incompatible with
	-mindirect-branch=thunk, -mindirect-branch=thunk-extern,
	-mfunction-return=thunk and -mfunction-return=thunk-extern.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-10.c: New test.
	* gcc.target/i386/indirect-thunk-8.c: Likewise.
	* gcc.target/i386/indirect-thunk-9.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-10.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-11.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-9.c: Likewise.
	* gcc.target/i386/ret-thunk-17.c: Likewise.
	* gcc.target/i386/ret-thunk-18.c: Likewise.
	* gcc.target/i386/ret-thunk-19.c: Likewise.
	* gcc.target/i386/ret-thunk-20.c: Likewise.
	* gcc.target/i386/ret-thunk-21.c: Likewise.

From-SVN: r256664
2018-01-14 06:43:10 -08:00
H.J. Lu
6abe11c1a3 x86: Add 'V' register operand modifier
Add 'V', a special modifier which prints the name of the full integer
register without '%'.  For

extern void (*func_p) (void);

void
foo (void)
{
  asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
}

it generates:

foo:
	movq	func_p(%rip), %rax
	call	__x86_indirect_thunk_rax
	ret

gcc/

	* config/i386/i386.c (print_reg): Print the name of the full
	integer register without '%'.
	(ix86_print_operand): Handle 'V'.
	 * doc/extend.texi: Document 'V' modifier.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-register-4.c: New test.

From-SVN: r256663
2018-01-14 06:41:25 -08:00
H.J. Lu
d543c04b79 x86: Add -mindirect-branch-register
Add -mindirect-branch-register to force indirect branch via register.
This is implemented by disabling patterns of indirect branch via memory,
similar to TARGET_X32.

-mindirect-branch= and -mfunction-return= tests are updated with
-mno-indirect-branch-register to avoid false test failures when
-mindirect-branch-register is added to RUNTESTFLAGS for "make check".

gcc/

	* config/i386/constraints.md (Bs): Disallow memory operand for
	-mindirect-branch-register.
	(Bw): Likewise.
	* config/i386/predicates.md (indirect_branch_operand): Likewise.
	(GOT_memory_operand): Likewise.
	(call_insn_operand): Likewise.
	(sibcall_insn_operand): Likewise.
	(GOT32_symbol_operand): Likewise.
	* config/i386/i386.md (indirect_jump): Call convert_memory_address
	for -mindirect-branch-register.
	(tablejump): Likewise.
	(*sibcall_memory): Likewise.
	(*sibcall_value_memory): Likewise.
	Disallow peepholes of indirect call and jump via memory for
	-mindirect-branch-register.
	(*call_pop): Replace m with Bw.
	(*call_value_pop): Likewise.
	(*sibcall_pop_memory): Replace m with Bs.
	* config/i386/i386.opt (mindirect-branch-register): New option.
	* doc/invoke.texi: Document -mindirect-branch-register option.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-1.c (dg-options): Add
	-mno-indirect-branch-register.
	* gcc.target/i386/indirect-thunk-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
	* gcc.target/i386/ret-thunk-10.c: Likewise.
	* gcc.target/i386/ret-thunk-11.c: Likewise.
	* gcc.target/i386/ret-thunk-12.c: Likewise.
	* gcc.target/i386/ret-thunk-13.c: Likewise.
	* gcc.target/i386/ret-thunk-14.c: Likewise.
	* gcc.target/i386/ret-thunk-15.c: Likewise.
	* gcc.target/i386/ret-thunk-9.c: Likewise.
	* gcc.target/i386/indirect-thunk-register-1.c: New test.
	* gcc.target/i386/indirect-thunk-register-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-register-3.c: Likewise.

From-SVN: r256662
2018-01-14 06:40:01 -08:00
H.J. Lu
45e1401938 x86: Add -mfunction-return=
Add -mfunction-return= option to convert function return to call and
return thunks.  The default is 'keep', which keeps function return
unmodified.  'thunk' converts function return to call and return thunk.
'thunk-inline' converts function return to inlined call and return thunk.
'thunk-extern' converts function return to external call and return
thunk provided in a separate object file.  You can control this behavior
for a specific function by using the function attribute function_return.

Function return thunk is the same as memory thunk for -mindirect-branch=
where the return address is at the top of the stack:

__x86_return_thunk:
	call L2
L1:
	pause
	lfence
	jmp L1
L2:
	lea 8(%rsp), %rsp|lea 4(%esp), %esp
	ret

and function return becomes

	jmp __x86_return_thunk

-mindirect-branch= tests are updated with -mfunction-return=keep to
avoid false test failures when -mfunction-return=thunk is added to
RUNTESTFLAGS for "make check".

gcc/

	* config/i386/i386-protos.h (ix86_output_function_return): New.
	* config/i386/i386.c (ix86_set_indirect_branch_type): Also
	set function_return_type.
	(indirect_thunk_name): Add ret_p to indicate thunk for function
	return.
	(output_indirect_thunk_function): Pass false to
	indirect_thunk_name.
	(ix86_output_indirect_branch): Likewise.
	(output_indirect_thunk_function): Create alias for function
	return thunk if regno < 0.
	(ix86_output_function_return): New function.
	(ix86_handle_fndecl_attribute): Handle function_return.
	(ix86_attribute_table): Add function_return.
	* config/i386/i386.h (machine_function): Add
	function_return_type.
	* config/i386/i386.md (simple_return_internal): Use
	ix86_output_function_return.
	(simple_return_internal_long): Likewise.
	* config/i386/i386.opt (mfunction-return=): New option.
	(indirect_branch): Mention -mfunction-return=.
	* doc/extend.texi: Document function_return function attribute.
	* doc/invoke.texi: Document -mfunction-return= option.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-1.c (dg-options): Add
	-mfunction-return=keep.
	* gcc.target/i386/indirect-thunk-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-8.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
	* gcc.target/i386/ret-thunk-1.c: New test.
	* gcc.target/i386/ret-thunk-10.c: Likewise.
	* gcc.target/i386/ret-thunk-11.c: Likewise.
	* gcc.target/i386/ret-thunk-12.c: Likewise.
	* gcc.target/i386/ret-thunk-13.c: Likewise.
	* gcc.target/i386/ret-thunk-14.c: Likewise.
	* gcc.target/i386/ret-thunk-15.c: Likewise.
	* gcc.target/i386/ret-thunk-16.c: Likewise.
	* gcc.target/i386/ret-thunk-2.c: Likewise.
	* gcc.target/i386/ret-thunk-3.c: Likewise.
	* gcc.target/i386/ret-thunk-4.c: Likewise.
	* gcc.target/i386/ret-thunk-5.c: Likewise.
	* gcc.target/i386/ret-thunk-6.c: Likewise.
	* gcc.target/i386/ret-thunk-7.c: Likewise.
	* gcc.target/i386/ret-thunk-8.c: Likewise.
	* gcc.target/i386/ret-thunk-9.c: Likewise.

From-SVN: r256661
2018-01-14 06:37:39 -08:00
H.J. Lu
da99fd4a3c x86: Add -mindirect-branch=
Add -mindirect-branch= option to convert indirect call and jump to call
and return thunks.  The default is 'keep', which keeps indirect call and
jump unmodified.  'thunk' converts indirect call and jump to call and
return thunk.  'thunk-inline' converts indirect call and jump to inlined
call and return thunk.  'thunk-extern' converts indirect call and jump to
external call and return thunk provided in a separate object file.  You
can control this behavior for a specific function by using the function
attribute indirect_branch.

2 kinds of thunks are geneated.  Memory thunk where the function address
is at the top of the stack:

__x86_indirect_thunk:
	call L2
L1:
	pause
	lfence
	jmp L1
L2:
	lea 8(%rsp), %rsp|lea 4(%esp), %esp
	ret

Indirect jmp via memory, "jmp mem", is converted to

	push memory
	jmp __x86_indirect_thunk

Indirect call via memory, "call mem", is converted to

	jmp L2
L1:
	push [mem]
	jmp __x86_indirect_thunk
L2:
	call L1

Register thunk where the function address is in a register, reg:

__x86_indirect_thunk_reg:
	call	L2
L1:
	pause
	lfence
	jmp	L1
L2:
	movq	%reg, (%rsp)|movl    %reg, (%esp)
	ret

where reg is one of (r|e)ax, (r|e)dx, (r|e)cx, (r|e)bx, (r|e)si, (r|e)di,
(r|e)bp, r8, r9, r10, r11, r12, r13, r14 and r15.

Indirect jmp via register, "jmp reg", is converted to

	jmp __x86_indirect_thunk_reg

Indirect call via register, "call reg", is converted to

	call __x86_indirect_thunk_reg

gcc/

	* config/i386/i386-opts.h (indirect_branch): New.
	* config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
	* config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
	with local indirect jump when converting indirect call and jump.
	(ix86_set_indirect_branch_type): New.
	(ix86_set_current_function): Call ix86_set_indirect_branch_type.
	(indirectlabelno): New.
	(indirect_thunk_needed): Likewise.
	(indirect_thunk_bnd_needed): Likewise.
	(indirect_thunks_used): Likewise.
	(indirect_thunks_bnd_used): Likewise.
	(INDIRECT_LABEL): Likewise.
	(indirect_thunk_name): Likewise.
	(output_indirect_thunk): Likewise.
	(output_indirect_thunk_function): Likewise.
	(ix86_output_indirect_branch): Likewise.
	(ix86_output_indirect_jmp): Likewise.
	(ix86_code_end): Call output_indirect_thunk_function if needed.
	(ix86_output_call_insn): Call ix86_output_indirect_branch if
	needed.
	(ix86_handle_fndecl_attribute): Handle indirect_branch.
	(ix86_attribute_table): Add indirect_branch.
	* config/i386/i386.h (machine_function): Add indirect_branch_type
	and has_local_indirect_jump.
	* config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
	to true.
	(tablejump): Likewise.
	(*indirect_jump): Use ix86_output_indirect_jmp.
	(*tablejump_1): Likewise.
	(simple_return_indirect_internal): Likewise.
	* config/i386/i386.opt (mindirect-branch=): New option.
	(indirect_branch): New.
	(keep): Likewise.
	(thunk): Likewise.
	(thunk-inline): Likewise.
	(thunk-extern): Likewise.
	* doc/extend.texi: Document indirect_branch function attribute.
	* doc/invoke.texi: Document -mindirect-branch= option.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-1.c: New test.
	* gcc.target/i386/indirect-thunk-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-8.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-7.c: Likewise.

From-SVN: r256660
2018-01-14 06:35:19 -08:00
Jan Hubicka
3f05a4f072 re PR ipa/83051 (ICE on valid code at -O3: in edge_badness, at ipa-inline.c:1024)
PR ipa/83051
	* gcc.c-torture/compile/pr83051.c: New testcase.
	* ipa-inline.c (edge_badness): Tolerate roundoff errors.

From-SVN: r256659
2018-01-14 11:20:31 +00:00
Richard Sandiford
01b9bf0615 inline_small_functions speedup
After inlining A into B, inline_small_functions updates the information
for (most) callees and callers of the new B:

	  update_callee_keys (&edge_heap, where, updated_nodes);
      [...]
      /* Our profitability metric can depend on local properties
	 such as number of inlinable calls and size of the function body.
	 After inlining these properties might change for the function we
	 inlined into (since it's body size changed) and for the functions
	 called by function we inlined (since number of it inlinable callers
	 might change).  */
      update_caller_keys (&edge_heap, where, updated_nodes, NULL);

These functions in turn call can_inline_edge_p for most of the associated
edges:

	    if (can_inline_edge_p (edge, false)
		&& want_inline_small_function_p (edge, false))
	      update_edge_key (heap, edge);

can_inline_edge_p indirectly calls estimate_calls_size_and_time
on the caller node, which seems to recursively process all callee
edges rooted at the node.  It looks from this like the algorithm
can be at least quadratic in the worst case.

Maybe there's something we can do to make can_inline_edge_p cheaper, but
since neither of these two calls is responsible for reporting an inline
failure reason, it seems cheaper to test want_inline_small_function_p
first, so that we don't calculate an estimate for something that we
already know isn't a "small function".  I think the only change
needed to make that work is to check for CIF_FINAL_ERROR in
want_inline_small_function_p; at the moment we rely on can_inline_edge_p
to make that check.

This cuts the time to build optabs.ii by over 4% with an
--enable-checking=release compiler on x86_64-linux-gnu.  I've seen more
dramatic wins on aarch64-linux-gnu due to the NUM_POLY_INT_COEFFS==2
thing.  The patch doesn't affect the output code.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* ipa-inline.c (want_inline_small_function_p): Return false if
	inlining has already failed with CIF_FINAL_ERROR.
	(update_caller_keys): Call want_inline_small_function_p before
	can_inline_edge_p.
	(update_callee_keys): Likewise.

From-SVN: r256658
2018-01-14 10:56:56 +00:00
Prathamesh Kulkarni
61760b925c re PR tree-optimization/83501 (strlen(a) not folded after strcpy(a, "..."))
2018-01-14  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

	PR tree-optimization/83501
	* gcc.dg/strlenopt-39.c: Restrict to i?86 and x86_64-*-* targets.

From-SVN: r256657
2018-01-14 08:58:58 +00:00
Kelvin Nilsen
a3a821c903 rs6000-p8swap.c (rs6000_sum_of_two_registers_p): New function.
gcc/ChangeLog:

2018-01-10  Kelvin Nilsen  <kelvin@gcc.gnu.org>

	* config/rs6000/rs6000-p8swap.c (rs6000_sum_of_two_registers_p):
	New function.
	(rs6000_quadword_masked_address_p): Likewise.
	(quad_aligned_load_p): Likewise.
	(quad_aligned_store_p): Likewise.
	(const_load_sequence_p): Add comment to describe the outer-most loop.
	(mimic_memory_attributes_and_flags): New function.
	(rs6000_gen_stvx): Likewise.
	(replace_swapped_aligned_store): Likewise.
	(rs6000_gen_lvx): Likewise.
	(replace_swapped_aligned_load): Likewise.
	(replace_swapped_load_constant): Capitalize argument name in
	comment describing this function.
	(rs6000_analyze_swaps): Add a third pass to search for vector loads
	and stores that access quad-word aligned addresses and replace
	with stvx or lvx instructions when appropriate.
	* config/rs6000/rs6000-protos.h (rs6000_sum_of_two_registers_p):
	New function prototype.
	(rs6000_quadword_masked_address_p): Likewise.
	(rs6000_gen_lvx): Likewise.
	(rs6000_gen_stvx): Likewise.
	* config/rs6000/vsx.md (*vsx_le_perm_load_<mode>): For modes
	VSX_D (V2DF, V2DI), modify this split to select lvx instruction
	when memory address is aligned.
	(*vsx_le_perm_load_<mode>): For modes VSX_W (V4SF, V4SI), modify
	this split to select lvx instruction when memory address is aligned.
	(*vsx_le_perm_load_v8hi): Modify this split to select lvx
	instruction when memory address is aligned.
	(*vsx_le_perm_load_v16qi): Likewise.
	(four unnamed splitters): Modify to select the stvx instruction
	when memory is aligned.

gcc/testsuite/ChangeLog:

2018-01-10  Kelvin Nilsen  <kelvin@gcc.gnu.org>

	* gcc.target/powerpc/pr48857.c: Modify dejagnu directives to look
	for lvx and stvx instead of lxvd2x and stxvd2x and require
	little-endian target.  Add comments.
	* gcc.target/powerpc/swaps-p8-28.c: Add functions for more
	comprehensive testing.
	* gcc.target/powerpc/swaps-p8-29.c: Likewise.
	* gcc.target/powerpc/swaps-p8-30.c: Likewise.
	* gcc.target/powerpc/swaps-p8-31.c: Likewise.
	* gcc.target/powerpc/swaps-p8-32.c: Likewise.
	* gcc.target/powerpc/swaps-p8-33.c: Likewise.
	* gcc.target/powerpc/swaps-p8-34.c: Likewise.
	* gcc.target/powerpc/swaps-p8-35.c: Likewise.
	* gcc.target/powerpc/swaps-p8-36.c: Likewise.
	* gcc.target/powerpc/swaps-p8-37.c: Likewise.
	* gcc.target/powerpc/swaps-p8-38.c: Likewise.
	* gcc.target/powerpc/swaps-p8-39.c: Likewise.
	* gcc.target/powerpc/swaps-p8-40.c: Likewise.
	* gcc.target/powerpc/swaps-p8-41.c: Likewise.
	* gcc.target/powerpc/swaps-p8-42.c: Likewise.
	* gcc.target/powerpc/swaps-p8-43.c: Likewise.
	* gcc.target/powerpc/swaps-p8-44.c: Likewise.
	* gcc.target/powerpc/swaps-p8-45.c: Likewise.
	* gcc.target/powerpc/vec-extract-2.c: Add comment and remove
	scan-assembler-not directives that forbid lvx and xxpermdi.
	* gcc.target/powerpc/vec-extract-3.c: Likewise.
	* gcc.target/powerpc/vec-extract-5.c: Likewise.
	* gcc.target/powerpc/vec-extract-6.c: Likewise.
	* gcc.target/powerpc/vec-extract-7.c: Likewise.
	* gcc.target/powerpc/vec-extract-8.c: Likewise.
	* gcc.target/powerpc/vec-extract-9.c: Likewise.
	* gcc.target/powerpc/vsx-vector-6-le.c: Change
	scan-assembler-times directives to reflect different numbers of
	expected xxlnor, xxlor, xvcmpgtdp, and xxland instructions.

libcpp/ChangeLog:

2018-01-10  Kelvin Nilsen  <kelvin@gcc.gnu.org>

	* lex.c (search_line_fast): Remove illegal coercion of an
	unaligned pointer value to vector pointer type and replace with
	use of __builtin_vec_vsx_ld () built-in function, which operates
	on unaligned pointer values.

From-SVN: r256656
2018-01-14 05:19:29 +00:00
Ian Lance Taylor
ffad1c54d2 go/types: implement SizesFor for gccgo
Move the architecture-specific settings out of configure.ac into a new
    shell script goarch.sh.  Use the new script to collect the values for
    all architectures to make them available in go/types.
    
    Also fix cmd/vet to pass the right compiler when it calls SizesFor.
    
    This fixes cmd/vet for systems that are not implemented in the gc
    toolchain, such as alpha and ia64.
    
    Reviewed-on: https://go-review.googlesource.com/87635

From-SVN: r256655
2018-01-14 04:59:01 +00:00
Tim Shen
8532713fc4 re PR libstdc++/83601 (std::regex_replace C++14 conformance issue: escaping in SED mode)
PR libstdc++/83601
	* include/bits/regex.tcc (regex_replace): Fix escaping in sed.
	* testsuite/28_regex/algorithms/regex_replace/char/pr83601.cc: Tests.
	* testsuite/28_regex/algorithms/regex_replace/wchar_t/pr83601.cc: Tests.

From-SVN: r256654
2018-01-14 00:48:30 +00:00
GCC Administrator
8bc5a5c57c Daily bump.
From-SVN: r256653
2018-01-14 00:16:15 +00:00
Rainer Orth
1f7273e5db Allow for lack of VM_MEMORY_OS_ALLOC_ONCE on Mac OS X (PR sanitizer/82824)
PR sanitizer/82824
	* lsan/lsan_common_mac.cc: Cherry-pick upstream r322437.

From-SVN: r256650
2018-01-13 21:01:27 +00:00
Jerry DeLisle
f208c5ccc7 re PR fortran/82007 (DTIO write format stored in a string leads to severe errors)
2018-01-13  Jerry DeLisle  <jvdelisle@gcc.gnu.org>

        PR fortran/82007
        * resolve.c (resolve_transfer): Delete code looking for 'DT'
        format specifiers in format strings. Set formatted to true if a
        format string or format label is present.
        * trans-io.c (get_dtio_proc): Likewise. (transfer_expr): Fix
        whitespace.

From-SVN: r256649
2018-01-13 20:41:00 +00:00
Jan Hubicka
f36180f4a4 predict.c (determine_unlikely_bbs): Handle correctly BBs which appears in the queue multiple times.
* predict.c (determine_unlikely_bbs): Handle correctly BBs
	which appears in the queue multiple times.

From-SVN: r256648
2018-01-13 19:32:04 +00:00
Thomas Koenig
39f309aca6 re PR fortran/83744 (ICE in ../../gcc/gcc/fortran/dump-parse-tree.c:3093 while using -fc-prototypes)
2018-01-13  Thomas Koenig <tkoenig@gcc.gnu.org>

	PR fortran/83744
	* dump-parse-tree.c (get_c_type_name): Remove extra line.
	Change for loop to use declaration in for loop. Handle BT_LOGICAL
	and BT_CHARACTER.
	(write_decl): Add where argument. Fix indentation. Replace
	assert with error message. Add typename to warning
	in comment.
	(write_type): Adjust locus to call of write_decl.
	(write_variable): Likewise.
	(write_proc): Likewise. Replace assert with error message.

From-SVN: r256645
2018-01-13 18:22:36 +00:00
Richard Sandiford
a57776a113 Support for aliasing with variable strides
This patch adds runtime alias checks for loops with variable strides,
so that we can vectorise them even without a restrict qualifier.
There are several parts to doing this:

1) For accesses like:

     x[i * n] += 1;

   we need to check whether n (and thus the DR_STEP) is nonzero.
   vect_analyze_data_ref_dependence records values that need to be
   checked in this way, then prune_runtime_alias_test_list records a
   bounds check on DR_STEP being outside the range [0, 0].

2) For accesses like:

     x[i * n] = x[i * n + 1] + 1;

   we simply need to test whether abs (n) >= 2.
   prune_runtime_alias_test_list looks for cases like this and tries
   to guess whether it is better to use this kind of check or a check
   for non-overlapping ranges.  (We could do an OR of the two conditions
   at runtime, but that isn't implemented yet.)

3) Checks for overlapping ranges need to cope with variable strides.
   At present the "length" of each segment in a range check is
   represented as an offset from the base that lies outside the
   touched range, in the same direction as DR_STEP.  The length
   can therefore be negative and is sometimes conservative.

   With variable steps it's easier to reaon about if we split
   this into two:

     seg_len:
       distance travelled from the first iteration of interest
       to the last, e.g. DR_STEP * (VF - 1)

     access_size:
       the number of bytes accessed in each iteration

   with access_size always being a positive constant and seg_len
   possibly being variable.  We can then combine alias checks
   for two accesses that are a constant number of bytes apart by
   adjusting the access size to account for the gap.  This leaves
   the segment length unchanged, which allows the check to be combined
   with further accesses.

   When seg_len is positive, the runtime alias check has the form:

        base_a >= base_b + seg_len_b + access_size_b
     || base_b >= base_a + seg_len_a + access_size_a

   In many accesses the base will be aligned to the access size, which
   allows us to skip the addition:

        base_a > base_b + seg_len_b
     || base_b > base_a + seg_len_a

   A similar saving is possible with "negative" lengths.

   The patch therefore tracks the alignment in addition to seg_len
   and access_size.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vectorizer.h (vec_lower_bound): New structure.
	(_loop_vec_info): Add check_nonzero and lower_bounds.
	(LOOP_VINFO_CHECK_NONZERO): New macro.
	(LOOP_VINFO_LOWER_BOUNDS): Likewise.
	(LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too.
	* tree-data-ref.h (dr_with_seg_len): Add access_size and align
	fields.  Make seg_len the distance travelled, not including the
	access size.
	(dr_direction_indicator): Declare.
	(dr_zero_step_indicator): Likewise.
	(dr_known_forward_stride_p): Likewise.
	* tree-data-ref.c: Include stringpool.h, tree-vrp.h and
	tree-ssanames.h.
	(runtime_alias_check_p): Allow runtime alias checks with
	variable strides.
	(operator ==): Compare access_size and align.
	(prune_runtime_alias_test_list): Rework for new distinction between
	the access_size and seg_len.
	(create_intersect_range_checks_index): Likewise.  Cope with polynomial
	segment lengths.
	(get_segment_min_max): New function.
	(create_intersect_range_checks): Use it.
	(dr_step_indicator): New function.
	(dr_direction_indicator): Likewise.
	(dr_zero_step_indicator): Likewise.
	(dr_known_forward_stride_p): Likewise.
	* tree-loop-distribution.c (data_ref_segment_size): Return
	DR_STEP * (niters - 1).
	(compute_alias_check_pairs): Update call to the dr_with_seg_len
	constructor.
	* tree-vect-data-refs.c (vect_check_nonzero_value): New function.
	(vect_preserves_scalar_order_p): New function, split out from...
	(vect_analyze_data_ref_dependence): ...here.  Check for zero steps.
	(vect_vfa_segment_size): Return DR_STEP * (length_factor - 1).
	(vect_vfa_access_size): New function.
	(vect_vfa_align): Likewise.
	(vect_compile_time_alias): Take access_size_a and access_b arguments.
	(dump_lower_bound): New function.
	(vect_check_lower_bound): Likewise.
	(vect_small_gap_p): Likewise.
	(vectorizable_with_step_bound_p): Likewise.
	(vect_prune_runtime_alias_test_list): Ignore cross-iteration
	depencies if the vectorization factor is 1.  Convert the checks
	for nonzero steps into checks on the bounds of DR_STEP.  Try using
	a bunds check for variable steps if the minimum required step is
	relatively small. Update calls to the dr_with_seg_len
	constructor and to vect_compile_time_alias.
	* tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New
	function.
	(vect_loop_versioning): Call it.
	* tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS
	when retrying.
	(vect_estimate_min_profitable_iters): Account for any bounds checks.

gcc/testsuite/
	* gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather
	than SLP vectorization.
	* gcc.dg/vect/vect-alias-check-10.c: New test.
	* gcc.dg/vect/vect-alias-check-11.c: Likewise.
	* gcc.dg/vect/vect-alias-check-12.c: Likewise.
	* gcc.dg/vect/vect-alias-check-8.c: Likewise.
	* gcc.dg/vect/vect-alias-check-9.c: Likewise.
	* gcc.target/aarch64/sve/strided_load_8.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_1.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_1.h: Likewise.
	* gcc.target/aarch64/sve/var_stride_1_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_2_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_3.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_3_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_4.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_4_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_5.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_5_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_6.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_6_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_7.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_7_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_8.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_8_run.c: Likewise.
	* gfortran.dg/vect/vect-alias-check-1.F90: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256644
2018-01-13 18:02:10 +00:00
Richard Sandiford
f307441ac4 Add support for SVE scatter stores
This is mostly a mechanical extension of the previous gather load
support to scatter stores.  The internal functions in this case are:

  IFN_SCATTER_STORE (base, offsets, scale, values)
  IFN_MASK_SCATTER_STORE (base, offsets, scale, values, mask)

However, one nonobvious change is to vect_analyze_data_ref_access.
If we're treating an access as a gather load or scatter store
(i.e. if STMT_VINFO_GATHER_SCATTER_P is true), the existing code
would create a dummy data_reference whose step is 0.  There's not
really much else it could do, since the whole point is that the
step isn't predictable from iteration to iteration.  We then
went into this code in vect_analyze_data_ref_access:

  /* Allow loads with zero step in inner-loop vectorization.  */
  if (loop_vinfo && integer_zerop (step))
    {
      GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
      if (!nested_in_vect_loop_p (loop, stmt))
	return DR_IS_READ (dr);

I.e. we'd take the step literally and assume that this is a load
or store to an invariant address.  Loads from invariant addresses
are supported but stores to them aren't.

The code therefore had the effect of disabling all scatter stores.
AFAICT this is true of AVX too: although tests like avx512f-scatter-1.c
test for the correctness of a scatter-like loop, they don't seem to
check whether a scatter instruction is actually used.

The patch therefore makes vect_analyze_data_ref_access return true
for scatters.  We do seem to handle the aliasing correctly;
that's tested by other functions, and is symmetrical to the
already-working gather case.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/sourcebuild.texi (vect_scatter_store): Document.
	* optabs.def (scatter_store_optab, mask_scatter_store_optab): New
	optabs.
	* doc/md.texi (scatter_store@var{m}, mask_scatter_store@var{m}):
	Document.
	* genopinit.c (main): Add supports_vec_scatter_store and
	supports_vec_scatter_store_cached to target_optabs.
	* gimple.h (gimple_expr_type): Handle IFN_SCATTER_STORE and
	IFN_MASK_SCATTER_STORE.
	* internal-fn.def (SCATTER_STORE, MASK_SCATTER_STORE): New internal
	functions.
	* internal-fn.h (internal_store_fn_p): Declare.
	(internal_fn_stored_value_index): Likewise.
	* internal-fn.c (scatter_store_direct): New macro.
	(expand_scatter_store_optab_fn): New function.
	(direct_scatter_store_optab_supported_p): New macro.
	(internal_store_fn_p): New function.
	(internal_gather_scatter_fn_p): Handle IFN_SCATTER_STORE and
	IFN_MASK_SCATTER_STORE.
	(internal_fn_mask_index): Likewise.
	(internal_fn_stored_value_index): New function.
	(internal_gather_scatter_fn_supported_p): Adjust operand numbers
	for scatter stores.
	* optabs-query.h (supports_vec_scatter_store_p): Declare.
	* optabs-query.c (supports_vec_scatter_store_p): New function.
	* tree-vectorizer.h (vect_get_store_rhs): Declare.
	* tree-vect-data-refs.c (vect_analyze_data_ref_access): Return
	true for scatter stores.
	(vect_gather_scatter_fn_p): Handle scatter stores too.
	(vect_check_gather_scatter): Consider using scatter stores if
	supports_vec_scatter_store_p.
	* tree-vect-patterns.c (vect_try_gather_scatter_pattern): Handle
	scatter stores too.
	* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
	internal_fn_stored_value_index.
	(check_load_store_masking): Handle scatter stores too.
	(vect_get_store_rhs): Make public.
	(vectorizable_call): Use internal_store_fn_p.
	(vectorizable_store): Handle scatter store internal functions.
	(vect_transform_stmt): Compare GROUP_STORE_COUNT with GROUP_SIZE
	when deciding whether the end of the group has been reached.
	* config/aarch64/aarch64.md (UNSPEC_ST1_SCATTER): New unspec.
	* config/aarch64/aarch64-sve.md (scatter_store<mode>): New expander.
	(mask_scatter_store<mode>): New insns.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_scatter_store):
	New proc.
	* gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on
	targets with scatter stores.
	* gcc.dg/vect/vect-71.c: Restrict XFAIL to targets without scatter
	stores.
	* gcc.target/aarch64/sve/mask_scatter_store_1.c: New test.
	* gcc.target/aarch64/sve/mask_scatter_store_2.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_1.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_2.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_3.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_4.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_5.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_6.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_7.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_1.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_2.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_3.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_4.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_5.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_6.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_7.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256643
2018-01-13 18:01:59 +00:00
Richard Sandiford
429ef523f7 Allow gather loads to be used for grouped accesses
Following on from the previous patch for strided accesses, this patch
allows gather loads to be used with grouped accesses, if we otherwise
would need to fall back to VMAT_ELEMENTWISE.  However, as the comment
says, this is restricted to single-element groups for now:

	 ??? Although the code can handle all group sizes correctly,
	 it probably isn't a win to use separate strided accesses based
	 on nearby locations.  Or, even if it's a win over scalar code,
	 it might not be a win over vectorizing at a lower VF, if that
	 allows us to use contiguous accesses.

Single-element groups are an important special case though,
and this means that code is less sensitive to GCC's classification
of single accesses with constant steps as "grouped" and ones with
variable steps as "strided".

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vectorizer.h (vect_gather_scatter_fn_p): Declare.
	* tree-vect-data-refs.c (vect_gather_scatter_fn_p): Make public.
	* tree-vect-stmts.c (vect_truncate_gather_scatter_offset): New
	function.
	(vect_use_strided_gather_scatters_p): Take a masked_p argument.
	Use vect_truncate_gather_scatter_offset if we can't treat the
	operation as a normal gather load or scatter store.
	(get_group_load_store_type): Take the gather_scatter_info
	as argument.  Try using a gather load or scatter store for
	single-element groups.
	(get_load_store_type): Update calls to get_group_load_store_type
	and vect_use_strided_gather_scatters_p.

gcc/testsuite/
	* gcc.target/aarch64/sve/reduc_strict_3.c: Expect FADDA to be used
	for double_reduc1.
	* gcc.target/aarch64/sve/strided_load_4.c: New test.
	* gcc.target/aarch64/sve/strided_load_5.c: Likewise.
	* gcc.target/aarch64/sve/strided_load_6.c: Likewise.
	* gcc.target/aarch64/sve/strided_load_7.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256642
2018-01-13 18:01:49 +00:00
Richard Sandiford
ab2fc78250 Use gather loads for strided accesses
This patch tries to use gather loads for strided accesses,
rather than falling back to VMAT_ELEMENTWISE.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vectorizer.h (vect_create_data_ref_ptr): Take an extra
	optional tree argument.
	* tree-vect-data-refs.c (vect_check_gather_scatter): Check for
	null target hooks.
	(vect_create_data_ref_ptr): Take the iv_step as an optional argument,
	but continue to use the current value as a fallback.
	(bump_vector_ptr): Use operand_equal_p rather than tree_int_cst_compare
	to compare the updates.
	* tree-vect-stmts.c (vect_use_strided_gather_scatters_p): New function.
	(get_load_store_type): Use it when handling a strided access.
	(vect_get_strided_load_store_ops): New function.
	(vect_get_data_ptr_increment): Likewise.
	(vectorizable_load): Handle strided gather loads.  Always pass
	a step to vect_create_data_ref_ptr and bump_vector_ptr.

gcc/testsuite/
	* gcc.target/aarch64/sve/strided_load_1.c: New test.
	* gcc.target/aarch64/sve/strided_load_2.c: Likewise.
	* gcc.target/aarch64/sve/strided_load_3.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256641
2018-01-13 18:01:42 +00:00
Richard Sandiford
bfaa08b7ba Add support for SVE gather loads
This patch adds support for SVE gather loads.  It uses the basically
the same analysis code as the AVX gather support, but after that
there are two major differences:

- It uses new internal functions rather than target built-ins.
  The interface is:

     IFN_GATHER_LOAD (base, offsets scale)
     IFN_MASK_GATHER_LOAD (base, offsets scale, mask)

  which should be reasonably generic.  One of the advantages of
  using internal functions is that other passes can understand what
  the functions do, but a more immediate advantage is that we can
  query the underlying target pattern to see which scales it supports.

- It uses pattern recognition to convert the offset to the right width,
  if it was originally narrower than that.  This avoids having to do
  a widening operation as part of the gather expansion itself.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/md.texi (gather_load@var{m}): Document.
	(mask_gather_load@var{m}): Likewise.
	* genopinit.c (main): Add supports_vec_gather_load and
	supports_vec_gather_load_cached to target_optabs.
	* optabs-tree.c (init_tree_optimization_optabs): Use
	ggc_cleared_alloc to allocate target_optabs.
	* optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs.
	* internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal
	functions.
	* internal-fn.h (internal_load_fn_p): Declare.
	(internal_gather_scatter_fn_p): Likewise.
	(internal_fn_mask_index): Likewise.
	(internal_gather_scatter_fn_supported_p): Likewise.
	* internal-fn.c (gather_load_direct): New macro.
	(expand_gather_load_optab_fn): New function.
	(direct_gather_load_optab_supported_p): New macro.
	(direct_internal_fn_optab): New function.
	(internal_load_fn_p): Likewise.
	(internal_gather_scatter_fn_p): Likewise.
	(internal_fn_mask_index): Likewise.
	(internal_gather_scatter_fn_supported_p): Likewise.
	* optabs-query.c (supports_at_least_one_mode_p): New function.
	(supports_vec_gather_load_p): Likewise.
	* optabs-query.h (supports_vec_gather_load_p): Declare.
	* tree-vectorizer.h (gather_scatter_info): Add ifn, element_type
	and memory_type field.
	(NUM_PATTERNS): Bump to 15.
	* tree-vect-data-refs.c: Include internal-fn.h.
	(vect_gather_scatter_fn_p): New function.
	(vect_describe_gather_scatter_call): Likewise.
	(vect_check_gather_scatter): Try using internal functions for
	gather loads.  Recognize existing calls to a gather load function.
	(vect_analyze_data_refs): Consider using gather loads if
	supports_vec_gather_load_p.
	* tree-vect-patterns.c (vect_get_load_store_mask): New function.
	(vect_get_gather_scatter_offset_type): Likewise.
	(vect_convert_mask_for_vectype): Likewise.
	(vect_add_conversion_to_patterm): Likewise.
	(vect_try_gather_scatter_pattern): Likewise.
	(vect_recog_gather_scatter_pattern): New pattern recognizer.
	(vect_vect_recog_func_ptrs): Add it.
	* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
	internal_fn_mask_index and internal_gather_scatter_fn_p.
	(check_load_store_masking): Take the gather_scatter_info as an
	argument and handle gather loads.
	(vect_get_gather_scatter_ops): New function.
	(vectorizable_call): Check internal_load_fn_p.
	(vectorizable_load): Likewise.  Handle gather load internal
	functions.
	(vectorizable_store): Update call to check_load_store_masking.
	* config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec.
	* config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators.
	* config/aarch64/predicates.md (aarch64_gather_scale_operand_w)
	(aarch64_gather_scale_operand_d): New predicates.
	* config/aarch64/aarch64-sve.md (gather_load<mode>): New expander.
	(mask_gather_load<mode>): New insns.

gcc/testsuite/
	* gcc.target/aarch64/sve/gather_load_1.c: New test.
	* gcc.target/aarch64/sve/gather_load_2.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_3.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_4.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_5.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_6.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_7.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256640
2018-01-13 18:01:34 +00:00
Richard Sandiford
b781a135a0 Add support for in-order addition reduction using SVE FADDA
This patch adds support for in-order floating-point addition reductions,
which are suitable even in strict IEEE mode.

Previously vect_is_simple_reduction would reject any cases that forbid
reassociation.  The idea is instead to tentatively accept them as
"FOLD_LEFT_REDUCTIONs" and only fail later if there is no support
for them.  Although this patch only handles the particular case of plus
and minus on floating-point types, there's no reason in principle why
we couldn't handle other cases.

The reductions use a new fold_left_plus_optab if available, otherwise
they fall back to elementwise additions or subtractions.

The vect_force_simple_reduction change makes it easier for parloops
to read the type of reduction.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* optabs.def (fold_left_plus_optab): New optab.
	* doc/md.texi (fold_left_plus_@var{m}): Document.
	* internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function.
	* internal-fn.c (fold_left_direct): Define.
	(expand_fold_left_optab_fn): Likewise.
	(direct_fold_left_optab_supported_p): Likewise.
	* fold-const-call.c (fold_const_fold_left): New function.
	(fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS.
	* tree-parloops.c (valid_reduction_p): New function.
	(gather_scalar_reductions): Use it.
	* tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type.
	(vect_finish_replace_stmt): Declare.
	* tree-vect-loop.c (fold_left_reduction_fn): New function.
	(needs_fold_left_reduction_p): New function, split out from...
	(vect_is_simple_reduction): ...here.  Accept reductions that
	forbid reassociation, but give them type FOLD_LEFT_REDUCTION.
	(vect_force_simple_reduction): Also store the reduction type in
	the assignment's STMT_VINFO_REDUC_TYPE.
	(vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION.
	(merge_with_identity): New function.
	(vect_expand_fold_left): Likewise.
	(vectorize_fold_left_reduction): Likewise.
	(vectorizable_reduction): Handle FOLD_LEFT_REDUCTION.  Leave the
	scalar phi in place for it.  Check for target support and reject
	cases that would reassociate the operation.  Defer the transform
	phase to vectorize_fold_left_reduction.
	* config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec.
	* config/aarch64/aarch64-sve.md (fold_left_plus_<mode>): New expander.
	(*fold_left_plus_<mode>, *pred_fold_left_plus_<mode>): New insns.

gcc/testsuite/
	* gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and
	check for a message about using in-order reductions.
	* gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and
	check for a message about using in-order reductions.
	* gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be
	vectorized and check for a message about using in-order reductions.
	Expect targets with variable-length vectors to fall back to the
	fixed-length mininum.
	* gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and
	check for a message about using in-order reductions.
	* gcc.dg/vect/vect-reduc-in-order-1.c: New test.
	* gcc.dg/vect/vect-reduc-in-order-2.c: Likewise.
	* gcc.dg/vect/vect-reduc-in-order-3.c: Likewise.
	* gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.
	* gcc.target/aarch64/sve/reduc_strict_1.c: New test.
	* gcc.target/aarch64/sve/reduc_strict_1_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_strict_2.c: Likewise.
	* gcc.target/aarch64/sve/reduc_strict_2_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_strict_3.c: Likewise.
	* gcc.target/aarch64/sve/slp_13.c: Add floating-point types.
	* gfortran.dg/vect/vect-8.f90: Expect 22 loops to be vectorized if
	vect_fold_left_plus.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256639
2018-01-13 18:01:24 +00:00
Richard Sandiford
b89fa419ca Remove unnecessary temporary in tree-if-conv.c
The call to ifc_temp_var in predicate_mem_writes become redundant
in r230099.  Before that point the mask was calculated using
fold_build_*s, but now it's calculated by gimple_build and so
is already a valid gimple value.

As it stands, the call forces an SSA_NAME-to-SSA_NAME copy
to be created, whereas SLP expects that such redundant copies
have already been eliminated.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* tree-if-conv.c (predicate_mem_writes): Remove redundant
	call to ifc_temp_var.

From-SVN: r256638
2018-01-13 18:01:14 +00:00
Richard Sandiford
9005477f25 Rework the legitimize_address_displacement hook
This patch:

- tweaks the handling of legitimize_address_displacement
  so that it gets called before rather than after the address has
  been expanded.  This means that we're no longer at the mercy
  of LRA being able to interpret the expanded instructions.

- passes the original offset to legitimize_address_displacement.

- adds SVE support to the AArch64 implementation of
  legitimize_address_displacement.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* target.def (legitimize_address_displacement): Take the original
	offset as a poly_int.
	* targhooks.h (default_legitimize_address_displacement): Update
	accordingly.
	* targhooks.c (default_legitimize_address_displacement): Likewise.
	* doc/tm.texi: Regenerate.
	* lra-constraints.c (base_plus_disp_to_reg): Take the displacement
	as an argument, moving assert of ad->disp == ad->disp_term to...
	(process_address_1): ...here.  Update calls to base_plus_disp_to_reg.
	Try calling targetm.legitimize_address_displacement before expanding
	the address rather than afterwards, and adjust for the new interface.
	* config/aarch64/aarch64.c (aarch64_legitimize_address_displacement):
	Match the new hook interface.  Handle SVE addresses.
	* config/sh/sh.c (sh_legitimize_address_displacement): Make the
	new hook interface.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256637
2018-01-13 18:00:59 +00:00
Richard Sandiford
5cce817119 Add an "early rematerialisation" pass
This patch looks for pseudo registers that are live across a call
and for which no call-preserved hard registers exist.  It then
recomputes the pseudos as necessary to ensure that they are no
longer live across a call.  The comment at the head of the file
describes the approach.

A new target hook selects which modes should be treated in this way.
By default none are, in which case the pass is skipped very early.

It might also be worth looking for cases like:

   C1: R1 := f (...)
   ...
   C2: R2 := f (...)
   C3: R1 := C2

and giving the same value number to C1 and C3, effectively treating
it like:

   C1: R1 := f (...)
   ...
   C2: R2 := f (...)
   C3: R1 := f (...)

Another (much more expensive) enhancement would be to apply value
numbering to all pseudo registers (not just rematerialisation
candidates), so that we can handle things like:

  C1: R1 := f (...R2...)
  ...
  C2: R1 := f (...R3...)

where R2 and R3 hold the same value.  But the current pass seems
to catch the vast majority of cases.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* Makefile.in (OBJS): Add early-remat.o.
	* target.def (select_early_remat_modes): New hook.
	* doc/tm.texi.in (TARGET_SELECT_EARLY_REMAT_MODES): New hook.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_select_early_remat_modes): Declare.
	* targhooks.c (default_select_early_remat_modes): New function.
	* timevar.def (TV_EARLY_REMAT): New timevar.
	* passes.def (pass_early_remat): New pass.
	* tree-pass.h (make_pass_early_remat): Declare.
	* early-remat.c: New file.
	* config/aarch64/aarch64.c (aarch64_select_early_remat_modes): New
	function.
	(TARGET_SELECT_EARLY_REMAT_MODES): Define.

gcc/testsuite/
	* gcc.target/aarch64/sve/spill_1.c: Also test that no predicates
	are spilled.
	* gcc.target/aarch64/sve/spill_2.c: New test.
	* gcc.target/aarch64/sve/spill_3.c: Likewise.
	* gcc.target/aarch64/sve/spill_4.c: Likewise.
	* gcc.target/aarch64/sve/spill_5.c: Likewise.
	* gcc.target/aarch64/sve/spill_6.c: Likewise.
	* gcc.target/aarch64/sve/spill_7.c: Likewise.

From-SVN: r256636
2018-01-13 18:00:51 +00:00
Richard Sandiford
d1d20a49a7 Use single-iteration epilogues when peeling for gaps
This patch adds support for fully-masking loops that require peeling
for gaps.  It peels exactly one scalar iteration and uses the masked
loop to handle the rest.  Previously we would fall back on using a
standard unmasked loop instead.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Replace
	vfm1 with a bound_epilog parameter.
	(vect_do_peeling): Update calls accordingly, and move the prologue
	call earlier in the function.  Treat the base bound_epilog as 0 for
	fully-masked loops and retain vf - 1 for other loops.  Add 1 to
	this base when peeling for gaps.
	* tree-vect-loop.c (vect_analyze_loop_2): Allow peeling for gaps
	with fully-masked loops.
	(vect_estimate_min_profitable_iters): Handle the single peeled
	iteration in that case.

gcc/testsuite/
	* gcc.target/aarch64/sve/struct_vect_18.c: Check the number
	of branches.
	* gcc.target/aarch64/sve/struct_vect_19.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_20.c: New test.
	* gcc.target/aarch64/sve/struct_vect_20_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_21.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_21_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_22.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_22_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_23.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_23_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256635
2018-01-13 18:00:41 +00:00
Richard Sandiford
4aa157e8d2 Allow single-element interleaving for non-power-of-2 strides
This allows LD3 to be used for isolated a[i * 3] accesses, in a similar
way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively.
Given the problems with the cost model underestimating the cost of
elementwise accesses, the patch continues to reject the VMAT_ELEMENTWISE
cases that are currently rejected.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-data-refs.c (vect_analyze_group_access_1): Allow
	single-element interleaving even if the size is not a power of 2.
	* tree-vect-stmts.c (get_load_store_type): Disallow elementwise
	accesses for single-element interleaving if the group size is
	not a power of 2.

gcc/testsuite/
	* gcc.target/aarch64/sve/struct_vect_18.c: New test.
	* gcc.target/aarch64/sve/struct_vect_18_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_19.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_19_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256634
2018-01-13 18:00:31 +00:00
Richard Sandiford
bb6c2b68d6 Add support for conditional reductions using SVE CLASTB
This patch uses SVE CLASTB to optimise conditional reductions.  It means
that we no longer need to maintain a separate index vector to record
the most recent valid value, and no longer need to worry about overflow
cases.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/md.texi (fold_extract_last_@var{m}): Document.
	* doc/sourcebuild.texi (vect_fold_extract_last): Likewise.
	* optabs.def (fold_extract_last_optab): New optab.
	* internal-fn.def (FOLD_EXTRACT_LAST): New internal function.
	* internal-fn.c (fold_extract_direct): New macro.
	(expand_fold_extract_optab_fn): Likewise.
	(direct_fold_extract_optab_supported_p): Likewise.
	* tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type.
	* tree-vect-loop.c (vect_model_reduction_cost): Handle
	EXTRACT_LAST_REDUCTION.
	(get_initial_def_for_reduction): Do not create an initial vector
	for EXTRACT_LAST_REDUCTION reductions.
	(vectorizable_reduction): Leave the scalar phi in place for
	EXTRACT_LAST_REDUCTIONs.  Try using EXTRACT_LAST_REDUCTION
	ahead of INTEGER_INDUC_COND_REDUCTION.  Do not check for an
	epilogue code for EXTRACT_LAST_REDUCTION and defer the
	transform phase to vectorizable_condition.
	* tree-vect-stmts.c (vect_finish_stmt_generation_1): New function,
	split out from...
	(vect_finish_stmt_generation): ...here.
	(vect_finish_replace_stmt): New function.
	(vectorizable_condition): Handle EXTRACT_LAST_REDUCTION.
	* config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New
	pattern.
	* config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec.

gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_vect_fold_extract_last): New proc.
	* gcc.dg/vect/pr65947-1.c: Update dump messages.  Add markup
	for fold_extract_last.
	* gcc.dg/vect/pr65947-2.c: Likewise.
	* gcc.dg/vect/pr65947-3.c: Likewise.
	* gcc.dg/vect/pr65947-4.c: Likewise.
	* gcc.dg/vect/pr65947-5.c: Likewise.
	* gcc.dg/vect/pr65947-6.c: Likewise.
	* gcc.dg/vect/pr65947-9.c: Likewise.
	* gcc.dg/vect/pr65947-10.c: Likewise.
	* gcc.dg/vect/pr65947-12.c: Likewise.
	* gcc.dg/vect/pr65947-14.c: Likewise.
	* gcc.dg/vect/pr80631-1.c: Likewise.
	* gcc.target/aarch64/sve/clastb_1.c: New test.
	* gcc.target/aarch64/sve/clastb_1_run.c: Likewise.
	* gcc.target/aarch64/sve/clastb_2.c: Likewise.
	* gcc.target/aarch64/sve/clastb_2_run.c: Likewise.
	* gcc.target/aarch64/sve/clastb_3.c: Likewise.
	* gcc.target/aarch64/sve/clastb_3_run.c: Likewise.
	* gcc.target/aarch64/sve/clastb_4.c: Likewise.
	* gcc.target/aarch64/sve/clastb_4_run.c: Likewise.
	* gcc.target/aarch64/sve/clastb_5.c: Likewise.
	* gcc.target/aarch64/sve/clastb_5_run.c: Likewise.
	* gcc.target/aarch64/sve/clastb_6.c: Likewise.
	* gcc.target/aarch64/sve/clastb_6_run.c: Likewise.
	* gcc.target/aarch64/sve/clastb_7.c: Likewise.
	* gcc.target/aarch64/sve/clastb_7_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256633
2018-01-13 17:59:59 +00:00
Richard Sandiford
bfe1bb57ba Add support for vectorising live-out values using SVE LASTB
This patch uses the SVE LASTB instruction to optimise cases in which
a value produced by the final scalar iteration of a vectorised loop is
live outside the loop.  Previously this situation would stop us from
using a fully-masked loop.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/md.texi (extract_last_@var{m}): Document.
	* optabs.def (extract_last_optab): New optab.
	* internal-fn.def (EXTRACT_LAST): New internal function.
	* internal-fn.c (cond_unary_direct): New macro.
	(expand_cond_unary_optab_fn): Likewise.
	(direct_cond_unary_optab_supported_p): Likewise.
	* tree-vect-loop.c (vectorizable_live_operation): Allow fully-masked
	loops using EXTRACT_LAST.
	* config/aarch64/aarch64-sve.md (aarch64_sve_lastb<mode>): Rename to...
	(extract_last_<mode>): ...this optab.
	(vec_extract<mode><Vel>): Update accordingly.

gcc/testsuite/
	* gcc.target/aarch64/sve/live_1.c: New test.
	* gcc.target/aarch64/sve/live_1_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256632
2018-01-13 17:59:50 +00:00
Richard Sandiford
76a34e3f85 Add an empty_mask_is_expensive hook
This patch adds a hook to control whether we avoid executing masked
(predicated) stores when the mask is all false.  We don't want to do
that by default for SVE.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* target.def (empty_mask_is_expensive): New hook.
	* doc/tm.texi.in (TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): New hook.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_empty_mask_is_expensive): Declare.
	* targhooks.c (default_empty_mask_is_expensive): New function.
	* tree-vectorizer.c (vectorize_loops): Only call optimize_mask_stores
	if the target says that empty masks are expensive.
	* config/aarch64/aarch64.c (aarch64_empty_mask_is_expensive):
	New function.
	(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Redefine.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256631
2018-01-13 17:59:40 +00:00
Richard Sandiford
535e7c114a Handle peeling for alignment with masking
This patch adds support for aligning vectors by using a partial
first iteration.  E.g. if the start pointer is 3 elements beyond
an aligned address, the first iteration will have a mask in which
the first three elements are false.

On SVE, the optimisation is only useful for vector-length-specific
code.  Vector-length-agnostic code doesn't try to align vectors
since the vector length might not be a power of 2.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field.
	(LOOP_VINFO_MASK_SKIP_NITERS): New macro.
	(vect_use_loop_mask_for_alignment_p): New function.
	(vect_prepare_for_masked_peels, vect_gen_while_not): Declare.
	* tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an
	niters_skip argument.  Make sure that the first niters_skip elements
	of the first iteration are inactive.
	(vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS.
	Update call to vect_set_loop_masks_directly.
	(get_misalign_in_elems): New function, split out from...
	(vect_gen_prolog_loop_niters): ...here.
	(vect_update_init_of_dr): Take a code argument that specifies whether
	the adjustment should be added or subtracted.
	(vect_update_init_of_drs): Likewise.
	(vect_prepare_for_masked_peels): New function.
	(vect_do_peeling): Skip prologue peeling if we're using a mask
	instead.  Update call to vect_update_inits_of_drs.
	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
	mask_skip_niters.
	(vect_analyze_loop_2): Allow fully-masked loops with peeling for
	alignment.  Do not include the number of peeled iterations in
	the minimum threshold in that case.
	(vectorizable_induction): Adjust the start value down by
	LOOP_VINFO_MASK_SKIP_NITERS iterations.
	(vect_transform_loop): Call vect_prepare_for_masked_peels.
	Take the number of skipped iterations into account when calculating
	the loop bounds.
	* tree-vect-stmts.c (vect_gen_while_not): New function.

gcc/testsuite/
	* gcc.target/aarch64/sve/nopeel_1.c: New test.
	* gcc.target/aarch64/sve/peel_ind_1.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_1_run.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_3.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_4.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_4_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256630
2018-01-13 17:59:32 +00:00
Richard Sandiford
c2700f7466 Allow the number of iterations to be smaller than VF
Fully-masked loops can be profitable even if the iteration
count is smaller than the vectorisation factor.  In this case
we're effectively doing a complete unroll followed by SLP.

The documentation for min-vect-loop-bound says that the
default value was 0, but actually the default and minimum
were 1.  We need it to be 0 for this case since the parameter
counts a whole number of vector iterations.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/sourcebuild.texi (vect_fully_masked): Document.
	* params.def (PARAM_MIN_VECT_LOOP_BOUND): Change minimum and
	default value to 0.
	* tree-vect-loop.c (vect_analyze_loop_costing): New function,
	split out from...
	(vect_analyze_loop_2): ...here. Don't check the vectorization
	factor against the number of loop iterations if the loop is
	fully-masked.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_fully_masked):
	New proc.
	* gcc.dg/vect/slp-3.c: Expect all loops to be vectorized if
	vect_fully_masked.
	* gcc.target/aarch64/sve/loop_add_4.c: New test.
	* gcc.target/aarch64/sve/loop_add_4_run.c: Likewise.
	* gcc.target/aarch64/sve/loop_add_5.c: Likewise.
	* gcc.target/aarch64/sve/loop_add_5_run.c: Likewise.
	* gcc.target/aarch64/sve/miniloop_1.c: Likewise.
	* gcc.target/aarch64/sve/miniloop_2.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256629
2018-01-13 17:59:23 +00:00
Richard Sandiford
8277ddf9ee Make ivopts handle calls to internal functions
ivopts previously treated pointer arguments to internal functions
like IFN_MASK_LOAD and IFN_MASK_STORE as normal gimple values.
This patch makes it treat them as addresses instead.  This makes
a significant difference to the code quality for SVE loops,
since we can then use loads and stores with scaled indices.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-ssa-loop-ivopts.c (USE_ADDRESS): Split into...
	(USE_REF_ADDRESS, USE_PTR_ADDRESS): ...these new use types.
	(dump_groups): Update accordingly.
	(iv_use::mem_type): New member variable.
	(address_p): New function.
	(record_use): Add a mem_type argument and initialize the new
	mem_type field.
	(record_group_use): Add a mem_type argument.  Use address_p.
	Remove obsolete null checks of base_object.  Update call to record_use.
	(find_interesting_uses_op): Update call to record_group_use.
	(find_interesting_uses_cond): Likewise.
	(find_interesting_uses_address): Likewise.
	(get_mem_type_for_internal_fn): New function.
	(find_address_like_use): Likewise.
	(find_interesting_uses_stmt): Try find_address_like_use before
	calling find_interesting_uses_op.
	(addr_offset_valid_p): Use the iv mem_type field as the type
	of the addressed memory.
	(add_autoinc_candidates): Likewise.
	(get_address_cost): Likewise.
	(split_small_address_groups_p): Use address_p.
	(split_address_groups): Likewise.
	(add_iv_candidate_for_use): Likewise.
	(autoinc_possible_for_pair): Likewise.
	(rewrite_groups): Likewise.
	(get_use_type): Check for USE_REF_ADDRESS instead of USE_ADDRESS.
	(determine_group_iv_cost): Update after split of USE_ADDRESS.
	(get_alias_ptr_type_for_ptr_address): New function.
	(rewrite_use_address): Rewrite address uses in calls that were
	identified by find_address_like_use.

gcc/testsuite/
	* gcc.dg/tree-ssa/scev-9.c: Expected REFERENCE ADDRESS
	instead of just ADDRESS.
	* gcc.dg/tree-ssa/scev-10.c: Likewise.
	* gcc.dg/tree-ssa/scev-11.c: Likewise.
	* gcc.dg/tree-ssa/scev-12.c: Likewise.
	* gcc.target/aarch64/sve/index_offset_1.c: New test.
	* gcc.target/aarch64/sve/index_offset_1_run.c: Likewise.
	* gcc.target/aarch64/sve/loop_add_2.c: Likewise.
	* gcc.target/aarch64/sve/loop_add_3.c: Likewise.
	* gcc.target/aarch64/sve/while_1.c: Check for indexed addressing modes.
	* gcc.target/aarch64/sve/while_2.c: Likewise.
	* gcc.target/aarch64/sve/while_3.c: Likewise.
	* gcc.target/aarch64/sve/while_4.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256628
2018-01-13 17:59:15 +00:00
Richard Sandiford
65dd134602 Allow ADDR_EXPRs of TARGET_MEM_REFs
This patch allows ADDR_EXPR <TARGET_MEM_REF ...>, which is useful
when calling internal functions that take pointers to memory that
is conditionally loaded or stored.  This is a prerequisite to the
following ivopts patch.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* expr.c (expand_expr_addr_expr_1): Handle ADDR_EXPRs of
	TARGET_MEM_REFs.
	* gimple-expr.h (is_gimple_addressable: Likewise.
	* gimple-expr.c (is_gimple_address): Likewise.
	* internal-fn.c (expand_call_mem_ref): New function.
	(expand_mask_load_optab_fn): Use it.
	(expand_mask_store_optab_fn): Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256627
2018-01-13 17:59:08 +00:00
Richard Sandiford
0972596e6d Add support for reductions in fully-masked loops
This patch removes the restriction that fully-masked loops cannot
have reductions.  The key thing here is to make sure that the
reduction accumulator doesn't include any values associated with
inactive lanes; the patch adds a bunch of conditional binary
operations for doing that.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/md.texi (cond_add@var{mode}, cond_sub@var{mode})
	(cond_and@var{mode}, cond_ior@var{mode}, cond_xor@var{mode})
	(cond_smin@var{mode}, cond_smax@var{mode}, cond_umin@var{mode})
	(cond_umax@var{mode}): Document.
	* optabs.def (cond_add_optab, cond_sub_optab, cond_and_optab)
	(cond_ior_optab, cond_xor_optab, cond_smin_optab, cond_smax_optab)
	(cond_umin_optab, cond_umax_optab): New optabs.
	* internal-fn.def (COND_ADD, COND_SUB, COND_MIN, COND_MAX, COND_AND)
	(COND_IOR, COND_XOR): New internal functions.
	* internal-fn.h (get_conditional_internal_fn): Declare.
	* internal-fn.c (cond_binary_direct): New macro.
	(expand_cond_binary_optab_fn): Likewise.
	(direct_cond_binary_optab_supported_p): Likewise.
	(get_conditional_internal_fn): New function.
	* tree-vect-loop.c (vectorizable_reduction): Handle fully-masked loops.
	Cope with reduction statements that are vectorized as calls rather
	than assignments.
	* config/aarch64/aarch64-sve.md (cond_<optab><mode>): New insns.
	* config/aarch64/iterators.md (UNSPEC_COND_ADD, UNSPEC_COND_SUB)
	(UNSPEC_COND_SMAX, UNSPEC_COND_UMAX, UNSPEC_COND_SMIN)
	(UNSPEC_COND_UMIN, UNSPEC_COND_AND, UNSPEC_COND_ORR)
	(UNSPEC_COND_EOR): New unspecs.
	(optab): Add mappings for them.
	(SVE_COND_INT_OP, SVE_COND_FP_OP): New int iterators.
	(sve_int_op, sve_fp_op): New int attributes.

gcc/testsuite/
	* gcc.dg/vect/pr60482.c: Remove XFAIL for variable-length vectors.
	* gcc.target/aarch64/sve/reduc_1.c: Expect the loop operations
	to be predicated.
	* gcc.target/aarch64/sve/slp_5.c: Check for a fully-masked loop.
	* gcc.target/aarch64/sve/slp_7.c: Likewise.
	* gcc.target/aarch64/sve/reduc_5.c: New test.
	* gcc.target/aarch64/sve/slp_13.c: Likewise.
	* gcc.target/aarch64/sve/slp_13_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256626
2018-01-13 17:59:00 +00:00
Richard Sandiford
7cfb4d9359 Add support for fully-predicated loops
This patch adds support for using a single fully-predicated loop instead
of a vector loop and a scalar tail.  An SVE WHILELO instruction generates
the predicate for each iteration of the loop, given the current scalar
iv value and the loop bound.  This operation is wrapped up in a new internal
function called WHILE_ULT.  E.g.:

   WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 }
   WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 }

The third WHILE_ULT argument is needed to make the operation
unambiguous: without it, WHILE_ULT (0, 3) for one vector type would
seem equivalent to WHILE_ULT (0, 3) for another, even if the types have
different numbers of elements.

Note that the patch uses "mask" and "fully-masked" instead of
"predicate" and "fully-predicated", to follow existing GCC terminology.

This patch just handles the simple cases, punting for things like
reductions and live-out values.  Later patches remove most of these
restrictions.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* optabs.def (while_ult_optab): New optab.
	* doc/md.texi (while_ult@var{m}@var{n}): Document.
	* internal-fn.def (WHILE_ULT): New internal function.
	* internal-fn.h (direct_internal_fn_supported_p): New override
	that takes two types as argument.
	* internal-fn.c (while_direct): New macro.
	(expand_while_optab_fn): New function.
	(convert_optab_supported_p): Likewise.
	(direct_while_optab_supported_p): New macro.
	* wide-int.h (wi::udiv_ceil): New function.
	* tree-vectorizer.h (rgroup_masks): New structure.
	(vec_loop_masks): New typedef.
	(_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p
	and fully_masked_p.
	(LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P)
	(LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros.
	(vect_max_vf): New function.
	(slpeel_make_loop_iterate_ntimes): Delete.
	(vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while)
	(vect_halve_mask_nunits, vect_double_mask_nunits): Declare.
	(vect_record_loop_mask, vect_get_loop_mask): Likewise.
	* tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h,
	internal-fn.h, stor-layout.h and optabs-query.h.
	(vect_set_loop_mask): New function.
	(add_preheader_seq): Likewise.
	(add_header_seq): Likewise.
	(interleave_supported_p): Likewise.
	(vect_maybe_permute_loop_masks): Likewise.
	(vect_set_loop_masks_directly): Likewise.
	(vect_set_loop_condition_masked): Likewise.
	(vect_set_loop_condition_unmasked): New function, split out from
	slpeel_make_loop_iterate_ntimes.
	(slpeel_make_loop_iterate_ntimes): Rename to..
	(vect_set_loop_condition): ...this.  Use vect_set_loop_condition_masked
	for fully-masked loops and vect_set_loop_condition_unmasked otherwise.
	(vect_do_peeling): Update call accordingly.
	(vect_gen_vector_loop_niters): Use VF as the step for fully-masked
	loops.
	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
	mask_compare_type, can_fully_mask_p and fully_masked_p.
	(release_vec_loop_masks): New function.
	(_loop_vec_info): Use it to free the loop masks.
	(can_produce_all_loop_masks_p): New function.
	(vect_get_max_nscalars_per_iter): Likewise.
	(vect_verify_full_masking): Likewise.
	(vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around
	retries, and free the mask rgroups before retrying.  Check loop-wide
	reasons for disallowing fully-masked loops.  Make the final decision
	about whether use a fully-masked loop or not.
	(vect_estimate_min_profitable_iters): Do not assume that peeling
	for the number of iterations will be needed for fully-masked loops.
	(vectorizable_reduction): Disable fully-masked loops.
	(vectorizable_live_operation): Likewise.
	(vect_halve_mask_nunits): New function.
	(vect_double_mask_nunits): Likewise.
	(vect_record_loop_mask): Likewise.
	(vect_get_loop_mask): Likewise.
	(vect_transform_loop): Handle the case in which the final loop
	iteration might handle a partial vector.  Call vect_set_loop_condition
	instead of slpeel_make_loop_iterate_ntimes.
	* tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h.
	(check_load_store_masking): New function.
	(prepare_load_store_mask): Likewise.
	(vectorizable_store): Handle fully-masked loops.
	(vectorizable_load): Likewise.
	(supportable_widening_operation): Use vect_halve_mask_nunits for
	booleans.
	(supportable_narrowing_operation): Likewise vect_double_mask_nunits.
	(vect_gen_while): New function.
	* config/aarch64/aarch64.md (umax<mode>3): New expander.
	(aarch64_uqdec<mode>): New insn.

gcc/testsuite/
	* gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization.
	* gcc.dg/tree-ssa/peel1.c: Likewise.
	* gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for
	variable-length vectors.
	* gcc.target/aarch64/sve/vcond_6.c: XFAIL test for AND.
	* gcc.target/aarch64/sve/vec_bool_cmp_1.c: Expect BIC instead of NOT.
	* gcc.target/aarch64/sve/slp_1.c: Check for a fully-masked loop.
	* gcc.target/aarch64/sve/slp_2.c: Likewise.
	* gcc.target/aarch64/sve/slp_3.c: Likewise.
	* gcc.target/aarch64/sve/slp_4.c: Likewise.
	* gcc.target/aarch64/sve/slp_6.c: Likewise.
	* gcc.target/aarch64/sve/slp_8.c: New test.
	* gcc.target/aarch64/sve/slp_8_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_9.c: Likewise.
	* gcc.target/aarch64/sve/slp_9_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_10.c: Likewise.
	* gcc.target/aarch64/sve/slp_10_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_11.c: Likewise.
	* gcc.target/aarch64/sve/slp_11_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_12.c: Likewise.
	* gcc.target/aarch64/sve/slp_12_run.c: Likewise.
	* gcc.target/aarch64/sve/ld1r_2.c: Likewise.
	* gcc.target/aarch64/sve/ld1r_2_run.c: Likewise.
	* gcc.target/aarch64/sve/while_1.c: Likewise.
	* gcc.target/aarch64/sve/while_2.c: Likewise.
	* gcc.target/aarch64/sve/while_3.c: Likewise.
	* gcc.target/aarch64/sve/while_4.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256625
2018-01-13 17:58:52 +00:00
Richard Sandiford
898f07b045 Add support for bitwise reductions
This patch adds support for the SVE bitwise reduction instructions
(ANDV, ORV and EORV).  It's a fairly mechanical extension of existing
REDUC_* operators.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)
	(reduc_xor_scal_optab): New optabs.
	* doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})
	(reduc_xor_scal_@var{m}): Document.
	* doc/sourcebuild.texi (vect_logical_reduc): Likewise.
	* internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New
	internal functions.
	* fold-const-call.c (fold_const_call): Handle them.
	* tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new
	internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.
	* config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>):
	(*reduc_<bit_reduc>_scal_<mode>): New patterns.
	* config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)
	(UNSPEC_XORV): New unspecs.
	(optab): Add entries for them.
	(BITWISEV): New int iterator.
	(bit_reduc_op): New int attributes.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_logical_reduc):
	New proc.
	* gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc
	and add an associated scan-dump test.  Prevent vectorization
	of the first two loops.
	* gcc.dg/vect/vect-reduc-or_2.c: Likewise.
	* gcc.target/aarch64/sve/reduc_1.c: Add AND, IOR and XOR reductions.
	* gcc.target/aarch64/sve/reduc_2.c: Likewise.
	* gcc.target/aarch64/sve/reduc_1_run.c: Likewise.
	(INIT_VECTOR): Tweak initial value so that some bits are always set.
	* gcc.target/aarch64/sve/reduc_2_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256624
2018-01-13 17:58:42 +00:00
Richard Sandiford
f1739b4829 SLP reductions with variable-length vectors
Two things stopped us using SLP reductions with variable-length vectors:

(1) We didn't have a way of constructing the initial vector.
    This patch does it by creating a vector full of the neutral
    identity value and then using a shift-and-insert function
    to insert any non-identity inputs into the low-numbered elements.
    (The non-identity values are needed for double reductions.)
    Alternatively, for unchained MIN/MAX reductions that have no neutral
    value, we instead use the same duplicate-and-interleave approach as
    for SLP constant and external definitions (added by a previous
    patch).

(2) The epilogue for constant-length vectors would extract the vector
    elements associated with each SLP statement and do scalar arithmetic
    on these individual elements.  For variable-length vectors, the patch
    instead creates a reduction vector for each SLP statement, replacing
    the elements for other SLP statements with the identity value.
    It then uses a hardware reduction instruction on each vector.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/md.texi (vec_shl_insert_@var{m}): New optab.
	* internal-fn.def (VEC_SHL_INSERT): New internal function.
	* optabs.def (vec_shl_insert_optab): New optab.
	* tree-vectorizer.h (can_duplicate_and_interleave_p): Declare.
	(duplicate_and_interleave): Likewise.
	* tree-vect-loop.c: Include internal-fn.h.
	(neutral_op_for_slp_reduction): New function, split out from
	get_initial_defs_for_reduction.
	(get_initial_def_for_reduction): Handle option 2 for variable-length
	vectors by loading the neutral value into a vector and then shifting
	the initial value into element 0.
	(get_initial_defs_for_reduction): Replace the code argument with
	the neutral value calculated by neutral_op_for_slp_reduction.
	Use gimple_build_vector for constant-length vectors.
	Use IFN_VEC_SHL_INSERT for variable-length vectors if all
	but the first group_size elements have a neutral value.
	Use duplicate_and_interleave otherwise.
	(vect_create_epilog_for_reduction): Take a neutral_op parameter.
	Update call to get_initial_defs_for_reduction.  Handle SLP
	reductions for variable-length vectors by creating one vector
	result for each scalar result, with the elements associated
	with other scalar results stubbed out with the neutral value.
	(vectorizable_reduction): Call neutral_op_for_slp_reduction.
	Require IFN_VEC_SHL_INSERT for double reductions on
	variable-length vectors, or SLP reductions that have
	a neutral value.  Require can_duplicate_and_interleave_p
	support for variable-length unchained SLP reductions if there
	is no neutral value, such as for MIN/MAX reductions.  Also require
	the number of vector elements to be a multiple of the number of
	SLP statements when doing variable-length unchained SLP reductions.
	Update call to vect_create_epilog_for_reduction.
	* tree-vect-slp.c (can_duplicate_and_interleave_p): Make public
	and remove initial values.
	(duplicate_and_interleave): Make public.
	* config/aarch64/aarch64.md (UNSPEC_INSR): New unspec.
	* config/aarch64/aarch64-sve.md (vec_shl_insert_<mode>): New insn.

gcc/testsuite/
	* gcc.dg/vect/pr37027.c: Remove XFAIL for variable-length vectors.
	* gcc.dg/vect/pr67790.c: Likewise.
	* gcc.dg/vect/slp-reduc-1.c: Likewise.
	* gcc.dg/vect/slp-reduc-2.c: Likewise.
	* gcc.dg/vect/slp-reduc-3.c: Likewise.
	* gcc.dg/vect/slp-reduc-5.c: Likewise.
	* gcc.target/aarch64/sve/slp_5.c: New test.
	* gcc.target/aarch64/sve/slp_5_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_6.c: Likewise.
	* gcc.target/aarch64/sve/slp_6_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_7.c: Likewise.
	* gcc.target/aarch64/sve/slp_7_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256623
2018-01-13 17:58:33 +00:00
Richard Sandiford
018b2744fc Handle more SLP constant and extern definitions for variable VF
This patch adds support for vectorising SLP definitions that are
constant or external (i.e. from outside the loop) when the vectorisation
factor isn't known at compile time.  It can only handle cases where the
number of SLP statements is a power of 2.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-slp.c: Include gimple-fold.h and internal-fn.h
	(can_duplicate_and_interleave_p): New function.
	(vect_get_and_check_slp_defs): Take the vector of statements
	rather than just the current one.  Remove excess parentheses.
	Restriction rejectinon of vect_constant_def and vect_external_def
	for variable-length vectors to boolean types, or types for which
	can_duplicate_and_interleave_p is false.
	(vect_build_slp_tree_2): Update call to vect_get_and_check_slp_defs.
	(duplicate_and_interleave): New function.
	(vect_get_constant_vectors): Use gimple_build_vector for
	constant-length vectors and suitable variable-length constant
	vectors.  Use duplicate_and_interleave for other variable-length
	vectors.  Don't defer the update when inserting new statements.

gcc/testsuite/
	* gcc.dg/vect/no-scevccp-slp-30.c: Don't XFAIL for vect_variable_length
	&& vect_load_lanes
	* gcc.dg/vect/slp-1.c: Likewise.
	* gcc.dg/vect/slp-10.c: Likewise.
	* gcc.dg/vect/slp-12b.c: Likewise.
	* gcc.dg/vect/slp-12c.c: Likewise.
	* gcc.dg/vect/slp-17.c: Likewise.
	* gcc.dg/vect/slp-19b.c: Likewise.
	* gcc.dg/vect/slp-20.c: Likewise.
	* gcc.dg/vect/slp-21.c: Likewise.
	* gcc.dg/vect/slp-22.c: Likewise.
	* gcc.dg/vect/slp-23.c: Likewise.
	* gcc.dg/vect/slp-24-big-array.c: Likewise.
	* gcc.dg/vect/slp-24.c: Likewise.
	* gcc.dg/vect/slp-28.c: Likewise.
	* gcc.dg/vect/slp-39.c: Likewise.
	* gcc.dg/vect/slp-6.c: Likewise.
	* gcc.dg/vect/slp-7.c: Likewise.
	* gcc.dg/vect/slp-cond-1.c: Likewise.
	* gcc.dg/vect/slp-cond-2-big-array.c: Likewise.
	* gcc.dg/vect/slp-cond-2.c: Likewise.
	* gcc.dg/vect/slp-multitypes-1.c: Likewise.
	* gcc.dg/vect/slp-multitypes-8.c: Likewise.
	* gcc.dg/vect/slp-multitypes-9.c: Likewise.
	* gcc.dg/vect/slp-multitypes-10.c: Likewise.
	* gcc.dg/vect/slp-multitypes-12.c: Likewise.
	* gcc.dg/vect/slp-perm-6.c: Likewise.
	* gcc.dg/vect/slp-widen-mult-half.c: Likewise.
	* gcc.dg/vect/vect-live-slp-1.c: Likewise.
	* gcc.dg/vect/vect-live-slp-2.c: Likewise.
	* gcc.dg/vect/pr33953.c: Don't XFAIL for vect_variable_length.
	* gcc.dg/vect/slp-12a.c: Likewise.
	* gcc.dg/vect/slp-14.c: Likewise.
	* gcc.dg/vect/slp-15.c: Likewise.
	* gcc.dg/vect/slp-multitypes-2.c: Likewise.
	* gcc.dg/vect/slp-multitypes-4.c: Likewise.
	* gcc.dg/vect/slp-multitypes-5.c: Likewise.
	* gcc.target/aarch64/sve/slp_1.c: New test.
	* gcc.target/aarch64/sve/slp_1_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_2.c: Likewise.
	* gcc.target/aarch64/sve/slp_2_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_3.c: Likewise.
	* gcc.target/aarch64/sve/slp_3_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_4.c: Likewise.
	* gcc.target/aarch64/sve/slp_4_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256622
2018-01-13 17:58:14 +00:00
Richard Sandiford
3ea518f6f6 Protect against min_profitable_iters going negative
We had:

      if (vec_outside_cost <= 0)
        min_profitable_iters = 0;
      else
        {
	  min_profitable_iters = ((vec_outside_cost - scalar_outside_cost)
				  * assumed_vf
				  - vec_inside_cost * peel_iters_prologue
				  - vec_inside_cost * peel_iters_epilogue)
				 / ((scalar_single_iter_cost * assumed_vf)
				    - vec_inside_cost);

which can lead to negative min_profitable_iters when the *_outside_costs
are the same and peel_iters_epilogue is nonzero (e.g. if we're peeling
for gaps).

This is tested as part of the patch that adds support for fully-predicated
loops.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-loop.c (vect_estimate_min_profitable_iters): Make sure
	min_profitable_iters doesn't go negative.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256621
2018-01-13 17:58:06 +00:00
Richard Sandiford
7e11fc7f5c Add support for masked load/store_lanes
This patch adds support for vectorising groups of IFN_MASK_LOADs
and IFN_MASK_STOREs using conditional load/store-lanes instructions.
This requires new internal functions to represent the result
(IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs.

The normal IFN_{LOAD,STORE}_LANES functions are const operations
that logically just perform the permute: the load or store is
encoded as a MEM operand to the call statement.  In contrast,
the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of
interface as IFN_MASK_{LOAD,STORE}, since the memory is only
conditionally accessed.

The AArch64 patterns were added as part of the main LD[234]/ST[234] patch.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document.
	(vec_mask_store_lanes@var{m}@var{n}): Likewise.
	* optabs.def (vec_mask_load_lanes_optab): New optab.
	(vec_mask_store_lanes_optab): Likewise.
	* internal-fn.def (MASK_LOAD_LANES): New internal function.
	(MASK_STORE_LANES): Likewise.
	* internal-fn.c (mask_load_lanes_direct): New macro.
	(mask_store_lanes_direct): Likewise.
	(expand_mask_load_optab_fn): Handle masked operations.
	(expand_mask_load_lanes_optab_fn): New macro.
	(expand_mask_store_optab_fn): Handle masked operations.
	(expand_mask_store_lanes_optab_fn): New macro.
	(direct_mask_load_lanes_optab_supported_p): Likewise.
	(direct_mask_store_lanes_optab_supported_p): Likewise.
	* tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p
	parameter.
	(vect_load_lanes_supported): Likewise.
	* tree-vect-data-refs.c (strip_conversion): New function.
	(can_group_stmts_p): Likewise.
	(vect_analyze_data_ref_accesses): Use it instead of checking
	for a pair of assignments.
	(vect_store_lanes_supported): Take a masked_p parameter.
	(vect_load_lanes_supported): Likewise.
	* tree-vect-loop.c (vect_analyze_loop_2): Update calls to
	vect_store_lanes_supported and vect_load_lanes_supported.
	* tree-vect-slp.c (vect_analyze_slp_instance): Likewise.
	* tree-vect-stmts.c (get_group_load_store_type): Take a masked_p
	parameter.  Don't allow gaps for masked accesses.
	Use vect_get_store_rhs.  Update calls to vect_store_lanes_supported
	and vect_load_lanes_supported.
	(get_load_store_type): Take a masked_p parameter and update
	call to get_group_load_store_type.
	(vectorizable_store): Update call to get_load_store_type.
	Handle IFN_MASK_STORE_LANES.
	(vectorizable_load): Update call to get_load_store_type.
	Handle IFN_MASK_LOAD_LANES.

gcc/testsuite/
	* gcc.dg/vect/vect-ooo-group-1.c: New test.
	* gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_1_run.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_2_run.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_3_run.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_5.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_6.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_7.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_8.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256620
2018-01-13 17:57:57 +00:00
Richard Sandiford
abc8eb9a45 [AArch64] Tests for SVE structure modes
This patch adds tests for the SVE structure mode move patterns
and for LD[234] and ST[234] vectorisation.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/testsuite/
	* gcc.target/aarch64/sve/struct_move_1.c: New test.
	* gcc.target/aarch64/sve/struct_move_2.c: Likewise.
	* gcc.target/aarch64/sve/struct_move_3.c: Likewise.
	* gcc.target/aarch64/sve/struct_move_4.c: Likewise.
	* gcc.target/aarch64/sve/struct_move_5.c: Likewise.
	* gcc.target/aarch64/sve/struct_move_6.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_1.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_1_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_2.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_2_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_3.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_3_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_4.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_4_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_5.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_5_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_6.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_6_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_7.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_7_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_8.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_8_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_9.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_9_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_10.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_10_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_11.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_11_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_12.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_12_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_13.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_13_run.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_14.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_15.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_16.c: Likewise.
	* gcc.target/aarch64/sve/struct_vect_17.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256619
2018-01-13 17:57:47 +00:00
Richard Sandiford
9f4cbab84d [AArch64] SVE load/store_lanes support
This patch adds support for SVE LD[234], ST[234] and associated
structure modes.  Unlike Advanced SIMD, these modes are extra-long
vector modes instead of integer modes.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* config/aarch64/aarch64-modes.def: Define x2, x3 and x4 vector
	modes for SVE.
	* config/aarch64/aarch64-protos.h
	(aarch64_sve_struct_memory_operand_p): Declare.
	* config/aarch64/iterators.md (SVE_STRUCT): New mode iterator.
	(vector_count, insn_length, VSINGLE, vsingle): New mode attributes.
	(VPRED, vpred): Handle SVE structure modes.
	* config/aarch64/constraints.md (Utx): New constraint.
	* config/aarch64/predicates.md (aarch64_sve_struct_memory_operand)
	(aarch64_sve_struct_nonimmediate_operand): New predicates.
	* config/aarch64/aarch64.md (UNSPEC_LDN, UNSPEC_STN): New unspecs.
	* config/aarch64/aarch64-sve.md (mov<mode>, *aarch64_sve_mov<mode>_le)
	(*aarch64_sve_mov<mode>_be, pred_mov<mode>): New patterns for
	structure modes.  Split into pieces after RA.
	(vec_load_lanes<mode><vsingle>, vec_mask_load_lanes<mode><vsingle>)
	(vec_store_lanes<mode><vsingle>, vec_mask_store_lanes<mode><vsingle>):
	New patterns.
	* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle
	SVE structure modes.
	(aarch64_classify_address): Likewise.
	(sizetochar): Move earlier in file.
	(aarch64_print_operand): Handle SVE register lists.
	(aarch64_array_mode): New function.
	(aarch64_sve_struct_memory_operand_p): Likewise.
	(TARGET_ARRAY_MODE): Redefine.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_load_lanes):
	Return true for SVE too.
	* g++.dg/vect/pr36648.cc: XFAIL for variable-length vectors
	if load/store lanes are supported.
	* gcc.dg/vect/slp-10.c: Likewise.
	* gcc.dg/vect/slp-12c.c: Likewise.
	* gcc.dg/vect/slp-17.c: Likewise.
	* gcc.dg/vect/slp-33.c: Likewise.
	* gcc.dg/vect/slp-6.c: Likewise.
	* gcc.dg/vect/slp-cond-1.c: Likewise.
	* gcc.dg/vect/slp-multitypes-11-big-array.c: Likewise.
	* gcc.dg/vect/slp-multitypes-11.c: Likewise.
	* gcc.dg/vect/slp-multitypes-12.c: Likewise.
	* gcc.dg/vect/slp-perm-5.c: Remove XFAIL for variable-length SVE.
	* gcc.dg/vect/slp-perm-6.c: Likewise.
	* gcc.dg/vect/slp-perm-9.c: Likewise.
	* gcc.dg/vect/slp-reduc-6.c: Remove XFAIL for variable-length vectors.
	* gcc.dg/vect/vect-load-lanes-peeling-1.c: Expect an epilogue loop
	for variable-length vectors.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256618
2018-01-13 17:57:36 +00:00
Richard Sandiford
695da53448 Give the target more control over ARRAY_TYPE modes
So far we've used integer modes for LD[234] and ST[234] arrays.
That doesn't scale well to SVE, since the sizes aren't fixed at
compile time (and even if they were, we wouldn't want integers
to be so wide).

This patch lets the target use double-, triple- and quadruple-length
vectors instead.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* target.def (array_mode): New target hook.
	* doc/tm.texi.in (TARGET_ARRAY_MODE): New hook.
	* doc/tm.texi: Regenerate.
	* hooks.h (hook_optmode_mode_uhwi_none): Declare.
	* hooks.c (hook_optmode_mode_uhwi_none): New function.
	* tree-vect-data-refs.c (vect_lanes_optab_supported_p): Use
	targetm.array_mode.
	* stor-layout.c (mode_for_array): Likewise.  Support polynomial
	type sizes.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256617
2018-01-13 17:57:25 +00:00
Richard Sandiford
779fed5fdb Fix folding of vector mask EQ/NE expressions
fold_binary_loc assumed that if the type of the result wasn't a vector,
the operands wouldn't be either.  This isn't necessarily true for
EQ_EXPR and NE_EXPR of vector masks, which can return a single scalar
for the mask as a whole.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* fold-const.c (fold_binary_loc): Check the argument types
	rather than the result type when testing for a vector operation.

gcc/testsuite/
	* gcc.target/aarch64/sve/vec_bool_cmp_1.c: New test.
	* gcc.target/aarch64/sve/vec_bool_cmp_1_run.c: Likweise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256616
2018-01-13 17:57:17 +00:00
Richard Sandiford
dbc3af4fc6 SVE unwinding
This patch adds support for unwinding frames that use the SVE
pseudo VG register.  We want this register to act like a normal
register if the CFI explicitly sets it, but want to provide a
default value otherwise.  Computing the default value requires
an SVE target, so we only want to compute it on demand.

aarch64_vg uses a hard-coded .inst in order to avoid a build
dependency on binutils 2.28 or later.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* doc/tm.texi.in (DWARF_LAZY_REGISTER_VALUE): Document.
	* doc/tm.texi: Regenerate.

libgcc/
	* config/aarch64/value-unwind.h (aarch64_vg): New function.
	(DWARF_LAZY_REGISTER_VALUE): Define.
	* unwind-dw2.c (_Unwind_GetGR): Use DWARF_LAZY_REGISTER_VALUE
	to provide a fallback register value.

gcc/testsuite/
	* g++.target/aarch64/sve/aarch64-sve.exp: New harness.
	* g++.target/aarch64/sve/catch_1.C: New test.
	* g++.target/aarch64/sve/catch_2.C: Likewise.
	* g++.target/aarch64/sve/catch_3.C: Likewise.
	* g++.target/aarch64/sve/catch_4.C: Likewise.
	* g++.target/aarch64/sve/catch_5.C: Likewise.
	* g++.target/aarch64/sve/catch_6.C: Likewise.

Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>

From-SVN: r256615
2018-01-13 17:56:52 +00:00
Richard Sandiford
825b856cd0 [AArch64] SVE tests
This patch adds gcc.target/aarch64 tests for SVE, and forces some
existing Advanced SIMD tests to use -march=armv8-a.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_aarch64_asm_sve_ok):
	New proc.
	* gcc.target/aarch64/bic_imm_1.c: Use #pragma GCC target "+nosve".
	* gcc.target/aarch64/fmaxmin.c: Likewise.
	* gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
	* gcc.target/aarch64/orr_imm_1.c: Likewise.
	* gcc.target/aarch64/pr62178.c: Likewise.
	* gcc.target/aarch64/pr71727-2.c: Likewise.
	* gcc.target/aarch64/saddw-1.c: Likewise.
	* gcc.target/aarch64/saddw-2.c: Likewise.
	* gcc.target/aarch64/uaddw-1.c: Likewise.
	* gcc.target/aarch64/uaddw-2.c: Likewise.
	* gcc.target/aarch64/uaddw-3.c: Likewise.
	* gcc.target/aarch64/vect-add-sub-cond.c: Likewise.
	* gcc.target/aarch64/vect-compile.c: Likewise.
	* gcc.target/aarch64/vect-faddv-compile.c: Likewise.
	* gcc.target/aarch64/vect-fcm-eq-d.c: Likewise.
	* gcc.target/aarch64/vect-fcm-eq-f.c: Likewise.
	* gcc.target/aarch64/vect-fcm-ge-d.c: Likewise.
	* gcc.target/aarch64/vect-fcm-ge-f.c: Likewise.
	* gcc.target/aarch64/vect-fcm-gt-d.c: Likewise.
	* gcc.target/aarch64/vect-fcm-gt-f.c: Likewise.
	* gcc.target/aarch64/vect-fmax-fmin-compile.c: Likewise.
	* gcc.target/aarch64/vect-fmaxv-fminv-compile.c: Likewise.
	* gcc.target/aarch64/vect-fmovd-zero.c: Likewise.
	* gcc.target/aarch64/vect-fmovd.c: Likewise.
	* gcc.target/aarch64/vect-fmovf-zero.c: Likewise.
	* gcc.target/aarch64/vect-fmovf.c: Likewise.
	* gcc.target/aarch64/vect-fp-compile.c: Likewise.
	* gcc.target/aarch64/vect-ld1r-compile-fp.c: Likewise.
	* gcc.target/aarch64/vect-ld1r-compile.c: Likewise.
	* gcc.target/aarch64/vect-movi.c: Likewise.
	* gcc.target/aarch64/vect-mull-compile.c: Likewise.
	* gcc.target/aarch64/vect-reduc-or_1.c: Likewise.
	* gcc.target/aarch64/vect-vaddv.c: Likewise.
	* gcc.target/aarch64/vect_saddl_1.c: Likewise.
	* gcc.target/aarch64/vect_smlal_1.c: Likewise.
	* gcc.target/aarch64/vector_initialization_nostack.c: XFAIL for
	fixed-length SVE.
	* gcc.target/aarch64/sve/aarch64-sve.exp: New file.
	* gcc.target/aarch64/sve/arith_1.c: New test.
	* gcc.target/aarch64/sve/const_pred_1.C: Likewise.
	* gcc.target/aarch64/sve/const_pred_2.C: Likewise.
	* gcc.target/aarch64/sve/const_pred_3.C: Likewise.
	* gcc.target/aarch64/sve/const_pred_4.C: Likewise.
	* gcc.target/aarch64/sve/cvtf_signed_1.c: Likewise.
	* gcc.target/aarch64/sve/cvtf_signed_1_run.c: Likewise.
	* gcc.target/aarch64/sve/cvtf_unsigned_1.c: Likewise.
	* gcc.target/aarch64/sve/cvtf_unsigned_1_run.c: Likewise.
	* gcc.target/aarch64/sve/dup_imm_1.c: Likewise.
	* gcc.target/aarch64/sve/dup_imm_1_run.c: Likewise.
	* gcc.target/aarch64/sve/dup_lane_1.c: Likewise.
	* gcc.target/aarch64/sve/ext_1.c: Likewise.
	* gcc.target/aarch64/sve/ext_2.c: Likewise.
	* gcc.target/aarch64/sve/extract_1.c: Likewise.
	* gcc.target/aarch64/sve/extract_2.c: Likewise.
	* gcc.target/aarch64/sve/extract_3.c: Likewise.
	* gcc.target/aarch64/sve/extract_4.c: Likewise.
	* gcc.target/aarch64/sve/fabs_1.c: Likewise.
	* gcc.target/aarch64/sve/fcvtz_signed_1.c: Likewise.
	* gcc.target/aarch64/sve/fcvtz_signed_1_run.c: Likewise.
	* gcc.target/aarch64/sve/fcvtz_unsigned_1.c: Likewise.
	* gcc.target/aarch64/sve/fcvtz_unsigned_1_run.c: Likewise.
	* gcc.target/aarch64/sve/fdiv_1.c: Likewise.
	* gcc.target/aarch64/sve/fdup_1.c: Likewise.
	* gcc.target/aarch64/sve/fdup_1_run.c: Likewise.
	* gcc.target/aarch64/sve/fmad_1.c: Likewise.
	* gcc.target/aarch64/sve/fmla_1.c: Likewise.
	* gcc.target/aarch64/sve/fmls_1.c: Likewise.
	* gcc.target/aarch64/sve/fmsb_1.c: Likewise.
	* gcc.target/aarch64/sve/fmul_1.c: Likewise.
	* gcc.target/aarch64/sve/fneg_1.c: Likewise.
	* gcc.target/aarch64/sve/fnmad_1.c: Likewise.
	* gcc.target/aarch64/sve/fnmla_1.c: Likewise.
	* gcc.target/aarch64/sve/fnmls_1.c: Likewise.
	* gcc.target/aarch64/sve/fnmsb_1.c: Likewise.
	* gcc.target/aarch64/sve/fp_arith_1.c: Likewise.
	* gcc.target/aarch64/sve/frinta_1.c: Likewise.
	* gcc.target/aarch64/sve/frinti_1.c: Likewise.
	* gcc.target/aarch64/sve/frintm_1.c: Likewise.
	* gcc.target/aarch64/sve/frintp_1.c: Likewise.
	* gcc.target/aarch64/sve/frintx_1.c: Likewise.
	* gcc.target/aarch64/sve/frintz_1.c: Likewise.
	* gcc.target/aarch64/sve/fsqrt_1.c: Likewise.
	* gcc.target/aarch64/sve/fsubr_1.c: Likewise.
	* gcc.target/aarch64/sve/index_1.c: Likewise.
	* gcc.target/aarch64/sve/index_1_run.c: Likewise.
	* gcc.target/aarch64/sve/ld1r_1.c: Likewise.
	* gcc.target/aarch64/sve/load_const_offset_1.c: Likewise.
	* gcc.target/aarch64/sve/load_const_offset_2.c: Likewise.
	* gcc.target/aarch64/sve/load_const_offset_3.c: Likewise.
	* gcc.target/aarch64/sve/load_scalar_offset_1.c: Likewise.
	* gcc.target/aarch64/sve/logical_1.c: Likewise.
	* gcc.target/aarch64/sve/loop_add_1.c: Likewise.
	* gcc.target/aarch64/sve/loop_add_1_run.c: Likewise.
	* gcc.target/aarch64/sve/mad_1.c: Likewise.
	* gcc.target/aarch64/sve/maxmin_1.c: Likewise.
	* gcc.target/aarch64/sve/maxmin_1_run.c: Likewise.
	* gcc.target/aarch64/sve/maxmin_strict_1.c: Likewise.
	* gcc.target/aarch64/sve/maxmin_strict_1_run.c: Likewise.
	* gcc.target/aarch64/sve/mla_1.c: Likewise.
	* gcc.target/aarch64/sve/mls_1.c: Likewise.
	* gcc.target/aarch64/sve/mov_rr_1.c: Likewise.
	* gcc.target/aarch64/sve/msb_1.c: Likewise.
	* gcc.target/aarch64/sve/mul_1.c: Likewise.
	* gcc.target/aarch64/sve/neg_1.c: Likewise.
	* gcc.target/aarch64/sve/nlogical_1.c: Likewise.
	* gcc.target/aarch64/sve/nlogical_1_run.c: Likewise.
	* gcc.target/aarch64/sve/pack_1.c: Likewise.
	* gcc.target/aarch64/sve/pack_1_run.c: Likewise.
	* gcc.target/aarch64/sve/pack_fcvt_signed_1.c: Likewise.
	* gcc.target/aarch64/sve/pack_fcvt_signed_1_run.c: Likewise.
	* gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c: Likewise.
	* gcc.target/aarch64/sve/pack_fcvt_unsigned_1_run.c: Likewise.
	* gcc.target/aarch64/sve/pack_float_1.c: Likewise.
	* gcc.target/aarch64/sve/pack_float_1_run.c: Likewise.
	* gcc.target/aarch64/sve/popcount_1.c: Likewise.
	* gcc.target/aarch64/sve/popcount_1_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_1.c: Likewise.
	* gcc.target/aarch64/sve/reduc_1_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_2.c: Likewise.
	* gcc.target/aarch64/sve/reduc_2_run.c: Likewise.
	* gcc.target/aarch64/sve/reduc_3.c: Likewise.
	* gcc.target/aarch64/sve/rev_1.c: Likewise.
	* gcc.target/aarch64/sve/revb_1.c: Likewise.
	* gcc.target/aarch64/sve/revh_1.c: Likewise.
	* gcc.target/aarch64/sve/revw_1.c: Likewise.
	* gcc.target/aarch64/sve/shift_1.c: Likewise.
	* gcc.target/aarch64/sve/single_1.c: Likewise.
	* gcc.target/aarch64/sve/single_2.c: Likewise.
	* gcc.target/aarch64/sve/single_3.c: Likewise.
	* gcc.target/aarch64/sve/single_4.c: Likewise.
	* gcc.target/aarch64/sve/spill_1.c: Likewise.
	* gcc.target/aarch64/sve/store_scalar_offset_1.c: Likewise.
	* gcc.target/aarch64/sve/subr_1.c: Likewise.
	* gcc.target/aarch64/sve/trn1_1.c: Likewise.
	* gcc.target/aarch64/sve/trn2_1.c: Likewise.
	* gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Likewise.
	* gcc.target/aarch64/sve/unpack_fcvt_signed_1_run.c: Likewise.
	* gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
	* gcc.target/aarch64/sve/unpack_fcvt_unsigned_1_run.c: Likewise.
	* gcc.target/aarch64/sve/unpack_float_1.c: Likewise.
	* gcc.target/aarch64/sve/unpack_float_1_run.c: Likewise.
	* gcc.target/aarch64/sve/unpack_signed_1.c: Likewise.
	* gcc.target/aarch64/sve/unpack_signed_1_run.c: Likewise.
	* gcc.target/aarch64/sve/unpack_unsigned_1.c: Likewise.
	* gcc.target/aarch64/sve/unpack_unsigned_1_run.c: Likewise.
	* gcc.target/aarch64/sve/uzp1_1.c: Likewise.
	* gcc.target/aarch64/sve/uzp1_1_run.c: Likewise.
	* gcc.target/aarch64/sve/uzp2_1.c: Likewise.
	* gcc.target/aarch64/sve/uzp2_1_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_1.C: Likewise.
	* gcc.target/aarch64/sve/vcond_1_run.C: Likewise.
	* gcc.target/aarch64/sve/vcond_2.c: Likewise.
	* gcc.target/aarch64/sve/vcond_2_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_3.c: Likewise.
	* gcc.target/aarch64/sve/vcond_4.c: Likewise.
	* gcc.target/aarch64/sve/vcond_4_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_5.c: Likewise.
	* gcc.target/aarch64/sve/vcond_5_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_6.c: Likewise.
	* gcc.target/aarch64/sve/vcond_6_run.c: Likewise.
	* gcc.target/aarch64/sve/vec_init_1.c: Likewise.
	* gcc.target/aarch64/sve/vec_init_1_run.c: Likewise.
	* gcc.target/aarch64/sve/vec_init_2.c: Likewise.
	* gcc.target/aarch64/sve/vec_perm_1.c: Likewise.
	* gcc.target/aarch64/sve/vec_perm_1_run.c: Likewise.
	* gcc.target/aarch64/sve/vec_perm_1_overrange_run.c: Likewise.
	* gcc.target/aarch64/sve/vec_perm_const_1.c: Likewise.
	* gcc.target/aarch64/sve/vec_perm_const_1_overrun.c: Likewise.
	* gcc.target/aarch64/sve/vec_perm_const_1_run.c: Likewise.
	* gcc.target/aarch64/sve/vec_perm_const_single_1.c: Likewise.
	* gcc.target/aarch64/sve/vec_perm_const_single_1_run.c: Likewise.
	* gcc.target/aarch64/sve/vec_perm_single_1.c: Likewise.
	* gcc.target/aarch64/sve/vec_perm_single_1_run.c: Likewise.
	* gcc.target/aarch64/sve/zip1_1.c: Likewise.
	* gcc.target/aarch64/sve/zip2_1.c: Likewise.

Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256614
2018-01-13 17:55:24 +00:00
Richard Sandiford
801e38459d [AArch64] Testsuite markup for SVE
This patch adds new target selectors for SVE and updates existing
selectors accordingly.  It also XFAILs some tests that don't yet
work for some SVE modes; most of these go away with follow-on
vectorisation enhancements.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_aarch64_sve)
	(aarch64_sve_bits, check_effective_target_aarch64_sve_hw)
	(aarch64_sve_hw_bits, check_effective_target_aarch64_sve256_hw):
	New procedures.
	(check_effective_target_vect_perm): Handle SVE.
	(check_effective_target_vect_perm_byte): Likewise.
	(check_effective_target_vect_perm_short): Likewise.
	(check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise.
	(check_effective_target_vect_widen_mult_qi_to_hi): Likewise.
	(check_effective_target_vect_widen_mult_hi_to_si): Likewise.
	(check_effective_target_vect_element_align_preferred): Likewise.
	(check_effective_target_vect_align_stack_vars): Likewise.
	(check_effective_target_vect_load_lanes): Likewise.
	(check_effective_target_vect_masked_store): Likewise.
	(available_vector_sizes): Use aarch64_sve_bits for SVE.
	* gcc.dg/vect/tree-vect.h (VECTOR_BITS): Define appropriately
	for SVE.
	* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add SVE XFAIL.
	* gcc.dg/vect/bb-slp-pr69907.c: Likewise.
	* gcc.dg/vect/no-vfa-vect-depend-2.c: Likewise.
	* gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
	* gcc.dg/vect/slp-23.c: Likewise.
	* gcc.dg/vect/slp-perm-5.c: Likewise.
	* gcc.dg/vect/slp-perm-6.c: Likewise.
	* gcc.dg/vect/slp-perm-9.c: Likewise.
	* gcc.dg/vect/slp-reduc-3.c: Likewise.
	* gcc.dg/vect/vect-114.c: Likewise.
	* gcc.dg/vect/vect-mult-const-pattern-1.c: Likewise.
	* gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.

Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256613
2018-01-13 17:50:45 +00:00
Richard Sandiford
43cacb12fc [AArch64] Add SVE support
This patch adds support for ARM's Scalable Vector Extension.
The patch just contains the core features that work with the
current vectoriser framework; later patches will add extra
capabilities to both the target-independent code and AArch64 code.
The patch doesn't include:

- support for unwinding frames whose size depends on the vector length
- modelling the effect of __tls_get_addr on the SVE registers

These are handled by later patches instead.

Some notes:

- The copyright years for aarch64-sve.md start at 2009 because some of
  the code is based on aarch64.md, which also starts from then.

- The patch inserts spaces between items in the AArch64 section
  of sourcebuild.texi.  This matches at least the surrounding
  architectures and looks a little nicer in the info output.

- aarch64-sve.md includes a pattern:

    while_ult<GPI:mode><PRED_ALL:mode>

  A later patch adds a matching "while_ult" optab, but the pattern
  is also needed by the predicate vec_duplicate expander.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/invoke.texi (-msve-vector-bits=): Document new option.
	(sve): Document new AArch64 extension.
	* doc/md.texi (w): Extend the description of the AArch64
	constraint to include SVE vectors.
	(Upl, Upa): Document new AArch64 predicate constraints.
	* config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New
	enum.
	* config/aarch64/aarch64.opt (sve_vector_bits): New enum.
	(msve-vector-bits=): New option.
	* config/aarch64/aarch64-option-extensions.def (fp, simd): Disable
	SVE when these are disabled.
	(sve): New extension.
	* config/aarch64/aarch64-modes.def: Define SVE vector and predicate
	modes.  Adjust their number of units based on aarch64_sve_vg.
	(MAX_BITSIZE_MODE_ANY_MODE): Define.
	* config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New
	aarch64_addr_query_type.
	(aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode)
	(aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
	(aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries)
	(aarch64_split_add_offset, aarch64_output_sve_cnt_immediate)
	(aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate)
	(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare.
	(aarch64_simd_imm_zero_p): Delete.
	(aarch64_check_zero_based_sve_index_immediate): Declare.
	(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
	(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
	(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
	(aarch64_sve_float_mul_immediate_p): Likewise.
	(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
	rather than an rtx.
	(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare.
	(aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback.
	(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare.
	(aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float)
	(aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare.
	(aarch64_regmode_natural_size): Likewise.
	* config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro.
	(AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift
	left one place.
	(AARCH64_ISA_SVE, TARGET_SVE): New macros.
	(FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries
	for VG and the SVE predicate registers.
	(V_ALIASES): Add a "z"-prefixed alias.
	(FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1.
	(AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros.
	(PR_REGNUM_P, PR_LO_REGNUM_P): Likewise.
	(PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes.
	(REG_CLASS_NAMES): Add entries for them.
	(REG_CLASS_CONTENTS): Likewise.  Update ALL_REGS to include VG
	and the predicate registers.
	(aarch64_sve_vg): Declare.
	(BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED)
	(SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros.
	(REGMODE_NATURAL_SIZE): Define.
	* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle
	SVE macros.
	* config/aarch64/aarch64.c: Include cfgrtl.h.
	(simd_immediate_info): Add a constructor for series vectors,
	and an associated step field.
	(aarch64_sve_vg): New variable.
	(aarch64_dbx_register_number): Handle VG and the predicate registers.
	(aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete.
	(VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE)
	(VEC_ANY_DATA, VEC_STRUCT): New constants.
	(aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p)
	(aarch64_classify_vector_mode, aarch64_vector_data_mode_p)
	(aarch64_sve_data_mode_p, aarch64_sve_pred_mode)
	(aarch64_get_mask_mode): New functions.
	(aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS
	and FP_LO_REGS.  Handle PR_REGS, PR_LO_REGS and PR_HI_REGS.
	(aarch64_hard_regno_mode_ok): Handle VG.  Also handle the SVE
	predicate modes and predicate registers.  Explicitly restrict
	GPRs to modes of 16 bytes or smaller.  Only allow FP registers
	to store a vector mode if it is recognized by
	aarch64_classify_vector_mode.
	(aarch64_regmode_natural_size): New function.
	(aarch64_hard_regno_caller_save_mode): Return the original mode
	for predicates.
	(aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate)
	(aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl)
	(aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate)
	(aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New
	functions.
	(aarch64_add_offset): Add a temp2 parameter.  Assert that temp1
	does not overlap dest if the function is frame-related.  Handle
	SVE constants.
	(aarch64_split_add_offset): New function.
	(aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass
	them aarch64_add_offset.
	(aarch64_allocate_and_probe_stack_space): Add a temp2 parameter
	and update call to aarch64_sub_sp.
	(aarch64_add_cfa_expression): New function.
	(aarch64_expand_prologue): Pass extra temporary registers to the
	functions above.  Handle the case in which we need to emit new
	DW_CFA_expressions for registers that were originally saved
	relative to the stack pointer, but now have to be expressed
	relative to the frame pointer.
	(aarch64_output_mi_thunk): Pass extra temporary registers to the
	functions above.
	(aarch64_expand_epilogue): Likewise.  Prevent inheritance of
	IP0 and IP1 values for SVE frames.
	(aarch64_expand_vec_series): New function.
	(aarch64_expand_sve_widened_duplicate): Likewise.
	(aarch64_expand_sve_const_vector): Likewise.
	(aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter.
	Handle SVE constants.  Use emit_move_insn to move a force_const_mem
	into the register, rather than emitting a SET directly.
	(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move)
	(aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p)
	(offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p)
	(offset_9bit_signed_scaled_p): New functions.
	(aarch64_replicate_bitmask_imm): New function.
	(aarch64_bitmask_imm): Use it.
	(aarch64_cannot_force_const_mem): Reject expressions involving
	a CONST_POLY_INT.  Update call to aarch64_classify_symbol.
	(aarch64_classify_index): Handle SVE indices, by requiring
	a plain register index with a scale that matches the element size.
	(aarch64_classify_address): Handle SVE addresses.  Assert that
	the mode of the address is VOIDmode or an integer mode.
	Update call to aarch64_classify_symbol.
	(aarch64_classify_symbolic_expression): Update call to
	aarch64_classify_symbol.
	(aarch64_const_vec_all_in_range_p): New function.
	(aarch64_print_vector_float_operand): Likewise.
	(aarch64_print_operand): Handle 'N' and 'C'.  Use "zN" rather than
	"vN" for FP registers with SVE modes.  Handle (const ...) vectors
	and the FP immediates 1.0 and 0.5.
	(aarch64_print_address_internal): Handle SVE addresses.
	(aarch64_print_operand_address): Use ADDR_QUERY_ANY.
	(aarch64_regno_regclass): Handle predicate registers.
	(aarch64_secondary_reload): Handle big-endian reloads of SVE
	data modes.
	(aarch64_class_max_nregs): Handle SVE modes and predicate registers.
	(aarch64_rtx_costs): Check for ADDVL and ADDPL instructions.
	(aarch64_convert_sve_vector_bits): New function.
	(aarch64_override_options): Use it to handle -msve-vector-bits=.
	(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
	rather than an rtx.
	(aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode.
	Handle SVE vector and predicate modes.  Accept VL-based constants
	that need only one temporary register, and VL offsets that require
	no temporary registers.
	(aarch64_conditional_register_usage): Mark the predicate registers
	as fixed if SVE isn't available.
	(aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode.
	Return true for SVE vector and predicate modes.
	(aarch64_simd_container_mode): Take the number of bits as a poly_int64
	rather than an unsigned int.  Handle SVE modes.
	(aarch64_preferred_simd_mode): Update call accordingly.  Handle
	SVE modes.
	(aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR
	if SVE is enabled.
	(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
	(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
	(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
	(aarch64_sve_float_mul_immediate_p): New functions.
	(aarch64_sve_valid_immediate): New function.
	(aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors.
	Explicitly reject structure modes.  Check for INDEX constants.
	Handle PTRUE and PFALSE constants.
	(aarch64_check_zero_based_sve_index_immediate): New function.
	(aarch64_simd_imm_zero_p): Delete.
	(aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for
	vector modes.  Accept constants in the range of CNT[BHWD].
	(aarch64_simd_scalar_immediate_valid_for_move): Explicitly
	ask for an Advanced SIMD mode.
	(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions.
	(aarch64_simd_vector_alignment): Handle SVE predicates.
	(aarch64_vectorize_preferred_vector_alignment): New function.
	(aarch64_simd_vector_alignment_reachable): Use it instead of
	the vector size.
	(aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p.
	(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New
	functions.
	(MAX_VECT_LEN): Delete.
	(expand_vec_perm_d): Add a vec_flags field.
	(emit_unspec2, aarch64_expand_sve_vec_perm): New functions.
	(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip)
	(aarch64_evpc_ext): Don't apply a big-endian lane correction
	for SVE modes.
	(aarch64_evpc_rev): Rename to...
	(aarch64_evpc_rev_local): ...this.  Use a predicated operation for SVE.
	(aarch64_evpc_rev_global): New function.
	(aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP.
	(aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of
	MAX_VECT_LEN.
	(aarch64_evpc_sve_tbl): New function.
	(aarch64_expand_vec_perm_const_1): Update after rename of
	aarch64_evpc_rev.  Handle SVE permutes too, trying
	aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather
	than aarch64_evpc_tbl.
	(aarch64_vectorize_vec_perm_const): Initialize vec_flags.
	(aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code)
	(aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int)
	(aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or)
	(aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float)
	(aarch64_expand_sve_vcond): New functions.
	(aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead
	of aarch64_vector_mode_p.
	(aarch64_dwarf_poly_indeterminate_value): New function.
	(aarch64_compute_pressure_classes): Likewise.
	(aarch64_can_change_mode_class): Likewise.
	(TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine.
	(TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise.
	(TARGET_VECTORIZE_GET_MASK_MODE): Likewise.
	(TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise.
	(TARGET_COMPUTE_PRESSURE_CLASSES): Likewise.
	(TARGET_CAN_CHANGE_MODE_CLASS): Likewise.
	* config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr)
	(Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New
	constraints.
	(Dn, Dl, Dr): Accept const as well as const_vector.
	(Dz): Likewise.  Compare against CONST0_RTX.
	* config/aarch64/iterators.md: Refer to "Advanced SIMD" instead
	of "vector" where appropriate.
	(SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD)
	(SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators.
	(UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT)
	(UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE)
	(UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS)
	(UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs.
	(Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV)
	(v_int_equiv): Extend to SVE modes.
	(Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New
	mode attributes.
	(LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators.
	(optab): Handle popcount, smin, smax, umin, umax, abs and sqrt.
	(logical_nn, lr, sve_int_op, sve_fp_op): New code attributs.
	(LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP)
	(SVE_COND_FP_CMP): New int iterators.
	(perm_hilo): Handle the new unpack unspecs.
	(optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int
	attributes.
	* config/aarch64/predicates.md (aarch64_sve_cnt_immediate)
	(aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate)
	(aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand)
	(aarch64_equality_operator, aarch64_constant_vector_operand)
	(aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates.
	(aarch64_sve_nonimmediate_operand): Likewise.
	(aarch64_sve_general_operand): Likewise.
	(aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise.
	(aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate)
	(aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise.
	(aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise.
	(aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise.
	(aarch64_sve_float_arith_immediate): Likewise.
	(aarch64_sve_float_arith_with_sub_immediate): Likewise.
	(aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise.
	(aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise.
	(aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise.
	(aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise.
	(aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise.
	(aarch64_sve_float_arith_operand): Likewise.
	(aarch64_sve_float_arith_with_sub_operand): Likewise.
	(aarch64_sve_float_mul_operand): Likewise.
	(aarch64_sve_vec_perm_operand): Likewise.
	(aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate.
	(aarch64_mov_operand): Accept const_poly_int and const_vector.
	(aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const
	as well as const_vector.
	(aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier
	in file.  Use CONST0_RTX and CONSTM1_RTX.
	(aarch64_simd_or_scalar_imm_zero): Likewise.  Add match_codes.
	(aarch64_simd_reg_or_zero): Accept const as well as const_vector.
	Use aarch64_simd_imm_zero.
	* config/aarch64/aarch64-sve.md: New file.
	* config/aarch64/aarch64.md: Include it.
	(VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers.
	(UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE)
	(UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI)
	(UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK)
	(UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants.
	(sve): New attribute.
	(enabled): Disable instructions with the sve attribute unless
	TARGET_SVE.
	(movqi, movhi): Pass CONST_POLY_INT operaneds through
	aarch64_expand_mov_immediate.
	(*mov<mode>_aarch64, *movsi_aarch64, *movdi_aarch64): Handle
	CNT[BHSD] immediates.
	(movti): Split CONST_POLY_INT moves into two halves.
	(add<mode>3): Accept aarch64_pluslong_or_poly_operand.
	Split additions that need a temporary here if the destination
	is the stack pointer.
	(*add<mode>3_aarch64): Handle ADDVL and ADDPL immediates.
	(*add<mode>3_poly_1): New instruction.
	(set_clobber_cc): New expander.

Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256612
2018-01-13 17:50:35 +00:00
Richard Sandiford
11e0322aea Mark SLP failures for vect_variable_length
Until SLP support for variable-length vectors is added, many tests
fall back to non-SLP vectorisation with permutes.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/testsuite/
	* gcc.dg/vect/no-scevccp-slp-30.c: XFAIL SLP test for
	vect_variable_length, expecting the test to be vectorized
	without SLP instead.
	* gcc.dg/vect/pr33953.c: Likewise.
	* gcc.dg/vect/pr37027.c: Likewise.
	* gcc.dg/vect/pr67790.c: Likewise.
	* gcc.dg/vect/pr68445.c: Likewise.
	* gcc.dg/vect/slp-1.c: Likewise.
	* gcc.dg/vect/slp-10.c: Likewise.
	* gcc.dg/vect/slp-12a.c: Likewise.
	* gcc.dg/vect/slp-12b.c: Likewise.
	* gcc.dg/vect/slp-12c.c: Likewise.
	* gcc.dg/vect/slp-13-big-array.c: Likewise.
	* gcc.dg/vect/slp-13.c: Likewise.
	* gcc.dg/vect/slp-14.c: Likewise.
	* gcc.dg/vect/slp-15.c: Likewise.
	* gcc.dg/vect/slp-17.c: Likewise.
	* gcc.dg/vect/slp-19b.c: Likewise.
	* gcc.dg/vect/slp-2.c: Likewise.
	* gcc.dg/vect/slp-20.c: Likewise.
	* gcc.dg/vect/slp-21.c: Likewise.
	* gcc.dg/vect/slp-22.c: Likewise.
	* gcc.dg/vect/slp-24-big-array.c: Likewise.
	* gcc.dg/vect/slp-24.c: Likewise.
	* gcc.dg/vect/slp-28.c: Likewise.
	* gcc.dg/vect/slp-39.c: Likewise.
	* gcc.dg/vect/slp-42.c: Likewise.
	* gcc.dg/vect/slp-6.c: Likewise.
	* gcc.dg/vect/slp-7.c: Likewise.
	* gcc.dg/vect/slp-cond-1.c: Likewise.
	* gcc.dg/vect/slp-cond-2-big-array.c: Likewise.
	* gcc.dg/vect/slp-cond-2.c: Likewise.
	* gcc.dg/vect/slp-multitypes-1.c: Likewise.
	* gcc.dg/vect/slp-multitypes-10.c: Likewise.
	* gcc.dg/vect/slp-multitypes-12.c: Likewise.
	* gcc.dg/vect/slp-multitypes-2.c: Likewise.
	* gcc.dg/vect/slp-multitypes-4.c: Likewise.
	* gcc.dg/vect/slp-multitypes-5.c: Likewise.
	* gcc.dg/vect/slp-multitypes-8.c: Likewise.
	* gcc.dg/vect/slp-multitypes-9.c: Likewise.
	* gcc.dg/vect/slp-reduc-1.c: Likewise.
	* gcc.dg/vect/slp-reduc-2.c: Likewise.
	* gcc.dg/vect/slp-reduc-4.c: Likewise.
	* gcc.dg/vect/slp-reduc-5.c: Likewise.
	* gcc.dg/vect/slp-reduc-7.c: Likewise.
	* gcc.dg/vect/slp-widen-mult-half.c: Likewise.
	* gcc.dg/vect/vect-live-slp-1.c: Likewise.
	* gcc.dg/vect/vect-live-slp-2.c: Likewise.
	* gcc.dg/vect/vect-live-slp-3.c: Likewise.

From-SVN: r256611
2018-01-13 17:50:25 +00:00