OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
Ville Voutilainen	c89f2d2468	Make optional conditionally trivially_{copy,move}_{constructible,assignable} * include/std/optional (_Optional_payload): Fix the comment in the class head and turn into a primary and one specialization. (_Optional_payload::_M_engaged): Strike the NSDMI. (_Optional_payload<_Tp, false>::operator=(const _Optional_payload&)): New. (_Optional_payload<_Tp, false>::operator=(_Optional_payload&&)): Likewise. (_Optional_payload<_Tp, false>::_M_get): Likewise. (_Optional_payload<_Tp, false>::_M_reset): Likewise. (_Optional_base_impl): Likewise. (_Optional_base): Turn into a primary and three specializations. (optional(nullopt)): Change the base init. * testsuite/20_util/optional/assignment/8.cc: New. * testsuite/20_util/optional/cons/trivial.cc: Likewise. * testsuite/20_util/optional/cons/value_neg.cc: Adjust. From-SVN: r256694	2018-01-15 13:32:24 +02:00
Georg-Johann Lay	1759d1167a	Adjust tests to AVR_TINY. * gcc.target/avr/progmem.h (pgm_read_char): Handle AVR_TINY. * gcc.target/avr/pr52472.c: Add "! avr_tiny" target filter. * gcc.target/avr/pr71627.c: Same. * gcc.target/avr/torture/addr-space-1-0.c: Same. * gcc.target/avr/torture/addr-space-1-1.c: Same. * gcc.target/avr/torture/addr-space-1-x.c: Same. * gcc.target/avr/torture/addr-space-2-0.c: Same. * gcc.target/avr/torture/addr-space-2-1.c: Same. * gcc.target/avr/torture/addr-space-2-x.c: Same. * gcc.target/avr/torture/sat-hr-plus-minus.c: Same. * gcc.target/avr/torture/sat-k-plus-minus.c: Same. * gcc.target/avr/torture/sat-llk-plus-minus.c: Same. * gcc.target/avr/torture/sat-r-plus-minus.c: Same. * gcc.target/avr/torture/sat-uhr-plus-minus.c: Same. * gcc.target/avr/torture/sat-uk-plus-minus.c: Same. * gcc.target/avr/torture/sat-ullk-plus-minus.c: Same. * gcc.target/avr/torture/sat-ur-plus-minus.c: Same. * gcc.target/avr/torture/pr61055.c: Same. * gcc.target/avr/torture/builtins-3-absfx.c: Only use __flash if available. * gcc.target/avr/torture/int24-mul.c: Same. * gcc.target/avr/torture/pr51782-1.c: Same. * gcc.target/avr/torture/pr61443.c: Same. * gcc.target/avr/torture/builtins-2.c: Factor out addr-space stuff... * gcc.target/avr/torture/builtins-2-flash.c: ...to this new test. From-SVN: r256690	2018-01-15 11:18:18 +00:00
Jonathan Wakely	bab0a26de5	PR libstdc++/80276 fix template argument handling in type printers PR libstdc++/80276 * python/libstdcxx/v6/printers.py (strip_inline_namespaces): New. (get_template_arg_list): New. (StdVariantPrinter._template_args): Remove, use get_template_arg_list instead. (TemplateTypePrinter): Rewrite to work with gdb.Type objects instead of strings and regular expressions. (add_one_template_type_printer): Adapt to new TemplateTypePrinter. (FilteringTypePrinter): Add docstring. Match using startswith. Use strip_inline_namespaces instead of strip_versioned_namespace. (add_one_type_printer): Prepend namespace to match argument. (register_type_printers): Add type printers for char16_t and char32_t string types and for types using cxx11 ABI. Update calls to add_one_template_type_printer to provide default argument dicts. * testsuite/libstdc++-prettyprinters/80276.cc: New test. * testsuite/libstdc++-prettyprinters/whatis.cc: Remove tests for basic_string<unsigned char> and basic_string<signed char>. * testsuite/libstdc++-prettyprinters/whatis2.cc: Duplicate whatis.cc to test local variables, without overriding _GLIBCXX_USE_CXX11_ABI. From-SVN: r256689	2018-01-15 11:13:53 +00:00
Juraj Oršulić	ed99ae13bb	Correct earlier ChangeLog entry Add Juraj Oršulić as original patch author. From-SVN: r256688	2018-01-15 11:13:49 +00:00
Georg-Johann Lay	93c74e5970	re PR c/83801 ([avr] String constant in __flash not put into .progmem) PR c/83801 PR c/83729 * gcc.target/avr/torture/pr83729.c: New test. * gcc.target/avr/torture/pr83801.c: New test. From-SVN: r256687	2018-01-15 10:04:32 +00:00
Jakub Jelinek	3fccbb9ece	re PR middle-end/82694 (Linux kernel miscompiled since r250765) PR middle-end/82694 * common.opt (fstrict-overflow): No longer an alias. (fwrapv-pointer): New option. * tree.h (TYPE_OVERFLOW_WRAPS, TYPE_OVERFLOW_UNDEFINED): Define also for pointer types based on flag_wrapv_pointer. * opts.c (common_handle_option) <case OPT_fstrict_overflow>: Set opts->x_flag_wrap[pv] to !value, clear opts->x_flag_trapv if opts->x_flag_wrapv got set. * fold-const.c (fold_comparison, fold_binary_loc): Revert 2017-08-01 changes, just use TYPE_OVERFLOW_UNDEFINED on pointer type instead of POINTER_TYPE_OVERFLOW_UNDEFINED. * match.pd: Likewise in address comparison pattern. * doc/invoke.texi: Document -fwrapv and -fstrict-overflow. * gcc.dg/no-strict-overflow-7.c: Revert 2017-08-01 changes. * gcc.dg/tree-ssa/pr81388-1.c: Likewise. From-SVN: r256686	2018-01-15 10:05:59 +01:00
Richard Biener	2aa89839f5	re PR lto/83804 ([meta] LTO memory consumption) 2018-01-15 Richard Biener <rguenther@suse.de> PR lto/83804 * tree.c (free_lang_data_in_type): Always unlink TYPE_DECLs from TYPE_FIELDS. Free TYPE_BINFO if not used by devirtualization. Reset type names to their identifier if their TYPE_DECL doesn't have linkage (and thus is used for ODR and devirt). (save_debug_info_for_decl): Remove. (save_debug_info_for_type): Likewise. (add_tree_to_fld_list): Adjust. * tree-pretty-print.c (dump_generic_node): Make dumping of type names more robust. From-SVN: r256685	2018-01-15 08:57:28 +00:00
Richard Biener	a55e8b53d0	BASE-VER: Bump to 8.0.1. 2018-01-15 Richard Biener <rguenther@suse.de> * BASE-VER: Bump to 8.0.1. From-SVN: r256684	2018-01-15 08:28:13 +00:00
Martin Sebor	e0676e2e71	re PR other/83508 ([arm] c-c++-common/Wrestrict.c fails since r255836) PR other/83508 * builtins.c (check_access): Avoid warning when the no-warning bit is set. PR other/83508 * gcc.dg/Wstringop-overflow-2.c: New test. From-SVN: r256683	2018-01-14 23:15:09 -07:00
Cory Fields	5804f62712	tree-ssa-loop-im.c (sort_bbs_in_loop_postorder_cmp): Stabilize sort. * tree-ssa-loop-im.c (sort_bbs_in_loop_postorder_cmp): Stabilize sort. * ira-color (allocno_hard_regs_compare): Likewise. From-SVN: r256682	2018-01-14 23:05:50 -07:00
Nathan Rossi	aba0d181dc	re PR target/83013 (MicroBlaze - #ident - Error: operation combines symbols in different segments) PR target/83013 * config/microblaze/microblaze.c (microblaze_asm_output_ident): Use .pushsection/.popsection. From-SVN: r256681	2018-01-14 23:02:19 -07:00
GCC Administrator	42d3f20a15	Daily bump. From-SVN: r256680	2018-01-15 00:16:26 +00:00
Martin Sebor	656280b0b4	PR c++/81327 - cast to void* does not suppress -Wclass-memaccess gcc/ChangeLog: PR c++/81327 * doc/invoke.texi (-Wlass-memaccess): Document suppression by casting. From-SVN: r256677	2018-01-14 14:54:25 -07:00
Jerry DeLisle	ba791a6c72	Fix date in log. From-SVN: r256676	2018-01-14 21:46:43 +00:00
Jerry DeLisle	511f5ccf43	Fix date in Changelog From-SVN: r256674	2018-01-14 21:00:29 +00:00
H.J. Lu	616ef62f57	Correct ChangeLog of x86: Add -mfunction-return= From-SVN: r256673	2018-01-14 12:57:36 -08:00
H.J. Lu	dfc358bf02	Correct ChangeLog of x86: Add -mindirect-branch= From-SVN: r256672	2018-01-14 12:56:07 -08:00
Jerry DeLisle	33b2b069c1	re PR libfortran/83811 (fortran 'e' format broken for single digit exponents) 2018-01-18 Jerry DeLisle <jvdelisle@gcc.gnu.org> PR libgfortran/83811 * write.c (select_buffer): Adjust buffer size up by 1. * gfortran.dg/fmt_e.f90: New test. From-SVN: r256669	2018-01-14 17:36:29 +00:00
Andreas Schwab	a61bac1ea9	re PR libstdc++/81092 (Missing symbols for new std::wstring constructors) PR libstdc++/81092 * config/abi/post/ia64-linux-gnu/baseline_symbols.txt: Update. From-SVN: r256668	2018-01-14 17:32:20 +00:00
Jakub Jelinek	2abaf67e41	config.gcc (i[34567]86--): Remove one duplicate gfniintrin.h entry from extra_headers. * config.gcc (i[34567]86--): Remove one duplicate gfniintrin.h entry from extra_headers. (x86_64--): Remove two duplicate gfniintrin.h entries from extra_headers, make the list bitwise identical to the i?86-- one. From-SVN: r256667	2018-01-14 17:19:14 +01:00
H.J. Lu	95d11c1707	x86: Disallow -mindirect-branch=/-mfunction-return= with -mcmodel=large Since the thunk function may not be reachable in large code model, -mcmodel=large is incompatible with -mindirect-branch=thunk, -mindirect-branch=thunk-extern, -mfunction-return=thunk and -mfunction-return=thunk-extern. Issue an error when they are used with -mcmodel=large. gcc/ * config/i386/i386.c (ix86_set_indirect_branch_type): Disallow -mcmodel=large with -mindirect-branch=thunk, -mindirect-branch=thunk-extern, -mfunction-return=thunk and -mfunction-return=thunk-extern. * doc/invoke.texi: Document -mcmodel=large is incompatible with -mindirect-branch=thunk, -mindirect-branch=thunk-extern, -mfunction-return=thunk and -mfunction-return=thunk-extern. gcc/testsuite/ * gcc.target/i386/indirect-thunk-10.c: New test. * gcc.target/i386/indirect-thunk-8.c: Likewise. * gcc.target/i386/indirect-thunk-9.c: Likewise. * gcc.target/i386/indirect-thunk-attr-10.c: Likewise. * gcc.target/i386/indirect-thunk-attr-11.c: Likewise. * gcc.target/i386/indirect-thunk-attr-9.c: Likewise. * gcc.target/i386/ret-thunk-17.c: Likewise. * gcc.target/i386/ret-thunk-18.c: Likewise. * gcc.target/i386/ret-thunk-19.c: Likewise. * gcc.target/i386/ret-thunk-20.c: Likewise. * gcc.target/i386/ret-thunk-21.c: Likewise. From-SVN: r256664	2018-01-14 06:43:10 -08:00
H.J. Lu	6abe11c1a3	x86: Add 'V' register operand modifier Add 'V', a special modifier which prints the name of the full integer register without '%'. For extern void (func_p) (void); void foo (void) { asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p)); } it generates: foo: movq func_p(%rip), %rax call __x86_indirect_thunk_rax ret gcc/ config/i386/i386.c (print_reg): Print the name of the full integer register without '%'. (ix86_print_operand): Handle 'V'. * doc/extend.texi: Document 'V' modifier. gcc/testsuite/ * gcc.target/i386/indirect-thunk-register-4.c: New test. From-SVN: r256663	2018-01-14 06:41:25 -08:00
H.J. Lu	d543c04b79	x86: Add -mindirect-branch-register Add -mindirect-branch-register to force indirect branch via register. This is implemented by disabling patterns of indirect branch via memory, similar to TARGET_X32. -mindirect-branch= and -mfunction-return= tests are updated with -mno-indirect-branch-register to avoid false test failures when -mindirect-branch-register is added to RUNTESTFLAGS for "make check". gcc/ * config/i386/constraints.md (Bs): Disallow memory operand for -mindirect-branch-register. (Bw): Likewise. * config/i386/predicates.md (indirect_branch_operand): Likewise. (GOT_memory_operand): Likewise. (call_insn_operand): Likewise. (sibcall_insn_operand): Likewise. (GOT32_symbol_operand): Likewise. * config/i386/i386.md (indirect_jump): Call convert_memory_address for -mindirect-branch-register. (tablejump): Likewise. (sibcall_memory): Likewise. (sibcall_value_memory): Likewise. Disallow peepholes of indirect call and jump via memory for -mindirect-branch-register. (call_pop): Replace m with Bw. (call_value_pop): Likewise. (sibcall_pop_memory): Replace m with Bs. config/i386/i386.opt (mindirect-branch-register): New option. * doc/invoke.texi: Document -mindirect-branch-register option. gcc/testsuite/ * gcc.target/i386/indirect-thunk-1.c (dg-options): Add -mno-indirect-branch-register. * gcc.target/i386/indirect-thunk-2.c: Likewise. * gcc.target/i386/indirect-thunk-3.c: Likewise. * gcc.target/i386/indirect-thunk-4.c: Likewise. * gcc.target/i386/indirect-thunk-5.c: Likewise. * gcc.target/i386/indirect-thunk-6.c: Likewise. * gcc.target/i386/indirect-thunk-7.c: Likewise. * gcc.target/i386/indirect-thunk-attr-1.c: Likewise. * gcc.target/i386/indirect-thunk-attr-2.c: Likewise. * gcc.target/i386/indirect-thunk-attr-3.c: Likewise. * gcc.target/i386/indirect-thunk-attr-4.c: Likewise. * gcc.target/i386/indirect-thunk-attr-5.c: Likewise. * gcc.target/i386/indirect-thunk-attr-6.c: Likewise. * gcc.target/i386/indirect-thunk-attr-7.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-1.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-2.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise. * gcc.target/i386/indirect-thunk-extern-1.c: Likewise. * gcc.target/i386/indirect-thunk-extern-2.c: Likewise. * gcc.target/i386/indirect-thunk-extern-3.c: Likewise. * gcc.target/i386/indirect-thunk-extern-4.c: Likewise. * gcc.target/i386/indirect-thunk-extern-5.c: Likewise. * gcc.target/i386/indirect-thunk-extern-6.c: Likewise. * gcc.target/i386/indirect-thunk-extern-7.c: Likewise. * gcc.target/i386/indirect-thunk-inline-1.c: Likewise. * gcc.target/i386/indirect-thunk-inline-2.c: Likewise. * gcc.target/i386/indirect-thunk-inline-3.c: Likewise. * gcc.target/i386/indirect-thunk-inline-4.c: Likewise. * gcc.target/i386/indirect-thunk-inline-5.c: Likewise. * gcc.target/i386/indirect-thunk-inline-6.c: Likewise. * gcc.target/i386/indirect-thunk-inline-7.c: Likewise. * gcc.target/i386/ret-thunk-10.c: Likewise. * gcc.target/i386/ret-thunk-11.c: Likewise. * gcc.target/i386/ret-thunk-12.c: Likewise. * gcc.target/i386/ret-thunk-13.c: Likewise. * gcc.target/i386/ret-thunk-14.c: Likewise. * gcc.target/i386/ret-thunk-15.c: Likewise. * gcc.target/i386/ret-thunk-9.c: Likewise. * gcc.target/i386/indirect-thunk-register-1.c: New test. * gcc.target/i386/indirect-thunk-register-2.c: Likewise. * gcc.target/i386/indirect-thunk-register-3.c: Likewise. From-SVN: r256662	2018-01-14 06:40:01 -08:00
H.J. Lu	45e1401938	x86: Add -mfunction-return= Add -mfunction-return= option to convert function return to call and return thunks. The default is 'keep', which keeps function return unmodified. 'thunk' converts function return to call and return thunk. 'thunk-inline' converts function return to inlined call and return thunk. 'thunk-extern' converts function return to external call and return thunk provided in a separate object file. You can control this behavior for a specific function by using the function attribute function_return. Function return thunk is the same as memory thunk for -mindirect-branch= where the return address is at the top of the stack: __x86_return_thunk: call L2 L1: pause lfence jmp L1 L2: lea 8(%rsp), %rsp\|lea 4(%esp), %esp ret and function return becomes jmp __x86_return_thunk -mindirect-branch= tests are updated with -mfunction-return=keep to avoid false test failures when -mfunction-return=thunk is added to RUNTESTFLAGS for "make check". gcc/ * config/i386/i386-protos.h (ix86_output_function_return): New. * config/i386/i386.c (ix86_set_indirect_branch_type): Also set function_return_type. (indirect_thunk_name): Add ret_p to indicate thunk for function return. (output_indirect_thunk_function): Pass false to indirect_thunk_name. (ix86_output_indirect_branch): Likewise. (output_indirect_thunk_function): Create alias for function return thunk if regno < 0. (ix86_output_function_return): New function. (ix86_handle_fndecl_attribute): Handle function_return. (ix86_attribute_table): Add function_return. * config/i386/i386.h (machine_function): Add function_return_type. * config/i386/i386.md (simple_return_internal): Use ix86_output_function_return. (simple_return_internal_long): Likewise. * config/i386/i386.opt (mfunction-return=): New option. (indirect_branch): Mention -mfunction-return=. * doc/extend.texi: Document function_return function attribute. * doc/invoke.texi: Document -mfunction-return= option. gcc/testsuite/ * gcc.target/i386/indirect-thunk-1.c (dg-options): Add -mfunction-return=keep. * gcc.target/i386/indirect-thunk-2.c: Likewise. * gcc.target/i386/indirect-thunk-3.c: Likewise. * gcc.target/i386/indirect-thunk-4.c: Likewise. * gcc.target/i386/indirect-thunk-5.c: Likewise. * gcc.target/i386/indirect-thunk-6.c: Likewise. * gcc.target/i386/indirect-thunk-7.c: Likewise. * gcc.target/i386/indirect-thunk-attr-1.c: Likewise. * gcc.target/i386/indirect-thunk-attr-2.c: Likewise. * gcc.target/i386/indirect-thunk-attr-3.c: Likewise. * gcc.target/i386/indirect-thunk-attr-4.c: Likewise. * gcc.target/i386/indirect-thunk-attr-5.c: Likewise. * gcc.target/i386/indirect-thunk-attr-6.c: Likewise. * gcc.target/i386/indirect-thunk-attr-7.c: Likewise. * gcc.target/i386/indirect-thunk-attr-8.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-1.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-2.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise. * gcc.target/i386/indirect-thunk-extern-1.c: Likewise. * gcc.target/i386/indirect-thunk-extern-2.c: Likewise. * gcc.target/i386/indirect-thunk-extern-3.c: Likewise. * gcc.target/i386/indirect-thunk-extern-4.c: Likewise. * gcc.target/i386/indirect-thunk-extern-5.c: Likewise. * gcc.target/i386/indirect-thunk-extern-6.c: Likewise. * gcc.target/i386/indirect-thunk-extern-7.c: Likewise. * gcc.target/i386/indirect-thunk-inline-1.c: Likewise. * gcc.target/i386/indirect-thunk-inline-2.c: Likewise. * gcc.target/i386/indirect-thunk-inline-3.c: Likewise. * gcc.target/i386/indirect-thunk-inline-4.c: Likewise. * gcc.target/i386/indirect-thunk-inline-5.c: Likewise. * gcc.target/i386/indirect-thunk-inline-6.c: Likewise. * gcc.target/i386/indirect-thunk-inline-7.c: Likewise. * gcc.target/i386/ret-thunk-1.c: New test. * gcc.target/i386/ret-thunk-10.c: Likewise. * gcc.target/i386/ret-thunk-11.c: Likewise. * gcc.target/i386/ret-thunk-12.c: Likewise. * gcc.target/i386/ret-thunk-13.c: Likewise. * gcc.target/i386/ret-thunk-14.c: Likewise. * gcc.target/i386/ret-thunk-15.c: Likewise. * gcc.target/i386/ret-thunk-16.c: Likewise. * gcc.target/i386/ret-thunk-2.c: Likewise. * gcc.target/i386/ret-thunk-3.c: Likewise. * gcc.target/i386/ret-thunk-4.c: Likewise. * gcc.target/i386/ret-thunk-5.c: Likewise. * gcc.target/i386/ret-thunk-6.c: Likewise. * gcc.target/i386/ret-thunk-7.c: Likewise. * gcc.target/i386/ret-thunk-8.c: Likewise. * gcc.target/i386/ret-thunk-9.c: Likewise. From-SVN: r256661	2018-01-14 06:37:39 -08:00
H.J. Lu	da99fd4a3c	x86: Add -mindirect-branch= Add -mindirect-branch= option to convert indirect call and jump to call and return thunks. The default is 'keep', which keeps indirect call and jump unmodified. 'thunk' converts indirect call and jump to call and return thunk. 'thunk-inline' converts indirect call and jump to inlined call and return thunk. 'thunk-extern' converts indirect call and jump to external call and return thunk provided in a separate object file. You can control this behavior for a specific function by using the function attribute indirect_branch. 2 kinds of thunks are geneated. Memory thunk where the function address is at the top of the stack: __x86_indirect_thunk: call L2 L1: pause lfence jmp L1 L2: lea 8(%rsp), %rsp\|lea 4(%esp), %esp ret Indirect jmp via memory, "jmp mem", is converted to push memory jmp __x86_indirect_thunk Indirect call via memory, "call mem", is converted to jmp L2 L1: push [mem] jmp __x86_indirect_thunk L2: call L1 Register thunk where the function address is in a register, reg: __x86_indirect_thunk_reg: call L2 L1: pause lfence jmp L1 L2: movq %reg, (%rsp)\|movl %reg, (%esp) ret where reg is one of (r\|e)ax, (r\|e)dx, (r\|e)cx, (r\|e)bx, (r\|e)si, (r\|e)di, (r\|e)bp, r8, r9, r10, r11, r12, r13, r14 and r15. Indirect jmp via register, "jmp reg", is converted to jmp __x86_indirect_thunk_reg Indirect call via register, "call reg", is converted to call __x86_indirect_thunk_reg gcc/ * config/i386/i386-opts.h (indirect_branch): New. * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise. * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone with local indirect jump when converting indirect call and jump. (ix86_set_indirect_branch_type): New. (ix86_set_current_function): Call ix86_set_indirect_branch_type. (indirectlabelno): New. (indirect_thunk_needed): Likewise. (indirect_thunk_bnd_needed): Likewise. (indirect_thunks_used): Likewise. (indirect_thunks_bnd_used): Likewise. (INDIRECT_LABEL): Likewise. (indirect_thunk_name): Likewise. (output_indirect_thunk): Likewise. (output_indirect_thunk_function): Likewise. (ix86_output_indirect_branch): Likewise. (ix86_output_indirect_jmp): Likewise. (ix86_code_end): Call output_indirect_thunk_function if needed. (ix86_output_call_insn): Call ix86_output_indirect_branch if needed. (ix86_handle_fndecl_attribute): Handle indirect_branch. (ix86_attribute_table): Add indirect_branch. * config/i386/i386.h (machine_function): Add indirect_branch_type and has_local_indirect_jump. * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump to true. (tablejump): Likewise. (indirect_jump): Use ix86_output_indirect_jmp. (tablejump_1): Likewise. (simple_return_indirect_internal): Likewise. * config/i386/i386.opt (mindirect-branch=): New option. (indirect_branch): New. (keep): Likewise. (thunk): Likewise. (thunk-inline): Likewise. (thunk-extern): Likewise. * doc/extend.texi: Document indirect_branch function attribute. * doc/invoke.texi: Document -mindirect-branch= option. gcc/testsuite/ * gcc.target/i386/indirect-thunk-1.c: New test. * gcc.target/i386/indirect-thunk-2.c: Likewise. * gcc.target/i386/indirect-thunk-3.c: Likewise. * gcc.target/i386/indirect-thunk-4.c: Likewise. * gcc.target/i386/indirect-thunk-5.c: Likewise. * gcc.target/i386/indirect-thunk-6.c: Likewise. * gcc.target/i386/indirect-thunk-7.c: Likewise. * gcc.target/i386/indirect-thunk-attr-1.c: Likewise. * gcc.target/i386/indirect-thunk-attr-2.c: Likewise. * gcc.target/i386/indirect-thunk-attr-3.c: Likewise. * gcc.target/i386/indirect-thunk-attr-4.c: Likewise. * gcc.target/i386/indirect-thunk-attr-5.c: Likewise. * gcc.target/i386/indirect-thunk-attr-6.c: Likewise. * gcc.target/i386/indirect-thunk-attr-7.c: Likewise. * gcc.target/i386/indirect-thunk-attr-8.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-1.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-2.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise. * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise. * gcc.target/i386/indirect-thunk-extern-1.c: Likewise. * gcc.target/i386/indirect-thunk-extern-2.c: Likewise. * gcc.target/i386/indirect-thunk-extern-3.c: Likewise. * gcc.target/i386/indirect-thunk-extern-4.c: Likewise. * gcc.target/i386/indirect-thunk-extern-5.c: Likewise. * gcc.target/i386/indirect-thunk-extern-6.c: Likewise. * gcc.target/i386/indirect-thunk-extern-7.c: Likewise. * gcc.target/i386/indirect-thunk-inline-1.c: Likewise. * gcc.target/i386/indirect-thunk-inline-2.c: Likewise. * gcc.target/i386/indirect-thunk-inline-3.c: Likewise. * gcc.target/i386/indirect-thunk-inline-4.c: Likewise. * gcc.target/i386/indirect-thunk-inline-5.c: Likewise. * gcc.target/i386/indirect-thunk-inline-6.c: Likewise. * gcc.target/i386/indirect-thunk-inline-7.c: Likewise. From-SVN: r256660	2018-01-14 06:35:19 -08:00
Jan Hubicka	3f05a4f072	re PR ipa/83051 (ICE on valid code at -O3: in edge_badness, at ipa-inline.c:1024) PR ipa/83051 * gcc.c-torture/compile/pr83051.c: New testcase. * ipa-inline.c (edge_badness): Tolerate roundoff errors. From-SVN: r256659	2018-01-14 11:20:31 +00:00
Richard Sandiford	01b9bf0615	inline_small_functions speedup After inlining A into B, inline_small_functions updates the information for (most) callees and callers of the new B: update_callee_keys (&edge_heap, where, updated_nodes); [...] /* Our profitability metric can depend on local properties such as number of inlinable calls and size of the function body. After inlining these properties might change for the function we inlined into (since it's body size changed) and for the functions called by function we inlined (since number of it inlinable callers might change). / update_caller_keys (&edge_heap, where, updated_nodes, NULL); These functions in turn call can_inline_edge_p for most of the associated edges: if (can_inline_edge_p (edge, false) && want_inline_small_function_p (edge, false)) update_edge_key (heap, edge); can_inline_edge_p indirectly calls estimate_calls_size_and_time on the caller node, which seems to recursively process all callee edges rooted at the node. It looks from this like the algorithm can be at least quadratic in the worst case. Maybe there's something we can do to make can_inline_edge_p cheaper, but since neither of these two calls is responsible for reporting an inline failure reason, it seems cheaper to test want_inline_small_function_p first, so that we don't calculate an estimate for something that we already know isn't a "small function". I think the only change needed to make that work is to check for CIF_FINAL_ERROR in want_inline_small_function_p; at the moment we rely on can_inline_edge_p to make that check. This cuts the time to build optabs.ii by over 4% with an --enable-checking=release compiler on x86_64-linux-gnu. I've seen more dramatic wins on aarch64-linux-gnu due to the NUM_POLY_INT_COEFFS==2 thing. The patch doesn't affect the output code. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> gcc/ ipa-inline.c (want_inline_small_function_p): Return false if inlining has already failed with CIF_FINAL_ERROR. (update_caller_keys): Call want_inline_small_function_p before can_inline_edge_p. (update_callee_keys): Likewise. From-SVN: r256658	2018-01-14 10:56:56 +00:00
Prathamesh Kulkarni	61760b925c	re PR tree-optimization/83501 (strlen(a) not folded after strcpy(a, "...")) 2018-01-14 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> PR tree-optimization/83501 * gcc.dg/strlenopt-39.c: Restrict to i?86 and x86_64-- targets. From-SVN: r256657	2018-01-14 08:58:58 +00:00
Kelvin Nilsen	a3a821c903	rs6000-p8swap.c (rs6000_sum_of_two_registers_p): New function. gcc/ChangeLog: 2018-01-10 Kelvin Nilsen <kelvin@gcc.gnu.org> * config/rs6000/rs6000-p8swap.c (rs6000_sum_of_two_registers_p): New function. (rs6000_quadword_masked_address_p): Likewise. (quad_aligned_load_p): Likewise. (quad_aligned_store_p): Likewise. (const_load_sequence_p): Add comment to describe the outer-most loop. (mimic_memory_attributes_and_flags): New function. (rs6000_gen_stvx): Likewise. (replace_swapped_aligned_store): Likewise. (rs6000_gen_lvx): Likewise. (replace_swapped_aligned_load): Likewise. (replace_swapped_load_constant): Capitalize argument name in comment describing this function. (rs6000_analyze_swaps): Add a third pass to search for vector loads and stores that access quad-word aligned addresses and replace with stvx or lvx instructions when appropriate. * config/rs6000/rs6000-protos.h (rs6000_sum_of_two_registers_p): New function prototype. (rs6000_quadword_masked_address_p): Likewise. (rs6000_gen_lvx): Likewise. (rs6000_gen_stvx): Likewise. * config/rs6000/vsx.md (vsx_le_perm_load_<mode>): For modes VSX_D (V2DF, V2DI), modify this split to select lvx instruction when memory address is aligned. (vsx_le_perm_load_<mode>): For modes VSX_W (V4SF, V4SI), modify this split to select lvx instruction when memory address is aligned. (vsx_le_perm_load_v8hi): Modify this split to select lvx instruction when memory address is aligned. (vsx_le_perm_load_v16qi): Likewise. (four unnamed splitters): Modify to select the stvx instruction when memory is aligned. gcc/testsuite/ChangeLog: 2018-01-10 Kelvin Nilsen <kelvin@gcc.gnu.org> * gcc.target/powerpc/pr48857.c: Modify dejagnu directives to look for lvx and stvx instead of lxvd2x and stxvd2x and require little-endian target. Add comments. * gcc.target/powerpc/swaps-p8-28.c: Add functions for more comprehensive testing. * gcc.target/powerpc/swaps-p8-29.c: Likewise. * gcc.target/powerpc/swaps-p8-30.c: Likewise. * gcc.target/powerpc/swaps-p8-31.c: Likewise. * gcc.target/powerpc/swaps-p8-32.c: Likewise. * gcc.target/powerpc/swaps-p8-33.c: Likewise. * gcc.target/powerpc/swaps-p8-34.c: Likewise. * gcc.target/powerpc/swaps-p8-35.c: Likewise. * gcc.target/powerpc/swaps-p8-36.c: Likewise. * gcc.target/powerpc/swaps-p8-37.c: Likewise. * gcc.target/powerpc/swaps-p8-38.c: Likewise. * gcc.target/powerpc/swaps-p8-39.c: Likewise. * gcc.target/powerpc/swaps-p8-40.c: Likewise. * gcc.target/powerpc/swaps-p8-41.c: Likewise. * gcc.target/powerpc/swaps-p8-42.c: Likewise. * gcc.target/powerpc/swaps-p8-43.c: Likewise. * gcc.target/powerpc/swaps-p8-44.c: Likewise. * gcc.target/powerpc/swaps-p8-45.c: Likewise. * gcc.target/powerpc/vec-extract-2.c: Add comment and remove scan-assembler-not directives that forbid lvx and xxpermdi. * gcc.target/powerpc/vec-extract-3.c: Likewise. * gcc.target/powerpc/vec-extract-5.c: Likewise. * gcc.target/powerpc/vec-extract-6.c: Likewise. * gcc.target/powerpc/vec-extract-7.c: Likewise. * gcc.target/powerpc/vec-extract-8.c: Likewise. * gcc.target/powerpc/vec-extract-9.c: Likewise. * gcc.target/powerpc/vsx-vector-6-le.c: Change scan-assembler-times directives to reflect different numbers of expected xxlnor, xxlor, xvcmpgtdp, and xxland instructions. libcpp/ChangeLog: 2018-01-10 Kelvin Nilsen <kelvin@gcc.gnu.org> * lex.c (search_line_fast): Remove illegal coercion of an unaligned pointer value to vector pointer type and replace with use of __builtin_vec_vsx_ld () built-in function, which operates on unaligned pointer values. From-SVN: r256656	2018-01-14 05:19:29 +00:00
Ian Lance Taylor	ffad1c54d2	go/types: implement SizesFor for gccgo Move the architecture-specific settings out of configure.ac into a new shell script goarch.sh. Use the new script to collect the values for all architectures to make them available in go/types. Also fix cmd/vet to pass the right compiler when it calls SizesFor. This fixes cmd/vet for systems that are not implemented in the gc toolchain, such as alpha and ia64. Reviewed-on: https://go-review.googlesource.com/87635 From-SVN: r256655	2018-01-14 04:59:01 +00:00
Tim Shen	8532713fc4	re PR libstdc++/83601 (std::regex_replace C++14 conformance issue: escaping in SED mode) PR libstdc++/83601 * include/bits/regex.tcc (regex_replace): Fix escaping in sed. * testsuite/28_regex/algorithms/regex_replace/char/pr83601.cc: Tests. * testsuite/28_regex/algorithms/regex_replace/wchar_t/pr83601.cc: Tests. From-SVN: r256654	2018-01-14 00:48:30 +00:00
GCC Administrator	8bc5a5c57c	Daily bump. From-SVN: r256653	2018-01-14 00:16:15 +00:00
Rainer Orth	1f7273e5db	Allow for lack of VM_MEMORY_OS_ALLOC_ONCE on Mac OS X (PR sanitizer/82824) PR sanitizer/82824 * lsan/lsan_common_mac.cc: Cherry-pick upstream r322437. From-SVN: r256650	2018-01-13 21:01:27 +00:00
Jerry DeLisle	f208c5ccc7	re PR fortran/82007 (DTIO write format stored in a string leads to severe errors) 2018-01-13 Jerry DeLisle <jvdelisle@gcc.gnu.org> PR fortran/82007 * resolve.c (resolve_transfer): Delete code looking for 'DT' format specifiers in format strings. Set formatted to true if a format string or format label is present. * trans-io.c (get_dtio_proc): Likewise. (transfer_expr): Fix whitespace. From-SVN: r256649	2018-01-13 20:41:00 +00:00
Jan Hubicka	f36180f4a4	predict.c (determine_unlikely_bbs): Handle correctly BBs which appears in the queue multiple times. * predict.c (determine_unlikely_bbs): Handle correctly BBs which appears in the queue multiple times. From-SVN: r256648	2018-01-13 19:32:04 +00:00
Thomas Koenig	39f309aca6	re PR fortran/83744 (ICE in ../../gcc/gcc/fortran/dump-parse-tree.c:3093 while using -fc-prototypes) 2018-01-13 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/83744 * dump-parse-tree.c (get_c_type_name): Remove extra line. Change for loop to use declaration in for loop. Handle BT_LOGICAL and BT_CHARACTER. (write_decl): Add where argument. Fix indentation. Replace assert with error message. Add typename to warning in comment. (write_type): Adjust locus to call of write_decl. (write_variable): Likewise. (write_proc): Likewise. Replace assert with error message. From-SVN: r256645	2018-01-13 18:22:36 +00:00
Richard Sandiford	a57776a113	Support for aliasing with variable strides This patch adds runtime alias checks for loops with variable strides, so that we can vectorise them even without a restrict qualifier. There are several parts to doing this: 1) For accesses like: x[i * n] += 1; we need to check whether n (and thus the DR_STEP) is nonzero. vect_analyze_data_ref_dependence records values that need to be checked in this way, then prune_runtime_alias_test_list records a bounds check on DR_STEP being outside the range [0, 0]. 2) For accesses like: x[i * n] = x[i * n + 1] + 1; we simply need to test whether abs (n) >= 2. prune_runtime_alias_test_list looks for cases like this and tries to guess whether it is better to use this kind of check or a check for non-overlapping ranges. (We could do an OR of the two conditions at runtime, but that isn't implemented yet.) 3) Checks for overlapping ranges need to cope with variable strides. At present the "length" of each segment in a range check is represented as an offset from the base that lies outside the touched range, in the same direction as DR_STEP. The length can therefore be negative and is sometimes conservative. With variable steps it's easier to reaon about if we split this into two: seg_len: distance travelled from the first iteration of interest to the last, e.g. DR_STEP * (VF - 1) access_size: the number of bytes accessed in each iteration with access_size always being a positive constant and seg_len possibly being variable. We can then combine alias checks for two accesses that are a constant number of bytes apart by adjusting the access size to account for the gap. This leaves the segment length unchanged, which allows the check to be combined with further accesses. When seg_len is positive, the runtime alias check has the form: base_a >= base_b + seg_len_b + access_size_b \|\| base_b >= base_a + seg_len_a + access_size_a In many accesses the base will be aligned to the access size, which allows us to skip the addition: base_a > base_b + seg_len_b \|\| base_b > base_a + seg_len_a A similar saving is possible with "negative" lengths. The patch therefore tracks the alignment in addition to seg_len and access_size. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vec_lower_bound): New structure. (_loop_vec_info): Add check_nonzero and lower_bounds. (LOOP_VINFO_CHECK_NONZERO): New macro. (LOOP_VINFO_LOWER_BOUNDS): Likewise. (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too. * tree-data-ref.h (dr_with_seg_len): Add access_size and align fields. Make seg_len the distance travelled, not including the access size. (dr_direction_indicator): Declare. (dr_zero_step_indicator): Likewise. (dr_known_forward_stride_p): Likewise. * tree-data-ref.c: Include stringpool.h, tree-vrp.h and tree-ssanames.h. (runtime_alias_check_p): Allow runtime alias checks with variable strides. (operator ==): Compare access_size and align. (prune_runtime_alias_test_list): Rework for new distinction between the access_size and seg_len. (create_intersect_range_checks_index): Likewise. Cope with polynomial segment lengths. (get_segment_min_max): New function. (create_intersect_range_checks): Use it. (dr_step_indicator): New function. (dr_direction_indicator): Likewise. (dr_zero_step_indicator): Likewise. (dr_known_forward_stride_p): Likewise. * tree-loop-distribution.c (data_ref_segment_size): Return DR_STEP * (niters - 1). (compute_alias_check_pairs): Update call to the dr_with_seg_len constructor. * tree-vect-data-refs.c (vect_check_nonzero_value): New function. (vect_preserves_scalar_order_p): New function, split out from... (vect_analyze_data_ref_dependence): ...here. Check for zero steps. (vect_vfa_segment_size): Return DR_STEP * (length_factor - 1). (vect_vfa_access_size): New function. (vect_vfa_align): Likewise. (vect_compile_time_alias): Take access_size_a and access_b arguments. (dump_lower_bound): New function. (vect_check_lower_bound): Likewise. (vect_small_gap_p): Likewise. (vectorizable_with_step_bound_p): Likewise. (vect_prune_runtime_alias_test_list): Ignore cross-iteration depencies if the vectorization factor is 1. Convert the checks for nonzero steps into checks on the bounds of DR_STEP. Try using a bunds check for variable steps if the minimum required step is relatively small. Update calls to the dr_with_seg_len constructor and to vect_compile_time_alias. * tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New function. (vect_loop_versioning): Call it. * tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS when retrying. (vect_estimate_min_profitable_iters): Account for any bounds checks. gcc/testsuite/ * gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather than SLP vectorization. * gcc.dg/vect/vect-alias-check-10.c: New test. * gcc.dg/vect/vect-alias-check-11.c: Likewise. * gcc.dg/vect/vect-alias-check-12.c: Likewise. * gcc.dg/vect/vect-alias-check-8.c: Likewise. * gcc.dg/vect/vect-alias-check-9.c: Likewise. * gcc.target/aarch64/sve/strided_load_8.c: Likewise. * gcc.target/aarch64/sve/var_stride_1.c: Likewise. * gcc.target/aarch64/sve/var_stride_1.h: Likewise. * gcc.target/aarch64/sve/var_stride_1_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_2.c: Likewise. * gcc.target/aarch64/sve/var_stride_2_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_3.c: Likewise. * gcc.target/aarch64/sve/var_stride_3_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_4.c: Likewise. * gcc.target/aarch64/sve/var_stride_4_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_5.c: Likewise. * gcc.target/aarch64/sve/var_stride_5_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_6.c: Likewise. * gcc.target/aarch64/sve/var_stride_6_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_7.c: Likewise. * gcc.target/aarch64/sve/var_stride_7_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_8.c: Likewise. * gcc.target/aarch64/sve/var_stride_8_run.c: Likewise. * gfortran.dg/vect/vect-alias-check-1.F90: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256644	2018-01-13 18:02:10 +00:00
Richard Sandiford	f307441ac4	Add support for SVE scatter stores This is mostly a mechanical extension of the previous gather load support to scatter stores. The internal functions in this case are: IFN_SCATTER_STORE (base, offsets, scale, values) IFN_MASK_SCATTER_STORE (base, offsets, scale, values, mask) However, one nonobvious change is to vect_analyze_data_ref_access. If we're treating an access as a gather load or scatter store (i.e. if STMT_VINFO_GATHER_SCATTER_P is true), the existing code would create a dummy data_reference whose step is 0. There's not really much else it could do, since the whole point is that the step isn't predictable from iteration to iteration. We then went into this code in vect_analyze_data_ref_access: /* Allow loads with zero step in inner-loop vectorization. / if (loop_vinfo && integer_zerop (step)) { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; if (!nested_in_vect_loop_p (loop, stmt)) return DR_IS_READ (dr); I.e. we'd take the step literally and assume that this is a load or store to an invariant address. Loads from invariant addresses are supported but stores to them aren't. The code therefore had the effect of disabling all scatter stores. AFAICT this is true of AVX too: although tests like avx512f-scatter-1.c test for the correctness of a scatter-like loop, they don't seem to check whether a scatter instruction is actually used. The patch therefore makes vect_analyze_data_ref_access return true for scatters. We do seem to handle the aliasing correctly; that's tested by other functions, and is symmetrical to the already-working gather case. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ doc/sourcebuild.texi (vect_scatter_store): Document. * optabs.def (scatter_store_optab, mask_scatter_store_optab): New optabs. * doc/md.texi (scatter_store@var{m}, mask_scatter_store@var{m}): Document. * genopinit.c (main): Add supports_vec_scatter_store and supports_vec_scatter_store_cached to target_optabs. * gimple.h (gimple_expr_type): Handle IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE. * internal-fn.def (SCATTER_STORE, MASK_SCATTER_STORE): New internal functions. * internal-fn.h (internal_store_fn_p): Declare. (internal_fn_stored_value_index): Likewise. * internal-fn.c (scatter_store_direct): New macro. (expand_scatter_store_optab_fn): New function. (direct_scatter_store_optab_supported_p): New macro. (internal_store_fn_p): New function. (internal_gather_scatter_fn_p): Handle IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE. (internal_fn_mask_index): Likewise. (internal_fn_stored_value_index): New function. (internal_gather_scatter_fn_supported_p): Adjust operand numbers for scatter stores. * optabs-query.h (supports_vec_scatter_store_p): Declare. * optabs-query.c (supports_vec_scatter_store_p): New function. * tree-vectorizer.h (vect_get_store_rhs): Declare. * tree-vect-data-refs.c (vect_analyze_data_ref_access): Return true for scatter stores. (vect_gather_scatter_fn_p): Handle scatter stores too. (vect_check_gather_scatter): Consider using scatter stores if supports_vec_scatter_store_p. * tree-vect-patterns.c (vect_try_gather_scatter_pattern): Handle scatter stores too. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_stored_value_index. (check_load_store_masking): Handle scatter stores too. (vect_get_store_rhs): Make public. (vectorizable_call): Use internal_store_fn_p. (vectorizable_store): Handle scatter store internal functions. (vect_transform_stmt): Compare GROUP_STORE_COUNT with GROUP_SIZE when deciding whether the end of the group has been reached. * config/aarch64/aarch64.md (UNSPEC_ST1_SCATTER): New unspec. * config/aarch64/aarch64-sve.md (scatter_store<mode>): New expander. (mask_scatter_store<mode>): New insns. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_scatter_store): New proc. * gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on targets with scatter stores. * gcc.dg/vect/vect-71.c: Restrict XFAIL to targets without scatter stores. * gcc.target/aarch64/sve/mask_scatter_store_1.c: New test. * gcc.target/aarch64/sve/mask_scatter_store_2.c: Likewise. * gcc.target/aarch64/sve/scatter_store_1.c: Likewise. * gcc.target/aarch64/sve/scatter_store_2.c: Likewise. * gcc.target/aarch64/sve/scatter_store_3.c: Likewise. * gcc.target/aarch64/sve/scatter_store_4.c: Likewise. * gcc.target/aarch64/sve/scatter_store_5.c: Likewise. * gcc.target/aarch64/sve/scatter_store_6.c: Likewise. * gcc.target/aarch64/sve/scatter_store_7.c: Likewise. * gcc.target/aarch64/sve/strided_store_1.c: Likewise. * gcc.target/aarch64/sve/strided_store_2.c: Likewise. * gcc.target/aarch64/sve/strided_store_3.c: Likewise. * gcc.target/aarch64/sve/strided_store_4.c: Likewise. * gcc.target/aarch64/sve/strided_store_5.c: Likewise. * gcc.target/aarch64/sve/strided_store_6.c: Likewise. * gcc.target/aarch64/sve/strided_store_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256643	2018-01-13 18:01:59 +00:00
Richard Sandiford	429ef523f7	Allow gather loads to be used for grouped accesses Following on from the previous patch for strided accesses, this patch allows gather loads to be used with grouped accesses, if we otherwise would need to fall back to VMAT_ELEMENTWISE. However, as the comment says, this is restricted to single-element groups for now: ??? Although the code can handle all group sizes correctly, it probably isn't a win to use separate strided accesses based on nearby locations. Or, even if it's a win over scalar code, it might not be a win over vectorizing at a lower VF, if that allows us to use contiguous accesses. Single-element groups are an important special case though, and this means that code is less sensitive to GCC's classification of single accesses with constant steps as "grouped" and ones with variable steps as "strided". 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_gather_scatter_fn_p): Declare. * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Make public. * tree-vect-stmts.c (vect_truncate_gather_scatter_offset): New function. (vect_use_strided_gather_scatters_p): Take a masked_p argument. Use vect_truncate_gather_scatter_offset if we can't treat the operation as a normal gather load or scatter store. (get_group_load_store_type): Take the gather_scatter_info as argument. Try using a gather load or scatter store for single-element groups. (get_load_store_type): Update calls to get_group_load_store_type and vect_use_strided_gather_scatters_p. gcc/testsuite/ * gcc.target/aarch64/sve/reduc_strict_3.c: Expect FADDA to be used for double_reduc1. * gcc.target/aarch64/sve/strided_load_4.c: New test. * gcc.target/aarch64/sve/strided_load_5.c: Likewise. * gcc.target/aarch64/sve/strided_load_6.c: Likewise. * gcc.target/aarch64/sve/strided_load_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256642	2018-01-13 18:01:49 +00:00
Richard Sandiford	ab2fc78250	Use gather loads for strided accesses This patch tries to use gather loads for strided accesses, rather than falling back to VMAT_ELEMENTWISE. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_create_data_ref_ptr): Take an extra optional tree argument. * tree-vect-data-refs.c (vect_check_gather_scatter): Check for null target hooks. (vect_create_data_ref_ptr): Take the iv_step as an optional argument, but continue to use the current value as a fallback. (bump_vector_ptr): Use operand_equal_p rather than tree_int_cst_compare to compare the updates. * tree-vect-stmts.c (vect_use_strided_gather_scatters_p): New function. (get_load_store_type): Use it when handling a strided access. (vect_get_strided_load_store_ops): New function. (vect_get_data_ptr_increment): Likewise. (vectorizable_load): Handle strided gather loads. Always pass a step to vect_create_data_ref_ptr and bump_vector_ptr. gcc/testsuite/ * gcc.target/aarch64/sve/strided_load_1.c: New test. * gcc.target/aarch64/sve/strided_load_2.c: Likewise. * gcc.target/aarch64/sve/strided_load_3.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256641	2018-01-13 18:01:42 +00:00
Richard Sandiford	bfaa08b7ba	Add support for SVE gather loads This patch adds support for SVE gather loads. It uses the basically the same analysis code as the AVX gather support, but after that there are two major differences: - It uses new internal functions rather than target built-ins. The interface is: IFN_GATHER_LOAD (base, offsets scale) IFN_MASK_GATHER_LOAD (base, offsets scale, mask) which should be reasonably generic. One of the advantages of using internal functions is that other passes can understand what the functions do, but a more immediate advantage is that we can query the underlying target pattern to see which scales it supports. - It uses pattern recognition to convert the offset to the right width, if it was originally narrower than that. This avoids having to do a widening operation as part of the gather expansion itself. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (gather_load@var{m}): Document. (mask_gather_load@var{m}): Likewise. * genopinit.c (main): Add supports_vec_gather_load and supports_vec_gather_load_cached to target_optabs. * optabs-tree.c (init_tree_optimization_optabs): Use ggc_cleared_alloc to allocate target_optabs. * optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs. * internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal functions. * internal-fn.h (internal_load_fn_p): Declare. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * internal-fn.c (gather_load_direct): New macro. (expand_gather_load_optab_fn): New function. (direct_gather_load_optab_supported_p): New macro. (direct_internal_fn_optab): New function. (internal_load_fn_p): Likewise. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * optabs-query.c (supports_at_least_one_mode_p): New function. (supports_vec_gather_load_p): Likewise. * optabs-query.h (supports_vec_gather_load_p): Declare. * tree-vectorizer.h (gather_scatter_info): Add ifn, element_type and memory_type field. (NUM_PATTERNS): Bump to 15. * tree-vect-data-refs.c: Include internal-fn.h. (vect_gather_scatter_fn_p): New function. (vect_describe_gather_scatter_call): Likewise. (vect_check_gather_scatter): Try using internal functions for gather loads. Recognize existing calls to a gather load function. (vect_analyze_data_refs): Consider using gather loads if supports_vec_gather_load_p. * tree-vect-patterns.c (vect_get_load_store_mask): New function. (vect_get_gather_scatter_offset_type): Likewise. (vect_convert_mask_for_vectype): Likewise. (vect_add_conversion_to_patterm): Likewise. (vect_try_gather_scatter_pattern): Likewise. (vect_recog_gather_scatter_pattern): New pattern recognizer. (vect_vect_recog_func_ptrs): Add it. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_mask_index and internal_gather_scatter_fn_p. (check_load_store_masking): Take the gather_scatter_info as an argument and handle gather loads. (vect_get_gather_scatter_ops): New function. (vectorizable_call): Check internal_load_fn_p. (vectorizable_load): Likewise. Handle gather load internal functions. (vectorizable_store): Update call to check_load_store_masking. * config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec. * config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators. * config/aarch64/predicates.md (aarch64_gather_scale_operand_w) (aarch64_gather_scale_operand_d): New predicates. * config/aarch64/aarch64-sve.md (gather_load<mode>): New expander. (mask_gather_load<mode>): New insns. gcc/testsuite/ * gcc.target/aarch64/sve/gather_load_1.c: New test. * gcc.target/aarch64/sve/gather_load_2.c: Likewise. * gcc.target/aarch64/sve/gather_load_3.c: Likewise. * gcc.target/aarch64/sve/gather_load_4.c: Likewise. * gcc.target/aarch64/sve/gather_load_5.c: Likewise. * gcc.target/aarch64/sve/gather_load_6.c: Likewise. * gcc.target/aarch64/sve/gather_load_7.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256640	2018-01-13 18:01:34 +00:00
Richard Sandiford	b781a135a0	Add support for in-order addition reduction using SVE FADDA This patch adds support for in-order floating-point addition reductions, which are suitable even in strict IEEE mode. Previously vect_is_simple_reduction would reject any cases that forbid reassociation. The idea is instead to tentatively accept them as "FOLD_LEFT_REDUCTIONs" and only fail later if there is no support for them. Although this patch only handles the particular case of plus and minus on floating-point types, there's no reason in principle why we couldn't handle other cases. The reductions use a new fold_left_plus_optab if available, otherwise they fall back to elementwise additions or subtractions. The vect_force_simple_reduction change makes it easier for parloops to read the type of reduction. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (fold_left_plus_optab): New optab. * doc/md.texi (fold_left_plus_@var{m}): Document. * internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function. * internal-fn.c (fold_left_direct): Define. (expand_fold_left_optab_fn): Likewise. (direct_fold_left_optab_supported_p): Likewise. * fold-const-call.c (fold_const_fold_left): New function. (fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS. * tree-parloops.c (valid_reduction_p): New function. (gather_scalar_reductions): Use it. * tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type. (vect_finish_replace_stmt): Declare. * tree-vect-loop.c (fold_left_reduction_fn): New function. (needs_fold_left_reduction_p): New function, split out from... (vect_is_simple_reduction): ...here. Accept reductions that forbid reassociation, but give them type FOLD_LEFT_REDUCTION. (vect_force_simple_reduction): Also store the reduction type in the assignment's STMT_VINFO_REDUC_TYPE. (vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION. (merge_with_identity): New function. (vect_expand_fold_left): Likewise. (vectorize_fold_left_reduction): Likewise. (vectorizable_reduction): Handle FOLD_LEFT_REDUCTION. Leave the scalar phi in place for it. Check for target support and reject cases that would reassociate the operation. Defer the transform phase to vectorize_fold_left_reduction. * config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec. * config/aarch64/aarch64-sve.md (fold_left_plus_<mode>): New expander. (fold_left_plus_<mode>, pred_fold_left_plus_<mode>): New insns. gcc/testsuite/ * gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and check for a message about using in-order reductions. * gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and check for a message about using in-order reductions. * gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be vectorized and check for a message about using in-order reductions. Expect targets with variable-length vectors to fall back to the fixed-length mininum. * gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and check for a message about using in-order reductions. * gcc.dg/vect/vect-reduc-in-order-1.c: New test. * gcc.dg/vect/vect-reduc-in-order-2.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-3.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-4.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_1.c: New test. * gcc.target/aarch64/sve/reduc_strict_1_run.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_2.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_2_run.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_3.c: Likewise. * gcc.target/aarch64/sve/slp_13.c: Add floating-point types. * gfortran.dg/vect/vect-8.f90: Expect 22 loops to be vectorized if vect_fold_left_plus. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256639	2018-01-13 18:01:24 +00:00
Richard Sandiford	b89fa419ca	Remove unnecessary temporary in tree-if-conv.c The call to ifc_temp_var in predicate_mem_writes become redundant in r230099. Before that point the mask was calculated using fold_build_s, but now it's calculated by gimple_build and so is already a valid gimple value. As it stands, the call forces an SSA_NAME-to-SSA_NAME copy to be created, whereas SLP expects that such redundant copies have already been eliminated. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> gcc/ tree-if-conv.c (predicate_mem_writes): Remove redundant call to ifc_temp_var. From-SVN: r256638	2018-01-13 18:01:14 +00:00
Richard Sandiford	9005477f25	Rework the legitimize_address_displacement hook This patch: - tweaks the handling of legitimize_address_displacement so that it gets called before rather than after the address has been expanded. This means that we're no longer at the mercy of LRA being able to interpret the expanded instructions. - passes the original offset to legitimize_address_displacement. - adds SVE support to the AArch64 implementation of legitimize_address_displacement. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * target.def (legitimize_address_displacement): Take the original offset as a poly_int. * targhooks.h (default_legitimize_address_displacement): Update accordingly. * targhooks.c (default_legitimize_address_displacement): Likewise. * doc/tm.texi: Regenerate. * lra-constraints.c (base_plus_disp_to_reg): Take the displacement as an argument, moving assert of ad->disp == ad->disp_term to... (process_address_1): ...here. Update calls to base_plus_disp_to_reg. Try calling targetm.legitimize_address_displacement before expanding the address rather than afterwards, and adjust for the new interface. * config/aarch64/aarch64.c (aarch64_legitimize_address_displacement): Match the new hook interface. Handle SVE addresses. * config/sh/sh.c (sh_legitimize_address_displacement): Make the new hook interface. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256637	2018-01-13 18:00:59 +00:00
Richard Sandiford	5cce817119	Add an "early rematerialisation" pass This patch looks for pseudo registers that are live across a call and for which no call-preserved hard registers exist. It then recomputes the pseudos as necessary to ensure that they are no longer live across a call. The comment at the head of the file describes the approach. A new target hook selects which modes should be treated in this way. By default none are, in which case the pass is skipped very early. It might also be worth looking for cases like: C1: R1 := f (...) ... C2: R2 := f (...) C3: R1 := C2 and giving the same value number to C1 and C3, effectively treating it like: C1: R1 := f (...) ... C2: R2 := f (...) C3: R1 := f (...) Another (much more expensive) enhancement would be to apply value numbering to all pseudo registers (not just rematerialisation candidates), so that we can handle things like: C1: R1 := f (...R2...) ... C2: R1 := f (...R3...) where R2 and R3 hold the same value. But the current pass seems to catch the vast majority of cases. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * Makefile.in (OBJS): Add early-remat.o. * target.def (select_early_remat_modes): New hook. * doc/tm.texi.in (TARGET_SELECT_EARLY_REMAT_MODES): New hook. * doc/tm.texi: Regenerate. * targhooks.h (default_select_early_remat_modes): Declare. * targhooks.c (default_select_early_remat_modes): New function. * timevar.def (TV_EARLY_REMAT): New timevar. * passes.def (pass_early_remat): New pass. * tree-pass.h (make_pass_early_remat): Declare. * early-remat.c: New file. * config/aarch64/aarch64.c (aarch64_select_early_remat_modes): New function. (TARGET_SELECT_EARLY_REMAT_MODES): Define. gcc/testsuite/ * gcc.target/aarch64/sve/spill_1.c: Also test that no predicates are spilled. * gcc.target/aarch64/sve/spill_2.c: New test. * gcc.target/aarch64/sve/spill_3.c: Likewise. * gcc.target/aarch64/sve/spill_4.c: Likewise. * gcc.target/aarch64/sve/spill_5.c: Likewise. * gcc.target/aarch64/sve/spill_6.c: Likewise. * gcc.target/aarch64/sve/spill_7.c: Likewise. From-SVN: r256636	2018-01-13 18:00:51 +00:00
Richard Sandiford	d1d20a49a7	Use single-iteration epilogues when peeling for gaps This patch adds support for fully-masking loops that require peeling for gaps. It peels exactly one scalar iteration and uses the masked loop to handle the rest. Previously we would fall back on using a standard unmasked loop instead. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Replace vfm1 with a bound_epilog parameter. (vect_do_peeling): Update calls accordingly, and move the prologue call earlier in the function. Treat the base bound_epilog as 0 for fully-masked loops and retain vf - 1 for other loops. Add 1 to this base when peeling for gaps. * tree-vect-loop.c (vect_analyze_loop_2): Allow peeling for gaps with fully-masked loops. (vect_estimate_min_profitable_iters): Handle the single peeled iteration in that case. gcc/testsuite/ * gcc.target/aarch64/sve/struct_vect_18.c: Check the number of branches. * gcc.target/aarch64/sve/struct_vect_19.c: Likewise. * gcc.target/aarch64/sve/struct_vect_20.c: New test. * gcc.target/aarch64/sve/struct_vect_20_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_21.c: Likewise. * gcc.target/aarch64/sve/struct_vect_21_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_22.c: Likewise. * gcc.target/aarch64/sve/struct_vect_22_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_23.c: Likewise. * gcc.target/aarch64/sve/struct_vect_23_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256635	2018-01-13 18:00:41 +00:00
Richard Sandiford	4aa157e8d2	Allow single-element interleaving for non-power-of-2 strides This allows LD3 to be used for isolated a[i * 3] accesses, in a similar way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively. Given the problems with the cost model underestimating the cost of elementwise accesses, the patch continues to reject the VMAT_ELEMENTWISE cases that are currently rejected. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-data-refs.c (vect_analyze_group_access_1): Allow single-element interleaving even if the size is not a power of 2. * tree-vect-stmts.c (get_load_store_type): Disallow elementwise accesses for single-element interleaving if the group size is not a power of 2. gcc/testsuite/ * gcc.target/aarch64/sve/struct_vect_18.c: New test. * gcc.target/aarch64/sve/struct_vect_18_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_19.c: Likewise. * gcc.target/aarch64/sve/struct_vect_19_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256634	2018-01-13 18:00:31 +00:00
Richard Sandiford	bb6c2b68d6	Add support for conditional reductions using SVE CLASTB This patch uses SVE CLASTB to optimise conditional reductions. It means that we no longer need to maintain a separate index vector to record the most recent valid value, and no longer need to worry about overflow cases. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (fold_extract_last_@var{m}): Document. * doc/sourcebuild.texi (vect_fold_extract_last): Likewise. * optabs.def (fold_extract_last_optab): New optab. * internal-fn.def (FOLD_EXTRACT_LAST): New internal function. * internal-fn.c (fold_extract_direct): New macro. (expand_fold_extract_optab_fn): Likewise. (direct_fold_extract_optab_supported_p): Likewise. * tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type. * tree-vect-loop.c (vect_model_reduction_cost): Handle EXTRACT_LAST_REDUCTION. (get_initial_def_for_reduction): Do not create an initial vector for EXTRACT_LAST_REDUCTION reductions. (vectorizable_reduction): Leave the scalar phi in place for EXTRACT_LAST_REDUCTIONs. Try using EXTRACT_LAST_REDUCTION ahead of INTEGER_INDUC_COND_REDUCTION. Do not check for an epilogue code for EXTRACT_LAST_REDUCTION and defer the transform phase to vectorizable_condition. * tree-vect-stmts.c (vect_finish_stmt_generation_1): New function, split out from... (vect_finish_stmt_generation): ...here. (vect_finish_replace_stmt): New function. (vectorizable_condition): Handle EXTRACT_LAST_REDUCTION. * config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New pattern. * config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_fold_extract_last): New proc. * gcc.dg/vect/pr65947-1.c: Update dump messages. Add markup for fold_extract_last. * gcc.dg/vect/pr65947-2.c: Likewise. * gcc.dg/vect/pr65947-3.c: Likewise. * gcc.dg/vect/pr65947-4.c: Likewise. * gcc.dg/vect/pr65947-5.c: Likewise. * gcc.dg/vect/pr65947-6.c: Likewise. * gcc.dg/vect/pr65947-9.c: Likewise. * gcc.dg/vect/pr65947-10.c: Likewise. * gcc.dg/vect/pr65947-12.c: Likewise. * gcc.dg/vect/pr65947-14.c: Likewise. * gcc.dg/vect/pr80631-1.c: Likewise. * gcc.target/aarch64/sve/clastb_1.c: New test. * gcc.target/aarch64/sve/clastb_1_run.c: Likewise. * gcc.target/aarch64/sve/clastb_2.c: Likewise. * gcc.target/aarch64/sve/clastb_2_run.c: Likewise. * gcc.target/aarch64/sve/clastb_3.c: Likewise. * gcc.target/aarch64/sve/clastb_3_run.c: Likewise. * gcc.target/aarch64/sve/clastb_4.c: Likewise. * gcc.target/aarch64/sve/clastb_4_run.c: Likewise. * gcc.target/aarch64/sve/clastb_5.c: Likewise. * gcc.target/aarch64/sve/clastb_5_run.c: Likewise. * gcc.target/aarch64/sve/clastb_6.c: Likewise. * gcc.target/aarch64/sve/clastb_6_run.c: Likewise. * gcc.target/aarch64/sve/clastb_7.c: Likewise. * gcc.target/aarch64/sve/clastb_7_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256633	2018-01-13 17:59:59 +00:00
Richard Sandiford	bfe1bb57ba	Add support for vectorising live-out values using SVE LASTB This patch uses the SVE LASTB instruction to optimise cases in which a value produced by the final scalar iteration of a vectorised loop is live outside the loop. Previously this situation would stop us from using a fully-masked loop. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (extract_last_@var{m}): Document. * optabs.def (extract_last_optab): New optab. * internal-fn.def (EXTRACT_LAST): New internal function. * internal-fn.c (cond_unary_direct): New macro. (expand_cond_unary_optab_fn): Likewise. (direct_cond_unary_optab_supported_p): Likewise. * tree-vect-loop.c (vectorizable_live_operation): Allow fully-masked loops using EXTRACT_LAST. * config/aarch64/aarch64-sve.md (aarch64_sve_lastb<mode>): Rename to... (extract_last_<mode>): ...this optab. (vec_extract<mode><Vel>): Update accordingly. gcc/testsuite/ * gcc.target/aarch64/sve/live_1.c: New test. * gcc.target/aarch64/sve/live_1_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256632	2018-01-13 17:59:50 +00:00
Richard Sandiford	76a34e3f85	Add an empty_mask_is_expensive hook This patch adds a hook to control whether we avoid executing masked (predicated) stores when the mask is all false. We don't want to do that by default for SVE. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * target.def (empty_mask_is_expensive): New hook. * doc/tm.texi.in (TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): New hook. * doc/tm.texi: Regenerate. * targhooks.h (default_empty_mask_is_expensive): Declare. * targhooks.c (default_empty_mask_is_expensive): New function. * tree-vectorizer.c (vectorize_loops): Only call optimize_mask_stores if the target says that empty masks are expensive. * config/aarch64/aarch64.c (aarch64_empty_mask_is_expensive): New function. (TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Redefine. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256631	2018-01-13 17:59:40 +00:00

1 2 3 4 5 ...

159034 Commits