GCC expands switch statements in a very simplistic way and tries to use a table
expansion even when it is a bad idea for performance or codesize.
GCC typically emits extremely sparse tables that contain mostly default entries
(something which currently cannot be tuned by backends). Additionally the
computation of the minimum/maximum label offsets is too simplistic so the
tables are often twice as large as necessary.
The cost of a table switch is significant due to the setup overhead, the table
lookup (which due to being sparse and large adds unnecessary cachemisses)
and hard to predict indirect jump. Therefore it is best to avoid using a table
unless there are many real case labels.
This patch fixes that by setting the default aarch64_case_values_threshold to
16 when the per-CPU tuning is not set. On SPEC2006 this improves the switch
heavy benchmarks GCC and perlbench both in performance (1-2%) as well as size
(0.5-1% smaller).
gcc/
* config/aarch64/aarch64.c (aarch64_case_values_threshold):
Return a better case_values_threshold when optimizing.
From-SVN: r236771
so increase the cost of integer registers slightly to avoid unnecessary int<->FP
moves. This improves register allocation of scalar SIMD operations.
* config/aarch64/aarch64-simd.md (aarch64_combinez):
Add ? to integer variant.
(aarch64_combinez_be): Likewise.
From-SVN: r236770
libgomp/
2016-05-26 Chung-Lin Tang <cltang@codesourcery.com>
* target.c (gomp_device_copy): New function.
(gomp_copy_host2dev): Likewise.
(gomp_copy_dev2host): Likewise.
(gomp_free_device_memory): Likewise.
(gomp_map_vars_existing): Adjust to call gomp_copy_host2dev.
(gomp_map_pointer): Likewise.
(gomp_map_vars): Adjust to call gomp_copy_host2dev, handle
NULL value from alloc_func plugin hook.
(gomp_unmap_tgt): Adjust to call gomp_free_device_memory.
(gomp_copy_from_async): Adjust to call gomp_copy_dev2host.
(gomp_unmap_vars): Likewise.
(gomp_update): Adjust to call gomp_copy_dev2host and
gomp_copy_host2dev functions.
(gomp_unload_image_from_device): Handle false value from
unload_image_func plugin hook.
(gomp_init_device): Handle false value from init_device_func
plugin hook.
(gomp_exit_data): Adjust to call gomp_copy_dev2host.
(omp_target_free): Adjust to call gomp_free_device_memory.
(omp_target_memcpy): Handle return values from host2dev_func,
dev2host_func, and dev2dev_func plugin hooks.
(omp_target_memcpy_rect_worker): Likewise.
(gomp_target_fini): Handle false value from fini_device_func
plugin hook.
* libgomp.h (struct gomp_device_descr): Adjust return type of
init_device_func, fini_device_func, unload_image_func, free_func,
dev2host_func,host2dev_func, and dev2dev_func plugin hooks to 'bool'.
* oacc-init.c (acc_shutdown_1): Handle false value from
fini_device_func plugin hook.
* oacc-host.c (host_init_device): Change return type to bool.
(host_fini_device): Likewise.
(host_unload_image): Likewise.
(host_free): Likewise.
(host_dev2host): Likewise.
(host_host2dev): Likewise.
* oacc-mem.c (acc_free): Handle plugin hook fatal error case.
(acc_memcpy_to_device): Likewise.
(acc_memcpy_from_device): Likewise.
(delete_copyout): Add libfnname parameter, handle free_func
hook fatal error case.
(acc_delete): Adjust delete_copyout call.
(acc_copyout): Likewise.
(update_dev_host): Move gomp_mutex_unlock to after
host2dev/dev2host hook calls.
* plugin/plugin-hsa.c (hsa_warn): Adjust 'hsa_error' local variable
to 'hsa_error_msg', for clarity.
(hsa_fatal): Likewise.
(hsa_error): New function.
(init_hsa_context): Change return type to bool, adjust to return
false on error.
(GOMP_OFFLOAD_get_num_devices): Adjust to handle init_hsa_context
return value.
(GOMP_OFFLOAD_init_device): Change return type to bool, adjust to
return false on error.
(get_agent_info): Adjust to return NULL on error.
(destroy_hsa_program): Change return type to bool, adjust to
return false on error.
(GOMP_OFFLOAD_load_image): Adjust to return -1 on error.
(destroy_module): Change return type to bool, adjust to
return false on error.
(GOMP_OFFLOAD_unload_image): Likewise.
(GOMP_OFFLOAD_fini_device): Likewise.
(GOMP_OFFLOAD_alloc): Change to return NULL when called.
(GOMP_OFFLOAD_free): Change to return false when called.
(GOMP_OFFLOAD_dev2host): Likewise.
(GOMP_OFFLOAD_host2dev): Likewise.
(GOMP_OFFLOAD_dev2dev): Likewise.
* plugin/plugin-nvptx.c (CUDA_CALL_ERET): New convenience macro.
(CUDA_CALL): Likewise.
(CUDA_CALL_ASSERT): Likewise.
(map_init): Change return type to bool, use CUDA_CALL* macros.
(map_fini): Likewise.
(init_streams_for_device): Change return type to bool, adjust
call to map_init.
(fini_streams_for_device): Change return type to bool, adjust
call to map_fini.
(select_stream_for_async): Release stream_lock before calls to
GOMP_PLUGIN_fatal, adjust call to map_init.
(nvptx_init): Use CUDA_CALL* macros.
(nvptx_attach_host_thread_to_device): Change return type to bool,
use CUDA_CALL* macros.
(nvptx_open_device): Use CUDA_CALL* macros.
(nvptx_close_device): Change return type to bool, use CUDA_CALL*
macros.
(nvptx_get_num_devices): Use CUDA_CALL* macros.
(link_ptx): Change return type to bool, use CUDA_CALL* macros.
(nvptx_exec): Use CUDA_CALL* macros.
(nvptx_alloc): Use CUDA_CALL* macros.
(nvptx_free): Change return type to bool, use CUDA_CALL* macros.
(nvptx_host2dev): Likewise.
(nvptx_dev2host): Likewise.
(nvptx_wait): Use CUDA_CALL* macros.
(nvptx_wait_async): Likewise.
(nvptx_wait_all): Likewise.
(nvptx_wait_all_async): Likewise.
(nvptx_set_cuda_stream): Adjust order of stream_lock acquire,
use CUDA_CALL* macros, adjust call to map_fini.
(GOMP_OFFLOAD_init_device): Change return type to bool,
adjust code accordingly.
(GOMP_OFFLOAD_fini_device): Likewise.
(GOMP_OFFLOAD_load_image): Adjust calls to
nvptx_attach_host_thread_to_device/link_ptx to handle errors,
use CUDA_CALL* macros.
(GOMP_OFFLOAD_unload_image): Change return type to bool, adjust
return code.
(GOMP_OFFLOAD_alloc): Adjust calls to code to handle error return.
(GOMP_OFFLOAD_free): Change return type to bool, adjust calls to
handle error return.
(GOMP_OFFLOAD_dev2host): Likewise.
(GOMP_OFFLOAD_host2dev): Likewise.
(GOMP_OFFLOAD_openacc_register_async_cleanup): Use CUDA_CALL* macros.
(GOMP_OFFLOAD_openacc_create_thread_data): Likewise.
liboffloadmic/
2016-05-26 Chung-Lin Tang <cltang@codesourcery.com>
* plugin/libgomp-plugin-intelmic.cpp (offload): Change return type
to bool, adjust return code.
(GOMP_OFFLOAD_init_device): Likewise.
(GOMP_OFFLOAD_fini_device): Likewise.
(get_target_table): Likewise.
(offload_image): Likwise.
(GOMP_OFFLOAD_load_image): Adjust call to offload_image(), change
to return -1 on error.
(GOMP_OFFLOAD_unload_image): Change return type to bool, adjust return
code.
(GOMP_OFFLOAD_alloc): Likewise.
(GOMP_OFFLOAD_free): Likewise.
(GOMP_OFFLOAD_host2dev): Likewise.
(GOMP_OFFLOAD_dev2host): Likewise.
(GOMP_OFFLOAD_dev2dev): Likewise.
From-SVN: r236768
2016-05-26 Chung-Lin Tang <cltang@codesourcery.com>
include/
* gomp-constants.h (GOMP_VERSION): Increment to 1, add comment to
describe the need for incrementing this macro whenever the plugin
interface is modified.
From-SVN: r236766
* config/i386/sse.md (*vcvtps2ph_store<mask_name>): Use v constraint
instead of x constraint.
(vcvtps2ph256<mask_name>): Likewise.
* gcc.target/i386/avx512vl-vcvtps2ph-3.c: New test.
From-SVN: r236765
(<mask_codefor>avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Rename
to ...
(avx512vl_shuf_<shuffletype>32x4_1<mask_name>): ... this.
(*avx_vperm_broadcast_v4sf): Use v constraint instead of x. Use
maybe_evex prefix instead of vex.
(*avx_vperm_broadcast_<mode>): Use v constraint instead of x. Handle
EXT_REX_SSE_REG_P (op0) case in the splitter.
* gcc.target/i386/avx512vl-vbroadcast-3.c: New test.
From-SVN: r236763
This patch adds support for the vec_cmpne altivec builtins from the Power
Architecture 64-Bit ELF V2 ABI OpenPOWER ABI for Linux Supplement (16 July
2015 Version 1.1). There are many of the builtins that are missing and this
is part of a series of patches to add them.
There aren't instructions for vec_cmpne so the output code is built from other
built-ins that do have instructions which in this case is the following.
vec_cmpneq (va, vb) == vec_nor (vec_cmpeq (va, vb), vec_cmpeq (va, vb))
The new test cases are executable tests which verify that the generated
code produces expected values. C macros were used so that the same
test case could be used for both the signed and unsigned versions of various
basic types. A separate executable test case is used for the long long versions
of vec_cmpne because of some differences in loading and storing the vectors.
[gcc]
2016-05-25 Bill Seurer <seurer@linux.vnet.ibm.com>
* config/rs6000/altivec.h (vec_cmpne): Add #define for vec_cmpne.
* config/rs6000/rs6000-builtin.def (vec_cmpne): Add vec_cmpne as a
special case builtin.
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Add
code for ALTIVEC_BUILTIN_VEC_CMPNE.
* config/rs6000/rs6000.c (altivec_init_builtins): Add definition
for __builtin_vec_cmpne.
[gcc/testsuite]
2016-05-25 Bill Seurer <seurer@linux.vnet.ibm.com>
* gcc.target/powerpc/vec-cmpne.c: New test.
* gcc.target/powerpc/vec-cmpne-long.c: New test.
From-SVN: r236753
* tree-ssa-phiopt.c (factor_out_conditional_conversion): Remove
redundant test and bail out if the type of the new operand is not
a GIMPLE register type after stripping a VIEW_CONVERT_EXPR.
From-SVN: r236748
PR rtl-optimization/66940
* ifcvt.c (noce_get_alt_condition): Check that incrementing or
decrementing desired_val will not overflow before performing these
operations.
* gcc.c-torture/execute/pr66940.c: New test.
From-SVN: r236728
* config/msp430/msp430.c (msp430_attr): Produce an error if a
static interrupt handler is detected.
* config/msp430/msp430.h (LIB_SPEC): Do not use msp430.ld as the
default linker script.
* config/msp430/msp430.md (movpsihi2_lo): New pattern for loading
the low part of a symbolic pointer.
From-SVN: r236704
2016-05-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/71261
* tree-if-conv.c (ifcvt_split_def_stmt): Walk uses on the
interesting stmt instead of immediate uses when looking
for the use operand to replace.
* c-c++-common/torture/pr71261.c: New testcase.
From-SVN: r236701
2016-05-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/71264
* tree-vect-stmts.c (vect_init_vector): Properly deal with
vector type val.
* gcc.dg/vect/pr71264.c: New testcase.
From-SVN: r236699
Fix PR tree-optimization/71239.
* g++.dg/pr71239.C: New test.
PR tree-optimization/71239
* tree.c (array_at_struct_end_p): Do not call operand_equal_p
if DECL_SIZE is NULL.
From-SVN: r236696
2016-05-25 Richard Biener <rguenther@suse.de>
* timevar.def (TV_TREE_LOOP_IFCVT): Add.
* tree-if-conv.c (pass_data_if_conversion): Use it.
From-SVN: r236695
[gcc]
2016-05-24 Michael Meissner <meissner@linux.vnet.ibm.com>
* config/rs6000/altivec.md (VNEG iterator): New iterator for
VNEGW/VNEGD instructions.
(p9_neg<mode>2): New insns for ISA 3.0 VNEGW/VNEGD.
(neg<mode>2): Add expander for V2DImode added in ISA 2.06, and
support for ISA 3.0 VNEGW/VNEGD instructions.
[gcc/testsuite]
2016-05-24 Michael Meissner <meissner@linux.vnet.ibm.com>
* gcc.target/powerpc/p9-vneg.c: New test for ISA 3.0 VNEGW/VNEGD
instructions.
From-SVN: r236679
gcc/ChangeLog:
2016-05-24 Martin Sebor <msebor@redhat.com>
PR c++/71147
* gcc/tree.h (complete_or_array_type_p): New inline function.
gcc/testsuite/ChangeLog:
2016-05-24 Martin Sebor <msebor@redhat.com>
PR c++/71147
* g++.dg/ext/flexary16.C: New test.
gcc/cp/ChangeLog:
2016-05-24 Martin Sebor <msebor@redhat.com>
PR c++/71147
* decl.c (layout_var_decl, grokdeclarator): Use complete_or_array_type_p.
* pt.c (instantiate_class_template_1): Try to complete the element
type of a flexible array member.
(can_complete_type_without_circularity): Handle arrays of unknown bound.
* typeck.c (complete_type): Also complete the type of the elements of
arrays with an unspecified bound.
From-SVN: r236664
* config/i386/i386.h (TARGET_AVOID_4BYTE_PREFIXES): Define.
* config/i386/constraints.md (Yr): Test TARGET_AVOID_4BYTE_PREFIXES
rather than X86_TUNE_AVOID_4BYTE_PREFIXES.
From-SVN: r236662