Cuda driver api functions cuLinkAddData and cuLinkCreate are available starting
version 5.5. In version 6.5, they are remapped onto _v2 versions.
The dlopen interface of the libgomp nvptx plugin uses the _v2 versions, so it
won't work with a cuda driver with driver api version lower than 6.5.
This patch fixes the problem by testing for the presence of the _v2 versions,
and falling back to the original versions in case of absence of the _v2
versions.
Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and
without --without-cuda-driver.
2018-08-08 Tom de Vries <tdevries@suse.de>
* plugin/cuda-lib.def (cuLinkAddData_v2, cuLinkCreate_v2): Declare using
CUDA_ONE_CALL_MAYBE_NULL.
* plugin/plugin-nvptx.c (cuLinkAddData, cuLinkCreate): Undef and declare.
(cuLinkAddData_v2, cuLinkCreate_v2): Declare.
(link_ptx): Fall back to cuLinkAddData/cuLinkCreate if the _v2 versions
are not found.
From-SVN: r263408
Cuda driver api function cuGetErrorString is available in version 6.0 and
higher.
Currently, when the driver that is used does not contain this function, the
libgomp nvptx plugin will not build (PLUGIN_NVPTX_DYNAMIC == 0) or run
(PLUGIN_NVPTX_DYNAMIC == 1).
This patch fixes this problem by testing for the presence of the function, and
handling absence.
Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and
without --without-cuda-driver.
2018-08-08 Tom de Vries <tdevries@suse.de>
* plugin/cuda-lib.def (cuGetErrorString): Use CUDA_ONE_CALL_MAYBE_NULL.
* plugin/plugin-nvptx.c (cuda_error): Handle if cuGetErrorString is not
present.
From-SVN: r263407
CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR is defined in cuda driver
api version 6.0 and higher.
Currently nvptx_open_device uses a hard-coded constant instead.
This patch fixes that by:
- defining CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR to the hardcoded
constant at toplevel, if not present in cuda.h, and
- using CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR in nvptx_open_device
Build on x86_64 with nvptx accelerator and reg-tested libgomp.
2018-08-08 Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c
(CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR): Define.
(nvptx_open_device): Use
CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR.
From-SVN: r263406
Cuda driver api function cuGetErrorString is available in version 6.0 and
higher.
This patch:
- removes a comment saying the declaration is not available in cuda.h 6.0
- fixes the presence test to use CUDA_VERSION < 6000
- moves the declaration to toplevel
Build on x86_64 with nvptx accelerator and reg-tested libgomp.
2018-08-08 Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (cuda_error): Move declaration of cuGetErrorString ...
(cuGetErrorString): ... here. Guard with CUDA_VERSION < 6000.
From-SVN: r263405
HXT semiconductor's CPU core Phecda, as a variant of Qualcomm qdf24xx,
reuses the same tuning structure and pipeline with it.
Applied on behalf of: Hongbo Zhang <hongbo.zhang@linaro.org>
* config/aarch64/aarch64-cores.def: Add phecda core.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Add phecda core.
From-SVN: r263404
PR libstdc++/86597
* include/bits/fs_dir.h (directory_entry::_M_file_type(error_code&)):
Clear error_code when cached type is used.
* testsuite/27_io/filesystem/directory_entry/86597.cc: New test.
From-SVN: r263397
TARGET_CPU_ZARCH allowed to distinguish between g5/g6 and newer
machines. Since the former are now gone, we can assume that
TARGET_CPU_ZARCH is always true. As a side-effect, branch splitting
is now completely gone. Some parts of literal pool splitting are also
gone, but it's still there: we need to support it because floating
point and vector instructions still cannot use relative addressing.
gcc/ChangeLog:
2018-08-08 Ilya Leoshkevich <iii@linux.ibm.com>
* config/s390/s390.c (s390_loadrelative_operand_p):
Remove TARGET_CPU_ZARCH usages.
(s390_rtx_costs): Likewise.
(s390_legitimate_constant_p): Likewise.
(s390_cannot_force_const_mem): Likewise.
(legitimate_reload_constant_p): Likewise.
(s390_preferred_reload_class): Likewise.
(legitimize_pic_address): Likewise.
(legitimize_tls_address): Likewise.
(s390_split_branches): Removed.
(s390_add_execute): Removed.
(s390_dump_pool): Remove TARGET_CPU_ZARCH usages.
(s390_mainpool_start): Likewise.
(s390_mainpool_finish): Likewise.
(s390_mainpool_cancel): Removed.
(s390_chunkify_start): Remove TARGET_CPU_ZARCH usages.
(s390_chunkify_cancel): Likewise.
(s390_return_addr_rtx): Likewise.
(s390_register_info): Remove split_branches_pending_p uages.
(s390_optimize_register_info): Likewise.
(s390_init_frame_layout): Remove TARGET_CPU_ZARCH and
split_branches_pending_p usages.
(s390_can_eliminate): Remove TARGET_CPU_ZARCH usages.
(s390_load_got): Likewise.
(s390_expand_split_stack_prologue): Likewise.
(output_asm_nops): Likewise.
(s390_function_profiler): Likewise.
(s390_emit_call): Likewise.
(s390_conditional_register_usage): Likewise.
(s390_optimize_prologue): Likewise.
(s390_reorg): Remove TARGET_CPU_ZARCH and
split_branches_pending_p usages.
(s390_option_override_internal): Remove TARGET_CPU_ZARCH
usages.
(s390_output_indirect_thunk_function): Likewise.
* config/s390/s390.h (TARGET_CPU_ZARCH): Removed.
(TARGET_CPU_ZARCH_P): Removed.
(struct machine_function): Remove split_branches_pending_p.
* config/s390/s390.md: Remove TARGET_CPU_ZARCH usages.
From-SVN: r263394
P0595R1 - is_constant_evaluated
cp/
* cp-tree.h (enum cp_built_in_function): New.
(maybe_constant_init): Add pretend_const_required argument.
* typeck2.c (store_init_value): Pass true as new argument to
maybe_constant_init.
* constexpr.c (constexpr_fn_retval): Check also DECL_BUILT_IN_CLASS
for BUILT_IN_UNREACHABLE.
(struct constexpr_ctx): Add pretend_const_required field.
(cxx_eval_builtin_function_call): Use DECL_IS_BUILTIN_CONSTANT_P
macro. Handle CP_BUILT_IN_IS_CONSTANT_EVALUATED. Check also
DECL_BUILT_IN_CLASS for BUILT_IN_UNREACHABLE.
(cxx_eval_outermost_constant_expr): Add pretend_const_required
argument, initialize pretend_const_required field in ctx. If the
result is TREE_CONSTANT and non_constant_p, retry with
pretend_const_required false if it was true.
(is_sub_constant_expr): Initialize pretend_const_required_field in
ctx.
(cxx_constant_value): Pass true as pretend_const_required to
cxx_eval_outermost_constant_expr.
(maybe_constant_value): Pass false as pretend_const_required to
cxx_eval_outermost_constant_expr.
(fold_non_dependent_expr): Likewise.
(maybe_constant_init_1): Add pretend_const_required argument, pass it
down to cxx_eval_outermost_constant_expr. Pass !allow_non_constant
instead of false as strict to cxx_eval_outermost_constant_expr.
(maybe_constant_init): Add pretend_const_required argument, pass it
down to maybe_constant_init_1.
(cxx_constant_init): Pass true as pretend_const_required to
maybe_constant_init_1.
* cp-gimplify.c (cp_gimplify_expr): Handle CALL_EXPRs to
CP_BUILT_IN_IS_CONSTANT_EVALUATED.
(cp_fold): Don't fold CP_BUILT_IN_IS_CONSTANT_EVALUATED calls.
* decl.c: Include langhooks.h.
(cxx_init_decl_processing): Register __builtin_is_constant_evaluated
built-in.
* tree.c (builtin_valid_in_constant_expr_p): Return true for
CP_BUILT_IN_IS_CONSTANT_EVALUATED.
* pt.c (declare_integer_pack): Initialize DECL_FUNCTION_CODE.
testsuite/
* g++.dg/cpp2a/is-constant-evaluated1.C: New test.
From-SVN: r263392
PR c++/86836
* pt.c (tsubst_expr): For structured bindings, call tsubst_decomp_names
before tsubst_init, not after it.
* g++.dg/cpp1z/decomp46.C: New test.
From-SVN: r263391
PR c++/86738
* constexpr.c (cxx_eval_binary_expression): For arithmetics involving
NULL pointer set *non_constant_p to true.
(cxx_eval_component_reference): For dereferencing of a NULL pointer,
set *non_constant_p to true and return t.
* g++.dg/opt/pr86738.C: New test.
From-SVN: r263390
The adjusted vector costs give Falkor a reasonable boost in performance for FP
benchmarks (both CPU2017 and CPU2006) and doesn't change INT benchmarks that
much. There are some regressions that will be investigated as follow on work.
Numbers from the CI run:
CPU2017:
(R) 605.mcf_s: -1.8%
(R) 620.omnetpp_s: -2%
623.xalancbmk_s: 2%
654.roms_s: 7%
(R) INT mean: -0.09%
FP mean: 0.70%
CPU2006:
(R) 429.mc: -5%
(R) 471.omnetpp: -9.5% (potentially noise/fluctuations)
483.xalancbmk: 6.02%
410.bwaves: 5.03%
433.milc: 2%
434.zeusmp: 10.5%
(R) 436.cactusADM: -12.75%
437.leslie3d: 5.94%
(R) 453.povray: -0.82%
459.GemsFDTD: 16.87%
465.tonto: 1%
(R) INT mean: -0.79%
FP mean: 1.54%
gcc/ChangeLog:
2018-08-08 Luis Machado <luis.machado@linaro.org>
* config/aarch64/aarch64.c (qdf24xx_vector_cost): New static global.
(qdf24xx_tunings): Set vector cost structure to qdf24xx_vector_cost.
From-SVN: r263389
Adjust Falkor's register_sextend cost from 4 to 3. This fixes a testsuite
failure in gcc.target/aarch64/extend.c:ldr_sxtw where GCC was generating
a sbfiz instruction rather than a load with sign extension.
No performance changes.
gcc/ChangeLog:
2018-08-08 Luis Machado <luis.machado@linaro.org>
* config/aarch64/aarch64.c (qdf24xx_addrcost_table)
<register_sextend>: Set to 3.
From-SVN: r263388
PR libstdc++/86874
* include/std/variant (_Copy_ctor_base::_M_destructive_move): Define
here instead of in _Move_assign_base.
(_Copy_ctor_base<true, _Types...>::_M_destructive_move): Define.
(_Copy_assign_base::operator=): Use _M_destructive_move when changing
the contained value to another alternative.
(_Move_assign_base::operator=): Likewise.
(_Move_assign_base::_M_destructive_move): Remove.
* testsuite/20_util/variant/86874.cc: New test.
From-SVN: r263365
Fix up the testing package to insure that execution traces
work properly (e.g. "-test.trace=<XXX>" command line option). The
call to stop tracing and emit the output file was stubbed out.
Reviewed-on: https://go-review.googlesource.com/128275
From-SVN: r263363
The "@" handling broke -mlow-precision-div, because the scalar forms of
the instruction were provided by a pattern that also provided FRECPX
(and so were parameterised on an unspec code as well as a mode),
while the SIMD versions had a dedicated FRECPE pattern. This patch
moves the scalar FRECPE handling to the SIMD pattern too (as for FRECPS)
and uses a separate pattern for FRECPX.
The convention in aarch64-simd-builtins.def seemed to be to add
comments only if the mapping wasn't obvious (i.e. not just sticking
"aarch64_" on the beginning and "<mode>" on the end), so the patch
deletes the reference to the combined pattern instead of rewording it.
There didn't seem to be any coverage of -mlow-precision-div in the
testsuite, so the patch adds some tests for it.
2018-08-07 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR target/86838
* config/aarch64/iterators.md (FRECP, frecp_suffix): Delete.
* config/aarch64/aarch64-simd.md
(aarch64_frecp<FRECP:frecp_suffix><mode>): Fold FRECPE into...
(@aarch64_frecpe<mode>): ...here and the move FRECPX to...
(aarch64_frecpx<mode>): ...this new pattern.
* config/aarch64/aarch64-simd-builtins.def: Remove comment
about aarch64_frecp<FRECP:frecp_suffix><mode>.
gcc/testsuite/
PR target/86838
* gcc.target/aarch64/frecpe_1.c: New test.
* gcc.target/aarch64/frecpe_2.c: Likewise.
From-SVN: r263362
Solaris memalign requires alignment to be at least sizeof(int), so
increase it as needed.
Also move the check for valid alignments from the fallback
implementation of aligned_alloc into operator new, as it's required for
all of aligned_alloc, memalign, posix_memalign and __aligned_malloc.
This adds a branch to check for undefined behaviour which we could just
ignore, so the check could just be removed. It should certainly be
removed if PR 86878 is implemented to issue a warning about calls with
invalid alignments.
PR libstdc++/86861
* libsupc++/new_opa.cc [_GLIBCXX_HAVE_MEMALIGN] (aligned_alloc):
Replace macro with inline function.
[__sun]: Increase alignment to meet memalign precondition.
[!HAVE__ALIGNED_MALLOC && !HAVE_POSIX_MEMALIGN && !HAVE_MEMALIGN]
(aligned_alloc): Move check for valid alignment to operator new.
Remove redundant check for non-zero size, it's enforced by the caller.
(operator new): Move check for valid alignment here. Use
__builtin_expect on check for zero size.
From-SVN: r263360
2018-08-07 Martin Liska <mliska@suse.cz>
PR middle-end/83023
* predict.c (expr_expected_value_1): Handle DECL_IS_MALLOC,
BUILT_IN_REALLOC and DECL_IS_OPERATOR_NEW.
* predict.def (PRED_MALLOC_NONNULL): New predictor.
* doc/extend.texi: Document that malloc attribute adds
hit to compiler.
2018-08-07 Martin Liska <mliska@suse.cz>
PR middle-end/83023
* gcc.dg/predict-16.c: New test.
* g++.dg/predict-1.C: New test.
From-SVN: r263355
Move the allocation logic into libstdc++.so so that it can be changed
without worrying about inlined code in existing binaries.
Leave do_allocate inline so that calls to it can be devirtualized, and
only the slow path needs to call into the library.
* config/abi/pre/gnu.ver: Export monotonic_buffer_resource members.
* include/std/memory_resource (monotonic_buffer_resource::release):
Call _M_release_buffers to free buffers.
(monotonic_buffer_resource::do_allocate): Call _M_new_buffer to
allocate a new buffer from upstream.
(monotonic_buffer_resource::_M_new_buffer): Declare.
(monotonic_buffer_resource::_M_release_buffers): Declare.
(monotonic_buffer_resource::_Chunk): Replace definition with
declaration as opaque type.
* src/c++17/memory_resource.cc (monotonic_buffer_resource::_Chunk):
Define.
(monotonic_buffer_resource::_M_new_buffer): Define.
(monotonic_buffer_resource::_M_release_buffers): Define.
From-SVN: r263354
This patch adds handling of functions that may not be present in the cuda
driver.
Such a function can be declared using CUDA_ONE_CALL_MAYBE_NULL in cuda-lib.def,
it can be called with the usual convenience macros, but before calling its
presence needs to be tested using new macro CUDA_CALL_EXISTS.
When using the dlopen interface (PLUGIN_NVPTX_DYNAMIC == 1), we allow
non-present functions by allowing dlsym to return NULL. Otherwise
(PLUGIN_NVPTX_DYNAMIC == 0) we declare the non-present function to be weak.
Build and reg-tested libgomp on x86_64 with nvidia accelerator, with and without
--disable-cuda-driver, in combination with a trigger patch that adds a
non-existing function foo to cuda-lib.def:
...
CUDA_ONE_CALL_MAYBE_NULL (foo)
...
and declares it in plugin-nvptx.c:
...
CUresult foo (void);
...
and then uses it in nvptx_init after the init_cuda_lib call:
...
if (CUDA_CALL_EXISTS (foo))
CUDA_CALL (foo);
...
Also build and reg-tested on x86_64 with nvidia accelerator, with and without
--disable-cuda-driver, in combination with a trigger patch that replaces all
CUDA_ONE_CALLs in cuda-lib.def with CUDA_ONE_CALL_MAYBE_NULL, and guards two
CUDA_CALLs with CUDA_CALL_EXISTS, one for a regular fn, and one for a fn that is
a define in cuda/cuda.h.
2018-08-07 Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (DO_PRAGMA): Define.
(struct cuda_lib_s): Add def/undef of CUDA_ONE_CALL_MAYBE_NULL.
(init_cuda_lib): Add new param to CUDA_ONE_CALL_1. Add arg to
corresponding call in CUDA_ONE_CALL. Add def/undef of
CUDA_ONE_CALL_MAYBE_NULL.
(CUDA_CALL_EXISTS): Define.
From-SVN: r263346
This patch makes sure that the lifetimes of the CUDA_ONE_CALL macro (which is
defined twice in plugin-nvptx.c) are minimized, to make it obvious that the
definitions are used only in the lib-cuda.def include.
Build on x86_64 with nvptx accelerator and reg-tested libgomp.
2018-08-07 Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (struct cuda_lib_s, init_cuda_lib): Put
CUDA_ONE_CALL defines right before the cuda-lib.def include, and the
corresponding undefs right after.
From-SVN: r263345
* tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Pass down
the vr_values instance to cprop_into_stmt.
(cprop_into_stmt): Pass vr_values instance down to cprop_operand.
(cprop_operand): Also query EVRP to determine if OP is a constant.
From-SVN: r263342
"make selftest-valgrind" shows:
187 bytes in 1 blocks are definitely lost in loss record 567 of 669
at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x1F08260: xcalloc (xmalloc.c:162)
by 0xB24F32: init_emit() (emit-rtl.c:5843)
by 0xC10080: prepare_function_start() (function.c:4803)
by 0xC10254: init_function_start(tree_node*) (function.c:4877)
by 0x1CDF92A: selftest::test_expansion_to_rtl() (function-tests.c:595)
by 0x1CE007C: selftest::function_tests_c_tests() (function-tests.c:676)
by 0x1E010E7: selftest::run_tests() (selftest-run-tests.c:98)
by 0x1062D1E: toplev::run_self_tests() (toplev.c:2225)
by 0x1062F40: toplev::main(int, char**) (toplev.c:2303)
by 0x1E5B90A: main (main.c:39)
The allocation in question is:
crtl->emit.regno_pointer_align
= XCNEWVEC (unsigned char, crtl->emit.regno_pointer_align_length);
This patch fixes this leak (and makes the output of
"make selftest-valgrind" clean) by calling free_after_compilation at the
end of the selftest in question.
gcc/ChangeLog:
* function-tests.c (selftest::test_expansion_to_rtl): Call
free_after_compilation.
From-SVN: r263339
gcc/ChangeLog:
2018-08-06 Andreas Krebbel <krebbel@linux.ibm.com>
* config/s390/s390.c (s390_loop_unroll_adjust): Prevent small
loops with memory block operations from getting unrolled.
gcc/testsuite/ChangeLog:
2018-08-06 Andreas Krebbel <krebbel@linux.ibm.com>
* gcc.target/s390/nomemloopunroll-1.c: New test.
From-SVN: r263336
The SPU processor is not affected by speculation, so this macro can
safely be defined as speculation_safe_value_not_needed.
gcc/ChangeLog:
PR target/86807
* config/spu/spu.c (TARGET_HAVE_SPECULATION_SAFE_VALUE):
Define to speculation_safe_value_not_needed.
From-SVN: r263335