The CUDA driver API starting version 6.5 offers a set of runtime functions to
calculate several occupancy-related measures, as a replacement for the occupancy
calculator spreadsheet.
This patch adds a heuristic for default runtime launch geometry, based on the
new runtime function cuOccupancyMaxPotentialBlockSize.
Build on x86_64 with nvptx accelerator and ran libgomp testsuite.
2018-08-13 Cesar Philippidis <cesar@codesourcery.com>
Tom de Vries <tdevries@suse.de>
PR target/85590
* plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef.
(cuOccupancyMaxPotentialBlockSize): Declare.
* plugin/cuda-lib.def (cuOccupancyMaxPotentialBlockSize): New
CUDA_ONE_CALL_MAYBE_NULL.
* plugin/plugin-nvptx.c (CUDA_VERSION < 6050): Define
CUoccupancyB2DSize and declare
cuOccupancyMaxPotentialBlockSize.
(nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the
default num_gangs and num_workers when the driver supports it.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
From-SVN: r263505
* plugin/configfrag.ac: For --without-cuda-driver don't initialize
CUDA_DRIVER_INCLUDE nor CUDA_DRIVER_LIB. If both
CUDA_DRIVER_INCLUDE and CUDA_DRIVER_LIB are empty and linking small
cuda program fails, define PLUGIN_NVPTX_DYNAMIC to 1 and use
plugin/include/cuda as include dir and -ldl instead of -lcuda as
library to link ptx plugin against.
* plugin/plugin-nvptx.c: Include dlfcn.h if PLUGIN_NVPTX_DYNAMIC.
(CUDA_CALLS): Define.
(cuda_lib, cuda_lib_inited): New variables.
(init_cuda_lib): New function.
(CUDA_CALL_PREFIX): Define.
(CUDA_CALL_ERET, CUDA_CALL_ASSERT): Use CUDA_CALL_PREFIX.
(CUDA_CALL): Use FN instead of (FN).
(CUDA_CALL_NOCHECK): Define.
(cuda_error, fini_streams_for_device, select_stream_for_async,
nvptx_attach_host_thread_to_device, nvptx_open_device, link_ptx,
event_gc, nvptx_exec, nvptx_async_test, nvptx_async_test_all,
nvptx_wait_all, nvptx_set_clocktick, GOMP_OFFLOAD_unload_image,
nvptx_stacks_alloc, nvptx_stacks_free, GOMP_OFFLOAD_run): Use
CUDA_CALL_NOCHECK.
(nvptx_init): Call init_cuda_lib, if it fails, return false. Use
CUDA_CALL_NOCHECK.
(nvptx_get_num_devices): Call init_cuda_lib, if it fails, return 0.
Use CUDA_CALL_NOCHECK.
* plugin/cuda/cuda.h: New file.
* config.h.in: Regenerated.
* configure: Regenerated.
From-SVN: r244522