bd9b3d3d1a
The CUDA driver API starting version 6.5 offers a set of runtime functions to calculate several occupancy-related measures, as a replacement for the occupancy calculator spreadsheet. This patch adds a heuristic for default runtime launch geometry, based on the new runtime function cuOccupancyMaxPotentialBlockSize. Build on x86_64 with nvptx accelerator and ran libgomp testsuite. 2018-08-13 Cesar Philippidis <cesar@codesourcery.com> Tom de Vries <tdevries@suse.de> PR target/85590 * plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef. (cuOccupancyMaxPotentialBlockSize): Declare. * plugin/cuda-lib.def (cuOccupancyMaxPotentialBlockSize): New CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-nvptx.c (CUDA_VERSION < 6050): Define CUoccupancyB2DSize and declare cuOccupancyMaxPotentialBlockSize. (nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the default num_gangs and num_workers when the driver supports it. Co-Authored-By: Tom de Vries <tdevries@suse.de> From-SVN: r263505
50 lines
1.6 KiB
Modula-2
50 lines
1.6 KiB
Modula-2
CUDA_ONE_CALL (cuCtxCreate)
|
|
CUDA_ONE_CALL (cuCtxDestroy)
|
|
CUDA_ONE_CALL (cuCtxGetCurrent)
|
|
CUDA_ONE_CALL (cuCtxGetDevice)
|
|
CUDA_ONE_CALL (cuCtxPopCurrent)
|
|
CUDA_ONE_CALL (cuCtxPushCurrent)
|
|
CUDA_ONE_CALL (cuCtxSynchronize)
|
|
CUDA_ONE_CALL (cuDeviceGet)
|
|
CUDA_ONE_CALL (cuDeviceGetAttribute)
|
|
CUDA_ONE_CALL (cuDeviceGetCount)
|
|
CUDA_ONE_CALL (cuEventCreate)
|
|
CUDA_ONE_CALL (cuEventDestroy)
|
|
CUDA_ONE_CALL (cuEventElapsedTime)
|
|
CUDA_ONE_CALL (cuEventQuery)
|
|
CUDA_ONE_CALL (cuEventRecord)
|
|
CUDA_ONE_CALL (cuEventSynchronize)
|
|
CUDA_ONE_CALL (cuFuncGetAttribute)
|
|
CUDA_ONE_CALL_MAYBE_NULL (cuGetErrorString)
|
|
CUDA_ONE_CALL (cuInit)
|
|
CUDA_ONE_CALL (cuLaunchKernel)
|
|
CUDA_ONE_CALL (cuLinkAddData)
|
|
CUDA_ONE_CALL_MAYBE_NULL (cuLinkAddData_v2)
|
|
CUDA_ONE_CALL (cuLinkComplete)
|
|
CUDA_ONE_CALL (cuLinkCreate)
|
|
CUDA_ONE_CALL_MAYBE_NULL (cuLinkCreate_v2)
|
|
CUDA_ONE_CALL (cuLinkDestroy)
|
|
CUDA_ONE_CALL (cuMemAlloc)
|
|
CUDA_ONE_CALL (cuMemAllocHost)
|
|
CUDA_ONE_CALL (cuMemcpy)
|
|
CUDA_ONE_CALL (cuMemcpyDtoDAsync)
|
|
CUDA_ONE_CALL (cuMemcpyDtoH)
|
|
CUDA_ONE_CALL (cuMemcpyDtoHAsync)
|
|
CUDA_ONE_CALL (cuMemcpyHtoD)
|
|
CUDA_ONE_CALL (cuMemcpyHtoDAsync)
|
|
CUDA_ONE_CALL (cuMemFree)
|
|
CUDA_ONE_CALL (cuMemFreeHost)
|
|
CUDA_ONE_CALL (cuMemGetAddressRange)
|
|
CUDA_ONE_CALL (cuMemHostGetDevicePointer)
|
|
CUDA_ONE_CALL (cuModuleGetFunction)
|
|
CUDA_ONE_CALL (cuModuleGetGlobal)
|
|
CUDA_ONE_CALL (cuModuleLoad)
|
|
CUDA_ONE_CALL (cuModuleLoadData)
|
|
CUDA_ONE_CALL (cuModuleUnload)
|
|
CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
|
|
CUDA_ONE_CALL (cuStreamCreate)
|
|
CUDA_ONE_CALL (cuStreamDestroy)
|
|
CUDA_ONE_CALL (cuStreamQuery)
|
|
CUDA_ONE_CALL (cuStreamSynchronize)
|
|
CUDA_ONE_CALL (cuStreamWaitEvent)
|