gcc/libgomp/plugin/cuda-lib.def
Cesar Philippidis bd9b3d3d1a [nvptx] Use CUDA driver API to select default runtime launch geometry
The CUDA driver API starting version 6.5 offers a set of runtime functions to
calculate several occupancy-related measures, as a replacement for the occupancy
calculator spreadsheet.

This patch adds a heuristic for default runtime launch geometry, based on the
new runtime function cuOccupancyMaxPotentialBlockSize.

Build on x86_64 with nvptx accelerator and ran libgomp testsuite.

2018-08-13  Cesar Philippidis  <cesar@codesourcery.com>
	    Tom de Vries  <tdevries@suse.de>

	PR target/85590
	* plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef.
	(cuOccupancyMaxPotentialBlockSize): Declare.
	* plugin/cuda-lib.def (cuOccupancyMaxPotentialBlockSize): New
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-nvptx.c (CUDA_VERSION < 6050): Define
	CUoccupancyB2DSize and declare
	cuOccupancyMaxPotentialBlockSize.
	(nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the
	default num_gangs and num_workers when the driver supports it.

Co-Authored-By: Tom de Vries <tdevries@suse.de>

From-SVN: r263505
2018-08-13 12:04:24 +00:00

50 lines
1.6 KiB
Modula-2

CUDA_ONE_CALL (cuCtxCreate)
CUDA_ONE_CALL (cuCtxDestroy)
CUDA_ONE_CALL (cuCtxGetCurrent)
CUDA_ONE_CALL (cuCtxGetDevice)
CUDA_ONE_CALL (cuCtxPopCurrent)
CUDA_ONE_CALL (cuCtxPushCurrent)
CUDA_ONE_CALL (cuCtxSynchronize)
CUDA_ONE_CALL (cuDeviceGet)
CUDA_ONE_CALL (cuDeviceGetAttribute)
CUDA_ONE_CALL (cuDeviceGetCount)
CUDA_ONE_CALL (cuEventCreate)
CUDA_ONE_CALL (cuEventDestroy)
CUDA_ONE_CALL (cuEventElapsedTime)
CUDA_ONE_CALL (cuEventQuery)
CUDA_ONE_CALL (cuEventRecord)
CUDA_ONE_CALL (cuEventSynchronize)
CUDA_ONE_CALL (cuFuncGetAttribute)
CUDA_ONE_CALL_MAYBE_NULL (cuGetErrorString)
CUDA_ONE_CALL (cuInit)
CUDA_ONE_CALL (cuLaunchKernel)
CUDA_ONE_CALL (cuLinkAddData)
CUDA_ONE_CALL_MAYBE_NULL (cuLinkAddData_v2)
CUDA_ONE_CALL (cuLinkComplete)
CUDA_ONE_CALL (cuLinkCreate)
CUDA_ONE_CALL_MAYBE_NULL (cuLinkCreate_v2)
CUDA_ONE_CALL (cuLinkDestroy)
CUDA_ONE_CALL (cuMemAlloc)
CUDA_ONE_CALL (cuMemAllocHost)
CUDA_ONE_CALL (cuMemcpy)
CUDA_ONE_CALL (cuMemcpyDtoDAsync)
CUDA_ONE_CALL (cuMemcpyDtoH)
CUDA_ONE_CALL (cuMemcpyDtoHAsync)
CUDA_ONE_CALL (cuMemcpyHtoD)
CUDA_ONE_CALL (cuMemcpyHtoDAsync)
CUDA_ONE_CALL (cuMemFree)
CUDA_ONE_CALL (cuMemFreeHost)
CUDA_ONE_CALL (cuMemGetAddressRange)
CUDA_ONE_CALL (cuMemHostGetDevicePointer)
CUDA_ONE_CALL (cuModuleGetFunction)
CUDA_ONE_CALL (cuModuleGetGlobal)
CUDA_ONE_CALL (cuModuleLoad)
CUDA_ONE_CALL (cuModuleLoadData)
CUDA_ONE_CALL (cuModuleUnload)
CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
CUDA_ONE_CALL (cuStreamCreate)
CUDA_ONE_CALL (cuStreamDestroy)
CUDA_ONE_CALL (cuStreamQuery)
CUDA_ONE_CALL (cuStreamSynchronize)
CUDA_ONE_CALL (cuStreamWaitEvent)