gcc/libgomp
Cesar Philippidis bd9b3d3d1a [nvptx] Use CUDA driver API to select default runtime launch geometry
The CUDA driver API starting version 6.5 offers a set of runtime functions to
calculate several occupancy-related measures, as a replacement for the occupancy
calculator spreadsheet.

This patch adds a heuristic for default runtime launch geometry, based on the
new runtime function cuOccupancyMaxPotentialBlockSize.

Build on x86_64 with nvptx accelerator and ran libgomp testsuite.

2018-08-13  Cesar Philippidis  <cesar@codesourcery.com>
	    Tom de Vries  <tdevries@suse.de>

	PR target/85590
	* plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef.
	(cuOccupancyMaxPotentialBlockSize): Declare.
	* plugin/cuda-lib.def (cuOccupancyMaxPotentialBlockSize): New
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-nvptx.c (CUDA_VERSION < 6050): Define
	CUoccupancyB2DSize and declare
	cuOccupancyMaxPotentialBlockSize.
	(nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the
	default num_gangs and num_workers when the driver supports it.

Co-Authored-By: Tom de Vries <tdevries@suse.de>

From-SVN: r263505
2018-08-13 12:04:24 +00:00
..
config [libgomp] Truncate config/nvptx/oacc-parallel.c 2018-08-01 13:01:45 -07:00
plugin [nvptx] Use CUDA driver API to select default runtime launch geometry 2018-08-13 12:04:24 +00:00
testsuite [nvptx] Ignore c++ exceptions 2018-08-02 15:59:01 +00:00
acinclude.m4
aclocal.m4
affinity.c
alloc.c
atomic.c
barrier.c
ChangeLog [nvptx] Use CUDA driver API to select default runtime launch geometry 2018-08-13 12:04:24 +00:00
ChangeLog.graphite
config.h.in
configure [libgomp, nvptx, --without-cuda-driver] Don't use system cuda driver 2018-08-04 20:07:22 +00:00
configure.ac
configure.tgt
critical.c
env.c
error.c
fortran.c
hashtab.h
icv-device.c
icv.c
iter_ull.c
iter.c
libgomp_f.h.in
libgomp_g.h
libgomp-plugin.c
libgomp-plugin.h
libgomp.h
libgomp.map
libgomp.spec.in
libgomp.texi libgomp.texi (Top): Move www.openmp.org to https. 2018-06-24 20:38:14 +00:00
lock.c
loop_ull.c
loop.c
Makefile.am
Makefile.in
oacc-async.c
oacc-cuda.c
oacc-host.c
oacc-init.c
oacc-int.h
oacc-mem.c
oacc-parallel.c
oacc-plugin.c
oacc-plugin.h
omp_lib.f90.in
omp_lib.h.in
omp.h.in
openacc_lib.h
openacc.f90
openacc.h
ordered.c
parallel.c
priority_queue.c
priority_queue.h
sections.c
secure_getenv.h
single.c
splay-tree.c
splay-tree.h
target.c
task.c
taskloop.c
team.c
work.c